facilitate gene discovery: Topics by Science.gov

Sample records for facilitate gene discovery

GWATCH: a web platform for automated gene association discovery analysis.

PubMed

Svitin, Anton; Malov, Sergey; Cherkasov, Nikolay; Geerts, Paul; Rotkevich, Mikhail; Dobrynin, Pavel; Shevchenko, Andrey; Guan, Li; Troyer, Jennifer; Hendrickson, Sher; Dilks, Holli Hutcheson; Oleksyk, Taras K; Donfield, Sharyne; Gomperts, Edward; Jabs, Douglas A; Sezgin, Efe; Van Natta, Mark; Harrigan, P Richard; Brumme, Zabrina L; O'Brien, Stephen J

2014-01-01

As genome-wide sequence analyses for complex human disease determinants are expanding, it is increasingly necessary to develop strategies to promote discovery and validation of potential disease-gene associations. Here we present a dynamic web-based platform - GWATCH - that automates and facilitates four steps in genetic epidemiological discovery: 1) Rapid gene association search and discovery analysis of large genome-wide datasets; 2) Expanded visual display of gene associations for genome-wide variants (SNPs, indels, CNVs), including Manhattan plots, 2D and 3D snapshots of any gene region, and a dynamic genome browser illustrating gene association chromosomal regions; 3) Real-time validation/replication of candidate or putative genes suggested from other sources, limiting Bonferroni genome-wide association study (GWAS) penalties; 4) Open data release and sharing by eliminating privacy constraints (The National Human Genome Research Institute (NHGRI) Institutional Review Board (IRB), informed consent, The Health Insurance Portability and Accountability Act (HIPAA) of 1996 etc.) on unabridged results, which allows for open access comparative and meta-analysis. GWATCH is suitable for both GWAS and whole genome sequence association datasets. We illustrate the utility of GWATCH with three large genome-wide association studies for HIV-AIDS resistance genes screened in large multicenter cohorts; however, association datasets from any study can be uploaded and analyzed by GWATCH.
Function-driven discovery of disease genes in zebrafish using an integrated genomics big data resource.

PubMed

Shim, Hongseok; Kim, Ji Hyun; Kim, Chan Yeong; Hwang, Sohyun; Kim, Hyojin; Yang, Sunmo; Lee, Ji Eun; Lee, Insuk

2016-11-16

Whole exome sequencing (WES) accelerates disease gene discovery using rare genetic variants, but further statistical and functional evidence is required to avoid false-discovery. To complement variant-driven disease gene discovery, here we present function-driven disease gene discovery in zebrafish (Danio rerio), a promising human disease model owing to its high anatomical and genomic similarity to humans. To facilitate zebrafish-based function-driven disease gene discovery, we developed a genome-scale co-functional network of zebrafish genes, DanioNet (www.inetbio.org/danionet), which was constructed by Bayesian integration of genomics big data. Rigorous statistical assessment confirmed the high prediction capacity of DanioNet for a wide variety of human diseases. To demonstrate the feasibility of the function-driven disease gene discovery using DanioNet, we predicted genes for ciliopathies and performed experimental validation for eight candidate genes. We also validated the existence of heterozygous rare variants in the candidate genes of individuals with ciliopathies yet not in controls derived from the UK10K consortium, suggesting that these variants are potentially involved in enhancing the risk of ciliopathies. These results showed that an integrated genomics big data for a model animal of diseases can expand our opportunity for harnessing WES data in disease gene discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Developing integrated crop knowledge networks to advance candidate gene discovery.

PubMed

Hassani-Pak, Keywan; Castellote, Martin; Esch, Maria; Hindle, Matthew; Lysenko, Artem; Taubert, Jan; Rawlings, Christopher

2016-12-01

The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.
Knowledge Discovery in Biological Databases for Revealing Candidate Genes Linked to Complex Phenotypes.

PubMed

Hassani-Pak, Keywan; Rawlings, Christopher

2017-06-13

Genetics and "omics" studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.
Metagenomics and novel gene discovery

PubMed Central

Culligan, Eamonn P; Sleator, Roy D; Marchesi, Julian R; Hill, Colin

2014-01-01

Metagenomics provides a means of assessing the total genetic pool of all the microbes in a particular environment, in a culture-independent manner. It has revealed unprecedented diversity in microbial community composition, which is further reflected in the encoded functional diversity of the genomes, a large proportion of which consists of novel genes. Herein, we review both sequence-based and functional metagenomic methods to uncover novel genes and outline some of the associated problems of each type of approach, as well as potential solutions. Furthermore, we discuss the potential for metagenomic biotherapeutic discovery, with a particular focus on the human gut microbiome and finally, we outline how the discovery of novel genes may be used to create bioengineered probiotics. PMID:24317337
Genome Neighborhood Network Reveals Insights into Enediyne Biosynthesis and Facilitates Prediction and Prioritization for Discovery

PubMed Central

Rudolf, Jeffrey D.; Yan, Xiaohui; Shen, Ben

2015-01-01

The enediynes are one of the most fascinating families of bacterial natural products given their unprecedented molecular architecture and extraordinary cytotoxicity. Enediynes are rare with only 11 structurally characterized members and four additional members isolated in their cycloaromatized form. Recent advances in DNA sequencing have resulted in an explosion of microbial genomes. A virtual survey of the GenBank and JGI genome databases revealed 87 enediyne biosynthetic gene clusters from 78 bacteria strains, implying enediynes are more common than previously thought. Here we report the construction and analysis of an enediyne genome neighborhood network (GNN) as a high-throughput approach to analyze secondary metabolite gene clusters. Analysis of the enediyne GNN facilitated rapid gene cluster annotation, revealed genetic trends in enediyne biosynthetic gene clusters resulting in a simple prediction scheme to determine 9- vs 10-membered enediyne gene clusters, and supported a genomic-based strain prioritization method for enediyne discovery. PMID:26318027
Repurposed transcriptomic data facilitate discovery of innate immunity toll-like receptor (TLR) Genes across Lophotrochozoa.

PubMed

Halanych, Kenneth M; Kocot, Kevin M

2014-10-01

The growing volume of genomic data from across life represents opportunities for deriving valuable biological information from data that were initially collected for another purpose. Here, we use transcriptomes collected for phylogenomic studies to search for toll-like receptor (TLR) genes in poorly sampled lophotrochozoan clades (Annelida, Mollusca, Brachiopoda, Phoronida, and Entoprocta) and one ecdysozoan clade (Priapulida). TLR genes are involved in innate immunity across animals by recognizing potential microbial infection. They have an extracellular leucine-rich repeat (LRR) domain connected to a transmembrane domain and an intracellular toll/interleukin-1 receptor (TIR) domain. Consequently, these genes are important in initiating a signaling pathway to trigger defense. We found at least one TLR ortholog in all but two taxa examined, suggesting that a broad array of lophotrochozoans may have innate immune systems similar to those observed in vertebrates and arthropods. Comparison to the SMART database confirmed the presence of both the LRR and the TIR protein motifs characteristic of TLR genes. Because we looked at only one transcriptome per species, discovery of TLR genes was limited for most taxa. However, several TRL-like genes that vary in the number and placement of LRR domains were found in phoronids. Additionally, several contigs contained LRR domains but lacked TIR domains, suggesting they were not TLRs. Many of these LRR-containing contigs had other domains (e.g., immunoglobin) and are likely involved in innate immunity. © 2014 Marine Biological Laboratory.
Global Landscape of a Co-Expressed Gene Network in Barley and its Application to Gene Discovery in Triticeae Crops

PubMed Central

Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo

2011-01-01

Accumulated transcriptome data can be used to investigate regulatory networks of genes involved in various biological systems. Co-expression analysis data sets generated from comprehensively collected transcriptome data sets now represent efficient resources that are capable of facilitating the discovery of genes with closely correlated expression patterns. In order to construct a co-expression network for barley, we analyzed 45 publicly available experimental series, which are composed of 1,347 sets of GeneChip data for barley. On the basis of a gene-to-gene weighted correlation coefficient, we constructed a global barley co-expression network and classified it into clusters of subnetwork modules. The resulting clusters are candidates for functional regulatory modules in the barley transcriptome. To annotate each of the modules, we performed comparative annotation using genes in Arabidopsis and Brachypodium distachyon. On the basis of a comparative analysis between barley and two model species, we investigated functional properties from the representative distributions of the gene ontology (GO) terms. Modules putatively involved in drought stress response and cellulose biogenesis have been identified. These modules are discussed to demonstrate the effectiveness of the co-expression analysis. Furthermore, we applied the data set of co-expressed genes coupled with comparative analysis in attempts to discover potentially Triticeae-specific network modules. These results demonstrate that analysis of the co-expression network of the barley transcriptome together with comparative analysis should promote the process of gene discovery in barley. Furthermore, the insights obtained should be transferable to investigations of Triticeae plants. The associated data set generated in this analysis is publicly accessible at http://coexpression.psc.riken.jp/barley/. PMID:21441235
Copper homeostasis gene discovery in Drosophila melanogaster.

PubMed

Norgate, Melanie; Southon, Adam; Zou, Sige; Zhan, Ming; Sun, Yu; Batterham, Phil; Camakaris, James

2007-06-01

Recent studies have shown a high level of conservation between Drosophila melanogaster and mammalian copper homeostasis mechanisms. These studies have also demonstrated the efficiency with which this species can be used to characterize novel genes, at both the cellular and whole organism level. As a versatile and inexpensive model organism, Drosophila is also particularly useful for gene discovery applications and thus has the potential to be extremely useful in identifying novel copper homeostasis genes and putative disease genes. In order to assess the suitability of Drosophila for this purpose, three screening approaches have been investigated. These include an analysis of the global transcriptional response to copper in both adult flies and an embryonic cell line using DNA microarray analysis. Two mutagenesis-based screens were also utilized. Several candidate copper homeostasis genes have been identified through this work. In addition, the results of each screen were carefully analyzed to identify any factors influencing efficiency and sensitivity. These are discussed here with the aim of maximizing the efficiency of future screens and the most suitable approaches are outlined. Building on this information, there is great potential for the further use of Drosophila for copper homeostasis gene discovery.
IDENTIFYING TOXIC LEADERSHIP BEHAVIORS AND TOOLS TO FACILITATE THEIR DISCOVERY

DTIC Science & Technology

2016-01-31

AIR WAR COLLEGE AIR UNIVERSITY IDENTIFYING TOXIC LEADERSHIP BEHAVIORS AND TOOLS TO FACILITATE THEIR DISCOVERY by Michael Boger, Lt Col...released investigations for specific, observable traits relating to toxic behavior . 3) Discuss indicators and concerns in steps one and two with...subordinates, which will aid in validating the specific observable behaviors from the lenses of each of these positions. The application of their input
Molecular Networking and Pattern-Based Genome Mining Improves Discovery of Biosynthetic Gene Clusters and their Products from Salinispora Species

DOE Office of Scientific and Technical Information (OSTI.GOV)

Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna

Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less
Molecular Networking and Pattern-Based Genome Mining Improves Discovery of Biosynthetic Gene Clusters and their Products from Salinispora Species

DOE PAGES

Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; ...

2015-04-09

Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. In this paper, we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains, including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated themore » identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. Finally, these efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches.« less
Molecular Networking and Pattern-Based Genome Mining Improves discovery of biosynthetic gene clusters and their products from Salinispora species

PubMed Central

Duncan, Katherine R.; Crüsemann, Max; Lechner, Anna; Sarkar, Anindita; Li, Jie; Ziemert, Nadine; Wang, Mingxun; Bandeira, Nuno; Moore, Bradley S.; Dorrestein, Pieter C.; Jensen, Paul R.

2015-01-01

Summary Genome sequencing has revealed that bacteria contain many more biosynthetic gene clusters than predicted based on the number of secondary metabolites discovered to date. While this biosynthetic reservoir has fostered interest in new tools for natural product discovery, there remains a gap between gene cluster detection and compound discovery. Here we apply molecular networking and the new concept of pattern-based genome mining to 35 Salinispora strains including 30 for which draft genome sequences were either available or obtained for this study. The results provide a method to simultaneously compare large numbers of complex microbial extracts, which facilitated the identification of media components, known compounds and their derivatives, and new compounds that could be prioritized for structure elucidation. These efforts revealed considerable metabolite diversity and led to several molecular family-gene cluster pairings, of which the quinomycin-type depsipeptide retimycin A was characterized and linked to gene cluster NRPS40 using pattern-based bioinformatic approaches. PMID:25865308
Automated Discovery of Long Intergenic RNAs Associated with Breast Cancer Progression

DTIC Science & Technology

2012-02-01

manuscript in preparation), (2) development and publication of an algorithm for detecting gene fusions in RNA-Seq data [1], and (3) discovery of outlier long...subjected to de novo assembly algorithms to discover novel transcripts representing either unannotated genes or novel somatic mutations such as gene...fusions. To this end the P.I. developed and published a novel algorithm called ChimeraScan to facilitate the discovery and validation of gene
Systems Pharmacology-Based Discovery of Natural Products for Precision Oncology Through Targeting Cancer Mutated Genes.

PubMed

Fang, J; Cai, C; Wang, Q; Lin, P; Zhao, Z; Cheng, F

2017-03-01

Massive cancer genomics data have facilitated the rapid revolution of a novel oncology drug discovery paradigm through targeting clinically relevant driver genes or mutations for the development of precision oncology. Natural products with polypharmacological profiles have been demonstrated as promising agents for the development of novel cancer therapies. In this study, we developed an integrated systems pharmacology framework that facilitated identifying potential natural products that target mutated genes across 15 cancer types or subtypes in the realm of precision medicine. High performance was achieved for our systems pharmacology framework. In case studies, we computationally identified novel anticancer indications for several US Food and Drug Administration-approved or clinically investigational natural products (e.g., resveratrol, quercetin, genistein, and fisetin) through targeting significantly mutated genes in multiple cancer types. In summary, this study provides a powerful tool for the development of molecularly targeted cancer therapies through targeting the clinically actionable alterations by exploiting the systems pharmacology of natural products. © 2017 The Authors CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals, Inc. on behalf of American Society for Clinical Pharmacology and Therapeutics.
SSHscreen and SSHdb, generic software for microarray based gene discovery: application to the stress response in cowpea

PubMed Central

2010-01-01

Background Suppression subtractive hybridization is a popular technique for gene discovery from non-model organisms without an annotated genome sequence, such as cowpea (Vigna unguiculata (L.) Walp). We aimed to use this method to enrich for genes expressed during drought stress in a drought tolerant cowpea line. However, current methods were inefficient in screening libraries and management of the sequence data, and thus there was a need to develop software tools to facilitate the process. Results Forward and reverse cDNA libraries enriched for cowpea drought response genes were screened on microarrays, and the R software package SSHscreen 2.0.1 was developed (i) to normalize the data effectively using spike-in control spot normalization, and (ii) to select clones for sequencing based on the calculation of enrichment ratios with associated statistics. Enrichment ratio 3 values for each clone showed that 62% of the forward library and 34% of the reverse library clones were significantly differentially expressed by drought stress (adjusted p value < 0.05). Enrichment ratio 2 calculations showed that > 88% of the clones in both libraries were derived from rare transcripts in the original tester samples, thus supporting the notion that suppression subtractive hybridization enriches for rare transcripts. A set of 118 clones were chosen for sequencing, and drought-induced cowpea genes were identified, the most interesting encoding a late embryogenesis abundant Lea5 protein, a glutathione S-transferase, a thaumatin, a universal stress protein, and a wound induced protein. A lipid transfer protein and several components of photosynthesis were down-regulated by the drought stress. Reverse transcriptase quantitative PCR confirmed the enrichment ratio values for the selected cowpea genes. SSHdb, a web-accessible database, was developed to manage the clone sequences and combine the SSHscreen data with sequence annotations derived from BLAST and Blast2GO. The self
A brief history of Alzheimer's disease gene discovery.

PubMed

Tanzi, Rudolph E

2013-01-01

The rich and colorful history of gene discovery in Alzheimer's disease (AD) over the past three decades is as complex and heterogeneous as the disease, itself. Twin and family studies indicate that genetic factors are estimated to play a role in at least 80% of AD cases. The inheritance of AD exhibits a dichotomous pattern. On one hand, rare mutations inAPP, PSEN1, and PSEN2 are fully penetrant for early-onset (<60 years) familial AD, which represents <5% of AD. On the other hand, common gene polymorphisms, such as the 4 and 2 variants of the APOE gene, influence susceptibility for common (>95%) late-onset AD. These four genes account for 30-50% of the inheritability of AD. Genome-wide association studies have recently led to the identification of additional highly confirmed AD candidate genes. Here, I review the past, present, and future of attempts to elucidate the complex and heterogeneous genetic underpinnings of AD along with some of the unique events that made these discoveries possible.
Discovery and validation of a glioblastoma co-expressed gene module

PubMed Central

Dunwoodie, Leland J.; Poehlman, William L.; Ficklin, Stephen P.; Feltus, Frank Alexander

2018-01-01

Tumors exhibit complex patterns of aberrant gene expression. Using a knowledge-independent, noise-reducing gene co-expression network construction software called KINC, we created multiple RNAseq-based gene co-expression networks relevant to brain and glioblastoma biology. In this report, we describe the discovery and validation of a glioblastoma-specific gene module that contains 22 co-expressed genes. The genes are upregulated in glioblastoma relative to normal brain and lower grade glioma samples; they are also hypo-methylated in glioblastoma relative to lower grade glioma tumors. Among the proneural, neural, mesenchymal, and classical glioblastoma subtypes, these genes are most-highly expressed in the mesenchymal subtype. Furthermore, high expression of these genes is associated with decreased survival across each glioblastoma subtype. These genes are of interest to glioblastoma biology and our gene interaction discovery and validation workflow can be used to discover and validate co-expressed gene modules derived from any co-expression network. PMID:29541392
Discovery and validation of a glioblastoma co-expressed gene module.

PubMed

Dunwoodie, Leland J; Poehlman, William L; Ficklin, Stephen P; Feltus, Frank Alexander

2018-02-16

Tumors exhibit complex patterns of aberrant gene expression. Using a knowledge-independent, noise-reducing gene co-expression network construction software called KINC, we created multiple RNAseq-based gene co-expression networks relevant to brain and glioblastoma biology. In this report, we describe the discovery and validation of a glioblastoma-specific gene module that contains 22 co-expressed genes. The genes are upregulated in glioblastoma relative to normal brain and lower grade glioma samples; they are also hypo-methylated in glioblastoma relative to lower grade glioma tumors. Among the proneural, neural, mesenchymal, and classical glioblastoma subtypes, these genes are most-highly expressed in the mesenchymal subtype. Furthermore, high expression of these genes is associated with decreased survival across each glioblastoma subtype. These genes are of interest to glioblastoma biology and our gene interaction discovery and validation workflow can be used to discover and validate co-expressed gene modules derived from any co-expression network.
Standardized Plant Disease Evaluations will Enhance Resistance Gene Discovery

USDA-ARS?s Scientific Manuscript database

Gene discovery and marker development using DNA based tools require plant populations with well-documented phenotypes. Related crops such as apples and pears may share a number of genes, for example resistance to common diseases, and data mining in one crop may reveal genes for the other. However, u...

Interdisciplinary Laboratory Course Facilitating Knowledge Integration, Mutualistic Teaming, and Original Discovery.

PubMed

Full, Robert J; Dudley, Robert; Koehl, M A R; Libby, Thomas; Schwab, Cheryl

2015-11-01

Experiencing the thrill of an original scientific discovery can be transformative to students unsure about becoming a scientist, yet few courses offer authentic research experiences. Increasingly, cutting-edge discoveries require an interdisciplinary approach not offered in current departmental-based courses. Here, we describe a one-semester, learning laboratory course on organismal biomechanics offered at our large research university that enables interdisciplinary teams of students from biology and engineering to grow intellectually, collaborate effectively, and make original discoveries. To attain this goal, we avoid traditional "cookbook" laboratories by training 20 students to use a dozen research stations. Teams of five students rotate to a new station each week where a professor, graduate student, and/or team member assists in the use of equipment, guides students through stages of critical thinking, encourages interdisciplinary collaboration, and moves them toward authentic discovery. Weekly discussion sections that involve the entire class offer exchange of discipline-specific knowledge, advice on experimental design, methods of collecting and analyzing data, a statistics primer, and best practices for writing and presenting scientific papers. The building of skills in concert with weekly guided inquiry facilitates original discovery via a final research project that can be presented at a national meeting or published in a scientific journal. © The Author 2015. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology. All rights reserved. For permissions please email: journals.permissions@oup.com.
Discovery of error-tolerant biclusters from noisy gene expression data.

PubMed

Gupta, Rohit; Rao, Navneet; Kumar, Vipin

2011-11-24

An important analysis performed on microarray gene-expression data is to discover biclusters, which denote groups of genes that are coherently expressed for a subset of conditions. Various biclustering algorithms have been proposed to find different types of biclusters from these real-valued gene-expression data sets. However, these algorithms suffer from several limitations such as inability to explicitly handle errors/noise in the data; difficulty in discovering small bicliusters due to their top-down approach; inability of some of the approaches to find overlapping biclusters, which is crucial as many genes participate in multiple biological processes. Association pattern mining also produce biclusters as their result and can naturally address some of these limitations. However, traditional association mining only finds exact biclusters, which limits its applicability in real-life data sets where the biclusters may be fragmented due to random noise/errors. Moreover, as they only work with binary or boolean attributes, their application on gene-expression data require transforming real-valued attributes to binary attributes, which often results in loss of information. Many past approaches have tried to address the issue of noise and handling real-valued attributes independently but there is no systematic approach that addresses both of these issues together. In this paper, we first propose a novel error-tolerant biclustering model, 'ET-bicluster', and then propose a bottom-up heuristic-based mining algorithm to sequentially discover error-tolerant biclusters directly from real-valued gene-expression data. The efficacy of our proposed approach is illustrated by comparing it with a recent approach RAP in the context of two biological problems: discovery of functional modules and discovery of biomarkers. For the first problem, two real-valued S.Cerevisiae microarray gene-expression data sets are used to demonstrate that the biclusters obtained from ET
Phenotypic mutant library: potential for gene discovery

USDA-ARS?s Scientific Manuscript database

The rapid development of high throughput and affordable Next- Generation Sequencing (NGS) techniques has renewed interest in gene discovery using forward genetics. The conventional forward genetic approach starts with isolation of mutants with a phenotype of interest, mapping the mutation within a s...
Biomarker discovery for colon cancer using a 761 gene RT-PCR assay.

PubMed

Clark-Langone, Kim M; Wu, Jenny Y; Sangli, Chithra; Chen, Angela; Snable, James L; Nguyen, Anhthu; Hackett, James R; Baker, Joffre; Yothers, Greg; Kim, Chungyeul; Cronin, Maureen T

2007-08-15

Reverse transcription PCR (RT-PCR) is widely recognized to be the gold standard method for quantifying gene expression. Studies using RT-PCR technology as a discovery tool have historically been limited to relatively small gene sets compared to other gene expression platforms such as microarrays. We have recently shown that TaqMan RT-PCR can be scaled up to profile expression for 192 genes in fixed paraffin-embedded (FPE) clinical study tumor specimens. This technology has also been used to develop and commercialize a widely used clinical test for breast cancer prognosis and prediction, the Onco typeDX assay. A similar need exists in colon cancer for a test that provides information on the likelihood of disease recurrence in colon cancer (prognosis) and the likelihood of tumor response to standard chemotherapy regimens (prediction). We have now scaled our RT-PCR assay to efficiently screen 761 biomarkers across hundreds of patient samples and applied this process to biomarker discovery in colon cancer. This screening strategy remains attractive due to the inherent advantages of maintaining platform consistency from discovery through clinical application. RNA was extracted from formalin fixed paraffin embedded (FPE) tissue, as old as 28 years, from 354 patients enrolled in NSABP C-01 and C-02 colon cancer studies. Multiplexed reverse transcription reactions were performed using a gene specific primer pool containing 761 unique primers. PCR was performed as independent TaqMan reactions for each candidate gene. Hierarchal clustering demonstrates that genes expected to co-express form obvious, distinct and in certain cases very tightly correlated clusters, validating the reliability of this technical approach to biomarker discovery. We have developed a high throughput, quantitatively precise multi-analyte gene expression platform for biomarker discovery that approaches low density DNA arrays in numbers of genes analyzed while maintaining the high specificity
An Evaluation of Active Learning Causal Discovery Methods for Reverse-Engineering Local Causal Pathways of Gene Regulation

PubMed Central

Ma, Sisi; Kemmeren, Patrick; Aliferis, Constantin F.; Statnikov, Alexander

2016-01-01

Reverse-engineering of causal pathways that implicate diseases and vital cellular functions is a fundamental problem in biomedicine. Discovery of the local causal pathway of a target variable (that consists of its direct causes and direct effects) is essential for effective intervention and can facilitate accurate diagnosis and prognosis. Recent research has provided several active learning methods that can leverage passively observed high-throughput data to draft causal pathways and then refine the inferred relations with a limited number of experiments. The current study provides a comprehensive evaluation of the performance of active learning methods for local causal pathway discovery in real biological data. Specifically, 54 active learning methods/variants from 3 families of algorithms were applied for local causal pathways reconstruction of gene regulation for 5 transcription factors in S. cerevisiae. Four aspects of the methods’ performance were assessed, including adjacency discovery quality, edge orientation accuracy, complete pathway discovery quality, and experimental cost. The results of this study show that some methods provide significant performance benefits over others and therefore should be routinely used for local causal pathway discovery tasks. This study also demonstrates the feasibility of local causal pathway reconstruction in real biological systems with significant quality and low experimental cost. PMID:26939894
Discovery of Cationic Polymers for Non-viral Gene Delivery using Combinatorial Approaches

PubMed Central

Barua, Sutapa; Ramos, James; Potta, Thrimoorthy; Taylor, David; Huang, Huang-Chiao; Montanez, Gabriela; Rege, Kaushal

2015-01-01

Gene therapy is an attractive treatment option for diseases of genetic origin, including several cancers and cardiovascular diseases. While viruses are effective vectors for delivering exogenous genes to cells, concerns related to insertional mutagenesis, immunogenicity, lack of tropism, decay and high production costs necessitate the discovery of non-viral methods. Significant efforts have been focused on cationic polymers as non-viral alternatives for gene delivery. Recent studies have employed combinatorial syntheses and parallel screening methods for enhancing the efficacy of gene delivery, biocompatibility of the delivery vehicle, and overcoming cellular level barriers as they relate to polymer-mediated transgene uptake, transport, transcription, and expression. This review summarizes and discusses recent advances in combinatorial syntheses and parallel screening of cationic polymer libraries for the discovery of efficient and safe gene delivery systems. PMID:21843141
Gene Discovery of Characteristic Metabolic Pathways in the Tea Plant (Camellia sinensis) Using ‘Omics’-Based Network Approaches: A Future Perspective

PubMed Central

Zhang, Shihua; Zhang, Liang; Tai, Yuling; Wang, Xuewen; Ho, Chi-Tang; Wan, Xiaochun

2018-01-01

Characteristic secondary metabolites, including flavonoids, theanine and caffeine, in the tea plant (Camellia sinensis) are the primary sources of the rich flavors, fresh taste, and health benefits of tea. The decoding of genes involved in these characteristic components is still significantly lagging, which lays an obstacle for applied genetic improvement and metabolic engineering. With the popularity of high-throughout transcriptomics and metabolomics, ‘omics’-based network approaches, such as gene co-expression network and gene-to-metabolite network, have emerged as powerful tools for gene discovery of plant-specialized (secondary) metabolism. Thus, it is pivotal to summarize and introduce such system-based strategies in facilitating gene identification of characteristic metabolic pathways in the tea plant (or other plants). In this review, we describe recent advances in transcriptomics and metabolomics for transcript and metabolite profiling, and highlight ‘omics’-based network strategies using successful examples in model and non-model plants. Further, we summarize recent progress in ‘omics’ analysis for gene identification of characteristic metabolites in the tea plant. Limitations of the current strategies are discussed by comparison with ‘omics’-based network approaches. Finally, we demonstrate the potential of introducing such network strategies in the tea plant, with a prospects ending for a promising network discovery of characteristic metabolite genes in the tea plant. PMID:29915604
Cancer gene discovery: exploiting insertional mutagenesis

PubMed Central

Ranzani, Marco; Annunziato, Stefano; Adams, David J.; Montini, Eugenio

2013-01-01

Insertional mutagenesis has been utilized as a functional forward genetics screen for the identification of novel genes involved in the pathogenesis of human cancers. Different insertional mutagens have been successfully used to reveal new cancer genes. For example, retroviruses (RVs) are integrating viruses with the capacity to induce the deregulation of genes in the neighborhood of the insertion site. RVs have been employed for more than 30 years to identify cancer genes in the hematopoietic system and mammary gland. Similarly, another tool that has revolutionized cancer gene discovery is the cut-and-paste transposons. These DNA elements have been engineered to contain strong promoters and stop cassettes that may function to perturb gene expression upon integration proximal to genes. In addition, complex mouse models characterized by tissue-restricted activity of transposons have been developed to identify oncogenes and tumor suppressor genes that control the development of a wide range of solid tumor types, extending beyond those tissues accessible using RV-based approaches. Most recently, lentiviral vectors (LVs) have appeared on the scene for use in cancer gene screens. LVs are replication defective integrating vectors that have the advantage of being able to infect non-dividing cells, in a wide range of cell types and tissues. In this review, we describe the various insertional mutagens focusing on their advantages/limitations and we discuss the new and promising tools that will improve the insertional mutagenesis screens of the future. PMID:23928056
Genes@Work: an efficient algorithm for pattern discovery and multivariate feature selection in gene expression data.

PubMed

Lepre, Jorge; Rice, J Jeremy; Tu, Yuhai; Stolovitzky, Gustavo

2004-05-01

Despite the growing literature devoted to finding differentially expressed genes in assays probing different tissues types, little attention has been paid to the combinatorial nature of feature selection inherent to large, high-dimensional gene expression datasets. New flexible data analysis approaches capable of searching relevant subgroups of genes and experiments are needed to understand multivariate associations of gene expression patterns with observed phenotypes. We present in detail a deterministic algorithm to discover patterns of multivariate gene associations in gene expression data. The patterns discovered are differential with respect to a control dataset. The algorithm is exhaustive and efficient, reporting all existent patterns that fit a given input parameter set while avoiding enumeration of the entire pattern space. The value of the pattern discovery approach is demonstrated by finding a set of genes that differentiate between two types of lymphoma. Moreover, these genes are found to behave consistently in an independent dataset produced in a different laboratory using different arrays, thus validating the genes selected using our algorithm. We show that the genes deemed significant in terms of their multivariate statistics will be missed using other methods. Our set of pattern discovery algorithms including a user interface is distributed as a package called Genes@Work. This package is freely available to non-commercial users and can be downloaded from our website (http://www.research.ibm.com/FunGen).
Comprehensive Clinical Phenotyping and Genetic Mapping for the Discovery of Autism Susceptibility Genes

DTIC Science & Technology

2013-03-14

SUPPLEMENTARY NOTES 14. ABSTRACT Autism is an extremely common and heterogeneous neurodevelopmental disorder. While genetic factors are known to play...AFRL-SA-WP-TR-2013-0013 Comprehensive Clinical Phenotyping and Genetic Mapping for the Discovery of Autism Susceptibility Genes...Genetic Mapping for the Discovery of Autism Susceptibility Genes 5a. CONTRACT NUMBER N/A 5b. GRANT NUMBER N/A 5c. PROGRAM ELEMENT NUMBER N/A 6
Nearing saturation of cancer driver gene discovery.

PubMed

Hsiehchen, David; Hsieh, Antony

2018-06-15

Extensive sequencing efforts of cancer genomes such as The Cancer Genome Atlas (TCGA) have been undertaken to uncover bona fide cancer driver genes which has enhanced our understanding of cancer and revealed therapeutic targets. However, the number of driver gene mutations is bounded, indicating that there must be a point when further sequencing efforts will be excessive. We found that there was a significant positive correlation between sample size and identified driver gene mutations across 33 cancers sequenced by the TCGA, which is expected if additional sequencing is still leading to the identification of more driver genes. However, the rate of new cancer driver genes being discovered with larger samples is declining rapidly. Our analysis provides a general guide for determining which cancer types would likely benefit from additional sequencing efforts, particularly those with relatively high rates of cancer driver gene discovery. Our results argue that past strategies of indiscriminately sequencing as many specimens as possible for all cancer types is becoming inefficient. In addition, without significant investments into applying our knowledge of cancer genomes, we risk sequencing more cancer genomes for the sake of sequencing rather than meaningful patient benefit.
Standardized plant disease evaluations will enhance resistance gene discovery

USDA-ARS?s Scientific Manuscript database

Gene discovery and marker development using DNA-based tools require plant populations with well documented phenotypes. If dissimilar phenotype evaluation methods or data scoring techniques are employed with different crops, or at different labs for the same crops, then data mining for genetic marker...
Literature Mining for the Discovery of Hidden Connections between Drugs, Genes and Diseases

PubMed Central

Frijters, Raoul; van Vugt, Marianne; Smeets, Ruben; van Schaik, René; de Vlieg, Jacob; Alkema, Wynand

2010-01-01

The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs. PMID:20885778
Identification of differentially expressed genes and false discovery rate in microarray studies.

PubMed

Gusnanto, Arief; Calza, Stefano; Pawitan, Yudi

2007-04-01

To highlight the development in microarray data analysis for the identification of differentially expressed genes, particularly via control of false discovery rate. The emergence of high-throughput technology such as microarrays raises two fundamental statistical issues: multiplicity and sensitivity. We focus on the biological problem of identifying differentially expressed genes. First, multiplicity arises due to testing tens of thousands of hypotheses, rendering the standard P value meaningless. Second, known optimal single-test procedures such as the t-test perform poorly in the context of highly multiple tests. The standard approach of dealing with multiplicity is too conservative in the microarray context. The false discovery rate concept is fast becoming the key statistical assessment tool replacing the P value. We review the false discovery rate approach and argue that it is more sensible for microarray data. We also discuss some methods to take into account additional information from the microarrays to improve the false discovery rate. There is growing consensus on how to analyse microarray data using the false discovery rate framework in place of the classical P value. Further research is needed on the preprocessing of the raw data, such as the normalization step and filtering, and on finding the most sensitive test procedure.
Lung tumor diagnosis and subtype discovery by gene expression profiling.

PubMed

Wang, Lu-yong; Tu, Zhuowen

2006-01-01

The optimal treatment of patients with complex diseases, such as cancers, depends on the accurate diagnosis by using a combination of clinical and histopathological data. In many scenarios, it becomes tremendously difficult because of the limitations in clinical presentation and histopathology. To accurate diagnose complex diseases, the molecular classification based on gene or protein expression profiles are indispensable for modern medicine. Moreover, many heterogeneous diseases consist of various potential subtypes in molecular basis and differ remarkably in their response to therapies. It is critical to accurate predict subgroup on disease gene expression profiles. More fundamental knowledge of the molecular basis and classification of disease could aid in the prediction of patient outcome, the informed selection of therapies, and identification of novel molecular targets for therapy. In this paper, we propose a new disease diagnostic method, probabilistic boosting tree (PB tree) method, on gene expression profiles of lung tumors. It enables accurate disease classification and subtype discovery in disease. It automatically constructs a tree in which each node combines a number of weak classifiers into a strong classifier. Also, subtype discovery is naturally embedded in the learning process. Our algorithm achieves excellent diagnostic performance, and meanwhile it is capable of detecting the disease subtype based on gene expression profile.
iSyTE 2.0: a database for expression-based gene discovery in the eye

PubMed Central

Kakrana, Atul; Yang, Andrian; Anand, Deepti; Djordjevic, Djordje; Ramachandruni, Deepti; Singh, Abhyudai; Huang, Hongzhan

2018-01-01

Abstract Although successful in identifying new cataract-linked genes, the previous version of the database iSyTE (integrated Systems Tool for Eye gene discovery) was based on expression information on just three mouse lens stages and was functionally limited to visualization by only UCSC-Genome Browser tracks. To increase its efficacy, here we provide an enhanced iSyTE version 2.0 (URL: http://research.bioinformatics.udel.edu/iSyTE) based on well-curated, comprehensive genome-level lens expression data as a one-stop portal for the effective visualization and analysis of candidate genes in lens development and disease. iSyTE 2.0 includes all publicly available lens Affymetrix and Illumina microarray datasets representing a broad range of embryonic and postnatal stages from wild-type and specific gene-perturbation mouse mutants with eye defects. Further, we developed a new user-friendly web interface for direct access and cogent visualization of the curated expression data, which supports convenient searches and a range of downstream analyses. The utility of these new iSyTE 2.0 features is illustrated through examples of established genes associated with lens development and pathobiology, which serve as tutorials for its application by the end-user. iSyTE 2.0 will facilitate the prioritization of eye development and disease-linked candidate genes in studies involving transcriptomics or next-generation sequencing data, linkage analysis and GWAS approaches. PMID:29036527
STARNET 2: a web-based tool for accelerating discovery of gene regulatory networks using microarray co-expression data

PubMed Central

Jupiter, Daniel; Chen, Hailin; VanBuren, Vincent

2009-01-01

Background Although expression microarrays have become a standard tool used by biologists, analysis of data produced by microarray experiments may still present challenges. Comparison of data from different platforms, organisms, and labs may involve complicated data processing, and inferring relationships between genes remains difficult. Results STARNET 2 is a new web-based tool that allows post hoc visual analysis of correlations that are derived from expression microarray data. STARNET 2 facilitates user discovery of putative gene regulatory networks in a variety of species (human, rat, mouse, chicken, zebrafish, Drosophila, C. elegans, S. cerevisiae, Arabidopsis and rice) by graphing networks of genes that are closely co-expressed across a large heterogeneous set of preselected microarray experiments. For each of the represented organisms, raw microarray data were retrieved from NCBI's Gene Expression Omnibus for a selected Affymetrix platform. All pairwise Pearson correlation coefficients were computed for expression profiles measured on each platform, respectively. These precompiled results were stored in a MySQL database, and supplemented by additional data retrieved from NCBI. A web-based tool allows user-specified queries of the database, centered at a gene of interest. The result of a query includes graphs of correlation networks, graphs of known interactions involving genes and gene products that are present in the correlation networks, and initial statistical analyses. Two analyses may be performed in parallel to compare networks, which is facilitated by the new HEATSEEKER module. Conclusion STARNET 2 is a useful tool for developing new hypotheses about regulatory relationships between genes and gene products, and has coverage for 10 species. Interpretation of the correlation networks is supported with a database of previously documented interactions, a test for enrichment of Gene Ontology terms, and heat maps of correlation distances that may be used to
Use of Heuristics to Facilitate Scientific Discovery Learning in a Simulation Learning Environment in a Physics Domain

ERIC Educational Resources Information Center

Veermans, Koen; van Joolingen, Wouter; de Jong, Ton

2006-01-01

This article describes a study into the role of heuristic support in facilitating discovery learning through simulation-based learning. The study compares the use of two such learning environments in the physics domain of collisions. In one learning environment (implicit heuristics) heuristics are only used to provide the learner with guidance…
Watershed and Economic Data InterOperability (WEDO): Facilitating Discovery, Evaluation and Integration through the Sharing of Watershed Modeling Data

EPA Science Inventory

Watershed and Economic Data InterOperability (WEDO) is a system of information technologies designed to publish watershed modeling studies for reuse. WEDO facilitates three aspects of interoperability: discovery, evaluation and integration of data. This increased level of interop...
Turning publicly available gene expression data into discoveries using gene set context analysis.

PubMed

Ji, Zhicheng; Vokes, Steven A; Dang, Chi V; Ji, Hongkai

2016-01-08

Gene Set Context Analysis (GSCA) is an open source software package to help researchers use massive amounts of publicly available gene expression data (PED) to make discoveries. Users can interactively visualize and explore gene and gene set activities in 25,000+ consistently normalized human and mouse gene expression samples representing diverse biological contexts (e.g. different cells, tissues and disease types, etc.). By providing one or multiple genes or gene sets as input and specifying a gene set activity pattern of interest, users can query the expression compendium to systematically identify biological contexts associated with the specified gene set activity pattern. In this way, researchers with new gene sets from their own experiments may discover previously unknown contexts of gene set functions and hence increase the value of their experiments. GSCA has a graphical user interface (GUI). The GUI makes the analysis convenient and customizable. Analysis results can be conveniently exported as publication quality figures and tables. GSCA is available at https://github.com/zji90/GSCA. This software significantly lowers the bar for biomedical investigators to use PED in their daily research for generating and screening hypotheses, which was previously difficult because of the complexity, heterogeneity and size of the data. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

Discovery of agents that eradicate leukemia stem cells using an in silico screen of public gene expression data

PubMed Central

Hassane, Duane C.; Guzman, Monica L.; Corbett, Cheryl; Li, Xiaojie; Abboud, Ramzi; Young, Fay; Liesveld, Jane L.; Carroll, Martin

2008-01-01

Increasing evidence indicates that malignant stem cells are important for the pathogenesis of acute myelogenous leukemia (AML) and represent a reservoir of cells that drive the development of AML and relapse. Therefore, new treatment regimens are necessary to prevent relapse and improve therapeutic outcomes. Previous studies have shown that the sesquiterpene lactone, parthenolide (PTL), ablates bulk, progenitor, and stem AML cells while causing no appreciable toxicity to normal hematopoietic cells. Thus, PTL must evoke cellular responses capable of mediating AML selective cell death. Given recent advances in chemical genomics such as gene expression-based high-throughput screening (GE-HTS) and the Connectivity Map, we hypothesized that the gene expression signature resulting from treatment of primary AML with PTL could be used to search for similar signatures in publicly available gene expression profiles deposited into the Gene Expression Omnibus (GEO). We therefore devised a broad in silico screen of the GEO database using the PTL gene expression signature as a template and discovered 2 new agents, celastrol and 4-hydroxy-2-nonenal, that effectively eradicate AML at the bulk, progenitor, and stem cell level. These findings suggest the use of multicenter collections of high-throughput data to facilitate discovery of leukemia drugs and drug targets. PMID:18305216
Perceptual uncertainty facilitates creative discovery

NASA Astrophysics Data System (ADS)

Tseng, Winger Sei-Wo

2018-06-01

In this study, unstructured and ambiguous figures used as visual stimuli were classified as having high, moderate, and low ambiguity and presented to participants. The Experiment was designed to explore how the perceptual ambiguity that is inherent within presented visual cues can affect novice and expert designers' visual discovery during design development. A total number of 42 participants, half of them were recruited from non-design departments as novices. The remaining were chosen from design companies regarded as experts. The participants were tasked with discovering a sub-shape from the presented sketch and using this shape as a cue to design a concept. To this end, two types of sub-shapes were defined: known feature sub-shapes and innovative feature sub-shapes (IFSs). The experimental results strongly evidence that with an increase in the ambiguity of the visual stimuli, expert designers produce more ideas and IFSs, whereas novice designers produce fewer. The capability of expert designers to exploit visual ambiguity is interesting, and its absence in novice designers suggests that this capability is likely a unique skill gained, at least in part, through professional practice. Our results can be applied in design learning and education to generalize the principles and strategies of visual discovery by expert designers during concept sketching in order to train novice designers in addressing design problems.
SNP discovery in candidate adaptive genes using exon capture in a free-ranging alpine ungulate

USGS Publications Warehouse

Roffler, Gretchen H.; Amish, Stephen J.; Smith, Seth; Cosart, Ted F.; Kardos, Marty; Schwartz, Michael K.; Luikart, Gordon

2016-01-01

Identification of genes underlying genomic signatures of natural selection is key to understanding adaptation to local conditions. We used targeted resequencing to identify SNP markers in 5321 candidate adaptive genes associated with known immunological, metabolic and growth functions in ovids and other ungulates. We selectively targeted 8161 exons in protein-coding and nearby 5′ and 3′ untranslated regions of chosen candidate genes. Targeted sequences were taken from bighorn sheep (Ovis canadensis) exon capture data and directly from the domestic sheep genome (Ovis aries v. 3; oviAri3). The bighorn sheep sequences used in the Dall's sheep (Ovis dalli dalli) exon capture aligned to 2350 genes on the oviAri3 genome with an average of 2 exons each. We developed a microfluidic qPCR-based SNP chip to genotype 476 Dall's sheep from locations across their range and test for patterns of selection. Using multiple corroborating approaches (lositan and bayescan), we detected 28 SNP loci potentially under selection. We additionally identified candidate loci significantly associated with latitude, longitude, precipitation and temperature, suggesting local environmental adaptation. The three methods demonstrated consistent support for natural selection on nine genes with immune and disease-regulating functions (e.g. Ovar-DRA, APC, BATF2, MAGEB18), cell regulation signalling pathways (e.g. KRIT1, PI3K, ORRC3), and respiratory health (CYSLTR1). Characterizing adaptive allele distributions from novel genetic techniques will facilitate investigation of the influence of environmental variation on local adaptation of a northern alpine ungulate throughout its range. This research demonstrated the utility of exon capture for gene-targeted SNP discovery and subsequent SNP chip genotyping using low-quality samples in a nonmodel species.
Discovery of cancer common and specific driver gene sets

PubMed Central

2017-01-01

Abstract Cancer is known as a disease mainly caused by gene alterations. Discovery of mutated driver pathways or gene sets is becoming an important step to understand molecular mechanisms of carcinogenesis. However, systematically investigating commonalities and specificities of driver gene sets among multiple cancer types is still a great challenge, but this investigation will undoubtedly benefit deciphering cancers and will be helpful for personalized therapy and precision medicine in cancer treatment. In this study, we propose two optimization models to de novo discover common driver gene sets among multiple cancer types (ComMDP) and specific driver gene sets of one certain or multiple cancer types to other cancers (SpeMDP), respectively. We first apply ComMDP and SpeMDP to simulated data to validate their efficiency. Then, we further apply these methods to 12 cancer types from The Cancer Genome Atlas (TCGA) and obtain several biologically meaningful driver pathways. As examples, we construct a common cancer pathway model for BRCA and OV, infer a complex driver pathway model for BRCA carcinogenesis based on common driver gene sets of BRCA with eight cancer types, and investigate specific driver pathways of the liquid cancer lymphoblastic acute myeloid leukemia (LAML) versus other solid cancer types. In these processes more candidate cancer genes are also found. PMID:28168295
Genomics-Based Discovery of Plant Genes for Synthetic Biology of Terpenoid Fragrances: A Case Study in Sandalwood oil Biosynthesis.

PubMed

Celedon, J M; Bohlmann, J

2016-01-01

Terpenoid fragrances are powerful mediators of ecological interactions in nature and have a long history of traditional and modern industrial applications. Plants produce a great diversity of fragrant terpenoid metabolites, which make them a superb source of biosynthetic genes and enzymes. Advances in fragrance gene discovery have enabled new approaches in synthetic biology of high-value speciality molecules toward applications in the fragrance and flavor, food and beverage, cosmetics, and other industries. Rapid developments in transcriptome and genome sequencing of nonmodel plant species have accelerated the discovery of fragrance biosynthetic pathways. In parallel, advances in metabolic engineering of microbial and plant systems have established platforms for synthetic biology applications of some of the thousands of plant genes that underlie fragrance diversity. While many fragrance molecules (eg, simple monoterpenes) are abundant in readily renewable plant materials, some highly valuable fragrant terpenoids (eg, santalols, ambroxides) are rare in nature and interesting targets for synthetic biology. As a representative example for genomics/transcriptomics enabled gene and enzyme discovery, we describe a strategy used successfully for elucidation of a complete fragrance biosynthetic pathway in sandalwood (Santalum album) and its reconstruction in yeast (Saccharomyces cerevisiae). We address questions related to the discovery of specific genes within large gene families and recovery of rare gene transcripts that are selectively expressed in recalcitrant tissues. To substantiate the validity of the approaches, we describe the combination of methods used in the gene and enzyme discovery of a cytochrome P450 in the fragrant heartwood of tropical sandalwood, responsible for the fragrance defining, final step in the biosynthesis of (Z)-santalols. © 2016 Elsevier Inc. All rights reserved.
New Form Discovery for the Analgesics Flurbiprofen and Sulindac Facilitated by Polymer-Induced Heteronucleation

PubMed Central

GRZESIAK, ADAM L.; MATZGER, ADAM J.

2008-01-01

The selection and discovery of new crystalline forms is a longstanding issue in solid-state chemistry of critical importance because of the effect molecular packing arrangement exerts on materials properties. Polymer-induced heteronucleation has recently been developed as a powerful approach to discover and control the production of crystal modifications based on the insoluble polymer heteronucleant added to the crystallization solution. The selective nucleation and discovery of new crystal forms of the well-studied pharmaceuticals flurbiprofen (FBP) and sulindac (SUL) has been achieved utilizing this approach. For the first time, FBP form III was produced in bulk quantities and its crystal structure was also determined. Furthermore, a novel 3:2 FBP:H2O phase was discovered that nucleates selectively from only a few polymers. Crystallization of SUL in the presence of insoluble polymers facilitated the growth of form I single crystals suitable for structure determination. Additionally, a new SUL polymorph (form IV) was discovered by this method. The crystal forms of FBP and SUL are characterized by Raman and FTIR spectroscopies, X-ray diffraction, and differential scanning calorimetry. PMID:17567888
Discovery of a widely distributed toxin biosynthetic gene cluster

PubMed Central

Lee, Shaun W.; Mitchell, Douglas A.; Markley, Andrew L.; Hensler, Mary E.; Gonzalez, David; Wohlrab, Aaron; Dorrestein, Pieter C.; Nizet, Victor; Dixon, Jack E.

2008-01-01

Bacteriocins represent a large family of ribosomally produced peptide antibiotics. Here we describe the discovery of a widely conserved biosynthetic gene cluster for the synthesis of thiazole and oxazole heterocycles on ribosomally produced peptides. These clusters encode a toxin precursor and all necessary proteins for toxin maturation and export. Using the toxin precursor peptide and heterocycle-forming synthetase proteins from the human pathogen Streptococcus pyogenes, we demonstrate the in vitro reconstitution of streptolysin S activity. We provide evidence that the synthetase enzymes, as predicted from our bioinformatics analysis, introduce heterocycles onto precursor peptides, thereby providing molecular insight into the chemical structure of streptolysin S. Furthermore, our studies reveal that the synthetase exhibits relaxed substrate specificity and modifies toxin precursors from both related and distant species. Given our findings, it is likely that the discovery of similar peptidic toxins will rapidly expand to existing and emerging genomes. PMID:18375757
Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships.

PubMed

Seok, Junhee; Kaushal, Amit; Davis, Ronald W; Xiao, Wenzhong

2010-01-18

The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data.
MAGIC database and interfaces: an integrated package for gene discovery and expression.

PubMed

Cordonnier-Pratt, Marie-Michèle; Liang, Chun; Wang, Haiming; Kolychev, Dmitri S; Sun, Feng; Freeman, Robert; Sullivan, Robert; Pratt, Lee H

2004-01-01

The rapidly increasing rate at which biological data is being produced requires a corresponding growth in relational databases and associated tools that can help laboratories contend with that data. With this need in mind, we describe here a Modular Approach to a Genomic, Integrated and Comprehensive (MAGIC) Database. This Oracle 9i database derives from an initial focus in our laboratory on gene discovery via production and analysis of expressed sequence tags (ESTs), and subsequently on gene expression as assessed by both EST clustering and microarrays. The MAGIC Gene Discovery portion of the database focuses on information derived from DNA sequences and on its biological relevance. In addition to MAGIC SEQ-LIMS, which is designed to support activities in the laboratory, it contains several additional subschemas. The latter include MAGIC Admin for database administration, MAGIC Sequence for sequence processing as well as sequence and clone attributes, MAGIC Cluster for the results of EST clustering, MAGIC Polymorphism in support of microsatellite and single-nucleotide-polymorphism discovery, and MAGIC Annotation for electronic annotation by BLAST and BLAT. The MAGIC Microarray portion is a MIAME-compliant database with two components at present. These are MAGIC Array-LIMS, which makes possible remote entry of all information into the database, and MAGIC Array Analysis, which provides data mining and visualization. Because all aspects of interaction with the MAGIC Database are via a web browser, it is ideally suited not only for individual research laboratories but also for core facilities that serve clients at any distance.
In silico mining and PCR-based approaches to transcription factor discovery in non-model plants: gene discovery of the WRKY transcription factors in conifers.

PubMed

Liu, Jun-Jun; Xiang, Yu

2011-01-01

WRKY transcription factors are key regulators of numerous biological processes in plant growth and development, as well as plant responses to abiotic and biotic stresses. Research on biological functions of plant WRKY genes has focused in the past on model plant species or species with largely characterized transcriptomes. However, a variety of non-model plants, such as forest conifers, are essential as feed, biofuel, and wood or for sustainable ecosystems. Identification of WRKY genes in these non-model plants is equally important for understanding the evolutionary and function-adaptive processes of this transcription factor family. Because of limited genomic information, the rarity of regulatory gene mRNAs in transcriptomes, and the sequence divergence to model organism genes, identification of transcription factors in non-model plants using methods similar to those generally used for model plants is difficult. This chapter describes a gene family discovery strategy for identification of WRKY transcription factors in conifers by a combination of in silico-based prediction and PCR-based experimental approaches. Compared to traditional cDNA library screening or EST sequencing at transcriptome scales, this integrated gene discovery strategy provides fast, simple, reliable, and specific methods to unveil the WRKY gene family at both genome and transcriptome levels in non-model plants.
Gene Fusion Markup Language: a prototype for exchanging gene fusion data.

PubMed

Kalyana-Sundaram, Shanker; Shanmugam, Achiraman; Chinnaiyan, Arul M

2012-10-16

An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future. Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at http://code.google.com/p/gfml-prototype/. The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses.
Gene Fusion Markup Language: a prototype for exchanging gene fusion data

PubMed Central

2012-01-01

Background An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future. Results Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at http://code.google.com/p/gfml-prototype/. Conclusion The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses. PMID:23072312
Systematic Evaluation of Molecular Networks for Discovery of Disease Genes.

PubMed

Huang, Justin K; Carlin, Daniel E; Yu, Michael Ku; Zhang, Wei; Kreisberg, Jason F; Tamayo, Pablo; Ideker, Trey

2018-04-25

Gene networks are rapidly growing in size and number, raising the question of which networks are most appropriate for particular applications. Here, we evaluate 21 human genome-wide interaction networks for their ability to recover 446 disease gene sets identified through literature curation, gene expression profiling, or genome-wide association studies. While all networks have some ability to recover disease genes, we observe a wide range of performance with STRING, ConsensusPathDB, and GIANT networks having the best performance overall. A general tendency is that performance scales with network size, suggesting that new interaction discovery currently outweighs the detrimental effects of false positives. Correcting for size, we find that the DIP network provides the highest efficiency (value per interaction). Based on these results, we create a parsimonious composite network with both high efficiency and performance. This work provides a benchmark for selection of molecular networks in human disease research. Copyright © 2018 Elsevier Inc. All rights reserved.
Modern plant metabolomics: Advanced natural product gene discoveries, improved technologies, and future prospects

DOE PAGES

Sumner, Lloyd W.; Lei, Zhentian; Nikolau, Basil J.; ...

2014-10-24

Plant metabolomics has matured and modern plant metabolomics has accelerated gene discoveries and the elucidation of a variety of plant natural product biosynthetic pathways. This study highlights specific examples of the discovery and characterization of novel genes and enzymes associated with the biosynthesis of natural products such as flavonoids, glucosinolates, terpenoids, and alkaloids. Additional examples of the integration of metabolomics with genome-based functional characterizations of plant natural products that are important to modern pharmaceutical technology are also reviewed. This article also provides a substantial review of recent technical advances in mass spectrometry imaging, nuclear magnetic resonance imaging, integrated LC-MS-SPE-NMR formore » metabolite identifications, and x-ray crystallography of microgram quantities for structural determinations. The review closes with a discussion on the future prospects of metabolomics related to crop species and herbal medicine.« less
High-density genetic map using whole-genome resequencing for fine mapping and candidate gene discovery for disease resistance in peanut.

PubMed

Agarwal, Gaurav; Clevenger, Josh; Pandey, Manish K; Wang, Hui; Shasidhar, Yaduru; Chu, Ye; Fountain, Jake C; Choudhary, Divya; Culbreath, Albert K; Liu, Xin; Huang, Guodong; Wang, Xingjun; Deshmukh, Rupesh; Holbrook, C Corley; Bertioli, David J; Ozias-Akins, Peggy; Jackson, Scott A; Varshney, Rajeev K; Guo, Baozhu

2018-04-10

Whole-genome resequencing (WGRS) of mapping populations has facilitated development of high-density genetic maps essential for fine mapping and candidate gene discovery for traits of interest in crop species. Leaf spots, including early leaf spot (ELS) and late leaf spot (LLS), and Tomato spotted wilt virus (TSWV) are devastating diseases in peanut causing significant yield loss. We generated WGRS data on a recombinant inbred line population, developed a SNP-based high-density genetic map, and conducted fine mapping, candidate gene discovery and marker validation for ELS, LLS and TSWV. The first sequence-based high-density map was constructed with 8869 SNPs assigned to 20 linkage groups, representing 20 chromosomes, for the 'T' population (Tifrunner × GT-C20) with a map length of 3120 cM and an average distance of 1.45 cM. The quantitative trait locus (QTL) analysis using high-density genetic map and multiple season phenotyping data identified 35 main-effect QTLs with phenotypic variation explained (PVE) from 6.32% to 47.63%. Among major-effect QTLs mapped, there were two QTLs for ELS on B05 with 47.42% PVE and B03 with 47.38% PVE, two QTLs for LLS on A05 with 47.63% and B03 with 34.03% PVE and one QTL for TSWV on B09 with 40.71% PVE. The epistasis and environment interaction analyses identified significant environmental effects on these traits. The identified QTL regions had disease resistance genes including R-genes and transcription factors. KASP markers were developed for major QTLs and validated in the population and are ready for further deployment in genomics-assisted breeding in peanut. © 2018 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Culture-independent discovery of natural products from soil metagenomes.

PubMed

Katz, Micah; Hover, Bradley M; Brady, Sean F

2016-03-01

Bacterial natural products have proven to be invaluable starting points in the development of many currently used therapeutic agents. Unfortunately, traditional culture-based methods for natural product discovery have been deemphasized by pharmaceutical companies due in large part to high rediscovery rates. Culture-independent, or "metagenomic," methods, which rely on the heterologous expression of DNA extracted directly from environmental samples (eDNA), have the potential to provide access to metabolites encoded by a large fraction of the earth's microbial biosynthetic diversity. As soil is both ubiquitous and rich in bacterial diversity, it is an appealing starting point for culture-independent natural product discovery efforts. This review provides an overview of the history of soil metagenome-driven natural product discovery studies and elaborates on the recent development of new tools for sequence-based, high-throughput profiling of environmental samples used in discovering novel natural product biosynthetic gene clusters. We conclude with several examples of these new tools being employed to facilitate the recovery of novel secondary metabolite encoding gene clusters from soil metagenomes and the subsequent heterologous expression of these clusters to produce bioactive small molecules.
Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships

PubMed Central

2010-01-01

Background The large amount of high-throughput genomic data has facilitated the discovery of the regulatory relationships between transcription factors and their target genes. While early methods for discovery of transcriptional regulation relationships from microarray data often focused on the high-throughput experimental data alone, more recent approaches have explored the integration of external knowledge bases of gene interactions. Results In this work, we develop an algorithm that provides improved performance in the prediction of transcriptional regulatory relationships by supplementing the analysis of microarray data with a new method of integrating information from an existing knowledge base. Using a well-known dataset of yeast microarrays and the Yeast Proteome Database, a comprehensive collection of known information of yeast genes, we show that knowledge-based predictions demonstrate better sensitivity and specificity in inferring new transcriptional interactions than predictions from microarray data alone. We also show that comprehensive, direct and high-quality knowledge bases provide better prediction performance. Comparison of our results with ChIP-chip data and growth fitness data suggests that our predicted genome-wide regulatory pairs in yeast are reasonable candidates for follow-up biological verification. Conclusion High quality, comprehensive, and direct knowledge bases, when combined with appropriate bioinformatic algorithms, can significantly improve the discovery of gene regulatory relationships from high throughput gene expression data. PMID:20122245
VISIONET: intuitive visualisation of overlapping transcription factor networks, with applications in cardiogenic gene discovery.

PubMed

Nim, Hieu T; Furtado, Milena B; Costa, Mauro W; Rosenthal, Nadia A; Kitano, Hiroaki; Boyd, Sarah E

2015-05-01

Existing de novo software platforms have largely overlooked a valuable resource, the expertise of the intended biologist users. Typical data representations such as long gene lists, or highly dense and overlapping transcription factor networks often hinder biologists from relating these results to their expertise. VISIONET, a streamlined visualisation tool built from experimental needs, enables biologists to transform large and dense overlapping transcription factor networks into sparse human-readable graphs via numerically filtering. The VISIONET interface allows users without a computing background to interactively explore and filter their data, and empowers them to apply their specialist knowledge on far more complex and substantial data sets than is currently possible. Applying VISIONET to the Tbx20-Gata4 transcription factor network led to the discovery and validation of Aldh1a2, an essential developmental gene associated with various important cardiac disorders, as a healthy adult cardiac fibroblast gene co-regulated by cardiogenic transcription factors Gata4 and Tbx20. We demonstrate with experimental validations the utility of VISIONET for expertise-driven gene discovery that opens new experimental directions that would not otherwise have been identified.
IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes.

PubMed

Hadjithomas, Michalis; Chen, I-Min A; Chu, Ken; Huang, Jinghua; Ratner, Anna; Palaniappan, Krishna; Andersen, Evan; Markowitz, Victor; Kyrpides, Nikos C; Ivanova, Natalia N

2017-01-04

Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes

DOE PAGES

Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken; ...

2016-11-29

Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less

IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hadjithomas, Michalis; Chen, I-Min A.; Chu, Ken

Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic genemore » clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.« less
Toward a Data Scalable Solution for Facilitating Discovery of Science Resources

DOE Office of Scientific and Technical Information (OSTI.GOV)

Weaver, Jesse R.; Castellana, Vito G.; Morari, Alessandro

Science is increasingly motivated by the need to process larger quantities of data. It is facing severe challenges in data collection, management, and processing, so much so that the computational demands of “data scaling” are competing with, and in many fields surpassing, the traditional objective of decreasing processing time. Example domains with large datasets include astronomy, biology, genomics, climate/weather, and material sciences. This paper presents a real-world use case in which we wish to answer queries pro- vided by domain scientists in order to facilitate discovery of relevant science resources. The problem is that the metadata for these science resourcesmore » is very large and is growing quickly, rapidly increasing the need for a data scaling solution. We propose a system – SGEM – designed for answering graph-based queries over large datasets on cluster architectures, and we re- port performance results for queries on the current RDESC dataset of nearly 1.4 billion triples, and on the well-known BSBM SPARQL query benchmark.« less
Pine Gene Discovery Project - Final Report - 08/31/1997 - 02/28/2001

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whetten, R. W.; Sederoff, R. R.; Kinlaw, C.

2001-04-30

Integration of pines into the large scope of plant biology research depends on study of pines in parallel with study of annual plants, and on availability of research materials from pine to plant biologists interested in comparing pine with annual plant systems. The objectives of the Pine Gene Discovery Project were to obtain 10,000 partial DNA sequences of genes expressed in loblolly pine, to determine which of those pine genes were similar to known genes from other organisms, and to make the DNA sequences and isolated pine genes available to plant researchers to stimulate integration of pines into the widermore » scope of plant biology research. Those objectives have been completed, and the results are available to the public. Requests for pine genes have been received from a number of laboratories that would otherwise not have included pine in their research, indicating that progress is being made toward the goal of integrating pine research into the larger molecular biology research community.« less
Systematic prediction of gene function in Arabidopsis thaliana using a probabilistic functional gene network

PubMed Central

Hwang, Sohyun; Rhee, Seung Y; Marcotte, Edward M; Lee, Insuk

2012-01-01

AraNet is a functional gene network for the reference plant Arabidopsis and has been constructed in order to identify new genes associated with plant traits. It is highly predictive for diverse biological pathways and can be used to prioritize genes for functional screens. Moreover, AraNet provides a web-based tool with which plant biologists can efficiently discover novel functions of Arabidopsis genes (http://www.functionalnet.org/aranet/). This protocol explains how to conduct network-based prediction of gene functions using AraNet and how to interpret the prediction results. Functional discovery in plant biology is facilitated by combining candidate prioritization by AraNet with focused experimental tests. PMID:21886106
45 CFR 156.935 - Discovery.

Code of Federal Regulations, 2014 CFR

2014-10-01

... 45 Public Welfare 1 2014-10-01 2014-10-01 false Discovery. 156.935 Section 156.935 Public Welfare... QHP Issuer Sanctions in Federally-Facilitated Exchanges § 156.935 Discovery. (a) The parties must identify any need for discovery from the opposing party as soon as possible, but no later than the time for...
Classification of Genes and Putative Biomarker Identification Using Distribution Metrics on Expression Profiles

PubMed Central

Huang, Hung-Chung; Jupiter, Daniel; VanBuren, Vincent

2010-01-01

Background Identification of genes with switch-like properties will facilitate discovery of regulatory mechanisms that underlie these properties, and will provide knowledge for the appropriate application of Boolean networks in gene regulatory models. As switch-like behavior is likely associated with tissue-specific expression, these gene products are expected to be plausible candidates as tissue-specific biomarkers. Methodology/Principal Findings In a systematic classification of genes and search for biomarkers, gene expression profiles (GEPs) of more than 16,000 genes from 2,145 mouse array samples were analyzed. Four distribution metrics (mean, standard deviation, kurtosis and skewness) were used to classify GEPs into four categories: predominantly-off, predominantly-on, graded (rheostatic), and switch-like genes. The arrays under study were also grouped and examined by tissue type. For example, arrays were categorized as ‘brain group’ and ‘non-brain group’; the Kolmogorov-Smirnov distance and Pearson correlation coefficient were then used to compare GEPs between brain and non-brain for each gene. We were thus able to identify tissue-specific biomarker candidate genes. Conclusions/Significance The methodology employed here may be used to facilitate disease-specific biomarker discovery. PMID:20140228
Comprehensive Clinical Phenotyping & Genetic Mapping for the Discovery of Autism Susceptibility Genes

DTIC Science & Technology

2012-12-05

Bisgaier J, Levinson D, Cutts DB, & Rhodes KV., (2011) Access to autism evaluation appointments with developmental-behavioral and neurodevelopmental ...W403 Columbus, OH 43205 Final Report Comprehensive Clinical Phenotyping & Genetic Mapping for the Discovery of Autism Susceptibility Genes...QFOXGHDUHDFRGH 1.0 Summary In 2006, the Central Ohio Registry for Autism (CORA) was initiated as a collaboration between Wright-Patterson Air
Whole-genome resequencing: changing the paradigms of SNP detection, molecular mapping and gene discovery

USDA-ARS?s Scientific Manuscript database

The next generation sequencing (NGS) technologies have opened a wealth of opportunities for plant breeding and genomics research, and changed the paradigms of marker detection, genotyping, and gene discovery. Abundant genomic resources have been generated using a whole genome resequencing (WGR) str...
An incoherent feedforward loop facilitates adaptive tuning of gene expression.

PubMed

Hong, Jungeui; Brandt, Nathan; Abdul-Rahman, Farah; Yang, Ally; Hughes, Tim; Gresham, David

2018-04-05

We studied adaptive evolution of gene expression using long-term experimental evolution of Saccharomyces cerevisiae in ammonium-limited chemostats. We found repeated selection for non-synonymous variation in the DNA binding domain of the transcriptional activator, GAT1, which functions with the repressor, DAL80 in an incoherent type-1 feedforward loop (I1-FFL) to control expression of the high affinity ammonium transporter gene, MEP2. Missense mutations in the DNA binding domain of GAT1 reduce its binding to the GATAA consensus sequence. However, we show experimentally, and using mathematical modeling, that decreases in GAT1 binding result in increased expression of MEP2 as a consequence of properties of I1-FFLs. Our results show that I1-FFLs, one of the most commonly occurring network motifs in transcriptional networks, can facilitate adaptive tuning of gene expression through modulation of transcription factor binding affinities. Our findings highlight the importance of gene regulatory architectures in the evolution of gene expression. © 2018, Hong et al.
Adeno-associated virus at 50: a golden anniversary of discovery, research, and gene therapy success--a personal perspective.

PubMed

Hastie, Eric; Samulski, R Jude

2015-05-01

Fifty years after the discovery of adeno-associated virus (AAV) and more than 30 years after the first gene transfer experiment was conducted, dozens of gene therapy clinical trials are in progress, one vector is approved for use in Europe, and breakthroughs in virus modification and disease modeling are paving the way for a revolution in the treatment of rare diseases, cancer, as well as HIV. This review will provide a historical perspective on the progression of AAV for gene therapy from discovery to the clinic, focusing on contributions from the Samulski lab regarding basic science and cloning of AAV, optimized large-scale production of vectors, preclinical large animal studies and safety data, vector modifications for improved efficacy, and successful clinical applications.
Genome-wide ENU mutagenesis for the discovery of novel male fertility regulators.

PubMed

Jamsai, Duangporn; O'Bryan, Moira K

2010-06-01

The completion of genome sequencing projects has provided an extensive knowledge of the contents of the genomes of human, mouse, and many other organisms. Despite this, the function of most of the estimated 25,000 human genes remains largely unknown. Attention has now turned to elucidating gene function and identifying biological pathways that contribute to human diseases, including male infertility. Our understanding of the genetic regulation of male fertility has been accelerated through the use of genetically modified mouse models including knockout, knock-in, gene-trapped, and transgenic mice. Such reverse genetic approaches however, require some fore-knowledge of a gene's function and, as such, bias against the discovery of completely novel genes and biological pathways. To facilitate high throughput gene discovery, genome-wide mouse mutagenesis via the use of a potent chemical mutagen, N-ethyl-N-nitrosourea (ENU), has been developed over the past decade. This forward genetic, or phenotype-driven, approach relies upon observing a phenotype first, then subsequently defining the underlining genetic defect. Mutations are randomly introduced into the mouse genome via ENU exposure. Through a controlled breeding scheme, mutations causing a phenotype of interest (e.g., male infertility) are then identified by linkage analysis and candidate gene sequencing. This approach allows for the possibility of revealing comprehensive phenotype-genotype relationships for a range of genes and pathways i.e. in addition to null alleles, mice containing partial loss of function or gain-of-function mutations, can be recovered. Such point mutations are likely to be more reflective of those that occur within the human population. Many research groups have successfully used this approach to generate infertile mouse lines and some novel male fertility genes have been revealed. In this review, we focus on the utility of ENU mutagenesis for the discovery of novel male fertility regulators.
Discovery of rare protein-coding genes in model methylotroph Methylobacterium extorquens AM1.

PubMed

Kumar, Dhirendra; Mondal, Anupam Kumar; Yadav, Amit Kumar; Dash, Debasis

2014-12-01

Proteogenomics involves the use of MS to refine annotation of protein-coding genes and discover genes in a genome. We carried out comprehensive proteogenomic analysis of Methylobacterium extorquens AM1 (ME-AM1) from publicly available proteomics data with a motive to improve annotation for methylotrophs; organisms capable of surviving in reduced carbon compounds such as methanol. Besides identifying 2482(50%) proteins, 29 new genes were discovered and 66 annotated gene models were revised in ME-AM1 genome. One such novel gene is identified with 75 peptides, lacks homolog in other methylobacteria but has glycosyl transferase and lipopolysaccharide biosynthesis protein domains, indicating its potential role in outer membrane synthesis. Many novel genes are present only in ME-AM1 among methylobacteria. Distant homologs of these genes in unrelated taxonomic classes and low GC-content of few genes suggest lateral gene transfer as a potential mode of their origin. Annotations of methylotrophy related genes were also improved by the discovery of a short gene in methylotrophy gene island and redefining a gene important for pyrroquinoline quinone synthesis, essential for methylotrophy. The combined use of proteogenomics and rigorous bioinformatics analysis greatly enhanced the annotation of protein-coding genes in model methylotroph ME-AM1 genome. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
De Novo Assembly of Auricularia polytricha Transcriptome Using Illumina Sequencing for Gene Discovery and SSR Marker Identification

PubMed Central

Zhou, Yan; Chen, Lianfu; Fan, Xiuzhi; Bian, Yinbing

2014-01-01

Auricularia polytricha (Mont.) Sacc., a type of edible black-brown mushroom with a gelatinous and modality-specific fruiting body, is in high demand in Asia due to its nutritional and medicinal properties. Illumina Solexa sequenceing technology was used to generate very large transcript sequences from the mycelium and the mature fruiting body of A. polytricha for gene discovery and molecular marker development. De novo assembly generated 36,483 ESTs with an N50 length of 636 bp. A total of 28,108 ESTs demonstrated significant hits with known proteins in the nr database, and 94.03% of the annotated ESTs showed the greatest similarity to A. delicata, a related species of A. polytricha. Functional categorization of the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways revealed the conservation of genes involved in various biological processes in A. polytricha. Gene expression profile analysis indicated that a total of 2,057 ESTs were differentially expressed, including 1,020 ESTs that were up-regulated in the mycelium and 1,037 up-regulated in the fruiting body. Functional enrichment showed that the ESTs associated with biosynthesis, metabolism and assembly of proteins were more active in fruiting body development. The expression patterns of homologous transcription factors indicated that the molecular mechanisms of fruiting body formation and development were not exactly the same as for other agarics. Interestingly, an EST encoding tyrosinase was significantly up-regulated in the fruiting body, indicating that melanins accumulated during the processes of the formation of the black-brown color of the fruiting body in A. polytricha development. In addition, a total of 1,715 potential SSRs were detected in this transcriptome. The transcriptome analysis of A. polytricha provides valuable sequence resources and numerous molecular markers to facilitate further functional genomics studies and
Automated Discovery of Functional Generality of Human Gene Expression Programs

PubMed Central

Gerber, Georg K; Dowell, Robin D; Jaakkola, Tommi S; Gifford, David K

2007-01-01

An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-κB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal “cross-talk,” and
Discovery of new candidate genes related to brain development using protein interaction information.

PubMed

Chen, Lei; Chu, Chen; Kong, Xiangyin; Huang, Tao; Cai, Yu-Dong

2015-01-01

Human brain development is a dramatic process composed of a series of complex and fine-tuned spatiotemporal gene expressions. A good comprehension of this process can assist us in developing the potential of our brain. However, we have only limited knowledge about the genes and gene functions that are involved in this biological process. Therefore, a substantial demand remains to discover new brain development-related genes and identify their biological functions. In this study, we aimed to discover new brain-development related genes by building a computational method. We referred to a series of computational methods used to discover new disease-related genes and developed a similar method. In this method, the shortest path algorithm was executed on a weighted graph that was constructed using protein-protein interactions. New candidate genes fell on at least one of the shortest paths connecting two known genes that are related to brain development. A randomization test was then adopted to filter positive discoveries. Of the final identified genes, several have been reported to be associated with brain development, indicating the effectiveness of the method, whereas several of the others may have potential roles in brain development.
Sex-Specific Associations between Particulate Matter Exposure and Gene Expression in Independent Discovery and Validation Cohorts of Middle-Aged Men and Women.

PubMed

Vrijens, Karen; Winckelmans, Ellen; Tsamou, Maria; Baeyens, Willy; De Boever, Patrick; Jennen, Danyel; de Kok, Theo M; Den Hond, Elly; Lefebvre, Wouter; Plusquin, Michelle; Reynders, Hans; Schoeters, Greet; Van Larebeke, Nicolas; Vanpoucke, Charlotte; Kleinjans, Jos; Nawrot, Tim S

2017-04-01

Particulate matter (PM) exposure leads to premature death, mainly due to respiratory and cardiovascular diseases. Identification of transcriptomic biomarkers of air pollution exposure and effect in a healthy adult population. Microarray analyses were performed in 98 healthy volunteers (48 men, 50 women). The expression of eight sex-specific candidate biomarker genes (significantly associated with PM 10 in the discovery cohort and with a reported link to air pollution-related disease) was measured with qPCR in an independent validation cohort (75 men, 94 women). Pathway analysis was performed using Gene Set Enrichment Analysis. Average daily PM 2.5 and PM 10 exposures over 2-years were estimated for each participant's residential address using spatiotemporal interpolation in combination with a dispersion model. Average long-term PM 10 was 25.9 (± 5.4) and 23.7 (± 2.3) μg/m 3 in the discovery and validation cohorts, respectively. In discovery analysis, associations between PM 10 and the expression of individual genes differed by sex. In the validation cohort, long-term PM 10 was associated with the expression of DNAJB5 and EAPP in men and ARHGAP4 ( p = 0.053) in women. AKAP6 and LIMK1 were significantly associated with PM 10 in women, although associations differed in direction between the discovery and validation cohorts. Expression of the eight candidate genes in the discovery cohort differentiated between validation cohort participants with high versus low PM 10 exposure (area under the receiver operating curve = 0.92; 95% CI: 0.85, 1.00; p = 0.0002 in men, 0.86; 95% CI: 0.76, 0.96; p = 0.004 in women). Expression of the sex-specific candidate genes identified in the discovery population predicted PM 10 exposure in an independent cohort of adults from the same area. Confirmation in other populations may further support this as a new approach for exposure assessment, and may contribute to the discovery of molecular mechanisms for PM-induced health effects.
Gene discovery by chemical mutagenesis and whole-genome sequencing in Dictyostelium.

PubMed

Li, Cheng-Lin Frank; Santhanam, Balaji; Webb, Amanda Nicole; Zupan, Blaž; Shaulsky, Gad

2016-09-01

Whole-genome sequencing is a useful approach for identification of chemical-induced lesions, but previous applications involved tedious genetic mapping to pinpoint the causative mutations. We propose that saturation mutagenesis under low mutagenic loads, followed by whole-genome sequencing, should allow direct implication of genes by identifying multiple independent alleles of each relevant gene. We tested the hypothesis by performing three genetic screens with chemical mutagenesis in the social soil amoeba Dictyostelium discoideum Through genome sequencing, we successfully identified mutant genes with multiple alleles in near-saturation screens, including resistance to intense illumination and strong suppressors of defects in an allorecognition pathway. We tested the causality of the mutations by comparison to published data and by direct complementation tests, finding both dominant and recessive causative mutations. Therefore, our strategy provides a cost- and time-efficient approach to gene discovery by integrating chemical mutagenesis and whole-genome sequencing. The method should be applicable to many microbial systems, and it is expected to revolutionize the field of functional genomics in Dictyostelium by greatly expanding the mutation spectrum relative to other common mutagenesis methods. © 2016 Li et al.; Published by Cold Spring Harbor Laboratory Press.
Evaluation of tools for highly variable gene discovery from single-cell RNA-seq data.

PubMed

Yip, Shun H; Sham, Pak Chung; Wang, Junwen

2018-02-21

Traditional RNA sequencing (RNA-seq) allows the detection of gene expression variations between two or more cell populations through differentially expressed gene (DEG) analysis. However, genes that contribute to cell-to-cell differences are not discoverable with RNA-seq because RNA-seq samples are obtained from a mixture of cells. Single-cell RNA-seq (scRNA-seq) allows the detection of gene expression in each cell. With scRNA-seq, highly variable gene (HVG) discovery allows the detection of genes that contribute strongly to cell-to-cell variation within a homogeneous cell population, such as a population of embryonic stem cells. This analysis is implemented in many software packages. In this study, we compare seven HVG methods from six software packages, including BASiCS, Brennecke, scLVM, scran, scVEGs and Seurat. Our results demonstrate that reproducibility in HVG analysis requires a larger sample size than DEG analysis. Discrepancies between methods and potential issues in these tools are discussed and recommendations are made.
The Matchmaker Exchange: A Platform for Rare Disease Gene Discovery

PubMed Central

Philippakis, Anthony A.; Azzariti, Danielle R.; Beltran, Sergi; Brookes, Anthony J.; Brownstein, Catherine A.; Brudno, Michael; Brunner, Han G.; Buske, Orion J.; Carey, Knox; Doll, Cassie; Dumitriu, Sergiu; Dyke, Stephanie O.M.; den Dunnen, Johan T.; Firth, Helen V.; Gibbs, Richard A.; Girdea, Marta; Gonzalez, Michael; Haendel, Melissa A.; Hamosh, Ada; Holm, Ingrid A.; Huang, Lijia; Hurles, Matthew E.; Hutton, Ben; Krier, Joel B.; Misyura, Andriy; Mungall, Christopher J.; Paschall, Justin; Paten, Benedict; Robinson, Peter N.; Schiettecatte, François; Sobreira, Nara L.; Swaminathan, Ganesh J.; Taschner, Peter E.; Terry, Sharon F.; Washington, Nicole L.; Züchner, Stephan; Boycott, Kym M.; Rehm, Heidi L.

2015-01-01

There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for “the needle in a haystack” to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can “match” these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow. PMID:26295439
Sex-Specific Associations between Particulate Matter Exposure and Gene Expression in Independent Discovery and Validation Cohorts of Middle-Aged Men and Women

PubMed Central

Vrijens, Karen; Winckelmans, Ellen; Tsamou, Maria; Baeyens, Willy; De Boever, Patrick; Jennen, Danyel; de Kok, Theo M.; Den Hond, Elly; Lefebvre, Wouter; Plusquin, Michelle; Reynders, Hans; Schoeters, Greet; Van Larebeke, Nicolas; Vanpoucke, Charlotte; Kleinjans, Jos; Nawrot, Tim S.

2016-01-01

Background: Particulate matter (PM) exposure leads to premature death, mainly due to respiratory and cardiovascular diseases. Objectives: Identification of transcriptomic biomarkers of air pollution exposure and effect in a healthy adult population. Methods: Microarray analyses were performed in 98 healthy volunteers (48 men, 50 women). The expression of eight sex-specific candidate biomarker genes (significantly associated with PM10 in the discovery cohort and with a reported link to air pollution-related disease) was measured with qPCR in an independent validation cohort (75 men, 94 women). Pathway analysis was performed using Gene Set Enrichment Analysis. Average daily PM2.5 and PM10 exposures over 2-years were estimated for each participant’s residential address using spatiotemporal interpolation in combination with a dispersion model. Results: Average long-term PM10 was 25.9 (± 5.4) and 23.7 (± 2.3) μg/m3 in the discovery and validation cohorts, respectively. In discovery analysis, associations between PM10 and the expression of individual genes differed by sex. In the validation cohort, long-term PM10 was associated with the expression of DNAJB5 and EAPP in men and ARHGAP4 (p = 0.053) in women. AKAP6 and LIMK1 were significantly associated with PM10 in women, although associations differed in direction between the discovery and validation cohorts. Expression of the eight candidate genes in the discovery cohort differentiated between validation cohort participants with high versus low PM10 exposure (area under the receiver operating curve = 0.92; 95% CI: 0.85, 1.00; p = 0.0002 in men, 0.86; 95% CI: 0.76, 0.96; p = 0.004 in women). Conclusions: Expression of the sex-specific candidate genes identified in the discovery population predicted PM10 exposure in an independent cohort of adults from the same area. Confirmation in other populations may further support this as a new approach for exposure assessment, and may contribute to the discovery of molecular

Peroxidase gene discovery from the horseradish transcriptome.

PubMed

Näätsaari, Laura; Krainer, Florian W; Schubert, Michael; Glieder, Anton; Thallinger, Gerhard G

2014-03-24

Horseradish peroxidases (HRPs) from Armoracia rusticana have long been utilized as reporters in various diagnostic assays and histochemical stainings. Regardless of their increasing importance in the field of life sciences and suggested uses in medical applications, chemical synthesis and other industrial applications, the HRP isoenzymes, their substrate specificities and enzymatic properties are poorly characterized. Due to lacking sequence information of natural isoenzymes and the low levels of HRP expression in heterologous hosts, commercially available HRP is still extracted as a mixture of isoenzymes from the roots of A. rusticana. In this study, a normalized, size-selected A. rusticana transcriptome library was sequenced using 454 Titanium technology. The resulting reads were assembled into 14871 isotigs with an average length of 1133 bp. Sequence databases, ORF finding and ORF characterization were utilized to identify peroxidase genes from the 14871 isotigs generated by de novo assembly. The sequences were manually reviewed and verified with Sanger sequencing of PCR amplified genomic fragments, resulting in the discovery of 28 secretory peroxidases, 23 of them previously unknown. A total of 22 isoenzymes including allelic variants were successfully expressed in Pichia pastoris and showed peroxidase activity with at least one of the substrates tested, thus enabling their development into commercial pure isoenzymes. This study demonstrates that transcriptome sequencing combined with sequence motif search is a powerful concept for the discovery and quick supply of new enzymes and isoenzymes from any plant or other eukaryotic organisms. Identification and manual verification of the sequences of 28 HRP isoenzymes do not only contribute a set of peroxidases for industrial, biological and biomedical applications, but also provide valuable information on the reliability of the approach in identifying and characterizing a large group of isoenzymes.
Peroxidase gene discovery from the horseradish transcriptome

PubMed Central

2014-01-01

Background Horseradish peroxidases (HRPs) from Armoracia rusticana have long been utilized as reporters in various diagnostic assays and histochemical stainings. Regardless of their increasing importance in the field of life sciences and suggested uses in medical applications, chemical synthesis and other industrial applications, the HRP isoenzymes, their substrate specificities and enzymatic properties are poorly characterized. Due to lacking sequence information of natural isoenzymes and the low levels of HRP expression in heterologous hosts, commercially available HRP is still extracted as a mixture of isoenzymes from the roots of A. rusticana. Results In this study, a normalized, size-selected A. rusticana transcriptome library was sequenced using 454 Titanium technology. The resulting reads were assembled into 14871 isotigs with an average length of 1133 bp. Sequence databases, ORF finding and ORF characterization were utilized to identify peroxidase genes from the 14871 isotigs generated by de novo assembly. The sequences were manually reviewed and verified with Sanger sequencing of PCR amplified genomic fragments, resulting in the discovery of 28 secretory peroxidases, 23 of them previously unknown. A total of 22 isoenzymes including allelic variants were successfully expressed in Pichia pastoris and showed peroxidase activity with at least one of the substrates tested, thus enabling their development into commercial pure isoenzymes. Conclusions This study demonstrates that transcriptome sequencing combined with sequence motif search is a powerful concept for the discovery and quick supply of new enzymes and isoenzymes from any plant or other eukaryotic organisms. Identification and manual verification of the sequences of 28 HRP isoenzymes do not only contribute a set of peroxidases for industrial, biological and biomedical applications, but also provide valuable information on the reliability of the approach in identifying and characterizing a large group
Improving Interpretation of Cardiac Phenotypes and Enhancing Discovery With Expanded Knowledge in the Gene Ontology

PubMed Central

Roncaglia, Paola; Howe, Douglas G.; Laulederkind, Stanley J.F.; Khodiyar, Varsha K.; Berardini, Tanya Z.; Tweedie, Susan; Foulger, Rebecca E.; Osumi-Sutherland, David; Campbell, Nancy H.; Huntley, Rachael P.; Talmud, Philippa J.; Blake, Judith A.; Breckenridge, Ross; Riley, Paul R.; Lambiase, Pier D.; Elliott, Perry M.; Clapp, Lucie; Tinker, Andrew; Hill, David P.

2018-01-01

Background: A systems biology approach to cardiac physiology requires a comprehensive representation of how coordinated processes operate in the heart, as well as the ability to interpret relevant transcriptomic and proteomic experiments. The Gene Ontology (GO) Consortium provides structured, controlled vocabularies of biological terms that can be used to summarize and analyze functional knowledge for gene products. Methods and Results: In this study, we created a computational resource to facilitate genetic studies of cardiac physiology by integrating literature curation with attention to an improved and expanded ontological representation of heart processes in the Gene Ontology. As a result, the Gene Ontology now contains terms that comprehensively describe the roles of proteins in cardiac muscle cell action potential, electrical coupling, and the transmission of the electrical impulse from the sinoatrial node to the ventricles. Evaluating the effectiveness of this approach to inform data analysis demonstrated that Gene Ontology annotations, analyzed within an expanded ontological context of heart processes, can help to identify candidate genes associated with arrhythmic disease risk loci. Conclusions: We determined that a combination of curation and ontology development for heart-specific genes and processes supports the identification and downstream analysis of genes responsible for the spread of the cardiac action potential through the heart. Annotating these genes and processes in a structured format facilitates data analysis and supports effective retrieval of gene-centric information about cardiac defects. PMID:29440116
Improving Interpretation of Cardiac Phenotypes and Enhancing Discovery With Expanded Knowledge in the Gene Ontology.

PubMed

Lovering, Ruth C; Roncaglia, Paola; Howe, Douglas G; Laulederkind, Stanley J F; Khodiyar, Varsha K; Berardini, Tanya Z; Tweedie, Susan; Foulger, Rebecca E; Osumi-Sutherland, David; Campbell, Nancy H; Huntley, Rachael P; Talmud, Philippa J; Blake, Judith A; Breckenridge, Ross; Riley, Paul R; Lambiase, Pier D; Elliott, Perry M; Clapp, Lucie; Tinker, Andrew; Hill, David P

2018-02-01

A systems biology approach to cardiac physiology requires a comprehensive representation of how coordinated processes operate in the heart, as well as the ability to interpret relevant transcriptomic and proteomic experiments. The Gene Ontology (GO) Consortium provides structured, controlled vocabularies of biological terms that can be used to summarize and analyze functional knowledge for gene products. In this study, we created a computational resource to facilitate genetic studies of cardiac physiology by integrating literature curation with attention to an improved and expanded ontological representation of heart processes in the Gene Ontology. As a result, the Gene Ontology now contains terms that comprehensively describe the roles of proteins in cardiac muscle cell action potential, electrical coupling, and the transmission of the electrical impulse from the sinoatrial node to the ventricles. Evaluating the effectiveness of this approach to inform data analysis demonstrated that Gene Ontology annotations, analyzed within an expanded ontological context of heart processes, can help to identify candidate genes associated with arrhythmic disease risk loci. We determined that a combination of curation and ontology development for heart-specific genes and processes supports the identification and downstream analysis of genes responsible for the spread of the cardiac action potential through the heart. Annotating these genes and processes in a structured format facilitates data analysis and supports effective retrieval of gene-centric information about cardiac defects. © 2018 The Authors.
Identifying candidate driver genes by integrative ovarian cancer genomics data

NASA Astrophysics Data System (ADS)

Lu, Xinguo; Lu, Jibo

2017-08-01

Integrative analysis of molecular mechanics underlying cancer can distinguish interactions that cannot be revealed based on one kind of data for the appropriate diagnosis and treatment of cancer patients. Tumor samples exhibit heterogeneity in omics data, such as somatic mutations, Copy Number Variations CNVs), gene expression profiles and so on. In this paper we combined gene co-expression modules and mutation modulators separately in tumor patients to obtain the candidate driver genes for resistant and sensitive tumor from the heterogeneous data. The final list of modulators identified are well known in biological processes associated with ovarian cancer, such as CCL17, CACTIN, CCL16, CCL22, APOB, KDF1, CCL11, HNF1B, LRG1, MED1 and so on, which can help to facilitate the discovery of biomarkers, molecular diagnostics, and drug discovery.
Construction and Evaluation of Normalized cDNA Libraries Enriched with Full-Length Sequences for Rapid Discovery of New Genes from Sisal (Agave sisalana Perr.) Different Developmental Stages

PubMed Central

Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

2012-01-01

To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing. PMID:23202944
New strategies in drug discovery.

PubMed

Ohlstein, Eliot H; Johnson, Anthony G; Elliott, John D; Romanic, Anne M

2006-01-01

Gene identification followed by determination of the expression of genes in a given disease and understanding of the function of the gene products is central to the drug discovery process. The ability to associate a specific gene with a disease can be attributed primarily to the extraordinary progress that has been made in the areas of gene sequencing and information technologies. Selection and validation of novel molecular targets have become of great importance in light of the abundance of new potential therapeutic drug targets that have emerged from human gene sequencing. In response to this revolution within the pharmaceutical industry, the development of high-throughput methods in both biology and chemistry has been necessitated. Further, the successful translation of basic scientific discoveries into clinical experimental medicine and novel therapeutics is an increasing challenge. As such, a new paradigm for drug discovery has emerged. This process involves the integration of clinical, genetic, genomic, and molecular phenotype data partnered with cheminformatics. Central to this process, the data generated are managed, collated, and interpreted with the use of informatics. This review addresses the use of new technologies that have arisen to deal with this new paradigm.
The promise of disease gene discovery in South Asia

PubMed Central

Nakatsuka, Nathan; Moorjani, Priya; Rai, Niraj; Sarkar, Biswanath; Tandon, Arti; Patterson, Nick; Bhavani, Gandham SriLakshmi; Girisha, Katta Mohan; Mustak, Mohammed S; Srinivasan, Sudha; Kaushik, Amit; Vahab, Saadi Abdul; Jagadeesh, Sujatha M.; Satyamoorthy, Kapaettu; Singh, Lalji; Reich, David; Thangaraj, Kumarasamy

2017-01-01

The more than 1.5 billion people who live in South Asia are correctly viewed not as a single large population, but as many small endogamous groups. We assembled genome-wide data from over 2,800 individuals from over 260 distinct South Asian groups. We identify 81 unique groups, of which 14 have estimated census sizes of more than a million, that descend from founder events more extreme than those in Ashkenazi Jews and Finns, both of which have high rates of recessive disease due to founder events. We identify multiple examples of recessive diseases in South Asia that are the result of such founder events. This study highlights an under-appreciated opportunity for reducing disease burden among South Asians through the discovery of and testing for recessive disease genes. PMID:28714977
Genome Enabled Discovery of Carbon Sequestration Genes in Poplar

DOE Office of Scientific and Technical Information (OSTI.GOV)

Filichkin, Sergei; Etherington, Elizabeth; Ma, Caiping

2007-02-22

The goals of the S.H. Strauss laboratory portion of 'Genome-enabled discovery of carbon sequestration genes in poplar' are (1) to explore the functions of candidate genes using Populus transformation by inserting genes provided by Oakridge National Laboratory (ORNL) and the University of Florida (UF) into poplar; (2) to expand the poplar transformation toolkit by developing transformation methods for important genotypes; and (3) to allow induced expression, and efficient gene suppression, in roots and other tissues. As part of the transformation improvement effort, OSU developed transformation protocols for Populus trichocarpa 'Nisqually-1' clone and an early flowering P. alba clone, 6K10. Completemore » descriptions of the transformation systems were published (Ma et. al. 2004, Meilan et. al 2004). Twenty-one 'Nisqually-1' and 622 6K10 transgenic plants were generated. To identify root predominant promoters, a set of three promoters were tested for their tissue-specific expression patterns in poplar and in Arabidopsis as a model system. A novel gene, ET304, was identified by analyzing a collection of poplar enhancer trap lines generated at OSU (Filichkin et. al 2006a, 2006b). Other promoters include the pGgMT1 root-predominant promoter from Casuarina glauca and the pAtPIN2 promoter from Arabidopsis root specific PIN2 gene. OSU tested two induction systems, alcohol- and estrogen-inducible, in multiple poplar transgenics. Ethanol proved to be the more efficient when tested in tissue culture and greenhouse conditions. Two estrogen-inducible systems were evaluated in transgenic Populus, neither of which functioned reliably in tissue culture conditions. GATEWAY-compatible plant binary vectors were designed to compare the silencing efficiency of homologous (direct) RNAi vs. heterologous (transitive) RNAi inverted repeats. A set of genes was targeted for post transcriptional silencing in the model Arabidopsis system; these include the floral meristem identity gene
WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches

PubMed Central

Romer, Katherine A.; Kayombya, Guy-Richard; Fraenkel, Ernest

2007-01-01

WebMOTIFS provides a web interface that facilitates the discovery and analysis of DNA-sequence motifs. Several studies have shown that the accuracy of motif discovery can be significantly improved by using multiple de novo motif discovery programs and using randomized control calculations to identify the most significant motifs or by using Bayesian approaches. WebMOTIFS makes it easy to apply these strategies. Using a single submission form, users can run several motif discovery programs and score, cluster and visualize the results. In addition, the Bayesian motif discovery program THEME can be used to determine the class of transcription factors that is most likely to regulate a set of sequences. Input can be provided as a list of gene or probe identifiers. Used with the default settings, WebMOTIFS accurately identifies biologically relevant motifs from diverse data in several species. WebMOTIFS is freely available at http://fraenkel.mit.edu/webmotifs. PMID:17584794
MorphDB: Prioritizing Genes for Specialized Metabolism Pathways and Gene Ontology Categories in Plants.

PubMed

Zwaenepoel, Arthur; Diels, Tim; Amar, David; Van Parys, Thomas; Shamir, Ron; Van de Peer, Yves; Tzfadia, Oren

2018-01-01

Recent times have seen an enormous growth of "omics" data, of which high-throughput gene expression data are arguably the most important from a functional perspective. Despite huge improvements in computational techniques for the functional classification of gene sequences, common similarity-based methods often fall short of providing full and reliable functional information. Recently, the combination of comparative genomics with approaches in functional genomics has received considerable interest for gene function analysis, leveraging both gene expression based guilt-by-association methods and annotation efforts in closely related model organisms. Besides the identification of missing genes in pathways, these methods also typically enable the discovery of biological regulators (i.e., transcription factors or signaling genes). A previously built guilt-by-association method is MORPH, which was proven to be an efficient algorithm that performs particularly well in identifying and prioritizing missing genes in plant metabolic pathways. Here, we present MorphDB, a resource where MORPH-based candidate genes for large-scale functional annotations (Gene Ontology, MapMan bins) are integrated across multiple plant species. Besides a gene centric query utility, we present a comparative network approach that enables researchers to efficiently browse MORPH predictions across functional gene sets and species, facilitating efficient gene discovery and candidate gene prioritization. MorphDB is available at http://bioinformatics.psb.ugent.be/webtools/morphdb/morphDB/index/. We also provide a toolkit, named "MORPH bulk" (https://github.com/arzwa/morph-bulk), for running MORPH in bulk mode on novel data sets, enabling researchers to apply MORPH to their own species of interest.
The Matchmaker Exchange: a platform for rare disease gene discovery.

PubMed

Philippakis, Anthony A; Azzariti, Danielle R; Beltran, Sergi; Brookes, Anthony J; Brownstein, Catherine A; Brudno, Michael; Brunner, Han G; Buske, Orion J; Carey, Knox; Doll, Cassie; Dumitriu, Sergiu; Dyke, Stephanie O M; den Dunnen, Johan T; Firth, Helen V; Gibbs, Richard A; Girdea, Marta; Gonzalez, Michael; Haendel, Melissa A; Hamosh, Ada; Holm, Ingrid A; Huang, Lijia; Hurles, Matthew E; Hutton, Ben; Krier, Joel B; Misyura, Andriy; Mungall, Christopher J; Paschall, Justin; Paten, Benedict; Robinson, Peter N; Schiettecatte, François; Sobreira, Nara L; Swaminathan, Ganesh J; Taschner, Peter E; Terry, Sharon F; Washington, Nicole L; Züchner, Stephan; Boycott, Kym M; Rehm, Heidi L

2015-10-01

There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be a reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. Three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow. © 2015 WILEY PERIODICALS, INC.
The Matchmaker Exchange: A Platform for Rare Disease Gene Discovery

DOE PAGES

Philippakis, Anthony A.; Azzariti, Danielle R.; Beltran, Sergi; ...

2015-09-17

There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be amore » reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. In conclusion, three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow.« less
The Matchmaker Exchange: A Platform for Rare Disease Gene Discovery

DOE Office of Scientific and Technical Information (OSTI.GOV)

Philippakis, Anthony A.; Azzariti, Danielle R.; Beltran, Sergi

There are few better examples of the need for data sharing than in the rare disease community, where patients, physicians, and researchers must search for "the needle in a haystack" to uncover rare, novel causes of disease within the genome. Impeding the pace of discovery has been the existence of many small siloed datasets within individual research or clinical laboratory databases and/or disease-specific organizations, hoping for serendipitous occasions when two distant investigators happen to learn they have a rare phenotype in common and can "match" these cases to build evidence for causality. However, serendipity has never proven to be amore » reliable or scalable approach in science. As such, the Matchmaker Exchange (MME) was launched to provide a robust and systematic approach to rare disease gene discovery through the creation of a federated network connecting databases of genotypes and rare phenotypes using a common application programming interface (API). The core building blocks of the MME have been defined and assembled. In conclusion, three MME services have now been connected through the API and are available for community use. Additional databases that support internal matching are anticipated to join the MME network as it continues to grow.« less
A genomics based discovery of secondary metabolite biosynthetic gene clusters in Aspergillus ustus.

PubMed

Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

2015-01-01

Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic.
A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus

PubMed Central

Pi, Borui; Yu, Dongliang; Dai, Fangwei; Song, Xiaoming; Zhu, Congyi; Li, Hongye; Yu, Yunsong

2015-01-01

Secondary metabolites (SMs) produced by Aspergillus have been extensively studied for their crucial roles in human health, medicine and industrial production. However, the resulting information is almost exclusively derived from a few model organisms, including A. nidulans and A. fumigatus, but little is known about rare pathogens. In this study, we performed a genomics based discovery of SM biosynthetic gene clusters in Aspergillus ustus, a rare human pathogen. A total of 52 gene clusters were identified in the draft genome of A. ustus 3.3904, such as the sterigmatocystin biosynthesis pathway that was commonly found in Aspergillus species. In addition, several SM biosynthetic gene clusters were firstly identified in Aspergillus that were possibly acquired by horizontal gene transfer, including the vrt cluster that is responsible for viridicatumtoxin production. Comparative genomics revealed that A. ustus shared the largest number of SM biosynthetic gene clusters with A. nidulans, but much fewer with other Aspergilli like A. niger and A. oryzae. These findings would help to understand the diversity and evolution of SM biosynthesis pathways in genus Aspergillus, and we hope they will also promote the development of fungal identification methodology in clinic. PMID:25706180
ConGEMs: Condensed Gene Co-Expression Module Discovery Through Rule-Based Clustering and Its Application to Carcinogenesis.

PubMed

Mallik, Saurav; Zhao, Zhongming

2017-12-28

For transcriptomic analysis, there are numerous microarray-based genomic data, especially those generated for cancer research. The typical analysis measures the difference between a cancer sample-group and a matched control group for each transcript or gene. Association rule mining is used to discover interesting item sets through rule-based methodology. Thus, it has advantages to find causal effect relationships between the transcripts. In this work, we introduce two new rule-based similarity measures-weighted rank-based Jaccard and Cosine measures-and then propose a novel computational framework to detect condensed gene co-expression modules ( C o n G E M s) through the association rule-based learning system and the weighted similarity scores. In practice, the list of evolved condensed markers that consists of both singular and complex markers in nature depends on the corresponding condensed gene sets in either antecedent or consequent of the rules of the resultant modules. In our evaluation, these markers could be supported by literature evidence, KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway and Gene Ontology annotations. Specifically, we preliminarily identified differentially expressed genes using an empirical Bayes test. A recently developed algorithm-RANWAR-was then utilized to determine the association rules from these genes. Based on that, we computed the integrated similarity scores of these rule-based similarity measures between each rule-pair, and the resultant scores were used for clustering to identify the co-expressed rule-modules. We applied our method to a gene expression dataset for lung squamous cell carcinoma and a genome methylation dataset for uterine cervical carcinogenesis. Our proposed module discovery method produced better results than the traditional gene-module discovery measures. In summary, our proposed rule-based method is useful for exploring biomarker modules from transcriptomic data.
Gene Discovery in the Apicomplexa as Revealed by EST Sequencing and Assembly of a Comparative Gene Database

PubMed Central

Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David

2003-01-01

Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p < 10−9, thus identifying many conserved genes that are likely to share common functions with other well-studied organisms. Gene assemblies were also used to identify strain polymorphisms, examine stage-specific expression, and identify gene families. An interesting class of genes that are confined to members of this phylum and not shared by plants, animals, or fungi, was identified. These genes likely mediate the novel biological features of members of the Apicomplexa and hence offer great potential for biological investigation and as possible therapeutic targets. [The sequence data from this study have been submitted to dbEST division of GenBank under accession nos.: Toxoplasma gondii: –, –, –, –, – , –, –, –, –. Plasmodium falciparum: –, –, –, –. Sarcocystis neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375
NASA Reverb: Standards-Driven Earth Science Data and Service Discovery

NASA Astrophysics Data System (ADS)

Cechini, M. F.; Mitchell, A.; Pilone, D.

2011-12-01

NASA's Earth Observing System Data and Information System (EOSDIS) is a core capability in NASA's Earth Science Data Systems Program. NASA's EOS ClearingHOuse (ECHO) is a metadata catalog for the EOSDIS, providing a centralized catalog of data products and registry of related data services. Working closely with the EOSDIS community, the ECHO team identified a need to develop the next generation EOS data and service discovery tool. This development effort relied on the following principles: + Metadata Driven User Interface - Users should be presented with data and service discovery capabilities based on dynamic processing of metadata describing the targeted data. + Integrated Data & Service Discovery - Users should be able to discovery data and associated data services that facilitate their research objectives. + Leverage Common Standards - Users should be able to discover and invoke services that utilize common interface standards. Metadata plays a vital role facilitating data discovery and access. As data providers enhance their metadata, more advanced search capabilities become available enriching a user's search experience. Maturing metadata formats such as ISO 19115 provide the necessary depth of metadata that facilitates advanced data discovery capabilities. Data discovery and access is not limited to simply the retrieval of data granules, but is growing into the more complex discovery of data services. These services include, but are not limited to, services facilitating additional data discovery, subsetting, reformatting, and re-projecting. The discovery and invocation of these data services is made significantly simpler through the use of consistent and interoperable standards. By utilizing an adopted standard, developing standard-specific adapters can be utilized to communicate with multiple services implementing a specific protocol. The emergence of metadata standards such as ISO 19119 plays a similarly important role in discovery as the 19115 standard
Drug discovery based on genetic and metabolic findings in schizophrenia.

PubMed

Dwyer, Donard S; Weeks, Kathrine; Aamodt, Eric J

2008-11-01

Recent progress in the genetics of schizophrenia provides the rationale for re-evaluating causative factors and therapeutic strategies for this disease. Here, we review the major candidate susceptibility genes and relate the aberrant function of these genes to defective regulation of energy metabolism in the schizophrenic brain. Disturbances in energy metabolism potentially lead to neurodevelopmental deficits, impaired function of the mature nervous system and failure to maintain neurites/dendrites and synaptic connections. Current antipsychotic drugs do not specifically address these underlying deficits; therefore, a new generation of more effective medications is urgently needed. Novel targets for future drug discovery are identified in this review. The coordinated application of structure-based drug design, systems biology and research on model organisms may greatly facilitate the search for next-generation antipsychotic drugs.

Gene signature critical to cancer phenotype as a paradigm for anti-cancer drug discovery

PubMed Central

Sampson, Erik R.; McMurray, Helene R.; Hassane, Duane C.; Newman, Laurel; Salzman, Peter; Jordan, Craig T.; Land, Hartmut

2013-01-01

Malignant cell transformation commonly results in the deregulation of thousands of cellular genes, an observation that suggests a complex biological process and an inherently challenging scenario for the development of effective cancer interventions. To better define the genes/pathways essential to regulating the malignant phenotype, we recently described a novel strategy based on the cooperative nature of carcinogenesis that focuses on genes synergistically deregulated in response to cooperating oncogenic mutations. These so-called “cooperation response genes” (CRGs) are highly enriched for genes critical for the cancer phenotype, thereby suggesting their causal role in the malignant state. Here we show that CRGs play an essential role in drug-mediated anti-cancer activity and that anti-cancer agents can be identified through their ability to antagonize the CRG expression profile. These findings provide proof-of-concept for the use of the CRG signature as a novel means of drug discovery with relevance to underlying anti-cancer drug mechanisms. PMID:22964631
IMG-ABC. A knowledge base to fuel discovery of biosynthetic gene clusters and novel secondary metabolites

DOE PAGES

Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; ...

2015-07-14

In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of “big” genomic data for discovering small molecules. IMG-ABC relies on IMG’s comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve asmore » the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC’s focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in lphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG’s extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG
IMG-ABC: A Knowledge Base To Fuel Discovery of Biosynthetic Gene Clusters and Novel Secondary Metabolites.

PubMed

Hadjithomas, Michalis; Chen, I-Min Amy; Chu, Ken; Ratner, Anna; Palaniappan, Krishna; Szeto, Ernest; Huang, Jinghua; Reddy, T B K; Cimermančič, Peter; Fischbach, Michael A; Ivanova, Natalia N; Markowitz, Victor M; Kyrpides, Nikos C; Pati, Amrita

2015-07-14

In the discovery of secondary metabolites, analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of computational platforms that enable such a systematic approach on a large scale. In this work, we present IMG-ABC (https://img.jgi.doe.gov/abc), an atlas of biosynthetic gene clusters within the Integrated Microbial Genomes (IMG) system, which is aimed at harnessing the power of "big" genomic data for discovering small molecules. IMG-ABC relies on IMG's comprehensive integrated structural and functional genomic data for the analysis of biosynthetic gene clusters (BCs) and associated secondary metabolites (SMs). SMs and BCs serve as the two main classes of objects in IMG-ABC, each with a rich collection of attributes. A unique feature of IMG-ABC is the incorporation of both experimentally validated and computationally predicted BCs in genomes as well as metagenomes, thus identifying BCs in uncultured populations and rare taxa. We demonstrate the strength of IMG-ABC's focused integrated analysis tools in enabling the exploration of microbial secondary metabolism on a global scale, through the discovery of phenazine-producing clusters for the first time in Alphaproteobacteria. IMG-ABC strives to fill the long-existent void of resources for computational exploration of the secondary metabolism universe; its underlying scalable framework enables traversal of uncovered phylogenetic and chemical structure space, serving as a doorway to a new era in the discovery of novel molecules. IMG-ABC is the largest publicly available database of predicted and experimental biosynthetic gene clusters and the secondary metabolites they produce. The system also includes powerful search and analysis tools that are integrated with IMG's extensive genomic/metagenomic data and analysis tool kits. As new research on biosynthetic gene clusters and secondary metabolites is published and more genomes are sequenced, IMG-ABC will continue to
Discovery of new candidate genes for rheumatoid arthritis through integration of genetic association data with expression pathway analysis.

PubMed

Shchetynsky, Klementy; Diaz-Gallo, Lina-Marcella; Folkersen, Lasse; Hensvold, Aase Haj; Catrina, Anca Irinel; Berg, Louise; Klareskog, Lars; Padyukov, Leonid

2017-02-02

Here we integrate verified signals from previous genetic association studies with gene expression and pathway analysis for discovery of new candidate genes and signaling networks, relevant for rheumatoid arthritis (RA). RNA-sequencing-(RNA-seq)-based expression analysis of 377 genes from previously verified RA-associated loci was performed in blood cells from 5 newly diagnosed, non-treated patients with RA, 7 patients with treated RA and 12 healthy controls. Differentially expressed genes sharing a similar expression pattern in treated and untreated RA sub-groups were selected for pathway analysis. A set of "connector" genes derived from pathway analysis was tested for differential expression in the initial discovery cohort and validated in blood cells from 73 patients with RA and in 35 healthy controls. There were 11 qualifying genes selected for pathway analysis and these were grouped into two evidence-based functional networks, containing 29 and 27 additional connector molecules. The expression of genes, corresponding to connector molecules was then tested in the initial RNA-seq data. Differences in the expression of ERBB2, TP53 and THOP1 were similar in both treated and non-treated patients with RA and an additional nine genes were differentially expressed in at least one group of patients compared to healthy controls. The ERBB2, TP53. THOP1 expression profile was successfully replicated in RNA-seq data from peripheral blood mononuclear cells from healthy controls and non-treated patients with RA, in an independent collection of samples. Integration of RNA-seq data with findings from association studies, and consequent pathway analysis implicate new candidate genes, ERBB2, TP53 and THOP1 in the pathogenesis of RA.
Sugar transporter genes of the brown planthopper, Nilaparvata lugens: A facilitated glucose/fructose transporter.

PubMed

Kikuta, Shingo; Kikawada, Takahiro; Hagiwara-Komoda, Yuka; Nakashima, Nobuhiko; Noda, Hiroaki

2010-11-01

The brown planthopper (BPH), Nilaparvata lugens, attacks rice plants and feeds on their phloem sap, which contains large amounts of sugars. The main sugar component of phloem sap is sucrose, a disaccharide composed of glucose and fructose. Sugars appear to be incorporated into the planthopper body by sugar transporters in the midgut. A total of 93 expressed sequence tags (ESTs) for putative sugar transporters were obtained from a BPH EST database, and 18 putative sugar transporter genes (Nlst1-18) were identified. The most abundantly expressed of these genes was Nlst1. This gene has previously been identified in the BPH as the glucose transporter gene NlHT1, which belongs to the major facilitator superfamily. Nlst1, 4, 6, 9, 12, 16, and 18 were highly expressed in the midgut, and Nlst2, 7, 8, 10, 15, 17, and 18 were highly expressed during the embryonic stages. Functional analyses were performed using Xenopus oocytes expressing NlST1 or 6. This showed that NlST6 is a facilitative glucose/fructose transporter that mediates sugar uptake from rice phloem sap in the BPH midgut in a manner similar to NlST1. Copyright © 2010 Elsevier Ltd. All rights reserved.
FORGE Canada Consortium: outcomes of a 2-year national rare-disease gene-discovery project.

PubMed

Beaulieu, Chandree L; Majewski, Jacek; Schwartzentruber, Jeremy; Samuels, Mark E; Fernandez, Bridget A; Bernier, Francois P; Brudno, Michael; Knoppers, Bartha; Marcadier, Janet; Dyment, David; Adam, Shelin; Bulman, Dennis E; Jones, Steve J M; Avard, Denise; Nguyen, Minh Thu; Rousseau, Francois; Marshall, Christian; Wintle, Richard F; Shen, Yaoqing; Scherer, Stephen W; Friedman, Jan M; Michaud, Jacques L; Boycott, Kym M

2014-06-05

Inherited monogenic disease has an enormous impact on the well-being of children and their families. Over half of the children living with one of these conditions are without a molecular diagnosis because of the rarity of the disease, the marked clinical heterogeneity, and the reality that there are thousands of rare diseases for which causative mutations have yet to be identified. It is in this context that in 2010 a Canadian consortium was formed to rapidly identify mutations causing a wide spectrum of pediatric-onset rare diseases by using whole-exome sequencing. The FORGE (Finding of Rare Disease Genes) Canada Consortium brought together clinicians and scientists from 21 genetics centers and three science and technology innovation centers from across Canada. From nation-wide requests for proposals, 264 disorders were selected for study from the 371 submitted; disease-causing variants (including in 67 genes not previously associated with human disease; 41 of these have been genetically or functionally validated, and 26 are currently under study) were identified for 146 disorders over a 2-year period. Here, we present our experience with four strategies employed for gene discovery and discuss FORGE's impact in a number of realms, from clinical diagnostics to the broadening of the phenotypic spectrum of many diseases to the biological insight gained into both disease states and normal human development. Lastly, on the basis of this experience, we discuss the way forward for rare-disease genetic discovery both in Canada and internationally. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease.

PubMed

Azuaje, Francisco; Zheng, Huiru; Camargo, Anyela; Wang, Haiying

2011-08-01

The discovery of novel disease biomarkers is a crucial challenge for translational bioinformatics. Demonstration of both their classification power and reproducibility across independent datasets are essential requirements to assess their potential clinical relevance. Small datasets and multiplicity of putative biomarker sets may explain lack of predictive reproducibility. Studies based on pathway-driven discovery approaches have suggested that, despite such discrepancies, the resulting putative biomarkers tend to be implicated in common biological processes. Investigations of this problem have been mainly focused on datasets derived from cancer research. We investigated the predictive and functional concordance of five methods for discovering putative biomarkers in four independently-generated datasets from the cardiovascular disease domain. A diversity of biosignatures was identified by the different methods. However, we found strong biological process concordance between them, especially in the case of methods based on gene set analysis. With a few exceptions, we observed lack of classification reproducibility using independent datasets. Partial overlaps between our putative sets of biomarkers and the primary studies exist. Despite the observed limitations, pathway-driven or gene set analysis can predict potentially novel biomarkers and can jointly point to biomedically-relevant underlying molecular mechanisms. Copyright © 2011 Elsevier Inc. All rights reserved.
Discovery and validation of gene classifiers for endocrine-disrupting chemicals in zebrafish (danio rerio)

PubMed Central

2012-01-01

-tissue conditions, thus suggesting a need for a preliminary survey of transcriptomic responses before launching a full scale classifier discovery effort. Classifier discovery based on individual TF networks could yield more mechanistically-oriented biomarkers. GSEA proved to be a flexible and effective tool for application of gene classifiers but a similar and more refined algorithm, connectivity mapping, should also be explored. The distribution characteristics of classifiers across tissues, chemicals, and TF networks suggested a differential biological impact among the EDCs on zebrafish transcriptome involving some basic cellular functions. PMID:22849515
Phenotype discovery by gene expression profiling: mapping of biological processes linked to BMP-2-mediated osteoblast differentiation.

PubMed

Balint, Eva; Lapointe, David; Drissi, Hicham; van der Meijden, Caroline; Young, Daniel W; van Wijnen, Andre J; Stein, Janet L; Stein, Gary S; Lian, Jane B

2003-05-15

osteogenic phenotype is recognized by 8 h, reflected by downregulation of most myogenic-related genes and induction of a spectrum of signaling proteins and enzymes facilitating synthesis and assembly of an extracellular skeletal environment. These genes included collagens Type I and VI and the small leucine rich repeat family of proteoglycans (e.g., decorin, biglycan, osteomodulin, fibromodulin, and osteoadherin/osteoglycin) that reached peak expression at 24 h. With extracellular matrix development, the bone phenotype was further established from 16 to 24 h by induction of genes for cell adhesion and communication and enzymes that organize the bone ECM. Our microarray analysis resulted in the discovery of a class of genes, initially described in relation to differentiation of astrocytes and oligodendrocytes that are functionally coupled to signals for cellular extensions. They include nexin, neuropilin, latexin, neuroglian, neuron specific gene 1, and Ulip; suggesting novel roles for these genes in the bone microenvironment. This global analysis identified a multistage molecular and cellular cascade that supports BMP-2-mediated osteoblast differentiation. Copyright 2003 Wiley-Liss, Inc.
Advanced systems biology methods in drug discovery and translational biomedicine.

PubMed

Zou, Jun; Zheng, Ming-Wu; Li, Gen; Su, Zhi-Guang

2013-01-01

Systems biology is in an exponential development stage in recent years and has been widely utilized in biomedicine to better understand the molecular basis of human disease and the mechanism of drug action. Here, we discuss the fundamental concept of systems biology and its two computational methods that have been commonly used, that is, network analysis and dynamical modeling. The applications of systems biology in elucidating human disease are highlighted, consisting of human disease networks, treatment response prediction, investigation of disease mechanisms, and disease-associated gene prediction. In addition, important advances in drug discovery, to which systems biology makes significant contributions, are discussed, including drug-target networks, prediction of drug-target interactions, investigation of drug adverse effects, drug repositioning, and drug combination prediction. The systems biology methods and applications covered in this review provide a framework for addressing disease mechanism and approaching drug discovery, which will facilitate the translation of research findings into clinical benefits such as novel biomarkers and promising therapies.
Gene discovery using next-generation pyrosequencing to develop ESTs for Phalaenopsis orchids

PubMed Central

2011-01-01

Background Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. Results To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. Conclusion Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies. PMID:21749684
How the serotonin story is being rewritten by new gene-based discoveries principally related to SLC6A4, the serotonin transporter gene, which functions to influence all cellular serotonin systems.

PubMed

Murphy, Dennis L; Fox, Meredith A; Timpano, Kiara R; Moya, Pablo R; Ren-Patterson, Renee; Andrews, Anne M; Holmes, Andrew; Lesch, Klaus-Peter; Wendland, Jens R

2008-11-01

Discovered and crystallized over sixty years ago, serotonin's important functions in the brain and body were identified over the ensuing years by neurochemical, physiological and pharmacological investigations. This 2008 M. Rapport Memorial Serotonin Review focuses on some of the most recent discoveries involving serotonin that are based on genetic methodologies. These include examples of the consequences that result from direct serotonergic gene manipulation (gene deletion or overexpression) in mice and other species; an evaluation of some phenotypes related to functional human serotonergic gene variants, particularly in SLC6A4, the serotonin transporter gene; and finally, a consideration of the pharmacogenomics of serotonergic drugs with respect to both their therapeutic actions and side effects. The serotonin transporter (SERT) has been the most comprehensively studied of the serotonin system molecular components, and will be the primary focus of this review. We provide in-depth examples of gene-based discoveries primarily related to SLC6A4 that have clarified serotonin's many important homeostatic functions in humans, non-human primates, mice and other species.
TOXICOGENOMICS DRUG DISCOVERY AND THE PATHOLOGIST

EPA Science Inventory

Toxicogenomics, drug discovery, and pathologist.

The field of toxicogenomics, which currently focuses on the application of large-scale differential gene expression (DGE) data to toxicology, is starting to influence drug discovery and development in the pharmaceutical indu...
An integrative model for in-silico clinical-genomics discovery science.

PubMed

Lussier, Yves A; Sarkar, Indra Nell; Cantor, Michael

2002-01-01

Human Genome discovery research has set the pace for Post-Genomic Discovery Research. While post-genomic fields focused at the molecular level are intensively pursued, little effort is being deployed in the later stages of molecular medicine discovery research, such as clinical-genomics. The objective of this study is to demonstrate the relevance and significance of integrating mainstream clinical informatics decision support systems to current bioinformatics genomic discovery science. This paper is a feasibility study of an original model enabling novel "in-silico" clinical-genomic discovery science and that demonstrates its feasibility. This model is designed to mediate queries among clinical and genomic knowledge bases with relevant bioinformatic analytic tools (e.g. gene clustering). Briefly, trait-disease-gene relationships were successfully illustrated using QMR, OMIM, SNOMED-RT, GeneCluster and TreeView. The analyses were visualized as two-dimensional dendrograms of clinical observations clustered around genes. To our knowledge, this is the first study using knowledge bases of clinical decision support systems for genomic discovery. Although this study is a proof of principle, it provides a framework for the development of clinical decision-support-system driven, high-throughput clinical-genomic technologies which could potentially unveil significant high-level functions of genes.
Discovery of rice essential genes by characterizing a CRISPR-edited mutation of closely related rice MAP kinase genes.

PubMed

Minkenberg, Bastian; Xie, Kabin; Yang, Yinong

2017-02-01

The clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR-associated protein 9 nuclease (Cas9) system depends on a guide RNA (gRNA) to specify its target. By efficiently co-expressing multiple gRNAs that target different genomic sites, the polycistronic tRNA-gRNA gene (PTG) strategy enables multiplex gene editing in the family of closely related mitogen-activated protein kinase (MPK) genes in Oryza sativa (rice). In this study, we identified MPK1 and MPK6 (Arabidopsis AtMPK6 and AtMPK4 orthologs, respectively) as essential genes for rice development by finding the preservation of MPK functional alleles and normal phenotypes in CRISPR-edited mutants. The true knock-out mutants of MPK1 were severely dwarfed and sterile, and homozygous mpk1 seeds from heterozygous parents were defective in embryo development. By contrast, heterozygous mpk6 mutant plants completely failed to produce homozygous mpk6 seeds. In addition, the functional importance of specific MPK features could be evaluated by characterizing CRISPR-induced allelic variation in the conserved kinase domain of MPK6. By simultaneously targeting between two and eight genomic sites in the closely related MPK genes, we demonstrated 45-86% frequency of biallelic mutations and the successful creation of single, double and quadruple gene mutants. Indels and fragment deletion were both stably inherited to the next generations, and transgene-free mutants of rice MPK genes were readily obtained via genetic segregation, thereby eliminating any positional effects of transgene insertions. Taken together, our study reveals the essentiality of MPK1 and MPK6 in rice development, and enables the functional discovery of previously inaccessible genes or domains with phenotypes masked by lethality or redundancy. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.
Coagulase-negative staphylococci as reservoirs of genes facilitating MRSA infection

PubMed Central

Otto, Michael

2013-01-01

Recent research has suggested that Staphylococcus epidermidis is a reservoir of genes that, after horizontal transfer, facilitate the potential of Staphylococcus aureus to colonize, survive during infection, or resist antibiotic treatment, traits that are notably manifest in methicillin-resistant S. aureus (MRSA). S. aureus is a dangerous human pathogen and notorious for acquiring antibiotic resistance. MRSA in particular is one of the most frequent causes of morbidity and death in hospitalized patients. S. aureus is an extremely versatile pathogen with a multitude of mechanisms to cause disease and circumvent immune defenses. In contrast, most other staphylococci, such as S. epidermidis, are commonly benign commensals and only occasionally cause disease. Recent findings highlight the key importance of efforts to better understand how genes of staphylococci other than S. aureus contribute to survival in the human host, how they are transferred to S. aureus, and why this exchange appears to be uni-directional. PMID:23165978
Novel Role of 3’UTR-Embedded Alu Elements as Facilitators of Processed Pseudogene Genesis and Host Gene Capture by Viral Genomes

PubMed Central

Engel, Pablo; Angulo, Ana

2016-01-01

Since the discovery of the high abundance of Alu elements in the human genome, the interest for the functional significance of these retrotransposons has been increasing. Primate Alu and rodent Alu-like elements are retrotransposed by a mechanism driven by the LINE1 (L1) encoded proteins, the same machinery that generates the L1 repeats, the processed pseudogenes (PPs), and other retroelements. Apart from free Alu RNAs, Alus are also transcribed and retrotranscribed as part of cellular gene transcripts, generally embedded inside 3’ untranslated regions (UTRs). Despite different proposed hypotheses, the functional implication of the presence of Alus inside 3’UTRs remains elusive. In this study we hypothesized that Alu elements in 3’UTRs could be involved in the genesis of PPs. By analyzing human genome data we discovered that the existence of 3’UTR-embedded Alu elements is overrepresented in genes source of PPs. In contrast, the presence of other retrotransposable elements in 3’UTRs does not show this PP linked overrepresentation. This research was extended to mouse and rat genomes and the results accordingly reveal overrepresentation of 3’UTR-embedded B1 (Alu-like) elements in PP parent genes. Interestingly, we also demonstrated that the overrepresentation of 3’UTR-embedded Alus is particularly significant in PP parent genes with low germline gene expression level. Finally, we provide data that support the hypothesis that the L1 machinery is also the system that herpesviruses, and possibly other large DNA viruses, use to capture host genes expressed in germline or somatic cells. Altogether our results suggest a novel role for Alu or Alu-like elements inside 3’UTRs as facilitators of the genesis of PPs, particularly in lowly expressed genes. Moreover, we propose that this L1-driven mechanism, aided by the presence of 3’UTR-embedded Alus, may also be exploited by DNA viruses to incorporate host genes to their viral genomes. PMID:28033411
Co-fuse: a new class discovery analysis tool to identify and prioritize recurrent fusion genes from RNA-sequencing data.

PubMed

Paisitkriangkrai, Sakrapee; Quek, Kelly; Nievergall, Eva; Jabbour, Anissa; Zannettino, Andrew; Kok, Chung Hoow

2018-06-07

Recurrent oncogenic fusion genes play a critical role in the development of various cancers and diseases and provide, in some cases, excellent therapeutic targets. To date, analysis tools that can identify and compare recurrent fusion genes across multiple samples have not been available to researchers. To address this deficiency, we developed Co-occurrence Fusion (Co-fuse), a new and easy to use software tool that enables biologists to merge RNA-seq information, allowing them to identify recurrent fusion genes, without the need for exhaustive data processing. Notably, Co-fuse is based on pattern mining and statistical analysis which enables the identification of hidden patterns of recurrent fusion genes. In this report, we show that Co-fuse can be used to identify 2 distinct groups within a set of 49 leukemic cell lines based on their recurrent fusion genes: a multiple myeloma (MM) samples-enriched cluster and an acute myeloid leukemia (AML) samples-enriched cluster. Our experimental results further demonstrate that Co-fuse can identify known driver fusion genes (e.g., IGH-MYC, IGH-WHSC1) in MM, when compared to AML samples, indicating the potential of Co-fuse to aid the discovery of yet unknown driver fusion genes through cohort comparisons. Additionally, using a 272 primary glioma sample RNA-seq dataset, Co-fuse was able to validate recurrent fusion genes, further demonstrating the power of this analysis tool to identify recurrent fusion genes. Taken together, Co-fuse is a powerful new analysis tool that can be readily applied to large RNA-seq datasets, and may lead to the discovery of new disease subgroups and potentially new driver genes, for which, targeted therapies could be developed. The Co-fuse R source code is publicly available at https://github.com/sakrapee/co-fuse .
High-throughput platform for the discovery of elicitors of silent bacterial gene clusters.

PubMed

Seyedsayamdost, Mohammad R

2014-05-20

Over the past decade, bacterial genome sequences have revealed an immense reservoir of biosynthetic gene clusters, sets of contiguous genes that have the potential to produce drugs or drug-like molecules. However, the majority of these gene clusters appear to be inactive for unknown reasons prompting terms such as "cryptic" or "silent" to describe them. Because natural products have been a major source of therapeutic molecules, methods that rationally activate these silent clusters would have a profound impact on drug discovery. Herein, a new strategy is outlined for awakening silent gene clusters using small molecule elicitors. In this method, a genetic reporter construct affords a facile read-out for activation of the silent cluster of interest, while high-throughput screening of small molecule libraries provides potential inducers. This approach was applied to two cryptic gene clusters in the pathogenic model Burkholderia thailandensis. The results not only demonstrate a prominent activation of these two clusters, but also reveal that the majority of elicitors are themselves antibiotics, most in common clinical use. Antibiotics, which kill B. thailandensis at high concentrations, act as inducers of secondary metabolism at low concentrations. One of these antibiotics, trimethoprim, served as a global activator of secondary metabolism by inducing at least five biosynthetic pathways. Further application of this strategy promises to uncover the regulatory networks that activate silent gene clusters while at the same time providing access to the vast array of cryptic molecules found in bacteria.
A New Omics Data Resource of Pleurocybella porrigens for Gene Discovery

PubMed Central

Dohra, Hideo; Someya, Takumi; Takano, Tomoyuki; Harada, Kiyonori; Omae, Saori; Hirai, Hirofumi; Yano, Kentaro; Kawagishi, Hirokazu

2013-01-01

Background Pleurocybella porrigens is a mushroom-forming fungus, which has been consumed as a traditional food in Japan. In 2004, 55 people were poisoned by eating the mushroom and 17 people among them died of acute encephalopathy. Since then, the Japanese government has been alerting Japanese people to take precautions against eating the P . porrigens mushroom. Unfortunately, despite efforts, the molecular mechanism of the encephalopathy remains elusive. The genome and transcriptome sequence data of P . porrigens and the related species, however, are not stored in the public database. To gain the omics data in P . porrigens , we sequenced genome and transcriptome of its fruiting bodies and mycelia by next generation sequencing. Methodology/Principal Findings Short read sequences of genomic DNAs and mRNAs in P . porrigens were generated by Illumina Genome Analyzer. Genome short reads were de novo assembled into scaffolds using Velvet. Comparisons of genome signatures among Agaricales showed that P . porrigens has a unique genome signature. Transcriptome sequences were assembled into contigs (unigenes). Biological functions of unigenes were predicted by Gene Ontology and KEGG pathway analyses. The majority of unigenes would be novel genes without significant counterparts in the public omics databases. Conclusions Functional analyses of unigenes present the existence of numerous novel genes in the basidiomycetes division. The results mean that the omics information such as genome, transcriptome and metabolome in basidiomycetes is short in the current databases. The large-scale omics information on P . porrigens , provided from this research, will give a new data resource for gene discovery in basidiomycetes. PMID:23936076

IMG-ABC: An Atlas of Biosynthetic Gene Clusters to Fuel the Discovery of Novel Secondary Metabolites

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, I-Min; Chu, Ken; Ratner, Anna

2014-10-28

In the discovery of secondary metabolites (SMs), large-scale analysis of sequence data is a promising exploration path that remains largely underutilized due to the lack of relevant computational resources. We present IMG-ABC (https://img.jgi.doe.gov/abc/) -- An Atlas of Biosynthetic gene Clusters within the Integrated Microbial Genomes (IMG) system1. IMG-ABC is a rich repository of both validated and predicted biosynthetic clusters (BCs) in cultured isolates, single-cells and metagenomes linked with the SM chemicals they produce and enhanced with focused analysis tools within IMG. The underlying scalable framework enables traversal of phylogenetic dark matter and chemical structure space -- serving as a doorwaymore » to a new era in the discovery of novel molecules.« less
An Initiative to Facilitate Park Usage, Discovery, and Physical Activity Among Children and Adolescents in Greenville County, South Carolina, 2014

PubMed Central

Kaczynski, Andrew T.; Hughey, S. Morgan; Besenyi, Gina M.; Powers, Alicia R.

2017-01-01

Introduction Parks are important settings for increasing population-level physical activity (PA). The objective of this study was to evaluate Park Hop, an incentivized scavenger-hunt–style intervention designed to influence park usage, discovery, park-based PA, and perceptions of parks among children and adolescents in Greenville County, South Carolina. Methods We used 2 data collection methods: matched preintervention and postintervention parent-completed surveys and in-park observations during 4 days near the midpoint of the intervention. We used paired-samples t tests and logistic regression to analyze changes in park visitation, perceptions, and PA. Results Children and adolescents visited an average of 12.1 (of 19) Park Hop parks, and discovered an average of 4.6 venues. In a subset of participants, from preintervention to postintervention, the mean number of park visits increased from 5.0 visits to 6.1 visits, the proportion of time engaged in PA during the most recent park visit increased from 77% to 87%, and parents reported more positive perceptions of the quality of park amenities. We observed more children and adolescents (n = 586) in the 2 intervention parks than in the 2 matched control parks (n = 305). However, the likelihood of children and adolescents engaging in moderate-to-vigorous PA was significantly greater in the control parks (74.3%) than in Park Hop parks (64.2%). Conclusion Park Hop facilitated community-collaboration between park agencies and positively influenced park usage, park discovery, time engaged in PA during park visits, and perceptions of parks. This low-cost, replicable, and scalable model can be implemented across communities to facilitate youth and family-focused PA through parks. PMID:28182864
A comparative review of estimates of the proportion unchanged genes and the false discovery rate

PubMed Central

Broberg, Per

2005-01-01

Background In the analysis of microarray data one generally produces a vector of p-values that for each gene give the likelihood of obtaining equally strong evidence of change by pure chance. The distribution of these p-values is a mixture of two components corresponding to the changed genes and the unchanged ones. The focus of this article is how to estimate the proportion unchanged and the false discovery rate (FDR) and how to make inferences based on these concepts. Six published methods for estimating the proportion unchanged genes are reviewed, two alternatives are presented, and all are tested on both simulated and real data. All estimates but one make do without any parametric assumptions concerning the distributions of the p-values. Furthermore, the estimation and use of the FDR and the closely related q-value is illustrated with examples. Five published estimates of the FDR and one new are presented and tested. Implementations in R code are available. Results A simulation model based on the distribution of real microarray data plus two real data sets were used to assess the methods. The proposed alternative methods for estimating the proportion unchanged fared very well, and gave evidence of low bias and very low variance. Different methods perform well depending upon whether there are few or many regulated genes. Furthermore, the methods for estimating FDR showed a varying performance, and were sometimes misleading. The new method had a very low error. Conclusion The concept of the q-value or false discovery rate is useful in practical research, despite some theoretical and practical shortcomings. However, it seems possible to challenge the performance of the published methods, and there is likely scope for further developing the estimates of the FDR. The new methods provide the scientist with more options to choose a suitable method for any particular experiment. The article advocates the use of the conjoint information regarding false positive and
Expression of uncharacterized male germ cell-specific genes and discovery of novel sperm-tail proteins in mice.

PubMed

Kwon, Jun Tae; Ham, Sera; Jeon, Suyeon; Kim, Youil; Oh, Seungmin; Cho, Chunghee

2017-01-01

The identification and characterization of germ cell-specific genes are essential if we hope to comprehensively understand the mechanisms of spermatogenesis and fertilization. Here, we searched the mouse UniGene databases and identified 13 novel genes as being putatively testis-specific or -predominant. Our in silico and in vitro analyses revealed that the expressions of these genes are testis- and germ cell-specific, and that they are regulated in a stage-specific manner during spermatogenesis. We generated antibodies against the proteins encoded by seven of the genes to facilitate their characterization in male germ cells. Immunoblotting and immunofluorescence analyses revealed that one of these proteins was expressed only in testicular germ cells, three were expressed in both testicular germ cells and testicular sperm, and the remaining three were expressed in sperm of the testicular stages and in mature sperm from the epididymis. Further analysis of the latter three proteins showed that they were all associated with cytoskeletal structures in the sperm flagellum. Among them, MORN5, which is predicted to contain three MORN motifs, is conserved between mouse and human sperm. In conclusion, we herein identify 13 authentic genes with male germ cell-specific expression, and provide comprehensive information about these genes and their encoded products. Our finding will facilitate future investigations into the functional roles of these novel genes in spermatogenesis and sperm functions.
VIZARD: analysis of Affymetrix Arabidopsis GeneChip data

NASA Technical Reports Server (NTRS)

Moseyko, Nick; Feldman, Lewis J.

2002-01-01

SUMMARY: The Affymetrix GeneChip Arabidopsis genome array has proved to be a very powerful tool for the analysis of gene expression in Arabidopsis thaliana, the most commonly studied plant model organism. VIZARD is a Java program created at the University of California, Berkeley, to facilitate analysis of Arabidopsis GeneChip data. It includes several integrated tools for filtering, sorting, clustering and visualization of gene expression data as well as tools for the discovery of regulatory motifs in upstream sequences. VIZARD also includes annotation and upstream sequence databases for the majority of genes represented on the Affymetrix Arabidopsis GeneChip array. AVAILABILITY: VIZARD is available free of charge for educational, research, and not-for-profit purposes, and can be downloaded at http://www.anm.f2s.com/research/vizard/ CONTACT: moseyko@uclink4.berkeley.edu.
Long noncoding RNA EWSAT1-mediated gene repression facilitates Ewing sarcoma oncogenesis

PubMed Central

Marques Howarth, Michelle; Simpson, David; Ngok, Siu P.; Nieves, Bethsaida; Chen, Ron; Siprashvili, Zurab; Vaka, Dedeepya; Breese, Marcus R.; Crompton, Brian D.; Alexe, Gabriela; Hawkins, Doug S.; Jacobson, Damon; Brunner, Alayne L.; West, Robert; Mora, Jaume; Stegmaier, Kimberly; Khavari, Paul; Sweet-Cordero, E. Alejandro

2014-01-01

Chromosomal translocation that results in fusion of the genes encoding RNA-binding protein EWS and transcription factor FLI1 (EWS-FLI1) is pathognomonic for Ewing sarcoma. EWS-FLI1 alters gene expression through mechanisms that are not completely understood. We performed RNA sequencing (RNAseq) analysis on primary pediatric human mesenchymal progenitor cells (pMPCs) expressing EWS-FLI1 in order to identify gene targets of this oncoprotein. We determined that long noncoding RNA-277 (Ewing sarcoma–associated transcript 1 [EWSAT1]) is upregulated by EWS-FLI1 in pMPCs. Inhibition of EWSAT1 expression diminished the ability of Ewing sarcoma cell lines to proliferate and form colonies in soft agar, whereas EWSAT1 inhibition had no effect on other cell types tested. Expression of EWS-FLI1 and EWSAT1 repressed gene expression, and a substantial fraction of targets that were repressed by EWS-FLI1 were also repressed by EWSAT1. Analysis of RNAseq data from primary human Ewing sarcoma further supported a role for EWSAT1 in mediating gene repression. We identified heterogeneous nuclear ribonucleoprotein (HNRNPK) as an RNA-binding protein that interacts with EWSAT1 and found a marked overlap in HNRNPK-repressed genes and those repressed by EWS-FLI1 and EWSAT1, suggesting that HNRNPK participates in EWSAT1-mediated gene repression. Together, our data reveal that EWSAT1 is a downstream target of EWS-FLI1 that facilitates the development of Ewing sarcoma via the repression of target genes. PMID:25401475
A systems-genetics approach and data mining tool to assist in the discovery of genes underlying complex traits in Oryza sativa.

PubMed

Ficklin, Stephen P; Feltus, Frank Alex

2013-01-01

Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with
CREB and the discovery of cognitive enhancers.

PubMed

Scott, Roderick; Bourtchuladze, Rusiko; Gossweiler, Scott; Dubnau, Josh; Tully, Tim

2002-01-01

In the past few years, a series of molecular-genetic, biochemical, cellular and behavioral studies in fruit flies, sea slugs and mice have confirmed a long-standing notion that long-term memory formation depends on the synthesis of new proteins. Experiments focused on the cAMP-responsive transcription factor, CREB, have established that neural activity-induced regulation of gene transcription promotes a synaptic growth process that strengthens the connections among active neurons. This process constitutes a physical basis for the engram--and CREB is a "molecular switch" to produce the engram. Helicon Therapeutics has been formed to identify drug compounds that enhance memory formation via augmentation of CREB biochemistry. Candidate compounds have been identified from a high throughput cell-based screen and are being evaluated in animal models of memory formation. A gene discovery program also seeks to identify new genes, which function downstream of CREB during memory formation, as a source for new drug discoveries in the future. Together, these drug and gene discovery efforts promise new class of pharmaceutical therapies for the treatment of various forms of cognitive dysfunction.
Integrated Bio-Entity Network: A System for Biological Knowledge Discovery

PubMed Central

Bell, Lindsey; Chowdhary, Rajesh; Liu, Jun S.; Niu, Xufeng; Zhang, Jinfeng

2011-01-01

A significant part of our biological knowledge is centered on relationships between biological entities (bio-entities) such as proteins, genes, small molecules, pathways, gene ontology (GO) terms and diseases. Accumulated at an increasing speed, the information on bio-entity relationships is archived in different forms at scattered places. Most of such information is buried in scientific literature as unstructured text. Organizing heterogeneous information in a structured form not only facilitates study of biological systems using integrative approaches, but also allows discovery of new knowledge in an automatic and systematic way. In this study, we performed a large scale integration of bio-entity relationship information from both databases containing manually annotated, structured information and automatic information extraction of unstructured text in scientific literature. The relationship information we integrated in this study includes protein–protein interactions, protein/gene regulations, protein–small molecule interactions, protein–GO relationships, protein–pathway relationships, and pathway–disease relationships. The relationship information is organized in a graph data structure, named integrated bio-entity network (IBN), where the vertices are the bio-entities and edges represent their relationships. Under this framework, graph theoretic algorithms can be designed to perform various knowledge discovery tasks. We designed breadth-first search with pruning (BFSP) and most probable path (MPP) algorithms to automatically generate hypotheses—the indirect relationships with high probabilities in the network. We show that IBN can be used to generate plausible hypotheses, which not only help to better understand the complex interactions in biological systems, but also provide guidance for experimental designs. PMID:21738677
Genomics-driven discovery of the pneumocandin biosynthetic gene cluster in the fungus Glarea lozoyensis

PubMed Central

2013-01-01

Background The antifungal therapy caspofungin is a semi-synthetic derivative of pneumocandin B0, a lipohexapeptide produced by the fungus Glarea lozoyensis, and was the first member of the echinocandin class approved for human therapy. The nonribosomal peptide synthetase (NRPS)-polyketide synthases (PKS) gene cluster responsible for pneumocandin biosynthesis from G. lozoyensis has not been elucidated to date. In this study, we report the elucidation of the pneumocandin biosynthetic gene cluster by whole genome sequencing of the G. lozoyensis wild-type strain ATCC 20868. Results The pneumocandin biosynthetic gene cluster contains a NRPS (GLNRPS4) and a PKS (GLPKS4) arranged in tandem, two cytochrome P450 monooxygenases, seven other modifying enzymes, and genes for L-homotyrosine biosynthesis, a component of the peptide core. Thus, the pneumocandin biosynthetic gene cluster is significantly more autonomous and organized than that of the recently characterized echinocandin B gene cluster. Disruption mutants of GLNRPS4 and GLPKS4 no longer produced the pneumocandins (A0 and B0), and the Δglnrps4 and Δglpks4 mutants lost antifungal activity against the human pathogenic fungus Candida albicans. In addition to pneumocandins, the G. lozoyensis genome encodes a rich repertoire of natural product-encoding genes including 24 PKSs, six NRPSs, five PKS-NRPS hybrids, two dimethylallyl tryptophan synthases, and 14 terpene synthases. Conclusions Characterization of the gene cluster provides a blueprint for engineering new pneumocandin derivatives with improved pharmacological properties. Whole genome estimation of the secondary metabolite-encoding genes from G. lozoyensis provides yet another example of the huge potential for drug discovery from natural products from the fungal kingdom. PMID:23688303
BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation

PubMed Central

2011-01-01

We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be. PMID:21696594
A Novel mRNA Level Subtraction Method for Quick Identification of Target-Orientated Uniquely Expressed Genes Between Peanut Immature Pod and Leaf

PubMed Central

2010-01-01

Subtraction technique has been broadly applied for target gene discovery. However, most current protocols apply relative differential subtraction and result in great amount clone mixtures of unique and differentially expressed genes. This makes it more difficult to identify unique or target-orientated expressed genes. In this study, we developed a novel method for subtraction at mRNA level by integrating magnetic particle technology into driver preparation and tester–driver hybridization to facilitate uniquely expressed gene discovery between peanut immature pod and leaf through a single round subtraction. The resulting target clones were further validated through polymerase chain reaction screening using peanut immature pod and leaf cDNA libraries as templates. This study has resulted in identifying several genes expressed uniquely in immature peanut pod. These target genes can be used for future peanut functional genome and genetic engineering research. PMID:21406066
An ion channel library for drug discovery and safety screening on automated platforms.

PubMed

Wible, Barbara A; Kuryshev, Yuri A; Smith, Stephen S; Liu, Zhiqi; Brown, Arthur M

2008-12-01

Ion channels represent the third largest class of targets in drug discovery after G-protein coupled receptors and kinases. In spite of this ranking, ion channels continue to be under exploited as drug targets compared with the other two groups for several reasons. First, with 400 ion channel genes and an even greater number of functional channels due to mixing and matching of individual subunits, a systematic collection of ion channel-expressing cell lines for drug discovery and safety screening has not been available. Second, the lack of high-throughput functional assays for ion channels has limited their use as drug targets. Now that automated electrophysiology has come of age and provided the technology to assay ion channels at medium to high throughput, we have addressed the need for a library of ion channel cell lines by constructing the Ion Channel Panel (ChanTest Corp., Cleveland, OH). From 400 ion channel genes, a collection of 82 of the most relevant human ion channels for drug discovery, safety, and human disease has been assembled.Each channel has been stably overexpressed in human embryonic kidney 293 or Chinese hamster ovary cells. Cell lines have been selected and validated on automated electrophysiology systems to facilitate cost-effective screening for safe and selective compounds at earlier stages in the drug development process. The screening and validation processes as well as the relative advantages of different screening platforms are discussed.
Independent Gene Discovery and Testing

ERIC Educational Resources Information Center

Palsule, Vrushalee; Coric, Dijana; Delancy, Russell; Dunham, Heather; Melancon, Caleb; Thompson, Dennis; Toms, Jamie; White, Ashley; Shultz, Jeffry

2010-01-01

A clear understanding of basic gene structure is critical when teaching molecular genetics, the central dogma and the biological sciences. We sought to create a gene-based teaching project to improve students' understanding of gene structure and to integrate this into a research project that can be implemented by instructors at the secondary level…
SNP discovery and development of genetic markers for mapping innate immune response genes in common carp (Cyprinus carpio).

PubMed

Kongchum, Pawapol; Palti, Yniv; Hallerman, Eric M; Hulata, Gideon; David, Lior

2010-08-01

Single nucleotide polymorphisms (SNPs) in immune response genes have been reported as markers for susceptibility to infectious diseases in human and livestock. A disease caused by cyprinid herpesvirus 3 (CyHV-3) is highly contagious and virulent in common carp (Cyprinus carpio). With the aim to develop molecular tools for breeding CyHV-3-resistant carp, we have amplified and sequenced 11 candidate genes for viral disease resistance including TLR2, TLR3, TLR4ba, TLR7, TLR9, TLR21, TLR22, MyD88, TRAF6, type I IFN and IL-1beta. For each gene, we initially cloned and sequenced PCR amplicons from 8 to 12 fish (2-3 fish per strain) from the SNP discovery panel. We then identified and evaluated putative SNPs for their polymorphisms in the SNP discovery panel and validated their usefulness for linkage analysis in a full-sib family using the SNaPshot method. Our sequencing results and phylogenetic analyses suggested that TLR3, TLR7 and MyD88 genes are duplicated in the common carp genome. We, therefore, developed locus-specific PCR primers and SNP genotyping assays for the duplicated loci. A total of 48 SNP markers were developed from PCR fragments of the 13 loci (7 single-locus and 3 duplicated genes). Thirty-nine markers were polymorphic with estimated minor allele frequencies of more than 0.1. The utility of the SNP markers was evaluated in one full-sib family and revealed that 20 markers from 9 loci segregated in a disomic and Mendelian pattern and would be useful for linkage analysis. Published by Elsevier Ltd.
Accelerating Gene Discovery by Phenotyping Whole-Genome Sequenced Multi-mutation Strains and Using the Sequence Kernel Association Test (SKAT)

PubMed Central

Garland, Stephanie J.; Mohan, Swetha; Flibotte, Stephane; Muncaster, Quintin; Cai, Jerry; Rademakers, Suzanne; Moerman, Donald G.; Leroux, Michel R.

2016-01-01

Forward genetic screens represent powerful, unbiased approaches to uncover novel components in any biological process. Such screens suffer from a major bottleneck, however, namely the cloning of corresponding genes causing the phenotypic variation. Reverse genetic screens have been employed as a way to circumvent this issue, but can often be limited in scope. Here we demonstrate an innovative approach to gene discovery. Using C. elegans as a model system, we used a whole-genome sequenced multi-mutation library, from the Million Mutation Project, together with the Sequence Kernel Association Test (SKAT), to rapidly screen for and identify genes associated with a phenotype of interest, namely defects in dye-filling of ciliated sensory neurons. Such anomalies in dye-filling are often associated with the disruption of cilia, organelles which in humans are implicated in sensory physiology (including vision, smell and hearing), development and disease. Beyond identifying several well characterised dye-filling genes, our approach uncovered three genes not previously linked to ciliated sensory neuron development or function. From these putative novel dye-filling genes, we confirmed the involvement of BGNT-1.1 in ciliated sensory neuron function and morphogenesis. BGNT-1.1 functions at the trans-Golgi network of sheath cells (glia) to influence dye-filling and cilium length, in a cell non-autonomous manner. Notably, BGNT-1.1 is the orthologue of human B3GNT1/B4GAT1, a glycosyltransferase associated with Walker-Warburg syndrome (WWS). WWS is a multigenic disorder characterised by muscular dystrophy as well as brain and eye anomalies. Together, our work unveils an effective and innovative approach to gene discovery, and provides the first evidence that B3GNT1-associated Walker-Warburg syndrome may be considered a ciliopathy. PMID:27508411
GeneMesh: a web-based microarray analysis tool for relating differentially expressed genes to MeSH terms.

PubMed

Jani, Saurin D; Argraves, Gary L; Barth, Jeremy L; Argraves, W Scott

2010-04-01

An important objective of DNA microarray-based gene expression experimentation is determining inter-relationships that exist between differentially expressed genes and biological processes, molecular functions, cellular components, signaling pathways, physiologic processes and diseases. Here we describe GeneMesh, a web-based program that facilitates analysis of DNA microarray gene expression data. GeneMesh relates genes in a query set to categories available in the Medical Subject Headings (MeSH) hierarchical index. The interface enables hypothesis driven relational analysis to a specific MeSH subcategory (e.g., Cardiovascular System, Genetic Processes, Immune System Diseases etc.) or unbiased relational analysis to broader MeSH categories (e.g., Anatomy, Biological Sciences, Disease etc.). Genes found associated with a given MeSH category are dynamically linked to facilitate tabular and graphical depiction of Entrez Gene information, Gene Ontology information, KEGG metabolic pathway diagrams and intermolecular interaction information. Expression intensity values of groups of genes that cluster in relation to a given MeSH category, gene ontology or pathway can be displayed as heat maps of Z score-normalized values. GeneMesh operates on gene expression data derived from a number of commercial microarray platforms including Affymetrix, Agilent and Illumina. GeneMesh is a versatile web-based tool for testing and developing new hypotheses through relating genes in a query set (e.g., differentially expressed genes from a DNA microarray experiment) to descriptors making up the hierarchical structure of the National Library of Medicine controlled vocabulary thesaurus, MeSH. The system further enhances the discovery process by providing links between sets of genes associated with a given MeSH category to a rich set of html linked tabular and graphic information including Entrez Gene summaries, gene ontologies, intermolecular interactions, overlays of genes onto KEGG
Target genes discovery through copy number alteration analysis in human hepatocellular carcinoma.

PubMed

Gu, De-Leung; Chen, Yen-Hsieh; Shih, Jou-Ho; Lin, Chi-Hung; Jou, Yuh-Shan; Chen, Chian-Feng

2013-12-21

High-throughput short-read sequencing of exomes and whole cancer genomes in multiple human hepatocellular carcinoma (HCC) cohorts confirmed previously identified frequently mutated somatic genes, such as TP53, CTNNB1 and AXIN1, and identified several novel genes with moderate mutation frequencies, including ARID1A, ARID2, MLL, MLL2, MLL3, MLL4, IRF2, ATM, CDKN2A, FGF19, PIK3CA, RPS6KA3, JAK1, KEAP1, NFE2L2, C16orf62, LEPR, RAC2, and IL6ST. Functional classification of these mutated genes suggested that alterations in pathways participating in chromatin remodeling, Wnt/β-catenin signaling, JAK/STAT signaling, and oxidative stress play critical roles in HCC tumorigenesis. Nevertheless, because there are few druggable genes used in HCC therapy, the identification of new therapeutic targets through integrated genomic approaches remains an important task. Because a large amount of HCC genomic data genotyped by high density single nucleotide polymorphism arrays is deposited in the public domain, copy number alteration (CNA) analyses of these arrays is a cost-effective way to reveal target genes through profiling of recurrent and overlapping amplicons, homozygous deletions and potentially unbalanced chromosomal translocations accumulated during HCC progression. Moreover, integration of CNAs with other high-throughput genomic data, such as aberrantly coding transcriptomes and non-coding gene expression in human HCC tissues and rodent HCC models, provides lines of evidence that can be used to facilitate the identification of novel HCC target genes with the potential of improving the survival of HCC patients.
Heuristic Bayesian segmentation for discovery of coexpressed genes within genomic regions.

PubMed

Pehkonen, Petri; Wong, Garry; Törönen, Petri

2010-01-01

Segmentation aims to separate homogeneous areas from the sequential data, and plays a central role in data mining. It has applications ranging from finance to molecular biology, where bioinformatics tasks such as genome data analysis are active application fields. In this paper, we present a novel application of segmentation in locating genomic regions with coexpressed genes. We aim at automated discovery of such regions without requirement for user-given parameters. In order to perform the segmentation within a reasonable time, we use heuristics. Most of the heuristic segmentation algorithms require some decision on the number of segments. This is usually accomplished by using asymptotic model selection methods like the Bayesian information criterion. Such methods are based on some simplification, which can limit their usage. In this paper, we propose a Bayesian model selection to choose the most proper result from heuristic segmentation. Our Bayesian model presents a simple prior for the segmentation solutions with various segment numbers and a modified Dirichlet prior for modeling multinomial data. We show with various artificial data sets in our benchmark system that our model selection criterion has the best overall performance. The application of our method in yeast cell-cycle gene expression data reveals potential active and passive regions of the genome.
De Novo Transcriptomic Analysis of Peripheral Blood Lymphocytes from the Chinese Goose: Gene Discovery and Immune System Pathway Description

PubMed Central

Tariq, Mansoor; Chen, Rong; Yuan, Hongyu; Liu, Yanjie; Wu, Yanan; Wang, Junya; Xia, Chun

2015-01-01

Background The Chinese goose is one of the most economically important poultry birds and is a natural reservoir for many avian viruses. However, the nature and regulation of the innate and adaptive immune systems of this waterfowl species are not completely understood due to limited information on the goose genome. Recently, transcriptome sequencing technology was applied in the genomic studies focused on novel gene discovery. Thus, this study described the transcriptome of the goose peripheral blood lymphocytes to identify immunity relevant genes. Principal Findings De novo transcriptome assembly of the goose peripheral blood lymphocytes was sequenced by Illumina-Solexa technology. In total, 211,198 unigenes were assembled from the 69.36 million cleaned reads. The average length, N50 size and the maximum length of the assembled unigenes were 687 bp, 1,298 bp and 18,992 bp, respectively. A total of 36,854 unigenes showed similarity by BLAST search against the NCBI non-redundant (Nr) protein database. For functional classification, 163,161 unigenes were comprised of three Gene Ontology (Go) categories and 67 subcategories. A total of 15,334 unigenes were annotated into 25 eukaryotic orthologous groups (KOGs) categories. Kyoto Encyclopedia of Genes and Genomes (KEGG) database annotated 39,585 unigenes into six biological functional groups and 308 pathways. Among the 2,757 unigenes that participated in the 15 immune system KEGG pathways, 125 of the most important immune relevant genes were summarized and analyzed by STRING analysis to identify gene interactions and relationships. Moreover, 10 genes were confirmed by PCR and analyzed. Of these 125 unigenes, 109 unigenes, approximately 87%, were not previously identified in the goose. Conclusion This de novo transcriptome analysis could provide important Chinese goose sequence information and highlights the value of new gene discovery, pathways investigation and immune system gene identification, and comparison with

DAVID-WS: a stateful web service to facilitate gene/protein list analysis.

PubMed

Jiao, Xiaoli; Sherman, Brad T; Huang, Da Wei; Stephens, Robert; Baseler, Michael W; Lane, H Clifford; Lempicki, Richard A

2012-07-01

The database for annotation, visualization and integrated discovery (DAVID), which can be freely accessed at http://david.abcc.ncifcrf.gov/, is a web-based online bioinformatics resource that aims to provide tools for the functional interpretation of large lists of genes/proteins. It has been used by researchers from more than 5000 institutes worldwide, with a daily submission rate of ∼1200 gene lists from ∼400 unique researchers, and has been cited by more than 6000 scientific publications. However, the current web interface does not support programmatic access to DAVID, and the uniform resource locator (URL)-based application programming interface (API) has a limit on URL size and is stateless in nature as it uses URL request and response messages to communicate with the server, without keeping any state-related details. DAVID-WS (web service) has been developed to automate user tasks by providing stateful web services to access DAVID programmatically without the need for human interactions. The web service and sample clients (written in Java, Perl, Python and Matlab) are made freely available under the DAVID License at http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html.
GEM-TREND: a web tool for gene expression data mining toward relevant network discovery

PubMed Central

Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi

2009-01-01

are dynamically linked to external data repositories. Conclusion GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at . PMID:19728865
GEM-TREND: a web tool for gene expression data mining toward relevant network discovery.

PubMed

Feng, Chunlai; Araki, Michihiro; Kunimoto, Ryo; Tamon, Akiko; Makiguchi, Hiroki; Niijima, Satoshi; Tsujimoto, Gozoh; Okuno, Yasushi

2009-09-03

linked to external data repositories. GEM-TREND was developed to retrieve gene expression data by comparing query gene-expression pattern with those of GEO gene expression data. It could be a very useful resource for finding similar gene expression profiles and constructing its gene co-expression networks from a publicly available database. GEM-TREND was designed to be user-friendly and is expected to support knowledge discovery. GEM-TREND is freely available at http://cgs.pharm.kyoto-u.ac.jp/services/network.
Cyanobacteria: photosynthetic factories combining biodiversity, radiation resistance, and genetics to facilitate drug discovery.

PubMed

Cassier-Chauvat, Corinne; Dive, Vincent; Chauvat, Franck

2017-02-01

Cyanobacteria are ancient, abundant, and widely diverse photosynthetic prokaryotes, which are viewed as promising cell factories for the ecologically responsible production of chemicals. Natural cyanobacteria synthesize a vast array of biologically active (secondary) metabolites with great potential for human health, while a few genetic models can be engineered for the (low level) production of biofuels. Recently, genome sequencing and mining has revealed that natural cyanobacteria have the capacity to produce many more secondary metabolites than have been characterized. The corresponding panoply of enzymes (polyketide synthases and non-ribosomal peptide synthases) of interest for synthetic biology can still be increased through gene manipulations with the tools available for the few genetically manipulable strains. In this review, we propose to exploit the metabolic diversity and radiation resistance of cyanobacteria, and when required the genetics of model strains, for the production and radioactive ( 14 C) labeling of bioactive products, in order to facilitate the screening for new drugs.
A Systems-Genetics Approach and Data Mining Tool to Assist in the Discovery of Genes Underlying Complex Traits in Oryza sativa

PubMed Central

Ficklin, Stephen P.; Feltus, Frank Alex

2013-01-01

Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with
Genome-Scale Discovery of Cell Wall Biosynthesis Genes in Populus (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment)

ScienceCinema

Muchero, Wellington

2018-01-15

Wellington Muchero from Oak Ridge National Laboratory gives a talk titled "Discovery of Cell Wall Biosynthesis Genes in Populus" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.
Antifreeze protein gene amplification facilitated niche exploitation and speciation in wolffish.

PubMed

Desjardins, Mariève; Graham, Laurie A; Davies, Peter L; Fletcher, Garth L

2012-06-01

During winter, the coastal waters of Newfoundland can be considered a 'freeze risk ecozone' for teleost fishes, where the shallower habitats pose a high (and the deeper habitats a low) risk of freezing. Atlantic (Anarhichas lupus) and spotted (Anarhichas minor) wolffish, which inhabit these waters, reside at opposite ends of this ecozone, with the Atlantic wolffish being the species facing the greatest risk, because of its shallower niche. In order to resist freezing, this species secretes five times the level of antifreeze protein (AFP) activity into the plasma than does the spotted wolffish. The main basis for this interspecific difference in AFP levels is gene dosage, as the Atlantic wolffish has approximately three times as many AFP gene copies as the spotted wolffish. In addition, AFP transcript levels in liver (the primary source of circulating AFPs) are several times higher in the Atlantic wolffish. One explanation for the difference in gene dosage and transcript levels is the presence of tandemly arrayed repeats in the latter, which make up two-thirds of its AFP gene pool. Such repeats are not present in the spotted wolffish. The available evidence indicates that the two species diverged from a common ancestor at a time when the ebb and flow of northern glaciations would have resulted in the emergence of shallow water 'freeze risk ecozones'. The results of this study suggest that the duplication/amplification of AFP genes in a subpopulation of ancestral wolffish would have facilitated the exploitation of this high-risk habitat, resulting in the divergence and evolution of modern-day Atlantic and spotted wolffish species. © 2012 The Authors Journal compilation © 2012 FEBS.
Exploiting Pre-rRNA Processing in Diamond Blackfan Anemia Gene Discovery and Diagnosis

PubMed Central

Farrar, Jason E.; Quarello, Paola; Fisher, Ross; O’Brien, Kelly A.; Aspesi, Anna; Parrella, Sara; Henson, Adrianna L.; Seidel, Nancy E.; Atsidaftos, Eva; Prakash, Supraja; Bari, Shahla; Garelli, Emanuela; Arceci, Robert J.; Dianzani, Irma; Ramenghi, Ugo; Vlachos, Adrianna; Lipton, Jeffrey M.; Bodine, David M.; Ellis, Steven R.

2014-01-01

Diamond Blackfan anemia (DBA), a syndrome primarily characterized by anemia and physical abnormalities, is one among a group of related inherited bone marrow failure syndromes (IBMFS) which share overlapping clinical features. Heterozygous mutations or single-copy deletions have been identified in 12 ribosomal protein genes in approximately 60% of DBA cases, with the genetic etiology unexplained in most remaining patients. Unlike many IBMFS, for which functional screening assays complement clinical and genetic findings, suspected DBA in the absence of typical alterations of the known genes must frequently be diagnosed after exclusion of other IBMFS. We report here a novel deletion in a child that presented such a diagnostic challenge and prompted development of a novel functional assay that can assist in the diagnosis of a significant fraction of patients with DBA. The ribosomal proteins affected in DBA are required for pre-rRNA processing, a process which can be interrogated to monitor steps in the maturation of 40S and 60S ribosomal subunits. In contrast to prior methods used to assess pre-rRNA processing, the assay reported here, based on capillary electrophoresis measurement of the maturation of rRNA in pre-60S ribosomal subunits, would be readily amenable to use in diagnostic laboratories. In addition to utility as a diagnostic tool, we applied this technique to gene discovery in DBA, resulting in the identification of RPL31 as a novel DBA gene. PMID:25042156
Application of industrial scale genomics to discovery of therapeutic targets in heart failure.

PubMed

Mehraban, F; Tomlinson, J E

2001-12-01

In recent years intense activity in both academic and industrial sectors has provided a wealth of information on the human genome with an associated impressive increase in the number of novel gene sequences deposited in sequence data repositories and patent applications. This genomic industrial revolution has transformed the way in which drug target discovery is now approached. In this article we discuss how various differential gene expression (DGE) technologies are being utilized for cardiovascular disease (CVD) drug target discovery. Other approaches such as sequencing cDNA from cardiovascular derived tissues and cells coupled with bioinformatic sequence analysis are used with the aim of identifying novel gene sequences that may be exploited towards target discovery. Additional leverage from gene sequence information is obtained through identification of polymorphisms that may confer disease susceptibility and/or affect drug responsiveness. Pharmacogenomic studies are described wherein gene expression-based techniques are used to evaluate drug response and/or efficacy. Industrial-scale genomics supports and addresses not only novel target gene discovery but also the burgeoning issues in pharmaceutical and clinical cardiovascular medicine relative to polymorphic gene responses.
Myogenin Recruits the Histone Chaperone Facilitates Chromatin Transcription (FACT) to Promote Nucleosome Disassembly at Muscle-specific Genes*

PubMed Central

Lolis, Alexandra A.; Londhe, Priya; Beggs, Benjamin C.; Byrum, Stephanie D.; Tackett, Alan J.; Davie, Judith K.

2013-01-01

Facilitates chromatin transcription (FACT) functions to reorganize nucleosomes by acting as a histone chaperone that destabilizes and restores nucleosomal structure. The FACT complex is composed of two subunits: SSRP1 and SPT16. We have discovered that myogenin interacts with the FACT complex. Transfection of FACT subunits with myogenin is highly stimulatory for endogenous muscle gene expression in 10T1/2 cells. We have also found that FACT subunits do not associate with differentiation-specific genes while C2C12 cells are proliferating but are recruited to muscle-specific genes as differentiation initiates and then dissociate as differentiation proceeds. The recruitment is dependent on myogenin, as knockdowns of myogenin show no recruitment of the FACT complex. These data suggest that FACT is involved in the early steps of gene activation through its histone chaperone activities that serve to open the chromatin structure and facilitate transcription. Consistent with this hypothesis, we find that nucleosomes are depleted at muscle-specific promoters upon differentiation and that this activity is dependent on the presence of FACT. Our results show that the FACT complex promotes myogenin-dependent transcription and suggest that FACT plays an important role in the establishment of the appropriate transcription profile in a differentiated muscle cell. PMID:23364797
Plant-derived isoprenoid sweeteners: recent progress in biosynthetic gene discovery and perspectives on microbial production.

PubMed

Seki, Hikaru; Tamura, Keita; Muranaka, Toshiya

2018-06-01

Increased public awareness of negative health effects associated with excess sugar consumption has triggered increasing interest in plant-derived natural sweeteners. Steviol glycosides are a group of highly sweet diterpene glycosides contained in the leaves of stevia (Stevia rebaudiana). Mogrosides, extracted from monk fruit (Siraitia grosvenorii), are a group of cucurbitane-type triterpenoid glycosides. Glycyrrhizin is an oleanane-type triterpenoid glycoside derived from the underground parts of Glycyrrhiza plants (licorice). This review focuses on the natural isoprenoid sweetening agents steviol glycosides, mogrosides, and glycyrrhizin, and describes recent progress in gene discovery and elucidation of the catalytic functions of their biosynthetic enzymes. Recently, remarkable progress has been made in engineering the production of various plant-specialized metabolites in microbial hosts such as Saccharomyces cerevisiae via the introduction of biosynthetic enzyme genes. Perspectives on the microbial production of plant-derived natural sweeteners are also discussed.
Phenome-driven disease genetics prediction toward drug discovery.

PubMed

Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong

2015-06-15

Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. nlp. edu/public/data/DMN © The Author 2015. Published by Oxford University Press.
Phenome-driven disease genetics prediction toward drug discovery

PubMed Central

Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong

2015-01-01

Motivation: Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. Results: To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e−4) and 81.3% (P < e−12) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn’s disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn’s disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn’s disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. Availability and implementation: nlp
A comprehensive resource of drought- and salinity- responsive ESTs for gene discovery and marker development in chickpea (Cicer arietinum L.)

PubMed Central

2009-01-01

and their expression profile showed predominance in specific stress-challenged libraries. Conclusion Generated set of chickpea ESTs serves as a resource of high quality transcripts for gene discovery and development of functional markers associated with abiotic stress tolerance that will be helpful to facilitate chickpea breeding. Mapping of gene-based markers in chickpea will also add more anchoring points to align genomes of chickpea and other legume species. PMID:19912666
iCOSSY: An Online Tool for Context-Specific Subnetwork Discovery from Gene Expression Data

PubMed Central

Saha, Ashis; Jeon, Minji; Tan, Aik Choon; Kang, Jaewoo

2015-01-01

Pathway analyses help reveal underlying molecular mechanisms of complex biological phenotypes. Biologists tend to perform multiple pathway analyses on the same dataset, as there is no single answer. It is often inefficient for them to implement and/or install all the algorithms by themselves. Online tools can help the community in this regard. Here we present an online gene expression analytical tool called iCOSSY which implements a novel pathway-based COntext-specific Subnetwork discoverY (COSSY) algorithm. iCOSSY also includes a few modifications of COSSY to increase its reliability and interpretability. Users can upload their gene expression datasets, and discover important subnetworks of closely interacting molecules to differentiate between two phenotypes (context). They can also interactively visualize the resulting subnetworks. iCOSSY is a web server that finds subnetworks that are differentially expressed in two phenotypes. Users can visualize the subnetworks to understand the biology of the difference. PMID:26147457
Discovery of novel bacterial toxins by genomics and computational biology.

PubMed

Doxey, Andrew C; Mansfield, Michael J; Montecucco, Cesare

2018-06-01

Hundreds and hundreds of bacterial protein toxins are presently known. Traditionally, toxin identification begins with pathological studies of bacterial infectious disease. Following identification and cultivation of a bacterial pathogen, the protein toxin is purified from the culture medium and its pathogenic activity is studied using the methods of biochemistry and structural biology, cell biology, tissue and organ biology, and appropriate animal models, supplemented by bioimaging techniques. The ongoing and explosive development of high-throughput DNA sequencing and bioinformatic approaches have set in motion a revolution in many fields of biology, including microbiology. One consequence is that genes encoding novel bacterial toxins can be identified by bioinformatic and computational methods based on previous knowledge accumulated from studies of the biology and pathology of thousands of known bacterial protein toxins. Starting from the paradigmatic cases of diphtheria toxin, tetanus and botulinum neurotoxins, this review discusses traditional experimental approaches as well as bioinformatics and genomics-driven approaches that facilitate the discovery of novel bacterial toxins. We discuss recent work on the identification of novel botulinum-like toxins from genera such as Weissella, Chryseobacterium, and Enteroccocus, and the implications of these computationally identified toxins in the field. Finally, we discuss the promise of metagenomics in the discovery of novel toxins and their ecological niches, and present data suggesting the existence of uncharacterized, botulinum-like toxin genes in insect gut metagenomes. Copyright © 2018. Published by Elsevier Ltd.
DAVID-WS: a stateful web service to facilitate gene/protein list analysis

PubMed Central

Jiao, Xiaoli; Sherman, Brad T.; Huang, Da Wei; Stephens, Robert; Baseler, Michael W.; Lane, H. Clifford; Lempicki, Richard A.

2012-01-01

Summary: The database for annotation, visualization and integrated discovery (DAVID), which can be freely accessed at http://david.abcc.ncifcrf.gov/, is a web-based online bioinformatics resource that aims to provide tools for the functional interpretation of large lists of genes/proteins. It has been used by researchers from more than 5000 institutes worldwide, with a daily submission rate of ∼1200 gene lists from ∼400 unique researchers, and has been cited by more than 6000 scientific publications. However, the current web interface does not support programmatic access to DAVID, and the uniform resource locator (URL)-based application programming interface (API) has a limit on URL size and is stateless in nature as it uses URL request and response messages to communicate with the server, without keeping any state-related details. DAVID-WS (web service) has been developed to automate user tasks by providing stateful web services to access DAVID programmatically without the need for human interactions. Availability: The web service and sample clients (written in Java, Perl, Python and Matlab) are made freely available under the DAVID License at http://david.abcc.ncifcrf.gov/content.jsp?file=WS.html. Contact: xiaoli.jiao@nih.gov; rlempicki@nih.gov PMID:22543366
Distribution and licensing of drug discovery tools – NIH perspectives

PubMed Central

Kim, J. P.

2009-01-01

Now, more than ever, drug discovery conducted at industrial or academic facilities requires rapid access to state-of-the-art research tools. Unreasonable restrictions or delays in the distribution or use of such tools can stifle new discoveries, thus limiting the development of future biomedical products. In grants and its own research programs the National Institutes of Health (NIH) is implementing its new policy to facilitate the exchanges of these tools for research discoveries and product development. PMID:12546842
Constraint of gene expression by the chromatin remodelling protein CHD4 facilitates lineage specification

PubMed Central

O'Shaughnessy-Kirwan, Aoife; Signolet, Jason; Costello, Ita; Gharbi, Sarah; Hendrich, Brian

2015-01-01

Chromatin remodelling proteins are essential for different aspects of metazoan biology, yet functional details of why these proteins are important are lacking. Although it is possible to describe the biochemistry of how they remodel chromatin, their chromatin-binding profiles in cell lines, and gene expression changes upon loss of a given protein, in very few cases can this easily translate into an understanding of how the function of that protein actually influences a developmental process. Here, we investigate how the chromatin remodelling protein CHD4 facilitates the first lineage decision in mammalian embryogenesis. Embryos lacking CHD4 can form a morphologically normal early blastocyst, but are unable to successfully complete the first lineage decision and form functional trophectoderm (TE). In the absence of a functional TE, Chd4 mutant blastocysts do not implant and are hence not viable. By measuring transcript levels in single cells from early embryos, we show that CHD4 influences the frequency at which unspecified cells in preimplantation stage embryos express lineage markers prior to the execution of this first lineage decision. In the absence of CHD4, this frequency is increased in 16-cell embryos, and by the blastocyst stage cells fail to properly adopt a TE gene expression programme. We propose that CHD4 allows cells to undertake lineage commitment in vivo by modulating the frequency with which lineage-specification genes are expressed. This provides novel insight into both how lineage decisions are made in mammalian cells, and how a chromatin remodelling protein functions to facilitate lineage commitment. PMID:26116663
KENNEDY SPACE CENTER, FLA. - In the Orbiter Processing Facility, KSC employee Gene Peavler works in the wheel area on the orbiter Discovery. The vehicle has undergone Orbiter Major Modifications in the past year. Discovery is scheduled to fly on mission STS-121 to the International Space Station.

NASA Image and Video Library

2003-12-09

KENNEDY SPACE CENTER, FLA. - In the Orbiter Processing Facility, KSC employee Gene Peavler works in the wheel area on the orbiter Discovery. The vehicle has undergone Orbiter Major Modifications in the past year. Discovery is scheduled to fly on mission STS-121 to the International Space Station.

Arrayed antibody library technology for therapeutic biologic discovery.

PubMed

Bentley, Cornelia A; Bazirgan, Omar A; Graziano, James J; Holmes, Evan M; Smider, Vaughn V

2013-03-15

Traditional immunization and display antibody discovery methods rely on competitive selection amongst a pool of antibodies to identify a lead. While this approach has led to many successful therapeutic antibodies, targets have been limited to proteins which are easily purified. In addition, selection driven discovery has produced a narrow range of antibody functionalities focused on high affinity antagonism. We review the current progress in developing arrayed protein libraries for screening-based, rather than selection-based, discovery. These single molecule per microtiter well libraries have been screened in multiplex formats against both purified antigens and directly against targets expressed on the cell surface. This facilitates the discovery of antibodies against therapeutically interesting targets (GPCRs, ion channels, and other multispanning membrane proteins) and epitopes that have been considered poorly accessible to conventional discovery methods. Copyright © 2013. Published by Elsevier Inc.
Discovery of Tumor Suppressor Gene Function.

ERIC Educational Resources Information Center

Oppenheimer, Steven B.

1995-01-01

This is an update of a 1991 review on tumor suppressor genes written at a time when understanding of how the genes work was limited. A recent major breakthrough in the understanding of the function of tumor suppressor genes is discussed. (LZ)
The PhytoClust tool for metabolic gene clusters discovery in plant genomes

PubMed Central

Fuchs, Lisa-Maria

2017-01-01

Abstract The existence of Metabolic Gene Clusters (MGCs) in plant genomes has recently raised increased interest. Thus far, MGCs were commonly identified for pathways of specialized metabolism, mostly those associated with terpene type products. For efficient identification of novel MGCs, computational approaches are essential. Here, we present PhytoClust; a tool for the detection of candidate MGCs in plant genomes. The algorithm employs a collection of enzyme families related to plant specialized metabolism, translated into hidden Markov models, to mine given genome sequences for physically co-localized metabolic enzymes. Our tool accurately identifies previously characterized plant MGCs. An exhaustive search of 31 plant genomes detected 1232 and 5531 putative gene cluster types and candidates, respectively. Clustering analysis of putative MGCs types by species reflected plant taxonomy. Furthermore, enrichment analysis revealed taxa- and species-specific enrichment of certain enzyme families in MGCs. When operating through our web-interface, PhytoClust users can mine a genome either based on a list of known cluster types or by defining new cluster rules. Moreover, for selected plant species, the output can be complemented by co-expression analysis. Altogether, we envisage PhytoClust to enhance novel MGCs discovery which will in turn impact the exploration of plant metabolism. PMID:28486689
Biomimicry as a basis for drug discovery.

PubMed

Kolb, V M

1998-01-01

Selected works are discussed which clearly demonstrate that mimicking various aspects of the process by which natural products evolved is becoming a powerful tool in contemporary drug discovery. Natural products are an established and rich source of drugs. The term "natural product" is often used synonymously with "secondary metabolite." Knowledge of genetics and molecular evolution helps us understand how biosynthesis of many classes of secondary metabolites evolved. One proposed hypothesis is termed "inventive evolution." It invokes duplication of genes, and mutation of the gene copies, among other genetic events. The modified duplicate genes, per se or in conjunction with other genetic events, may give rise to new enzymes, which, in turn, may generate new products, some of which may be selected for. Steps of the inventive evolution can be mimicked in several ways for purpose of drug discovery. For example, libraries of chemical compounds of any imaginable structure may be produced by combinatorial synthesis. Out of these libraries new active compounds can be selected. In another example, genetic system can be manipulated to produce modified natural products ("unnatural natural products"), from which new drugs can be selected. In some instances, similar natural products turn up in species that are not direct descendants of each other. This is presumably due to a horizontal gene transfer. The mechanism of this inter-species gene transfer can be mimicked in therapeutic gene delivery. Mimicking specifics or principles of chemical evolution including experimental and test-tube evolution also provides leads for new drug discovery.
Interestingness measures and strategies for mining multi-ontology multi-level association rules from gene ontology annotations for the discovery of new GO relationships.

PubMed

Manda, Prashanti; McCarthy, Fiona; Bridges, Susan M

2013-10-01

The Gene Ontology (GO), a set of three sub-ontologies, is one of the most popular bio-ontologies used for describing gene product characteristics. GO annotation data containing terms from multiple sub-ontologies and at different levels in the ontologies is an important source of implicit relationships between terms from the three sub-ontologies. Data mining techniques such as association rule mining that are tailored to mine from multiple ontologies at multiple levels of abstraction are required for effective knowledge discovery from GO annotation data. We present a data mining approach, Multi-ontology data mining at All Levels (MOAL) that uses the structure and relationships of the GO to mine multi-ontology multi-level association rules. We introduce two interestingness measures: Multi-ontology Support (MOSupport) and Multi-ontology Confidence (MOConfidence) customized to evaluate multi-ontology multi-level association rules. We also describe a variety of post-processing strategies for pruning uninteresting rules. We use publicly available GO annotation data to demonstrate our methods with respect to two applications (1) the discovery of co-annotation suggestions and (2) the discovery of new cross-ontology relationships. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Genome-wide and gene-based association implicates FRMD6 in Alzheimer disease.

PubMed

Hong, Mun-Gwan; Reynolds, Chandra A; Feldman, Adina L; Kallin, Mikael; Lambert, Jean-Charles; Amouyel, Philippe; Ingelsson, Erik; Pedersen, Nancy L; Prince, Jonathan A

2012-03-01

Genome-wide association studies (GWAS) that allow for allelic heterogeneity may facilitate the discovery of novel genes not detectable by models that require replication of a single variant site. One strategy to accomplish this is to focus on genes rather than markers as units of association, and so potentially capture a spectrum of causal alleles that differ across populations. Here, we conducted a GWAS of Alzheimer disease (AD) in 2,586 Swedes and performed gene-based meta-analysis with three additional studies from France, Canada, and the United States, in total encompassing 4,259 cases and 8,284 controls. Implementing a newly designed gene-based algorithm, we identified two loci apart from the region around APOE that achieved study-wide significance in combined samples, the strongest finding being for FRMD6 on chromosome 14q (P = 2.6 × 10(-14)) and a weaker signal for NARS2 that is immediately adjacent to GAB2 on chromosome 11q (P = 7.8 × 10(-9)). Ontology-based pathway analyses revealed significant enrichment of genes involved in glycosylation. Results suggest that gene-based approaches that accommodate allelic heterogeneity in GWAS can provide a complementary avenue for gene discovery and may help to explain a portion of the missing heritability not detectable with single nucleotide polymorphisms (SNPs) derived from marker-specific meta-analysis. © 2011 Wiley Periodicals, Inc.
An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.

PubMed

Hsu, Arthur L; Tang, Sen-Lin; Halgamuge, Saman K

2003-11-01

Current Self-Organizing Maps (SOMs) approaches to gene expression pattern clustering require the user to predefine the number of clusters likely to be expected. Hierarchical clustering methods used in this area do not provide unique partitioning of data. We describe an unsupervised dynamic hierarchical self-organizing approach, which suggests an appropriate number of clusters, to perform class discovery and marker gene identification in microarray data. In the process of class discovery, the proposed algorithm identifies corresponding sets of predictor genes that best distinguish one class from other classes. The approach integrates merits of hierarchical clustering with robustness against noise known from self-organizing approaches. The proposed algorithm applied to DNA microarray data sets of two types of cancers has demonstrated its ability to produce the most suitable number of clusters. Further, the corresponding marker genes identified through the unsupervised algorithm also have a strong biological relationship to the specific cancer class. The algorithm tested on leukemia microarray data, which contains three leukemia types, was able to determine three major and one minor cluster. Prediction models built for the four clusters indicate that the prediction strength for the smaller cluster is generally low, therefore labelled as uncertain cluster. Further analysis shows that the uncertain cluster can be subdivided further, and the subdivisions are related to two of the original clusters. Another test performed using colon cancer microarray data has automatically derived two clusters, which is consistent with the number of classes in data (cancerous and normal). JAVA software of dynamic SOM tree algorithm is available upon request for academic use. A comparison of rectangular and hexagonal topologies for GSOM is available from http://www.mame.mu.oz.au/mechatronics/journalinfo/Hsu2003supp.pdf
Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

PubMed

Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N

2013-03-15

The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter
Antibiotic discovery throughout the Small World Initiative: A molecular strategy to identify biosynthetic gene clusters involved in antagonistic activity.

PubMed

Davis, Elizabeth; Sloan, Tyler; Aurelius, Krista; Barbour, Angela; Bodey, Elijah; Clark, Brigette; Dennis, Celeste; Drown, Rachel; Fleming, Megan; Humbert, Allison; Glasgo, Elizabeth; Kerns, Trent; Lingro, Kelly; McMillin, MacKenzie; Meyer, Aaron; Pope, Breanna; Stalevicz, April; Steffen, Brittney; Steindl, Austin; Williams, Carolyn; Wimberley, Carmen; Zenas, Robert; Butela, Kristen; Wildschutte, Hans

2017-06-01

The emergence of bacterial pathogens resistant to all known antibiotics is a global health crisis. Adding to this problem is that major pharmaceutical companies have shifted away from antibiotic discovery due to low profitability. As a result, the pipeline of new antibiotics is essentially dry and many bacteria now resist the effects of most commonly used drugs. To address this global health concern, citizen science through the Small World Initiative (SWI) was formed in 2012. As part of SWI, students isolate bacteria from their local environments, characterize the strains, and assay for antibiotic production. During the 2015 fall semester at Bowling Green State University, students isolated 77 soil-derived bacteria and genetically characterized strains using the 16S rRNA gene, identified strains exhibiting antagonistic activity, and performed an expanded SWI workflow using transposon mutagenesis to identify a biosynthetic gene cluster involved in toxigenic compound production. We identified one mutant with loss of antagonistic activity and through subsequent whole-genome sequencing and linker-mediated PCR identified a 24.9 kb biosynthetic gene locus likely involved in inhibitory activity in that mutant. Further assessment against human pathogens demonstrated the inhibition of Bacillus cereus, Listeria monocytogenes, and methicillin-resistant Staphylococcus aureus in the presence of this compound, thus supporting our molecular strategy as an effective research pipeline for SWI antibiotic discovery and genetic characterization. © 2017 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
The Biomedical Resource Ontology (BRO) to Enable Resource Discovery in Clinical and Translational Research

PubMed Central

Tenenbaum, Jessica D.; Whetzel, Patricia L.; Anderson, Kent; Borromeo, Charles D.; Dinov, Ivo D.; Gabriel, Davera; Kirschner, Beth; Mirel, Barbara; Morris, Tim; Noy, Natasha; Nyulas, Csongor; Rubenson, David; Saxman, Paul R.; Singh, Harpreet; Whelan, Nancy; Wright, Zach; Athey, Brian D.; Becich, Michael J.; Ginsburg, Geoffrey S.; Musen, Mark A.; Smith, Kevin A.; Tarantal, Alice F.; Rubin, Daniel L; Lyster, Peter

2010-01-01

The biomedical research community relies on a diverse set of resources, both within their own institutions and at other research centers. In addition, an increasing number of shared electronic resources have been developed. Without effective means to locate and query these resources, it is challenging, if not impossible, for investigators to be aware of the myriad resources available, or to effectively perform resource discovery when the need arises. In this paper, we describe the development and use of the Biomedical Resource Ontology (BRO) to enable semantic annotation and discovery of biomedical resources. We also describe the Resource Discovery System (RDS) which is a federated, inter-institutional pilot project that uses the BRO to facilitate resource discovery on the Internet. Through the RDS framework and its associated Biositemaps infrastructure, the BRO facilitates semantic search and discovery of biomedical resources, breaking down barriers and streamlining scientific research that will improve human health. PMID:20955817
STS-114: Discovery Impromptu Briefing

NASA Technical Reports Server (NTRS)

2005-01-01

Dr. Griffin, NASA Administrator, is accompanied by members of The U.S. House of Representatives in this STS-114 Discovery Impromptu briefing. The U.S. House of Representatives present include: Sherwood Boehlert, House Science Committee Chairman, Senator Hutchinson, Sheila Jackson, 18th Congressional District Texas, Al Green, 9th Congressional District, Representative Jim Davis, Florida, and Gene Green, 29th District, Texas. Griffin talks about the problem that occurred with the external fuel tank sensor of the Space Shuttle Discovery and the effort NASA is pursuing to track the problem, and identify the root cause. He answers questions from the news media about the next steps for the Space Shuttle Discovery, time frame for the launch, and activities for the astronauts for the next few days.
Genetic and epigenetic control of gene expression by CRISPR–Cas systems

PubMed Central

Lo, Albert; Qi, Lei

2017-01-01

The discovery and adaption of bacterial clustered regularly interspaced short palindromic repeats (CRISPR)–CRISPR-associated (Cas) systems has revolutionized the way researchers edit genomes. Engineering of catalytically inactivated Cas variants (nuclease-deficient or nuclease-deactivated [dCas]) combined with transcriptional repressors, activators, or epigenetic modifiers enable sequence-specific regulation of gene expression and chromatin state. These CRISPR–Cas-based technologies have contributed to the rapid development of disease models and functional genomics screening approaches, which can facilitate genetic target identification and drug discovery. In this short review, we will cover recent advances of CRISPR–dCas9 systems and their use for transcriptional repression and activation, epigenome editing, and engineered synthetic circuits for complex control of the mammalian genome. PMID:28649363
Cancer in silico drug discovery: a systems biology tool for identifying candidate drugs to target specific molecular tumor subtypes.

PubMed

San Lucas, F Anthony; Fowler, Jerry; Chang, Kyle; Kopetz, Scott; Vilar, Eduardo; Scheet, Paul

2014-12-01

Large-scale cancer datasets such as The Cancer Genome Atlas (TCGA) allow researchers to profile tumors based on a wide range of clinical and molecular characteristics. Subsequently, TCGA-derived gene expression profiles can be analyzed with the Connectivity Map (CMap) to find candidate drugs to target tumors with specific clinical phenotypes or molecular characteristics. This represents a powerful computational approach for candidate drug identification, but due to the complexity of TCGA and technology differences between CMap and TCGA experiments, such analyses are challenging to conduct and reproduce. We present Cancer in silico Drug Discovery (CiDD; scheet.org/software), a computational drug discovery platform that addresses these challenges. CiDD integrates data from TCGA, CMap, and Cancer Cell Line Encyclopedia (CCLE) to perform computational drug discovery experiments, generating hypotheses for the following three general problems: (i) determining whether specific clinical phenotypes or molecular characteristics are associated with unique gene expression signatures; (ii) finding candidate drugs to repress these expression signatures; and (iii) identifying cell lines that resemble the tumors being studied for subsequent in vitro experiments. The primary input to CiDD is a clinical or molecular characteristic. The output is a biologically annotated list of candidate drugs and a list of cell lines for in vitro experimentation. We applied CiDD to identify candidate drugs to treat colorectal cancers harboring mutations in BRAF. CiDD identified EGFR and proteasome inhibitors, while proposing five cell lines for in vitro testing. CiDD facilitates phenotype-driven, systematic drug discovery based on clinical and molecular data from TCGA. ©2014 American Association for Cancer Research.
Discovery of Antibiotics-derived Polymers for Gene Delivery using Combinatorial Synthesis and Cheminformatics Modeling

PubMed Central

Potta, Thrimoorthy; Zhen, Zhuo; Grandhi, Taraka Sai Pavan; Christensen, Matthew D.; Ramos, James; Breneman, Curt M.; Rege, Kaushal

2014-01-01

We describe the combinatorial synthesis and cheminformatics modeling of aminoglycoside antibiotics-derived polymers for transgene delivery and expression. Fifty-six polymers were synthesized by polymerizing aminoglycosides with diglycidyl ether cross-linkers. Parallel screening resulted in identification of several lead polymers that resulted in high transgene expression levels in cells. The role of polymer physicochemical properties in determining efficacy of transgene expression was investigated using Quantitative Structure-Activity Relationship (QSAR) cheminformatics models based on Support Vector Regression (SVR) and ‘building block’ polymer structures. The QSAR model exhibited high predictive ability, and investigation of descriptors in the model, using molecular visualization and correlation plots, indicated that physicochemical attributes related to both, aminoglycosides and diglycidyl ethers facilitated transgene expression. This work synergistically combines combinatorial synthesis and parallel screening with cheminformatics-based QSAR models for discovery and physicochemical elucidation of effective antibiotics-derived polymers for transgene delivery in medicine and biotechnology. PMID:24331709
The PhytoClust tool for metabolic gene clusters discovery in plant genomes.

PubMed

Töpfer, Nadine; Fuchs, Lisa-Maria; Aharoni, Asaph

2017-07-07

The existence of Metabolic Gene Clusters (MGCs) in plant genomes has recently raised increased interest. Thus far, MGCs were commonly identified for pathways of specialized metabolism, mostly those associated with terpene type products. For efficient identification of novel MGCs, computational approaches are essential. Here, we present PhytoClust; a tool for the detection of candidate MGCs in plant genomes. The algorithm employs a collection of enzyme families related to plant specialized metabolism, translated into hidden Markov models, to mine given genome sequences for physically co-localized metabolic enzymes. Our tool accurately identifies previously characterized plant MGCs. An exhaustive search of 31 plant genomes detected 1232 and 5531 putative gene cluster types and candidates, respectively. Clustering analysis of putative MGCs types by species reflected plant taxonomy. Furthermore, enrichment analysis revealed taxa- and species-specific enrichment of certain enzyme families in MGCs. When operating through our web-interface, PhytoClust users can mine a genome either based on a list of known cluster types or by defining new cluster rules. Moreover, for selected plant species, the output can be complemented by co-expression analysis. Altogether, we envisage PhytoClust to enhance novel MGCs discovery which will in turn impact the exploration of plant metabolism. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
The Lineage-Specific Evolution of Aquaporin Gene Clusters Facilitated Tetrapod Terrestrial Adaptation

PubMed Central

Finn, Roderick Nigel; Chauvigné, François; Hlidberg, Jón Baldur; Cutler, Christopher P.; Cerdà, Joan

2014-01-01

A major physiological barrier for aquatic organisms adapting to terrestrial life is dessication in the aerial environment. This barrier was nevertheless overcome by the Devonian ancestors of extant Tetrapoda, but the origin of specific molecular mechanisms that solved this water problem remains largely unknown. Here we show that an ancient aquaporin gene cluster evolved specifically in the sarcopterygian lineage, and subsequently diverged into paralogous forms of AQP2, -5, or -6 to mediate water conservation in extant Tetrapoda. To determine the origin of these apomorphic genomic traits, we combined aquaporin sequencing from jawless and jawed vertebrates with broad taxon assembly of >2,000 transcripts amongst 131 deuterostome genomes and developed a model based upon Bayesian inference that traces their convergent roots to stem subfamilies in basal Metazoa and Prokaryota. This approach uncovered an unexpected diversity of aquaporins in every lineage investigated, and revealed that the vertebrate superfamily consists of 17 classes of aquaporins (Aqp0 - Aqp16). The oldest orthologs associated with water conservation in modern Tetrapoda are traced to a cluster of three aqp2-like genes in Actinistia that likely arose >500 Ma through duplication of an aqp0-like gene present in a jawless ancestor. In sea lamprey, we show that aqp0 first arose in a protocluster comprised of a novel aqp14 paralog and a fused aqp01 gene. To corroborate these findings, we conducted phylogenetic analyses of five syntenic nuclear receptor subfamilies, which, together with observations of extensive genome rearrangements, support the coincident loss of ancestral aqp2-like orthologs in Actinopterygii. We thus conclude that the divergence of sarcopterygian-specific aquaporin gene clusters was permissive for the evolution of water conservation mechanisms that facilitated tetrapod terrestrial adaptation. PMID:25426855
Discovery of Possible Gene Relationships through the Application of Self-Organizing Maps to DNA Microarray Databases

PubMed Central

Chavez-Alvarez, Rocio; Chavoya, Arturo; Mendez-Vazquez, Andres

2014-01-01

DNA microarrays and cell cycle synchronization experiments have made possible the study of the mechanisms of cell cycle regulation of Saccharomyces cerevisiae by simultaneously monitoring the expression levels of thousands of genes at specific time points. On the other hand, pattern recognition techniques can contribute to the analysis of such massive measurements, providing a model of gene expression level evolution through the cell cycle process. In this paper, we propose the use of one of such techniques –an unsupervised artificial neural network called a Self-Organizing Map (SOM)–which has been successfully applied to processes involving very noisy signals, classifying and organizing them, and assisting in the discovery of behavior patterns without requiring prior knowledge about the process under analysis. As a test bed for the use of SOMs in finding possible relationships among genes and their possible contribution in some biological processes, we selected 282 S. cerevisiae genes that have been shown through biological experiments to have an activity during the cell cycle. The expression level of these genes was analyzed in five of the most cited time series DNA microarray databases used in the study of the cell cycle of this organism. With the use of SOM, it was possible to find clusters of genes with similar behavior in the five databases along two cell cycles. This result suggested that some of these genes might be biologically related or might have a regulatory relationship, as was corroborated by comparing some of the clusters obtained with SOMs against a previously reported regulatory network that was generated using biological knowledge, such as protein-protein interactions, gene expression levels, metabolism dynamics, promoter binding, and modification, regulation and transport of proteins. The methodology described in this paper could be applied to the study of gene relationships of other biological processes in different organisms. PMID:24699245
Applications of chemogenomic library screening in drug discovery.

PubMed

Jones, Lyn H; Bunnage, Mark E

2017-04-01

The allure of phenotypic screening, combined with the industry preference for target-based approaches, has prompted the development of innovative chemical biology technologies that facilitate the identification of new therapeutic targets for accelerated drug discovery. A chemogenomic library is a collection of selective small-molecule pharmacological agents, and a hit from such a set in a phenotypic screen suggests that the annotated target or targets of that pharmacological agent may be involved in perturbing the observable phenotype. In this Review, we describe opportunities for chemogenomic screening to considerably expedite the conversion of phenotypic screening projects into target-based drug discovery approaches. Other applications are explored, including drug repositioning, predictive toxicology and the discovery of novel pharmacological modalities.
Biomedical Information Extraction: Mining Disease Associated Genes from Literature

ERIC Educational Resources Information Center

Huang, Zhong

2014-01-01

Disease associated gene discovery is a critical step to realize the future of personalized medicine. However empirical and clinical validation of disease associated genes are time consuming and expensive. In silico discovery of disease associated genes from literature is therefore becoming the first essential step for biomarker discovery to…
Multiple Testing in the Context of Gene Discovery in Sickle Cell Disease Using Genome-Wide Association Studies.

PubMed

Kuo, Kevin H M

2017-01-01

The issue of multiple testing, also termed multiplicity, is ubiquitous in studies where multiple hypotheses are tested simultaneously. Genome-wide association study (GWAS), a type of genetic association study that has gained popularity in the past decade, is most susceptible to the issue of multiple testing. Different methodologies have been employed to address the issue of multiple testing in GWAS. The purpose of the review is to examine the methodologies employed in dealing with multiple testing in the context of gene discovery using GWAS in sickle cell disease complications.

Antisense oligonucleotide technologies in drug discovery.

PubMed

Aboul-Fadl, Tarek

2006-09-01

The principle of antisense oligonucleotide (AS-OD) technologies is based on the specific inhibition of unwanted gene expression by blocking mRNA activity. It has long appeared to be an ideal strategy to leverage new genomic knowledge for drug discovery and development. In recent years, AS-OD technologies have been widely used as potent and promising tools for this purpose. There is a rapid increase in the number of antisense molecules progressing in clinical trials. AS-OD technologies provide a simple and efficient approach for drug discovery and development and are expected to become a reality in the near future. This editorial describes the established and emerging AS-OD technologies in drug discovery.
Biomarker Discovery by Novel Sensors Based on Nanoproteomics Approaches

PubMed Central

Dasilva, Noelia; Díez, Paula; Matarraz, Sergio; González-González, María; Paradinas, Sara; Orfao, Alberto; Fuentes, Manuel

2012-01-01

During the last years, proteomics has facilitated biomarker discovery by coupling high-throughput techniques with novel nanosensors. In the present review, we focus on the study of label-based and label-free detection systems, as well as nanotechnology approaches, indicating their advantages and applications in biomarker discovery. In addition, several disease biomarkers are shown in order to display the clinical importance of the improvement of sensitivity and selectivity by using nanoproteomics approaches as novel sensors. PMID:22438764
MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands

PubMed Central

Ou, Hong-Yu; He, Xinyi; Harrison, Ewan M.; Kulasekara, Bridget R.; Thani, Ali Bin; Kadioglu, Aras; Lory, Stephen; Hinton, Jay C. D.; Barer, Michael R.; Rajakumar, Kumar

2007-01-01

MobilomeFINDER (http://mml.sjtu.edu.cn/MobilomeFINDER) is an interactive online tool that facilitates bacterial genomic island or ‘mobile genome’ (mobilome) discovery; it integrates the ArrayOme and tRNAcc software packages. ArrayOme utilizes a microarray-derived comparative genomic hybridization input data set to generate ‘inferred contigs’ produced by merging adjacent genes classified as ‘present’. Collectively these ‘fragments’ represent a hypothetical ‘microarray-visualized genome (MVG)’. ArrayOme permits recognition of discordances between physical genome and MVG sizes, thereby enabling identification of strains rich in microarray-elusive novel genes. Individual tRNAcc tools facilitate automated identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites and other integration hotspots in closely related sequenced genomes. Accessory tools facilitate design of hotspot-flanking primers for in silico and/or wet-science-based interrogation of cognate loci in unsequenced strains and analysis of islands for features suggestive of foreign origins; island-specific and genome-contextual features are tabulated and represented in schematic and graphical forms. To date we have used MobilomeFINDER to analyse several Enterobacteriaceae, Pseudomonas aeruginosa and Streptococcus suis genomes. MobilomeFINDER enables high-throughput island identification and characterization through increased exploitation of emerging sequence data and PCR-based profiling of unsequenced test strains; subsequent targeted yeast recombination-based capture permits full-length sequencing and detailed functional studies of novel genomic islands. PMID:17537813
Functional Gene Discovery and Characterization of Genes and Alleles Affecting Wood Biomass Yield and Quality in Populus

DOE Office of Scientific and Technical Information (OSTI.GOV)

Busov, Victor

Adoption of biofuels as economically and environmentally viable alternative to fossil fuels would require development of specialized bioenergy varieties. A major goal in the breeding of such varieties is the improvement of lignocellulosic biomass yield and quality. These are complex traits and understanding the underpinning molecular mechanism can assist and accelerate their improvement. This is particularly important for tree bioenergy crops like poplars (species and hybrids from the genus Populus), for which breeding progress is extremely slow due to long generation cycles. A variety of approaches have been already undertaken to better understand the molecular bases of biomass yield andmore » quality in poplar. An obvious void in these undertakings has been the application of mutagenesis. Mutagenesis has been instrumental in the discovery and characterization of many plant traits including such that affect biomass yield and quality. In this proposal we use activation tagging to discover genes that can significantly affect biomass associated traits directly in poplar, a premier bioenergy crop. We screened a population of 5,000 independent poplar activation tagging lines under greenhouse conditions for a battery of biomass yield traits. These same plants were then analyzed for changes in wood chemistry using pyMBMS. As a result of these screens we have identified nearly 800 mutants, which are significantly (P<0.05) different when compared to wild type. Of these majority (~700) are affected in one of ten different biomass yield traits and 100 in biomass quality traits (e.g., lignin, S/G ration and C6/C5 sugars). We successfully recovered the position of the tag in approximately 130 lines, showed activation in nearly half of them and performed recapitulation experiments with 20 genes prioritized by the significance of the phenotype. Recapitulation experiments are still ongoing for many of the genes but the results are encouraging. For example, we have shown
Metabolomics-Driven Discovery of a Prenylated Isatin Antibiotic Produced by Streptomyces Species MBT28.

PubMed

Wu, Changsheng; Du, Chao; Gubbens, Jacob; Choi, Young Hae; van Wezel, Gilles P

2015-10-23

Actinomycetes are a major source of antimicrobials, anticancer compounds, and other medically important products, and their genomes harbor extensive biosynthetic potential. Major challenges in the screening of these microorganisms are to activate the expression of cryptic biosynthetic gene clusters and the development of technologies for efficient dereplication of known molecules. Here we report the identification of a previously unidentified isatin-type antibiotic produced by Streptomyces sp. MBT28, following a strategy based on NMR-based metabolomics combined with the introduction of streptomycin resistance in the producer strain. NMR-guided isolation by tracking the target proton signal resulted in the characterization of 7-prenylisatin (1) with antimicrobial activity against Bacillus subtilis. The metabolite-guided genome mining of Streptomyces sp. MBT28 combined with proteomics identified a gene cluster with an indole prenyltransferase that catalyzes the conversion of tryptophan into 7-prenylisatin. This study underlines the applicability of NMR-based metabolomics in facilitating the discovery of novel antibiotics.
Alternative RNA splicing of the MEAF6 gene facilitates neuroendocrine prostate cancer progression.

PubMed

Lee, Ahn R; Li, Yinan; Xie, Ning; Gleave, Martin E; Cox, Michael E; Collins, Colin C; Dong, Xuesen

2017-04-25

Although potent androgen receptor pathway inhibitors (ARPI) improve overall survival of metastatic prostate cancer patients, treatment-induced neuroendocrine prostate cancer (t-NEPC) as a consequence of the selection pressures of ARPI is becoming a more common clinical issue. Improved understanding of the molecular biology of t-NEPC is essential for the development of new effective management approaches for t-NEPC. In this study, we identify a splice variant of the MYST/Esa1-associated factor 6 (MEAF6) gene, MEAF6-1, that is highly expressed in both t-NEPC tumor biopsies and neuroendocrine cell lines of prostate and lung cancers. We show that MEAF6-1 splicing is stimulated by neuronal RNA splicing factor SRRM4. Rather than inducing neuroendocrine trans-differentiation of cells in prostate adenocarcinoma, MEAF6-1 upregulation stimulates cell proliferation, anchorage-independent cell growth, invasion and xenograft tumor growth. Gene microarray identifies that these MEAF6-1 actions are in part mediated by the ID1 and ID3 genes. These findings suggest that the MEAF6-1 variant does not induce neuroendocrine differentiation of prostate cancer cells, but rather facilitates t-NEPC progression by increasing the proliferation rate of cells that have acquired neuroendocrine phenotypes.
DiscoverySpace: an interactive data analysis application

PubMed Central

Robertson, Neil; Oveisi-Fordorei, Mehrdad; Zuyderduyn, Scott D; Varhol, Richard J; Fjell, Christopher; Marra, Marco; Jones, Steven; Siddiqui, Asim

2007-01-01

DiscoverySpace is a graphical application for bioinformatics data analysis. Users can seamlessly traverse references between biological databases and draw together annotations in an intuitive tabular interface. Datasets can be compared using a suite of novel tools to aid in the identification of significant patterns. DiscoverySpace is of broad utility and its particular strength is in the analysis of serial analysis of gene expression (SAGE) data. The application is freely available online. PMID:17210078
Gene/QTL discovery for Anthracnose in common bean (Phaseolus vulgaris L.) from North-western Himalayas

PubMed Central

Choudhary, Neeraj; Bawa, Vanya; Paliwal, Rajneesh; Singh, Bikram; Bhat, Mohd. Ashraf; Mir, Javid Iqbal; Gupta, Moni; Sofi, Parvaze A.; Thudi, Mahendar; Varshney, Rajeev K.

2018-01-01

Common bean (Phaseolus vulgaris L.) is one of the most important grain legume crops in the world. The beans grown in north-western Himalayas possess huge diversity for seed color, shape and size but are mostly susceptible to Anthracnose disease caused by seed born fungus Colletotrichum lindemuthianum. Dozens of QTLs/genes have been already identified for this disease in common bean world-wide. However, this is the first report of gene/QTL discovery for Anthracnose using bean germplasm from north-western Himalayas of state Jammu & Kashmir, India. A core set of 96 bean lines comprising 54 indigenous local landraces from 11 hot-spots and 42 exotic lines from 10 different countries were phenotyped at two locations (SKUAST-Jammu and Bhaderwah, Jammu) for Anthracnose resistance. The core set was also genotyped with genome-wide (91) random and trait linked SSR markers. The study of marker-trait associations (MTAs) led to the identification of 10 QTLs/genes for Anthracnose resistance. Among the 10 QTLs/genes identified, two MTAs are stable (BM45 & BM211), two MTAs (PVctt1 & BM211) are major explaining more than 20% phenotypic variation for Anthracnose and one MTA (BM211) is both stable and major. Six (06) genomic regions are reported for the first time, while as four (04) genomic regions validated the already known QTL/gene regions/clusters for Anthracnose. The major, stable and validated markers reported during the present study associated with Anthracnose resistance will prove useful in common bean molecular breeding programs aimed at enhancing Anthracnose resistance of local bean landraces grown in north-western Himalayas of state Jammu and Kashmir. PMID:29389971
De novo transcriptome sequencing and discovery of genes related to copper tolerance in Paeonia ostii.

PubMed

Wang, Yanjie; Dong, Chunlan; Xue, Zeyun; Jin, Qijiang; Xu, Yingchun

2016-01-15

Paeonia ostii, an important ornamental and medicinal plant, grows normally on copper (Cu) mines with widespread Cu contamination of soils, and it has the ability to lower Cu contents in the Cu-contaminated soils. However, very little molecular information concerned with Cu resistance of P. ostii is available. In this study, high-throughput de novo transcriptome sequencing was carried out for P. ostii with and without Cu treatment using Illumina HiSeq 2000 platform. A total of 77,704 All-unigenes were obtained with a mean length of 710 bp. Of these unigenes, 47,461 were annotated with public databases based on sequence similarities. Comparative transcript profiling allowed the discovery of 4324 differentially expressed genes (DEGs), with 2207 up-regulated and 2117 down-regulated unigenes in Cu-treated library as compared to the control counterpart. Based on these DEGs, Gene Ontology (GO) enrichment analysis indicated Cu stress-relevant terms, such as 'membrane' and 'antioxidant activity'. Meanwhile, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis uncovered some important pathways, including 'biosynthesis of secondary metabolites' and 'metabolic pathways'. In addition, expression patterns of 12 selected DEGs derived from quantitative real-time polymerase chain reaction (qRT-PCR) were consistent with their transcript abundance changes obtained by transcriptomic analyses, suggesting that all the 12 genes were authentically involved in Cu tolerance in P. ostii. This is the first report to identify genes related to Cu stress responses in P. ostii, which could offer valuable information on the molecular mechanisms of Cu resistance, and provide a basis for further genomics research on this and related ornamental species for phytoremediation. Copyright © 2015 Elsevier B.V. All rights reserved.
Gene/QTL discovery for Anthracnose in common bean (Phaseolus vulgaris L.) from North-western Himalayas.

PubMed

Choudhary, Neeraj; Bawa, Vanya; Paliwal, Rajneesh; Singh, Bikram; Bhat, Mohd Ashraf; Mir, Javid Iqbal; Gupta, Moni; Sofi, Parvaze A; Thudi, Mahendar; Varshney, Rajeev K; Mir, Reyazul Rouf

2018-01-01

Common bean (Phaseolus vulgaris L.) is one of the most important grain legume crops in the world. The beans grown in north-western Himalayas possess huge diversity for seed color, shape and size but are mostly susceptible to Anthracnose disease caused by seed born fungus Colletotrichum lindemuthianum. Dozens of QTLs/genes have been already identified for this disease in common bean world-wide. However, this is the first report of gene/QTL discovery for Anthracnose using bean germplasm from north-western Himalayas of state Jammu & Kashmir, India. A core set of 96 bean lines comprising 54 indigenous local landraces from 11 hot-spots and 42 exotic lines from 10 different countries were phenotyped at two locations (SKUAST-Jammu and Bhaderwah, Jammu) for Anthracnose resistance. The core set was also genotyped with genome-wide (91) random and trait linked SSR markers. The study of marker-trait associations (MTAs) led to the identification of 10 QTLs/genes for Anthracnose resistance. Among the 10 QTLs/genes identified, two MTAs are stable (BM45 & BM211), two MTAs (PVctt1 & BM211) are major explaining more than 20% phenotypic variation for Anthracnose and one MTA (BM211) is both stable and major. Six (06) genomic regions are reported for the first time, while as four (04) genomic regions validated the already known QTL/gene regions/clusters for Anthracnose. The major, stable and validated markers reported during the present study associated with Anthracnose resistance will prove useful in common bean molecular breeding programs aimed at enhancing Anthracnose resistance of local bean landraces grown in north-western Himalayas of state Jammu and Kashmir.
Regulation of gene expression in the mammalian eye and its relevance to eye disease.

PubMed

Scheetz, Todd E; Kim, Kwang-Youn A; Swiderski, Ruth E; Philp, Alisdair R; Braun, Terry A; Knudtson, Kevin L; Dorrance, Anne M; DiBona, Gerald F; Huang, Jian; Casavant, Thomas L; Sheffield, Val C; Stone, Edwin M

2006-09-26

We used expression quantitative trait locus mapping in the laboratory rat (Rattus norvegicus) to gain a broad perspective of gene regulation in the mammalian eye and to identify genetic variation relevant to human eye disease. Of >31,000 gene probes represented on an Affymetrix expression microarray, 18,976 exhibited sufficient signal for reliable analysis and at least 2-fold variation in expression among 120 F(2) rats generated from an SR/JrHsd x SHRSP intercross. Genome-wide linkage analysis with 399 genetic markers revealed significant linkage with at least one marker for 1,300 probes (alpha = 0.001; estimated empirical false discovery rate = 2%). Both contiguous and noncontiguous loci were found to be important in regulating mammalian eye gene expression. We investigated one locus of each type in greater detail and identified putative transcription-altering variations in both cases. We found an inserted cREL binding sequence in the 5' flanking sequence of the Abca4 gene associated with an increased expression level of that gene, and we found a mutation of the gene encoding thyroid hormone receptor beta2 associated with a decreased expression level of the gene encoding short-wavelength sensitive opsin (Opn1sw). In addition to these positional studies, we performed a pairwise analysis of gene expression to identify genes that are regulated in a coordinated manner and used this approach to validate two previously undescribed genes involved in the human disease Bardet-Biedl syndrome. These data and analytical approaches can be used to facilitate the discovery of additional genes and regulatory elements involved in human eye disease.
Landscape of genomic diversity and trait discovery in soybean.

PubMed

Valliyodan, Babu; Dan Qiu; Patil, Gunvant; Zeng, Peng; Huang, Jiaying; Dai, Lu; Chen, Chengxuan; Li, Yanjun; Joshi, Trupti; Song, Li; Vuong, Tri D; Musket, Theresa A; Xu, Dong; Shannon, J Grover; Shifeng, Cheng; Liu, Xin; Nguyen, Henry T

2016-03-31

Cultivated soybean [Glycine max (L.) Merr.] is a primary source of vegetable oil and protein. We report a landscape analysis of genome-wide genetic variation and an association study of major domestication and agronomic traits in soybean. A total of 106 soybean genomes representing wild, landraces, and elite lines were re-sequenced at an average of 17x depth with a 97.5% coverage. Over 10 million high-quality SNPs were discovered, and 35.34% of these have not been previously reported. Additionally, 159 putative domestication sweeps were identified, which includes 54.34 Mbp (4.9%) and 4,414 genes; 146 regions were involved in artificial selection during domestication. A genome-wide association study of major traits including oil and protein content, salinity, and domestication traits resulted in the discovery of novel alleles. Genomic information from this study provides a valuable resource for understanding soybean genome structure and evolution, and can also facilitate trait dissection leading to sequencing-based molecular breeding.
Landscape of genomic diversity and trait discovery in soybean

PubMed Central

Valliyodan, Babu; Dan Qiu; Patil, Gunvant; Zeng, Peng; Huang, Jiaying; Dai, Lu; Chen, Chengxuan; Li, Yanjun; Joshi, Trupti; Song, Li; Vuong, Tri D.; Musket, Theresa A.; Xu, Dong; Shannon, J. Grover; Shifeng, Cheng; Liu, Xin; Nguyen, Henry T.

2016-01-01

Cultivated soybean [Glycine max (L.) Merr.] is a primary source of vegetable oil and protein. We report a landscape analysis of genome-wide genetic variation and an association study of major domestication and agronomic traits in soybean. A total of 106 soybean genomes representing wild, landraces, and elite lines were re-sequenced at an average of 17x depth with a 97.5% coverage. Over 10 million high-quality SNPs were discovered, and 35.34% of these have not been previously reported. Additionally, 159 putative domestication sweeps were identified, which includes 54.34 Mbp (4.9%) and 4,414 genes; 146 regions were involved in artificial selection during domestication. A genome-wide association study of major traits including oil and protein content, salinity, and domestication traits resulted in the discovery of novel alleles. Genomic information from this study provides a valuable resource for understanding soybean genome structure and evolution, and can also facilitate trait dissection leading to sequencing-based molecular breeding. PMID:27029319
Computational functional genomics-based approaches in analgesic drug discovery and repurposing.

PubMed

Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred; Lötsch, Jörn

2018-06-01

Persistent pain is a major healthcare problem affecting a fifth of adults worldwide with still limited treatment options. The search for new analgesics increasingly includes the novel research area of functional genomics, which combines data derived from various processes related to DNA sequence, gene expression or protein function and uses advanced methods of data mining and knowledge discovery with the goal of understanding the relationship between the genome and the phenotype. Its use in drug discovery and repurposing for analgesic indications has so far been performed using knowledge discovery in gene function and drug target-related databases; next-generation sequencing; and functional proteomics-based approaches. Here, we discuss recent efforts in functional genomics-based approaches to analgesic drug discovery and repurposing and highlight the potential of computational functional genomics in this field including a demonstration of the workflow using a novel R library 'dbtORA'.
Merging chemical ecology with bacterial genome mining for secondary metabolite discovery.

PubMed

Vizcaino, Maria I; Guo, Xun; Crawford, Jason M

2014-02-01

The integration of chemical ecology and bacterial genome mining can enhance the discovery of structurally diverse natural products in functional contexts. By examining bacterial secondary metabolism in the framework of its ecological niche, insights into the upregulation of orphan biosynthetic pathways and the enhancement of the enzyme substrate supply can be obtained, leading to the discovery of new secondary metabolic pathways that would otherwise be silent or undetected under typical laboratory cultivation conditions. Access to these new natural products (i.e., the chemotypes) facilitates experimental genotype-to-phenotype linkages. Here, we describe certain functional natural products produced by Xenorhabdus and Photorhabdus bacteria with experimentally linked biosynthetic gene clusters as illustrative examples of the synergy between chemical ecology and bacterial genome mining in connecting genotypes to phenotypes through chemotype characterization. These Gammaproteobacteria share a mutualistic relationship with nematodes and a pathogenic relationship with insects and, in select cases, humans. The natural products encoded by these bacteria distinguish their interactions with their animal hosts and other microorganisms in their multipartite symbiotic lifestyles. Though both genera have similar lifestyles, their genetic, chemical, and physiological attributes are distinct. Both undergo phenotypic variation and produce a profuse number of bioactive secondary metabolites. We provide further detail in the context of regulation, production, processing, and function for these genetically encoded small molecules with respect to their roles in mutualism and pathogenicity. These collective insights more widely promote the discovery of atypical orphan biosynthetic pathways encoding novel small molecules in symbiotic systems, which could open up new avenues for investigating and exploiting microbial chemical signaling in host-bacteria interactions.
Discovery of Host Factors and Pathways Utilized in Hantaviral Infection

DTIC Science & Technology

2016-09-01

AWARD NUMBER: W81XWH-14-1-0204 TITLE: Discovery of Host Factors and Pathways Utilized in Hantaviral Infection PRINCIPAL INVESTIGATOR: Paul...Aug 2016 4. TITLE AND SUBTITLE Discovery of Host Factors and Pathways Utilized in Hantaviral Infection 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c...after significance values were calculated and corrected for false discovery rate. The top hit is ATP6V0A1, a gene encoding a subunit of a vacuolar
Genome-wide target profiling of piggyBac and Tol2 in HEK 293: pros and cons for gene discovery and gene therapy

PubMed Central

2011-01-01

Background DNA transposons have emerged as indispensible tools for manipulating vertebrate genomes with applications ranging from insertional mutagenesis and transgenesis to gene therapy. To fully explore the potential of two highly active DNA transposons, piggyBac and Tol2, as mammalian genetic tools, we have conducted a side-by-side comparison of the two transposon systems in the same setting to evaluate their advantages and disadvantages for use in gene therapy and gene discovery. Results We have observed that (1) the Tol2 transposase (but not piggyBac) is highly sensitive to molecular engineering; (2) the piggyBac donor with only the 40 bp 3'-and 67 bp 5'-terminal repeat domain is sufficient for effective transposition; and (3) a small amount of piggyBac transposases results in robust transposition suggesting the piggyBac transpospase is highly active. Performing genome-wide target profiling on data sets obtained by retrieving chromosomal targeting sequences from individual clones, we have identified several piggyBac and Tol2 hotspots and observed that (4) piggyBac and Tol2 display a clear difference in targeting preferences in the human genome. Finally, we have observed that (5) only sites with a particular sequence context can be targeted by either piggyBac or Tol2. Conclusions The non-overlapping targeting preference of piggyBac and Tol2 makes them complementary research tools for manipulating mammalian genomes. PiggyBac is the most promising transposon-based vector system for achieving site-specific targeting of therapeutic genes due to the flexibility of its transposase for being molecularly engineered. Insights from this study will provide a basis for engineering piggyBac transposases to achieve site-specific therapeutic gene targeting. PMID:21447194
Using Just-in-Time Information to Support Scientific Discovery Learning in a Computer-Based Simulation

ERIC Educational Resources Information Center

Hulshof, Casper D.; de Jong, Ton

2006-01-01

Students encounter many obstacles during scientific discovery learning with computer-based simulations. It is hypothesized that an effective type of support, that does not interfere with the scientific discovery learning process, should be delivered on a "just-in-time" base. This study explores the effect of facilitating access to…
Influence of smoking status and intensity on discovery of blood pressure loci through gene-smoking interactions

PubMed Central

Fuentes, Lisa de las; Schwander, Karen; Cupples, L. Adrienne; Rao, D. C.

2015-01-01

Background Genetic variation accounts for approximately 30% of blood pressure (BP) variability but most of that variability hasn't been attributed to specific variants. Interactions between genes and BP-associated factors may explain some ‘missing heritability.’ Cigarette smoking increases BP after short-term exposure and decreases BP with longer exposure. Gene-smoking interactions have discovered novel BP loci, but the contribution of smoking status and intensity to gene discovery is unknown. Methods We analyzed gene-smoking intensity interactions for association with systolic BP (SBP) in three subgroups from the Framingham Heart Study: current smokers only (N = 1,057), current and former smokers (‘ever smokers’, N = 3,374), and all subjects (N = 6,710). We used three smoking intensity variables defined at cutoffs of 10, 15, and 20 cigarettes per day (CPD). We evaluated the 1 degree-of-freedom (df) interaction and 2df joint test using generalized estimating equations. Results Analysis of current smokers using a CPD cutoff of 10 produced two loci associated with SBP. The rs9399633 minor allele was associated with increased SBP (5 mmHg) in heavy smokers (CPD>10) but decreased SBP (7 mmHg) in light smokers (CPD≤10). The rs11717948 minor allele was associated with decreased SBP (8 mmHg) in light smokers but decreased SBP (2 mmHg) in heavy smokers. Across all nine analyses, 19 additional loci reached p < 1×10−6. Discussion Analysis of current smokers may have the highest power to detect gene-smoking interactions, despite the reduced sample size. Associations of loci near SASH1 and KLHL6/KLHL24 with SBP may be modulated by tobacco smoking. PMID:25940791
Influence of Smoking Status and Intensity on Discovery of Blood Pressure Loci Through Gene-Smoking Interactions.

PubMed

Basson, Jacob; Sung, Yun Ju; Fuentes, Lisa de Las; Schwander, Karen; Cupples, L Adrienne; Rao, D C

2015-09-01

Genetic variation accounts for approximately 30% of blood pressure (BP) variability but most of that variability has not been attributed to specific variants. Interactions between genes and BP-associated factors may explain some "missing heritability." Cigarette smoking increases BP after short-term exposure and decreases BP with longer exposure. Gene-smoking interactions have discovered novel BP loci, but the contribution of smoking status and intensity to gene discovery is unknown. We analyzed gene-smoking intensity interactions for association with systolic BP (SBP) in three subgroups from the Framingham Heart Study: current smokers only (N = 1,057), current and former smokers ("ever smokers," N = 3,374), and all subjects (N = 6,710). We used three smoking intensity variables defined at cutoffs of 10, 15, and 20 cigarettes per day (CPD). We evaluated the 1 degree-of-freedom (df) interaction and 2df joint test using generalized estimating equations. Analysis of current smokers using a CPD cutoff of 10 produced two loci associated with SBP. The rs9399633 minor allele was associated with increased SBP (5 mmHg) in heavy smokers (CPD > 10) but decreased SBP (7 mmHg) in light smokers (CPD ≤ 10). The rs11717948 minor allele was associated with decreased SBP (8 mmHg) in light smokers but decreased SBP (2 mmHg) in heavy smokers. Across all nine analyses, 19 additional loci reached P < 1 × 10(-6). Analysis of current smokers may have the highest power to detect gene-smoking interactions, despite the reduced sample size. Associations of loci near SASH1 and KLHL6/KLHL24 with SBP may be modulated by tobacco smoking. © 2015 WILEY PERIODICALS, INC.

STAT3 Target Genes Relevant to Human Cancers

PubMed Central

Carpenter, Richard L.; Lo, Hui-Wen

2014-01-01

Since its discovery, the STAT3 transcription factor has been extensively studied for its function as a transcriptional regulator and its role as a mediator of development, normal physiology, and pathology of many diseases, including cancers. These efforts have uncovered an array of genes that can be positively and negatively regulated by STAT3, alone and in cooperation with other transcription factors. Through regulating gene expression, STAT3 has been demonstrated to play a pivotal role in many cellular processes including oncogenesis, tumor growth and progression, and stemness. Interestingly, recent studies suggest that STAT3 may behave as a tumor suppressor by activating expression of genes known to inhibit tumorigenesis. Additional evidence suggested that STAT3 may elicit opposing effects depending on cellular context and tumor types. These mixed results signify the need for a deeper understanding of STAT3, including its upstream regulators, parallel transcription co-regulators, and downstream target genes. To help facilitate fulfilling this unmet need, this review will be primarily focused on STAT3 downstream target genes that have been validated to associate with tumorigenesis and/or malignant biology of human cancers. PMID:24743777
Gene flow during glacial habitat shifts facilitates character displacement in a Neotropical flycatcher radiation.

PubMed

Chattopadhyay, Balaji; Garg, Kritika M; Gwee, Chyi Yin; Edwards, Scott V; Rheindt, Frank E

2017-09-01

Pleistocene climatic fluctuations are known to be an engine of biotic diversification at higher latitudes, but their impact on highly diverse tropical areas such as the Andes remains less well-documented. Specifically, while periods of global cooling may have led to fragmentation and differentiation at colder latitudes, they may - at the same time - have led to connectivity among insular patches of montane tropical habitat with unknown consequences on diversification. In the present study we utilized ~5.5 kb of DNA sequence data from eight nuclear loci and one mitochondrial gene alongside diagnostic morphological and bioacoustic markers to test the effects of Pleistocene climatic fluctuations on diversification in a complex of Andean tyrant-flycatchers of the genus Elaenia. Population genetic and phylogenetic approaches coupled with coalescent simulations demonstrated disparate levels of gene flow between the taxon chilensis and two parapatric Elaenia taxa predominantly during the last glacial period but not thereafter, possibly on account of downward shifts of montane forest habitat linking the populations of adjacent ridges. Additionally, morphological and bioacoustic analyses revealed a distinct pattern of character displacement in coloration and vocal traits between the two sympatric taxa albiceps and pallatangae, which were characterized by a lack of gene flow. Our study demonstrates that global periods of cooling are likely to have facilitated gene flow among Andean montane Elaenia flycatchers that are more isolated from one another during warm interglacial periods such as the present era. We also identify a hitherto overlooked case of plumage and vocal character displacement, underpinning the complexities of gene flow patterns caused by Pleistocene climate change across the Andes.
Arid5b facilitates chondrogenesis by recruiting the histone demethylase Phf2 to Sox9-regulated genes

NASA Astrophysics Data System (ADS)

Hata, Kenji; Takashima, Rikako; Amano, Katsuhiko; Ono, Koichiro; Nakanishi, Masako; Yoshida, Michiko; Wakabayashi, Makoto; Matsuda, Akio; Maeda, Yoshinobu; Suzuki, Yutaka; Sugano, Sumio; Whitson, Robert H.; Nishimura, Riko; Yoneda, Toshiyuki

2013-11-01

Histone modification, a critical step for epigenetic regulation, is an important modulator of biological events. Sox9 is a transcription factor critical for endochondral ossification; however, proof of its epigenetic regulation remains elusive. Here we identify AT-rich interactive domain 5b (Arid5b) as a transcriptional co-regulator of Sox9. Arid5b physically associates with Sox9 and synergistically induces chondrogenesis. Growth of Arid5b-/- mice is retarded with delayed endochondral ossification. Sox9-dependent chondrogenesis is attenuated in Arid5b-deficient cells. Arid5b recruits Phf2, a histone lysine demethylase, to the promoter region of Sox9 target genes and stimulates H3K9me2 demethylation of these genes. In the promoters of chondrogenic marker genes, H3K9me2 levels are increased in Arid5b-/- chondrocytes. Finally, we show that Phf2 knockdown inhibits Sox9-induced chondrocyte differentiation. Our findings establish an epigenomic mechanism of skeletal development, whereby Arid5b promotes chondrogenesis by facilitating Phf2-mediated histone demethylation of Sox9-regulated chondrogenic gene promoters.
Regulation of gene expression in the mammalian eye and its relevance to eye disease

PubMed Central

Scheetz, Todd E.; Kim, Kwang-Youn A.; Swiderski, Ruth E.; Philp, Alisdair R.; Braun, Terry A.; Knudtson, Kevin L.; Dorrance, Anne M.; DiBona, Gerald F.; Huang, Jian; Casavant, Thomas L.; Sheffield, Val C.; Stone, Edwin M.

2006-01-01

We used expression quantitative trait locus mapping in the laboratory rat (Rattus norvegicus) to gain a broad perspective of gene regulation in the mammalian eye and to identify genetic variation relevant to human eye disease. Of >31,000 gene probes represented on an Affymetrix expression microarray, 18,976 exhibited sufficient signal for reliable analysis and at least 2-fold variation in expression among 120 F2 rats generated from an SR/JrHsd × SHRSP intercross. Genome-wide linkage analysis with 399 genetic markers revealed significant linkage with at least one marker for 1,300 probes (α = 0.001; estimated empirical false discovery rate = 2%). Both contiguous and noncontiguous loci were found to be important in regulating mammalian eye gene expression. We investigated one locus of each type in greater detail and identified putative transcription-altering variations in both cases. We found an inserted cREL binding sequence in the 5′ flanking sequence of the Abca4 gene associated with an increased expression level of that gene, and we found a mutation of the gene encoding thyroid hormone receptor β2 associated with a decreased expression level of the gene encoding short-wavelength sensitive opsin (Opn1sw). In addition to these positional studies, we performed a pairwise analysis of gene expression to identify genes that are regulated in a coordinated manner and used this approach to validate two previously undescribed genes involved in the human disease Bardet–Biedl syndrome. These data and analytical approaches can be used to facilitate the discovery of additional genes and regulatory elements involved in human eye disease. PMID:16983098
Theoretical modeling of masking DNA application in aptamer-facilitated biomarker discovery.

PubMed

Cherney, Leonid T; Obrecht, Natalia M; Krylov, Sergey N

2013-04-16

In aptamer-facilitated biomarker discovery (AptaBiD), aptamers are selected from a library of random DNA (or RNA) sequences for their ability to specifically bind cell-surface biomarkers. The library is incubated with intact cells, and cell-bound DNA molecules are separated from those unbound and amplified by the polymerase chain reaction (PCR). The partitioning/amplification cycle is repeated multiple times while alternating target cells and control cells. Efficient aptamer selection in AptaBiD relies on the inclusion of masking DNA within the cell and library mixture. Masking DNA lacks primer regions for PCR amplification and is typically taken in excess to the library. The role of masking DNA within the selection mixture is to outcompete any nonspecific binding sequences within the initial library, thus allowing specific DNA sequences (i.e., aptamers) to be selected more efficiently. Efficient AptaBiD requires an optimum ratio of masking DNA to library DNA, at which aptamers still bind specific binding sites but nonaptamers within the library do not bind nonspecific binding sites. Here, we have developed a mathematical model that describes the binding processes taking place within the equilibrium mixture of masking DNA, library DNA, and target cells. An obtained mathematical solution allows one to estimate the concentration of masking DNA that is required to outcompete the library DNA at a desirable ratio of bound masking DNA to bound library DNA. The required concentration depends on concentrations of the library and cells as well as on unknown cell characteristics. These characteristics include the concentration of total binding sites on the cell surface, N, and equilibrium dissociation constants, K(nsL) and K(nsM), for nonspecific binding of the library DNA and masking DNA, respectively. We developed a theory that allows the determination of N, K(nsL), and K(nsM) based on measurements of EC50 values for cells mixed separately with the library and masking DNA
Orphan diseases: state of the drug discovery art.

PubMed

Volmar, Claude-Henry; Wahlestedt, Claes; Brothers, Shaun P

2017-06-01

Since 1983 more than 300 drugs have been developed and approved for orphan diseases. However, considering the development of novel diagnosis tools, the number of rare diseases vastly outpaces therapeutic discovery. Academic centers and nonprofit institutes are now at the forefront of rare disease R&D, partnering with pharmaceutical companies when academic researchers discover novel drugs or targets for specific diseases, thus reducing the failure risk and cost for pharmaceutical companies. Considerable progress has occurred in the art of orphan drug discovery, and a symbiotic relationship now exists between pharmaceutical industry, academia, and philanthropists that provides a useful framework for orphan disease therapeutic discovery. Here, the current state-of-the-art of drug discovery for orphan diseases is reviewed. Current technological approaches and challenges for drug discovery are considered, some of which can present somewhat unique challenges and opportunities in orphan diseases, including the potential for personalized medicine, gene therapy, and phenotypic screening.
Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery

DOE Office of Scientific and Technical Information (OSTI.GOV)

Weighill, Deborah; Jones, Piet; Shah, Manesh

Biological organisms are complex systems that are composed of functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants. However, the effects of these variants are the result of historic selective pressure and current environmental and epigenetic signals, and, as such, their co-occurrence can be seen as genome-wide correlations in a number of different manners. Biomass recalcitrance (i.e., the resistance of plants to degradation or deconstruction, which ultimately enables access to a plant's sugars) is a complex polygenic phenotype of high importance to biofuels initiatives. This study makes usemore » of data derived from the re-sequenced genomes from over 800 different Populus trichocarpa genotypes in combination with metabolomic and pyMBMS data across this population, as well as co-expression and co-methylation networks in order to better understand the molecular interactions involved in recalcitrance, and identify target genes involved in lignin biosynthesis/degradation. A Lines Of Evidence (LOE) scoring system is developed to integrate the information in the different layers and quantify the number of lines of evidence linking genes to target functions. This new scoring system was applied to quantify the lines of evidence linking genes to lignin-related genes and phenotypes across the network layers, and allowed for the generation of new hypotheses surrounding potential new candidate genes involved in lignin biosynthesis in P. trichocarpa, including various AGAMOUS-LIKE genes. Lastly, the resulting Genome Wide Association Study networks, integrated with Single Nucleotide Polymorphism (SNP) correlation, co-methylation, and co-expression networks through the LOE scores are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, the molecular basis for
Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery

DOE PAGES

Weighill, Deborah; Jones, Piet; Shah, Manesh; ...

2018-05-11

Biological organisms are complex systems that are composed of functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants. However, the effects of these variants are the result of historic selective pressure and current environmental and epigenetic signals, and, as such, their co-occurrence can be seen as genome-wide correlations in a number of different manners. Biomass recalcitrance (i.e., the resistance of plants to degradation or deconstruction, which ultimately enables access to a plant's sugars) is a complex polygenic phenotype of high importance to biofuels initiatives. This study makes usemore » of data derived from the re-sequenced genomes from over 800 different Populus trichocarpa genotypes in combination with metabolomic and pyMBMS data across this population, as well as co-expression and co-methylation networks in order to better understand the molecular interactions involved in recalcitrance, and identify target genes involved in lignin biosynthesis/degradation. A Lines Of Evidence (LOE) scoring system is developed to integrate the information in the different layers and quantify the number of lines of evidence linking genes to target functions. This new scoring system was applied to quantify the lines of evidence linking genes to lignin-related genes and phenotypes across the network layers, and allowed for the generation of new hypotheses surrounding potential new candidate genes involved in lignin biosynthesis in P. trichocarpa, including various AGAMOUS-LIKE genes. Lastly, the resulting Genome Wide Association Study networks, integrated with Single Nucleotide Polymorphism (SNP) correlation, co-methylation, and co-expression networks through the LOE scores are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, the molecular basis for
Gene and enhancer trap tagging of vascular-expressed genes in poplar trees

Treesearch

Andrew Groover; Joseph R. Fontana; Gayle Dupper; Caiping Ma; Robert Martienssen; Steven Strauss; Richard Meilan

2004-01-01

We report a gene discovery system for poplar trees based on gene and enhancer traps. Gene and enhancer trap vectors carrying the β-glucuronidase (GUS) reporter gene were inserted into the poplar genome via Agrobacterium tumefaciens transformation, where they reveal the expression pattern of genes at or near the insertion sites. Because GUS...
Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome

PubMed Central

Hücker, Sarah M.; Ardern, Zachary; Goldberg, Tatyana; Schafferhans, Andrea; Bernhofer, Michael; Vestergaard, Gisle; Nelson, Chase W.; Schloter, Michael; Rost, Burkhard; Scherer, Siegfried

2017-01-01

In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set. PMID:28902868
How to Facilitate Teachers' Understanding of Hypotheses and Predictions?

ERIC Educational Resources Information Center

Niaz, Mansoor

2011-01-01

The objective of this study was to facilitate inservice high school and university teachers' understanding of the difference between the terms "hypothesis" and "prediction." The context for understanding these terms was Columbus's discovery of America (as in the previous study). Control group teachers (N = 94) were evaluated before the discussion…
A novel algorithm for simplification of complex gene classifiers in cancer

PubMed Central

Wilson, Raphael A.; Teng, Ling; Bachmeyer, Karen M.; Bissonnette, Mei Lin Z.; Husain, Aliya N.; Parham, David M.; Triche, Timothy J.; Wing, Michele R.; Gastier-Foster, Julie M.; Barr, Frederic G.; Hawkins, Douglas S.; Anderson, James R.; Skapek, Stephen X.; Volchenboum, Samuel L.

2013-01-01

The clinical application of complex molecular classifiers as diagnostic or prognostic tools has been limited by the time and cost needed to apply them to patients. Using an existing fifty-gene expression signature known to separate two molecular subtypes of the pediatric cancer rhabdomyosarcoma, we show that an exhaustive iterative search algorithm can distill this complex classifier down to two or three features with equal discrimination. We validated the two-gene signatures using three separate and distinct data sets, including one that uses degraded RNA extracted from formalin-fixed, paraffin-embedded material. Finally, to demonstrate the generalizability of our algorithm, we applied it to a lung cancer data set to find minimal gene signatures that can distinguish survival. Our approach can easily be generalized and coupled to existing technical platforms to facilitate the discovery of simplified signatures that are ready for routine clinical use. PMID:23913937
Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures

PubMed Central

Stark, Alexander; Lin, Michael F.; Kheradpour, Pouya; Pedersen, Jakob S.; Parts, Leopold; Carlson, Joseph W.; Crosby, Madeline A.; Rasmussen, Matthew D.; Roy, Sushmita; Deoras, Ameya N.; Ruby, J. Graham; Brennecke, Julius; Hodges, Emily; Hinrichs, Angie S.; Caspi, Anat; Paten, Benedict; Park, Seung-Won; Han, Mira V.; Maeder, Morgan L.; Polansky, Benjamin J.; Robson, Bryanne E.; Aerts, Stein; van Helden, Jacques; Hassan, Bassem; Gilbert, Donald G.; Eastman, Deborah A.; Rice, Michael; Weir, Michael; Hahn, Matthew W.; Park, Yongkyu; Dewey, Colin N.; Pachter, Lior; Kent, W. James; Haussler, David; Lai, Eric C.; Bartel, David P.; Hannon, Gregory J.; Kaufman, Thomas C.; Eisen, Michael B.; Clark, Andrew G.; Smith, Douglas; Celniker, Susan E.; Gelbart, William M.; Kellis, Manolis

2008-01-01

Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies. PMID:17994088
Knowledge Discovery/A Collaborative Approach, an Innovative Solution

NASA Technical Reports Server (NTRS)

Fitts, Mary A.

2009-01-01

Collaboration between Medical Informatics and Healthcare Systems (MIHCS) at NASA/Johnson Space Center (JSC) and the Texas Medical Center (TMC) Library was established to investigate technologies for facilitating knowledge discovery across multiple life sciences research disciplines in multiple repositories. After reviewing 14 potential Enterprise Search System (ESS) solutions, Collexis was determined to best meet the expressed needs. A three month pilot evaluation of Collexis produced positive reports from multiple scientists across 12 research disciplines. The joint venture and a pilot-phased approach achieved the desired results without the high cost of purchasing software, hardware or additional resources to conduct the task. Medical research is highly compartmentalized by discipline, e.g. cardiology, immunology, neurology. The medical research community at large, as well as at JSC, recognizes the need for cross-referencing relevant information to generate best evidence. Cross-discipline collaboration at JSC is specifically required to close knowledge gaps affecting space exploration. To facilitate knowledge discovery across these communities, MIHCS combined expertise with the TMC library and found Collexis to best fit the needs of our researchers including:
Display technologies: application for the discovery of drug and gene delivery agents

PubMed Central

Sergeeva, Anna; Kolonin, Mikhail G.; Molldrem, Jeffrey J.; Pasqualini, Renata; Arap, Wadih

2007-01-01

Recognition of molecular diversity of cell surface proteomes in disease is essential for the development of targeted therapies. Progress in targeted therapeutics requires establishing effective approaches for high-throughput identification of agents specific for clinically relevant cell surface markers. Over the past decade, a number of platform strategies have been developed to screen polypeptide libraries for ligands targeting receptors selectively expressed in the context of various cell surface proteomes. Streamlined procedures for identification of ligand-receptor pairs that could serve as targets in disease diagnosis, profiling, imaging and therapy have relied on the display technologies, in which polypeptides with desired binding profiles can be serially selected, in a process called biopanning, based on their physical linkage with the encoding nucleic acid. These technologies include virus/phage display, cell display, ribosomal display, mRNA display and covalent DNA display (CDT), with phage display being by far the most utilized. The scope of this review is the recent advancements in the display technologies with a particular emphasis on molecular mapping of cell surface proteomes with peptide phage display. Prospective applications of targeted compounds derived from display libraries in the discovery of targeted drugs and gene therapy vectors are discussed. PMID:17123658
Text mining-based in silico drug discovery in oral mucositis caused by high-dose cancer therapy.

PubMed

Kirk, Jon; Shah, Nirav; Noll, Braxton; Stevens, Craig B; Lawler, Marshall; Mougeot, Farah B; Mougeot, Jean-Luc C

2018-08-01

Oral mucositis (OM) is a major dose-limiting side effect of chemotherapy and radiation used in cancer treatment. Due to the complex nature of OM, currently available drug-based treatments are of limited efficacy. Our objectives were (i) to determine genes and molecular pathways associated with OM and wound healing using computational tools and publicly available data and (ii) to identify drugs formulated for topical use targeting the relevant OM molecular pathways. OM and wound healing-associated genes were determined by text mining, and the intersection of the two gene sets was selected for gene ontology analysis using the GeneCodis program. Protein interaction network analysis was performed using STRING-db. Enriched gene sets belonging to the identified pathways were queried against the Drug-Gene Interaction database to find drug candidates for topical use in OM. Our analysis identified 447 genes common to both the "OM" and "wound healing" text mining concepts. Gene enrichment analysis yielded 20 genes representing six pathways and targetable by a total of 32 drugs which could possibly be formulated for topical application. A manual search on ClinicalTrials.gov confirmed no relevant pathway/drug candidate had been overlooked. Twenty-five of the 32 drugs can directly affect the PTGS2 (COX-2) pathway, the pathway that has been targeted in previous clinical trials with limited success. Drug discovery using in silico text mining and pathway analysis tools can facilitate the identification of existing drugs that have the potential of topical administration to improve OM treatment.
Enhancing AstroInformatics and Science Discovery from Data in Journal Articles

NASA Astrophysics Data System (ADS)

Mazzarella, Joseph

2011-05-01

Traditional methods of publishing scientific data and metadata in journal articles are in need of major upgrades to reach the full potential of astronomical databases and astroinformatics techniques to facilitate semi-automated, and eventually autonomous, methods of science discovery. I will review a growing collaboration involving the NASA/IPAC Extragalactic Database (NED), the Astrophysics Data System (ADS), the Virtual Astronomical Observatory (VAO), the AAS Journals and IOP, and the Data Conservancy that is aimed toward transforming the methodology used to publish, capture and link data associated with astrophysics journal articles. We are planning a web-based workflow to assist astronomers during the publication of journal articles. The primary goals are to facilitate the application of structure and standards to (meta)data, reduce errors, remove ambiguities in the identification of astrophysical objects and regions of sky, capture and preserve the images and spectral data files used to make plots, and accelerate the ingestion of the data into relevant repositories, search engines and integration services. The outcome of this community wide effort will address a recent public policy mandate to publish scientific data in open formats to allow reproducibility of results and to facilitate new discoveries. Equally important, this work has the potential to usher in a new wave of science discovery based on seamless connectivity between data relationships that are continuously growing in size and complexity, and increasingly sophisticated data visualization and analysis applications.
Designing for Data with Ask Dr. Discovery: Design Approaches for Facilitating Museum Evaluation with Real-Time Data Mining

ERIC Educational Resources Information Center

Nelson, Brian C.; Bowman, Cassie; Bowman, Judd

2017-01-01

Ask Dr. Discovery is an NSF-funded study addressing the need for ongoing, large-scale museum evaluation while investigating new ways to encourage museum visitors to engage deeply with museum content. To realize these aims, we are developing and implementing a mobile app with two parts: (1) a front-end virtual scientist called Dr. Discovery (Dr. D)…
The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.).

PubMed

Raju, Nikku L; Gnanesh, Belaghihalli N; Lekha, Pazhamala; Jayashree, Balaji; Pande, Suresh; Hiremath, Pavana J; Byregowda, Munishamappa; Singh, Nagendra K; Varshney, Rajeev K

2010-03-11

Pigeonpea (Cajanus cajan (L.) Millsp) is one of the major grain legume crops of the tropics and subtropics, but biotic stresses [Fusarium wilt (FW), sterility mosaic disease (SMD), etc.] are serious challenges for sustainable crop production. Modern genomic tools such as molecular markers and candidate genes associated with resistance to these stresses offer the possibility of facilitating pigeonpea breeding for improving biotic stress resistance. Availability of limited genomic resources, however, is a serious bottleneck to undertake molecular breeding in pigeonpea to develop superior genotypes with enhanced resistance to above mentioned biotic stresses. With an objective of enhancing genomic resources in pigeonpea, this study reports generation and analysis of comprehensive resource of FW- and SMD- responsive expressed sequence tags (ESTs). A total of 16 cDNA libraries were constructed from four pigeonpea genotypes that are resistant and susceptible to FW ('ICPL 20102' and 'ICP 2376') and SMD ('ICP 7035' and 'TTB 7') and a total of 9,888 (9,468 high quality) ESTs were generated and deposited in dbEST of GenBank under accession numbers GR463974 to GR473857 and GR958228 to GR958231. Clustering and assembly analyses of these ESTs resulted into 4,557 unique sequences (unigenes) including 697 contigs and 3,860 singletons. BLASTN analysis of 4,557 unigenes showed a significant identity with ESTs of different legumes (23.2-60.3%), rice (28.3%), Arabidopsis (33.7%) and poplar (35.4%). As expected, pigeonpea ESTs are more closely related to soybean (60.3%) and cowpea ESTs (43.6%) than other plant ESTs. Similarly, BLASTX similarity results showed that only 1,603 (35.1%) out of 4,557 total unigenes correspond to known proteins in the UniProt database (genes were assigned to cellular component category, 132 (2.8%) to biological process, and 132 (2.8%) in molecular function
The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.)

PubMed Central

2010-01-01

Background Pigeonpea (Cajanus cajan (L.) Millsp) is one of the major grain legume crops of the tropics and subtropics, but biotic stresses [Fusarium wilt (FW), sterility mosaic disease (SMD), etc.] are serious challenges for sustainable crop production. Modern genomic tools such as molecular markers and candidate genes associated with resistance to these stresses offer the possibility of facilitating pigeonpea breeding for improving biotic stress resistance. Availability of limited genomic resources, however, is a serious bottleneck to undertake molecular breeding in pigeonpea to develop superior genotypes with enhanced resistance to above mentioned biotic stresses. With an objective of enhancing genomic resources in pigeonpea, this study reports generation and analysis of comprehensive resource of FW- and SMD- responsive expressed sequence tags (ESTs). Results A total of 16 cDNA libraries were constructed from four pigeonpea genotypes that are resistant and susceptible to FW ('ICPL 20102' and 'ICP 2376') and SMD ('ICP 7035' and 'TTB 7') and a total of 9,888 (9,468 high quality) ESTs were generated and deposited in dbEST of GenBank under accession numbers GR463974 to GR473857 and GR958228 to GR958231. Clustering and assembly analyses of these ESTs resulted into 4,557 unique sequences (unigenes) including 697 contigs and 3,860 singletons. BLASTN analysis of 4,557 unigenes showed a significant identity with ESTs of different legumes (23.2-60.3%), rice (28.3%), Arabidopsis (33.7%) and poplar (35.4%). As expected, pigeonpea ESTs are more closely related to soybean (60.3%) and cowpea ESTs (43.6%) than other plant ESTs. Similarly, BLASTX similarity results showed that only 1,603 (35.1%) out of 4,557 total unigenes correspond to known proteins in the UniProt database (≤ 1E-08). Functional categorization of the annotated unigenes sequences showed that 153 (3.3%) genes were assigned to cellular component category, 132 (2.8%) to biological process, and 132 (2.8%) in

Transcriptomics Analysis of Crassostrea hongkongensis for the Discovery of Reproduction-Related Genes.

PubMed

Tong, Ying; Zhang, Yang; Huang, Jiaomei; Xiao, Shu; Zhang, Yuehuan; Li, Jun; Chen, Jinhui; Yu, Ziniu

2015-01-01

The reproductive mechanisms of mollusk species have been interesting targets in biological research because of the diverse reproductive strategies observed in this phylum. These species have also been studied for the development of fishery technologies in molluscan aquaculture. Although the molecular mechanisms underlying the reproductive process have been well studied in animal models, the relevant information from mollusks remains limited, particularly in species of great commercial interest. Crassostrea hongkongensis is the dominant oyster species that is distributed along the coast of the South China Sea and little genomic information on this species is available. Currently, high-throughput sequencing techniques have been widely used for investigating the basis of physiological processes and facilitating the establishment of adequate genetic selection programs. The C.hongkongensis transcriptome included a total of 1,595,855 reads, which were generated by 454 sequencing and were assembled into 41,472 contigs using de novo methods. Contigs were clustered into 33,920 isotigs and further grouped into 22,829 isogroups. Approximately 77.6% of the isogroups were successfully annotated by the Nr database. More than 1,910 genes were identified as being related to reproduction. Some key genes involved in germline development, sex determination and differentiation were identified for the first time in C.hongkongensis (nanos, piwi, ATRX, FoxL2, β-catenin, etc.). Gene expression analysis indicated that vasa, nanos, piwi, ATRX, FoxL2, β-catenin and SRD5A1 were highly or specifically expressed in C.hongkongensis gonads. Additionally, 94,056 single nucleotide polymorphisms (SNPs) and 1,699 simple sequence repeats (SSRs) were compiled. Our study significantly increased C.hongkongensis genomic information based on transcriptomics analysis. The group of reproduction-related genes identified in the present study constitutes a new tool for research on bivalve reproduction
Transcriptomics Analysis of Crassostrea hongkongensis for the Discovery of Reproduction-Related Genes

PubMed Central

Tong, Ying; Zhang, Yang; Huang, Jiaomei; Xiao, Shu; Zhang, Yuehuan; Li, Jun; Chen, Jinhui; Yu, Ziniu

2015-01-01

Background The reproductive mechanisms of mollusk species have been interesting targets in biological research because of the diverse reproductive strategies observed in this phylum. These species have also been studied for the development of fishery technologies in molluscan aquaculture. Although the molecular mechanisms underlying the reproductive process have been well studied in animal models, the relevant information from mollusks remains limited, particularly in species of great commercial interest. Crassostrea hongkongensis is the dominant oyster species that is distributed along the coast of the South China Sea and little genomic information on this species is available. Currently, high-throughput sequencing techniques have been widely used for investigating the basis of physiological processes and facilitating the establishment of adequate genetic selection programs. Results The C.hongkongensis transcriptome included a total of 1,595,855 reads, which were generated by 454 sequencing and were assembled into 41,472 contigs using de novo methods. Contigs were clustered into 33,920 isotigs and further grouped into 22,829 isogroups. Approximately 77.6% of the isogroups were successfully annotated by the Nr database. More than 1,910 genes were identified as being related to reproduction. Some key genes involved in germline development, sex determination and differentiation were identified for the first time in C.hongkongensis (nanos, piwi, ATRX, FoxL2, β-catenin, etc.). Gene expression analysis indicated that vasa, nanos, piwi, ATRX, FoxL2, β-catenin and SRD5A1 were highly or specifically expressed in C.hongkongensis gonads. Additionally, 94,056 single nucleotide polymorphisms (SNPs) and 1,699 simple sequence repeats (SSRs) were compiled. Conclusions Our study significantly increased C.hongkongensis genomic information based on transcriptomics analysis. The group of reproduction-related genes identified in the present study constitutes a new tool for research
Customizing microarrays for neuroscience drug discovery.

PubMed

Girgenti, Matthew J; Newton, Samuel S

2007-08-01

Microarray-based gene profiling has become the centerpiece of gene expression studies in the biological sciences. The ability to now interrogate the entire genome using a single chip demonstrates the progress in technology and instrumentation that has been made over the last two decades. Although this unbiased approach provides researchers with an immense quantity of data, obtaining meaningful insight is not possible without intensive data analysis and processing. Custom developed arrays have emerged as a viable and attractive alternative that can take advantage of this robust technology and tailor it to suit the needs and requirements of individual investigations. The ability to simplify data analysis, reduce noise and carefully optimize experimental conditions makes it a suitable tool that can be effectively utilized in neuroscience drug discovery efforts. Furthermore, incorporating recent advancements in fine focusing gene profiling to include specific cellular phenotypes can help resolve the complex cellular heterogeneity of the brain. This review surveys the use of microarray technology in neuroscience paying special attention to customized arrays and their potential in drug discovery. Novel applications of microarrays and ancillary techniques, such as laser microdissection, FAC sorting and RNA amplification, have also been discussed. The notion that a hypothesis-driven approach can be integrated into drug development programs is highlighted.
SBCDDB: Sleeping Beauty Cancer Driver Database for gene discovery in mouse models of human cancers

PubMed Central

Mann, Michael B

2018-01-01

Abstract Large-scale oncogenomic studies have identified few frequently mutated cancer drivers and hundreds of infrequently mutated drivers. Defining the biological context for rare driving events is fundamentally important to increasing our understanding of the druggable pathways in cancer. Sleeping Beauty (SB) insertional mutagenesis is a powerful gene discovery tool used to model human cancers in mice. Our lab and others have published a number of studies that identify cancer drivers from these models using various statistical and computational approaches. Here, we have integrated SB data from primary tumor models into an analysis and reporting framework, the Sleeping Beauty Cancer Driver DataBase (SBCDDB, http://sbcddb.moffitt.org), which identifies drivers in individual tumors or tumor populations. Unique to this effort, the SBCDDB utilizes a single, scalable, statistical analysis method that enables data to be grouped by different biological properties. This allows for SB drivers to be evaluated (and re-evaluated) under different contexts. The SBCDDB provides visual representations highlighting the spatial attributes of transposon mutagenesis and couples this functionality with analysis of gene sets, enabling users to interrogate relationships between drivers. The SBCDDB is a powerful resource for comparative oncogenomic analyses with human cancer genomics datasets for driver prioritization. PMID:29059366
A Cloud-enabled Service-oriented Spatial Web Portal for Facilitating Arctic Data Discovery, Integration, and Utilization

NASA Astrophysics Data System (ADS)

dias, S. B.; Yang, C.; Li, Z.; XIA, J.; Liu, K.; Gui, Z.; Li, W.

2013-12-01

Global climate change has become one of the biggest concerns for human kind in the 21st century due to its broad impacts on society and ecosystems across the world. Arctic has been observed as one of the most vulnerable regions to the climate change. In order to understand the impacts of climate change on the natural environment, ecosystems, biodiversity and others in the Arctic region, and thus to better support the planning and decision making process, cross-disciplinary researches are required to monitor and analyze changes of Arctic regions such as water, sea level, biodiversity and so on. Conducting such research demands the efficient utilization of various geospatially referenced data, web services and information related to Arctic region. In this paper, we propose a cloud-enabled and service-oriented Spatial Web Portal (SWP) to support the discovery, integration and utilization of Arctic related geospatial resources, serving as a building block of polar CI. This SWP leverages the following techniques: 1) a hybrid searching mechanism combining centralized local search, distributed catalogue search and specialized Internet search for effectively discovering Arctic data and web services from multiple sources; 2) a service-oriented quality-enabled framework for seamless integration and utilization of various geospatial resources; and 3) a cloud-enabled parallel spatial index building approach to facilitate near-real time resource indexing and searching. A proof-of-concept prototype is developed to demonstrate the feasibility of the proposed SWP, using an example of analyzing the Arctic snow cover change over the past 50 years.
Mass spectrometry-driven drug discovery for development of herbal medicine.

PubMed

Zhang, Aihua; Sun, Hui; Wang, Xijun

2018-05-01

Herbal medicine (HM) has made a major contribution to the drug discovery process with regard to identifying products compounds. Currently, more attention has been focused on drug discovery from natural compounds of HM. Despite the rapid advancement of modern analytical techniques, drug discovery is still a difficult and lengthy process. Fortunately, mass spectrometry (MS) can provide us with useful structural information for drug discovery, has been recognized as a sensitive, rapid, and high-throughput technology for advancing drug discovery from HM in the post-genomic era. It is essential to develop an efficient, high-quality, high-throughput screening method integrated with an MS platform for early screening of candidate drug molecules from natural products. We have developed a new chinmedomics strategy reliant on MS that is capable of capturing the candidate molecules, facilitating their identification of novel chemical structures in the early phase; chinmedomics-guided natural product discovery based on MS may provide an effective tool that addresses challenges in early screening of effective constituents of herbs against disease. This critical review covers the use of MS with related techniques and methodologies for natural product discovery, biomarker identification, and determination of mechanisms of action. It also highlights high-throughput chinmedomics screening methods suitable for lead compound discovery illustrated by recent successes. © 2016 Wiley Periodicals, Inc.
Generation of cell lines for drug discovery through random activation of gene expression: application to the human histamine H3 receptor.

PubMed

Song, J; Doucette, C; Hanniford, D; Hunady, K; Wang, N; Sherf, B; Harrington, J J; Brunden, K R; Stricker-Krongrad, A

2005-06-01

Target-based high-throughput screening (HTS) plays an integral role in drug discovery. The implementation of HTS assays generally requires high expression levels of the target protein, and this is typically accomplished using recombinant cDNA methodologies. However, the isolated gene sequences to many drug targets have intellectual property claims that restrict the ability to implement drug discovery programs. The present study describes the pharmacological characterization of the human histamine H3 receptor that was expressed using random activation of gene expression (RAGE), a technology that over-expresses proteins by up-regulating endogenous genes rather than introducing cDNA expression vectors into the cell. Saturation binding analysis using [125I]iodoproxyfan and RAGE-H3 membranes revealed a single class of binding sites with a K(D) value of 0.77 nM and a B(max) equal to 756 fmol/mg of protein. Competition binding studies showed that the rank order of potency for H3 agonists was N(alpha)-methylhistamine approximately (R)-alpha- methylhistamine > histamine and that the rank order of potency for H3 antagonists was clobenpropit > iodophenpropit > thioperamide. The same rank order of potency for H3 agonists and antagonists was observed in the functional assays as in the binding assays. The Fluorometic Imaging Plate Reader assays in RAGE-H3 cells gave high Z' values for agonist and antagonist screening, respectively. These results reveal that the human H3 receptor expressed with the RAGE technology is pharmacologically comparable to that expressed through recombinant methods. Moreover, the level of expression of the H3 receptor in the RAGE-H3 cells is suitable for HTS and secondary assays.
Output ordering and prioritisation system (OOPS): ranking biosynthetic gene clusters to enhance bioactive metabolite discovery.

PubMed

Peña, Alejandro; Del Carratore, Francesco; Cummings, Matthew; Takano, Eriko; Breitling, Rainer

2017-12-18

The rapid increase of publicly available microbial genome sequences has highlighted the presence of hundreds of thousands of biosynthetic gene clusters (BGCs) encoding valuable secondary metabolites. The experimental characterization of new BGCs is extremely laborious and struggles to keep pace with the in silico identification of potential BGCs. Therefore, the prioritisation of promising candidates among computationally predicted BGCs represents a pressing need. Here, we propose an output ordering and prioritisation system (OOPS) which helps sorting identified BGCs by a wide variety of custom-weighted biological and biochemical criteria in a flexible and user-friendly interface. OOPS facilitates a judicious prioritisation of BGCs using G+C content, coding sequence length, gene number, cluster self-similarity and codon bias parameters, as well as enabling the user to rank BGCs based upon BGC type, novelty, and taxonomic distribution. Effective prioritisation of BGCs will help to reduce experimental attrition rates and improve the breadth of bioactive metabolites characterized.
Tools to covisualize and coanalyze proteomic data with genomes and transcriptomes: validation of genes and alternative mRNA splicing.

PubMed

Pang, Chi Nam Ignatius; Tay, Aidan P; Aya, Carlos; Twine, Natalie A; Harkness, Linda; Hart-Smith, Gene; Chia, Samantha Z; Chen, Zhiliang; Deshpande, Nandan P; Kaakoush, Nadeem O; Mitchell, Hazel M; Kassem, Moustapha; Wilkins, Marc R

2014-01-03

Direct links between proteomic and genomic/transcriptomic data are not frequently made, partly because of lack of appropriate bioinformatics tools. To help address this, we have developed the PG Nexus pipeline. The PG Nexus allows users to covisualize peptides in the context of genomes or genomic contigs, along with RNA-seq reads. This is done in the Integrated Genome Viewer (IGV). A Results Analyzer reports the precise base position where LC-MS/MS-derived peptides cover genes or gene isoforms, on the chromosomes or contigs where this occurs. In prokaryotes, the PG Nexus pipeline facilitates the validation of genes, where annotation or gene prediction is available, or the discovery of genes using a "virtual protein"-based unbiased approach. We illustrate this with a comprehensive proteogenomics analysis of two strains of Campylobacter concisus . For higher eukaryotes, the PG Nexus facilitates gene validation and supports the identification of mRNA splice junction boundaries and splice variants that are protein-coding. This is illustrated with an analysis of splice junctions covered by human phosphopeptides, and other examples of relevance to the Chromosome-Centric Human Proteome Project. The PG Nexus is open-source and available from https://github.com/IntersectAustralia/ap11_Samifier. It has been integrated into Galaxy and made available in the Galaxy tool shed.
Computational discovery and in vivo validation of hnf4 as a regulatory gene in planarian regeneration.

PubMed

Lobo, Daniel; Morokuma, Junji; Levin, Michael

2016-09-01

Automated computational methods can infer dynamic regulatory network models directly from temporal and spatial experimental data, such as genetic perturbations and their resultant morphologies. Recently, a computational method was able to reverse-engineer the first mechanistic model of planarian regeneration that can recapitulate the main anterior-posterior patterning experiments published in the literature. Validating this comprehensive regulatory model via novel experiments that had not yet been performed would add in our understanding of the remarkable regeneration capacity of planarian worms and demonstrate the power of this automated methodology. Using the Michigan Molecular Interactions and STRING databases and the MoCha software tool, we characterized as hnf4 an unknown regulatory gene predicted to exist by the reverse-engineered dynamic model of planarian regeneration. Then, we used the dynamic model to predict the morphological outcomes under different single and multiple knock-downs (RNA interference) of hnf4 and its predicted gene pathway interactors β-catenin and hh Interestingly, the model predicted that RNAi of hnf4 would rescue the abnormal regenerated phenotype (tailless) of RNAi of hh in amputated trunk fragments. Finally, we validated these predictions in vivo by performing the same surgical and genetic experiments with planarian worms, obtaining the same phenotypic outcomes predicted by the reverse-engineered model. These results suggest that hnf4 is a regulatory gene in planarian regeneration, validate the computational predictions of the reverse-engineered dynamic model, and demonstrate the automated methodology for the discovery of novel genes, pathways and experimental phenotypes. michael.levin@tufts.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Metadata Effectiveness in Internet Discovery: An Analysis of Digital Collection Metadata Elements and Internet Search Engine Keywords

ERIC Educational Resources Information Center

Yang, Le

2016-01-01

This study analyzed digital item metadata and keywords from Internet search engines to learn what metadata elements actually facilitate discovery of digital collections through Internet keyword searching and how significantly each metadata element affects the discovery of items in a digital repository. The study found that keywords from Internet…
Facilitating NCAR Data Discovery by Connecting Related Resources

NASA Astrophysics Data System (ADS)

Rosati, A.

2012-12-01

Linking datasets, creators, and users by employing the proper standards helps to increase the impact of funded research. In order for users to find a dataset, it must first be named. Data citations play the important role of giving datasets a persistent presence by assigning a formal "name" and location. This project focuses on the next step of the "name-find-use" sequence: enhancing discoverability of NCAR data by connecting related resources on the web. By examining metadata schemas that document datasets, I examined how Semantic Web approaches can help to ensure the widest possible range of data users. The focus was to move from search engine optimization (SEO) to information connectivity. Two main markup types are very visible in the Semantic Web and applicable to scientific dataset discovery: The Open Archives Initiative-Object Reuse and Exchange (OAI-ORE - www.openarchives.org) and Microdata (HTML5 and www.schema.org). My project creates pilot aggregations of related resources using both markup types for three case studies: The North American Regional Climate Change Assessment Program (NARCCAP) dataset and related publications, the Palmer Drought Severity Index (PSDI) animation and image files from NCAR's Visualization Lab (VisLab), and the multidisciplinary data types and formats from the Advanced Cooperative Arctic Data and Information Service (ACADIS). This project documents the differences between these markups and how each creates connectedness on the web. My recommendations point toward the most efficient and effective markup schema for aggregating resources within the three case studies based on the following assessment criteria: ease of use, current state of support and adoption of technology, integration with typical web tools, available vocabularies and geoinformatic standards, interoperability with current repositories and access portals (e.g. ESG, Java), and relation to data citation tools and methods.
Translational Research 2.0: a framework for accelerating collaborative discovery.

PubMed

Asakiewicz, Chris

2014-05-01

The world wide web has revolutionized the conduct of global, cross-disciplinary research. In the life sciences, interdisciplinary approaches to problem solving and collaboration are becoming increasingly important in facilitating knowledge discovery and integration. Web 2.0 technologies promise to have a profound impact - enabling reproducibility, aiding in discovery, and accelerating and transforming medical and healthcare research across the healthcare ecosystem. However, knowledge integration and discovery require a consistent foundation upon which to operate. A foundation should be capable of addressing some of the critical issues associated with how research is conducted within the ecosystem today and how it should be conducted for the future. This article will discuss a framework for enhancing collaborative knowledge discovery across the medical and healthcare research ecosystem. A framework that could serve as a foundation upon which ecosystem stakeholders can enhance the way data, information and knowledge is created, shared and used to accelerate the translation of knowledge from one area of the ecosystem to another.
DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes.

PubMed

Piñero, Janet; Queralt-Rosinach, Núria; Bravo, Àlex; Deu-Pons, Jordi; Bauer-Mehren, Anna; Baron, Martin; Sanz, Ferran; Furlong, Laura I

2015-01-01

DisGeNET is a comprehensive discovery platform designed to address a variety of questions concerning the genetic underpinning of human diseases. DisGeNET contains over 380,000 associations between >16,000 genes and 13,000 diseases, which makes it one of the largest repositories currently available of its kind. DisGeNET integrates expert-curated databases with text-mined data, covers information on Mendelian and complex diseases, and includes data from animal disease models. It features a score based on the supporting evidence to prioritize gene-disease associations. It is an open access resource available through a web interface, a Cytoscape plugin and as a Semantic Web resource. The web interface supports user-friendly data exploration and navigation. DisGeNET data can also be analysed via the DisGeNET Cytoscape plugin, and enriched with the annotations of other plugins of this popular network analysis software suite. Finally, the information contained in DisGeNET can be expanded and complemented using Semantic Web technologies and linked to a variety of resources already present in the Linked Data cloud. Hence, DisGeNET offers one of the most comprehensive collections of human gene-disease associations and a valuable set of tools for investigating the molecular mechanisms underlying diseases of genetic origin, designed to fulfill the needs of different user profiles, including bioinformaticians, biologists and health-care practitioners. Database URL: http://www.disgenet.org/ © The Author(s) 2015. Published by Oxford University Press.
Gene expression signatures differentiate ovarian/peritoneal serous carcinoma from breast carcinoma in effusions

PubMed Central

Davidson, Ben; Stavnes, Helene Tuft; Holth, Arild; Chen, Xu; Yang, Yanqin; Shih, Ie-Ming; Wang, Tian-Li

2011-01-01

Abstract Ovarian/primary peritoneal carcinoma and breast carcinoma are the gynaecological cancers that most frequently involve the serosal cavities. With the objective of improving on the limited diagnostic panel currently available for the differential diagnosis of these two malignancies, as well as to define tumour-specific biological targets, we compared their global gene expression patterns. Gene expression profiles of 10 serous ovarian/peritoneal and eight ductal breast carcinoma effusions were analysed using the HumanRef-8 BeadChip from Illumina. Differentially expressed candidate genes were validated using quantitative real-time PCR and immunohistochemistry. Unsupervised hierarchical clustering using all 54,675 genes in the array separated ovarian from breast carcinoma samples. We identified 288 unique probes that were significantly differentially expressed in the two cancers by greater than 3.5-fold, of which 81 and 207 were overexpressed in breast and ovarian/peritoneal carcinoma, respectively. SAM analysis identified 1078 differentially expressed probes with false discovery rate less than 0.05. Genes overexpressed in breast carcinoma included TFF1, TFF3, FOXA1, CA12, GATA3, SDC1, PITX1, TH, EHFD1, EFEMP1, TOB1 and KLF2. Genes overexpressed in ovarian/peritoneal carcinoma included SPON1, RBP1, MFGE8, TM4SF12, MMP7, KLK5/6/7, FOLR1/3, PAX8, APOL2 and NRCAM. The differential expression of 14 genes was validated by quantitative real-time PCR, and differences in 5 gene products were confirmed by immunohistochemistry. Expression profiling distinguishes ovarian/peritoneal carcinoma from breast carcinoma and identifies genes that are differentially expressed in these two tumour types. The molecular signatures unique to these cancers may facilitate their differential diagnosis and may provide a molecular basis for therapeutic target discovery. PMID:20132413
HEx: A heterologous expression platform for the discovery of fungal natural products

PubMed Central

Schlecht, Ulrich; Horecka, Joe; Lin, Hsiao-Ching; Naughton, Brian; Miranda, Molly; Li, Yong Fuga; Hennessy, James R.; Vandova, Gergana A.; Steinmetz, Lars M.; Sattely, Elizabeth; Khosla, Chaitan; Hillenmeyer, Maureen E.

2018-01-01

For decades, fungi have been a source of U.S. Food and Drug Administration–approved natural products such as penicillin, cyclosporine, and the statins. Recent breakthroughs in DNA sequencing suggest that millions of fungal species exist on Earth, with each genome encoding pathways capable of generating as many as dozens of natural products. However, the majority of encoded molecules are difficult or impossible to access because the organisms are uncultivable or the genes are transcriptionally silent. To overcome this bottleneck in natural product discovery, we developed the HEx (Heterologous EXpression) synthetic biology platform for rapid, scalable expression of fungal biosynthetic genes and their encoded metabolites in Saccharomyces cerevisiae. We applied this platform to 41 fungal biosynthetic gene clusters from diverse fungal species from around the world, 22 of which produced detectable compounds. These included novel compounds with unexpected biosynthetic origins, particularly from poorly studied species. This result establishes the HEx platform for rapid discovery of natural products from any fungal species, even those that are uncultivable, and opens the door to discovery of the next generation of natural products. PMID:29651464
Knowledge Discovery from Biomedical Ontologies in Cross Domains.

PubMed

Shen, Feichen; Lee, Yugyung

2016-01-01

In recent years, there is an increasing demand for sharing and integration of medical data in biomedical research. In order to improve a health care system, it is required to support the integration of data by facilitating semantic interoperability systems and practices. Semantic interoperability is difficult to achieve in these systems as the conceptual models underlying datasets are not fully exploited. In this paper, we propose a semantic framework, called Medical Knowledge Discovery and Data Mining (MedKDD), that aims to build a topic hierarchy and serve the semantic interoperability between different ontologies. For the purpose, we fully focus on the discovery of semantic patterns about the association of relations in the heterogeneous information network representing different types of objects and relationships in multiple biological ontologies and the creation of a topic hierarchy through the analysis of the discovered patterns. These patterns are used to cluster heterogeneous information networks into a set of smaller topic graphs in a hierarchical manner and then to conduct cross domain knowledge discovery from the multiple biological ontologies. Thus, patterns made a greater contribution in the knowledge discovery across multiple ontologies. We have demonstrated the cross domain knowledge discovery in the MedKDD framework using a case study with 9 primary biological ontologies from Bio2RDF and compared it with the cross domain query processing approach, namely SLAP. We have confirmed the effectiveness of the MedKDD framework in knowledge discovery from multiple medical ontologies.
Knowledge Discovery from Biomedical Ontologies in Cross Domains

PubMed Central

Shen, Feichen; Lee, Yugyung

2016-01-01

In recent years, there is an increasing demand for sharing and integration of medical data in biomedical research. In order to improve a health care system, it is required to support the integration of data by facilitating semantic interoperability systems and practices. Semantic interoperability is difficult to achieve in these systems as the conceptual models underlying datasets are not fully exploited. In this paper, we propose a semantic framework, called Medical Knowledge Discovery and Data Mining (MedKDD), that aims to build a topic hierarchy and serve the semantic interoperability between different ontologies. For the purpose, we fully focus on the discovery of semantic patterns about the association of relations in the heterogeneous information network representing different types of objects and relationships in multiple biological ontologies and the creation of a topic hierarchy through the analysis of the discovered patterns. These patterns are used to cluster heterogeneous information networks into a set of smaller topic graphs in a hierarchical manner and then to conduct cross domain knowledge discovery from the multiple biological ontologies. Thus, patterns made a greater contribution in the knowledge discovery across multiple ontologies. We have demonstrated the cross domain knowledge discovery in the MedKDD framework using a case study with 9 primary biological ontologies from Bio2RDF and compared it with the cross domain query processing approach, namely SLAP. We have confirmed the effectiveness of the MedKDD framework in knowledge discovery from multiple medical ontologies. PMID:27548262
SNP Discovery for mapping alien introgressions in wheat

PubMed Central

2014-01-01

Background Monitoring alien introgressions in crop plants is difficult due to the lack of genetic and molecular mapping information on the wild crop relatives. The tertiary gene pool of wheat is a very important source of genetic variability for wheat improvement against biotic and abiotic stresses. By exploring the 5Mg short arm (5MgS) of Aegilops geniculata, we can apply chromosome genomics for the discovery of SNP markers and their use for monitoring alien introgressions in wheat (Triticum aestivum L). Results The short arm of chromosome 5Mg of Ae. geniculata Roth (syn. Ae. ovata L.; 2n = 4x = 28, UgUgMgMg) was flow-sorted from a wheat line in which it is maintained as a telocentric chromosome. DNA of the sorted arm was amplified and sequenced using an Illumina Hiseq 2000 with ~45x coverage. The sequence data was used for SNP discovery against wheat homoeologous group-5 assemblies. A total of 2,178 unique, 5MgS-specific SNPs were discovered. Randomly selected samples of 59 5MgS-specific SNPs were tested (44 by KASPar assay and 15 by Sanger sequencing) and 84% were validated. Of the selected SNPs, 97% mapped to a chromosome 5Mg addition to wheat (the source of t5MgS), and 94% to 5Mg introgressed from a different accession of Ae. geniculata substituting for chromosome 5D of wheat. The validated SNPs also identified chromosome segments of 5MgS origin in a set of T5D-5Mg translocation lines; eight SNPs (25%) mapped to TA5601 [T5DL · 5DS-5MgS(0.75)] and three (8%) to TA5602 [T5DL · 5DS-5MgS (0.95)]. SNPs (gsnp_5ms83 and gsnp_5ms94), tagging chromosome T5DL · 5DS-5MgS(0.95) with the smallest introgression carrying resistance to leaf rust (Lr57) and stripe rust (Yr40), were validated in two released germplasm lines with Lr57 and Yr40 genes. Conclusion This approach should be widely applicable for the identification of species/genome-specific SNPs. The development of a large number of SNP markers will facilitate the precise introgression and
Strategy of Daiichi Sankyo discovery research in oncology.

PubMed

Akahane, Kouichi; Hirokawa, Kazunori

2014-02-01

We would like to introduce Daiichi Sankyo's approach to developing cancer targeted medicines with special reference to the drug discovery strategy, global discovery activities and external research collaboration leading to generation of innovative drugs for cancer patients. We are developing 14 clinical projects for cancer treatment and three of them have been previously approved. These are mostly targeted for growth and survival signals of cancer cells. To overcome the drug resistance mechanism derived from the heterogeneous nature of cancer, we are developing selective inhibitors in three major clusters of signal pathways which may allow future rational combinations of oncology products. In addition to the main research facility in Japan, research sites in the EU and the USA provide us with different technical expertise and diversified ideas of drug discovery. To access novel drug targets, we are facilitating research collaboration with leading academia and successful cancer research scientists. In conclusion, we intend to focus more on developing innovative personalized medicines for better treatment of cancer.

Challenges of the information age: the impact of false discovery on pathway identification.

PubMed

Rog, Colin J; Chekuri, Srinivasa C; Edgerton, Mary E

2012-11-21

Pathways with members that have known relevance to a disease are used to support hypotheses generated from analyses of gene expression and proteomic studies. Using cancer as an example, the pitfalls of searching pathways databases as support for genes and proteins that could represent false discoveries are explored. The frequency with which networks could be generated from 100 instances each of randomly selected five and ten genes sets as input to MetaCore, a commercial pathways database, was measured. A PubMed search enumerated cancer-related literature published for any gene in the networks. Using three, two, and one maximum intervening step between input genes to populate the network, networks were generated with frequencies of 97%, 77%, and 7% using ten gene sets and 73%, 27%, and 1% using five gene sets. PubMed reported an average of 4225 cancer-related articles per network gene. This can be attributed to the richly populated pathways databases and the interest in the molecular basis of cancer. As information sources become enriched, they are more likely to generate plausible mechanisms for false discoveries.
Open Access High Throughput Drug Discovery in the Public Domain: A Mount Everest in the Making

PubMed Central

Roy, Anuradha; McDonald, Peter R.; Sittampalam, Sitta; Chaguturu, Rathnam

2013-01-01

High throughput screening (HTS) facilitates screening large numbers of compounds against a biochemical target of interest using validated biological or biophysical assays. In recent years, a significant number of drugs in clinical trails originated from HTS campaigns, validating HTS as a bona fide mechanism for hit finding. In the current drug discovery landscape, the pharmaceutical industry is embracing open innovation strategies with academia to maximize their research capabilities and to feed their drug discovery pipeline. The goals of academic research have therefore expanded from target identification and validation to probe discovery, chemical genomics, and compound library screening. This trend is reflected in the emergence of HTS centers in the public domain over the past decade, ranging in size from modestly equipped academic screening centers to well endowed Molecular Libraries Probe Centers Network (MLPCN) centers funded by the NIH Roadmap initiative. These centers facilitate a comprehensive approach to probe discovery in academia and utilize both classical and cutting-edge assay technologies for executing primary and secondary screening campaigns. The various facets of academic HTS centers as well as their implications on technology transfer and drug discovery are discussed, and a roadmap for successful drug discovery in the public domain is presented. New lead discovery against therapeutic targets, especially those involving the rare and neglected diseases, is indeed a Mount Everestonian size task, and requires diligent implementation of pharmaceutical industry’s best practices for a successful outcome. PMID:20809896
Targeted discovery of glycoside hydrolases from a switchgrass-adapted compost community

DOE Office of Scientific and Technical Information (OSTI.GOV)

Allgaier, M.; Reddy, A.; Park, J. I.

2009-11-15

Development of cellulosic biofuels from non-food crops is currently an area of intense research interest. Tailoring depolymerizing enzymes to particular feedstocks and pretreatment conditions is one promising avenue of research in this area. Here we added a green-waste compost inoculum to switchgrass (Panicum virgatum) and simulated thermophilic composting in a bioreactor to select for a switchgrass-adapted community and to facilitate targeted discovery of glycoside hydrolases. Small-subunit (SSU) rRNA-based community profiles revealed that the microbial community changed dramatically between the initial and switchgrass-adapted compost (SAC) with some bacterial populations being enriched over 20-fold. We obtained 225 Mbp of 454-titanium pyrosequence datamore » from the SAC community and conservatively identified 800 genes encoding glycoside hydrolase domains that were biased toward depolymerizing grass cell wall components. Of these, {approx}10% were putative cellulases mostly belonging to families GH5 and GH9. We synthesized two SAC GH9 genes with codon optimization for heterologous expression in Escherichia coli and observed activity for one on carboxymethyl cellulose. The active GH9 enzyme has a temperature optimum of 50 C and pH range of 5.5 to 8 consistent with the composting conditions applied. We demonstrate that microbial communities adapt to switchgrass decomposition using simulated composting condition and that full-length genes can be identified from complex metagenomic sequence data, synthesized and expressed resulting in active enzyme.« less
Targeted Discovery of Glycoside Hydrolases from a Switchgrass-Adapted Compost Community

DOE Office of Scientific and Technical Information (OSTI.GOV)

Reddy, Amitha; Allgaier, Martin; Park, Joshua I.

2011-05-11

Development of cellulosic biofuels from non-food crops is currently an area of intense research interest. Tailoring depolymerizing enzymes to particular feedstocks and pretreatment conditions is one promising avenue of research in this area. Here we added a green-waste compost inoculum to switchgrass (Panicum virgatum) and simulated thermophilic composting in a bioreactor to select for a switchgrass-adapted community and to facilitate targeted discovery of glycoside hydrolases. Smallsubunit (SSU) rRNA-based community profiles revealed that the microbial community changed dramatically between the initial and switchgrass-adapted compost (SAC) with some bacterial populations being enriched over 20-fold. We obtained 225 Mbp of 454-titanium pyrosequence datamore » from the SAC community and conservatively identified 800 genes encoding glycoside hydrolase domains that were biased toward depolymerizing grass cell wall components. Of these, ,10percent were putative cellulasesmostly belonging to families GH5 and GH9. We synthesized two SAC GH9 genes with codon optimization for heterologous expression in Escherichia coli and observed activity for one on carboxymethyl cellulose. The active GH9 enzyme has a temperature optimum of 50uC and pH range of 5.5 to 8 consistent with the composting conditions applied. We demonstrate that microbial communities adapt to switchgrass decomposition using simulated composting condition and that full-length genes can be identified from complex metagenomic sequence data, synthesized and expressed resulting in active enzyme.« less
GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data.

PubMed

Rue-Albrecht, Kévin; McGettigan, Paul A; Hernández, Belinda; Nalpas, Nicolas C; Magee, David A; Parnell, Andrew C; Gordon, Stephen V; MacHugh, David E

2016-03-11

Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.
Advancements in Aptamer Discovery Technologies.

PubMed

Gotrik, Michael R; Feagin, Trevor A; Csordas, Andrew T; Nakamoto, Margaret A; Soh, H Tom

2016-09-20

Affinity reagents that specifically bind to their target molecules are invaluable tools in nearly every field of modern biomedicine. Nucleic acid-based aptamers offer many advantages in this domain, because they are chemically synthesized, stable, and economical. Despite these compelling features, aptamers are currently not widely used in comparison to antibodies. This is primarily because conventional aptamer-discovery techniques such as SELEX are time-consuming and labor-intensive and often fail to produce aptamers with comparable binding performance to antibodies. This Account describes a body of work from our laboratory in developing advanced methods for consistently producing high-performance aptamers with higher efficiency, fewer resources, and, most importantly, a greater probability of success. We describe our efforts in systematically transforming each major step of the aptamer discovery process: selection, analysis, and characterization. To improve selection, we have developed microfluidic devices (M-SELEX) that enable discovery of high-affinity aptamers after a minimal number of selection rounds by precisely controlling the target concentration and washing stringency. In terms of improving aptamer pool analysis, our group was the first to use high-throughput sequencing (HTS) for the discovery of new aptamers. We showed that tracking the enrichment trajectory of individual aptamer sequences enables the identification of high-performing aptamers without requiring full convergence of the selected aptamer pool. HTS is now widely used for aptamer discovery, and open-source software has become available to facilitate analysis. To improve binding characterization, we used HTS data to design custom aptamer arrays to measure the affinity and specificity of up to ∼10(4) DNA aptamers in parallel as a means to rapidly discover high-quality aptamers. Most recently, our efforts have culminated in the invention of the "particle display" (PD) screening system, which
High-throughput genotyping-by-sequencing facilitates molecular tagging of a novel rust resistance gene, R 15 , in sunflower (Helianthus annuus L.).

PubMed

Ma, G J; Song, Q J; Markell, S G; Qi, L L

2018-07-01

A novel rust resistance gene, R 15 , derived from the cultivated sunflower HA-R8 was assigned to linkage group 8 of the sunflower genome using a genotyping-by-sequencing approach. SNP markers closely linked to R 15 were identified, facilitating marker-assisted selection of resistance genes. The rust virulence gene is co-evolving with the resistance gene in sunflower, leading to the emergence of new physiologic pathotypes. This presents a continuous threat to the sunflower crop necessitating the development of resistant sunflower hybrids providing a more efficient, durable, and environmentally friendly host plant resistance. The inbred line HA-R8 carries a gene conferring resistance to all known races of the rust pathogen in North America and can be used as a broad-spectrum resistance resource. Based on phenotypic assessments of 140 F 2 individuals derived from a cross of HA 89 with HA-R8, rust resistance in the population was found to be conferred by a single dominant gene (R 15 ) originating from HA-R8. Genotypic analysis with the currently available SSR markers failed to find any association between rust resistance and any markers. Therefore, we used genotyping-by-sequencing (GBS) analysis to achieve better genomic coverage. The GBS data showed that R 15 was located at the top end of linkage group (LG) 8. Saturation with 71 previously mapped SNP markers selected within this region further showed that it was located in a resistance gene cluster on LG8, and mapped to a 1.0-cM region between three co-segregating SNP makers SFW01920, SFW00128, and SFW05824 as well as the NSA_008457 SNP marker. These closely linked markers will facilitate marker-assisted selection and breeding in sunflower.
Practice-Based Knowledge Discovery for Comparative Effectiveness Research: An Organizing Framework

PubMed Central

Lucero, Robert J.; Bakken, Suzanne

2014-01-01

Electronic health information systems can increase the ability of health-care organizations to investigate the effects of clinical interventions. The authors present an organizing framework that integrates outcomes and informatics research paradigms to guide knowledge discovery in electronic clinical databases. They illustrate its application using the example of hospital acquired pressure ulcers (HAPU). The Knowledge Discovery through Informatics for Comparative Effectiveness Research (KDI-CER) framework was conceived as a heuristic to conceptualize study designs and address potential methodological limitations imposed by using a single research perspective. Advances in informatics research can play a complementary role in advancing the field of outcomes research including CER. The KDI-CER framework can be used to facilitate knowledge discovery from routinely collected electronic clinical data. PMID:25278645
Designing microarray and RNA-Seq experiments for greater systems biology discovery in modern plant genomics.

PubMed

Yang, Chuanping; Wei, Hairong

2015-02-01

Microarray and RNA-seq experiments have become an important part of modern genomics and systems biology. Obtaining meaningful biological data from these experiments is an arduous task that demands close attention to many details. Negligence at any step can lead to gene expression data containing inadequate or composite information that is recalcitrant for pattern extraction. Therefore, it is imperative to carefully consider experimental design before launching a time-consuming and costly experiment. Contemporarily, most genomics experiments have two objectives: (1) to generate two or more groups of comparable data for identifying differentially expressed genes, gene families, biological processes, or metabolic pathways under experimental conditions; (2) to build local gene regulatory networks and identify hierarchically important regulators governing biological processes and pathways of interest. Since the first objective aims to identify the active molecular identities and the second provides a basis for understanding the underlying molecular mechanisms through inferring causality relationships mediated by treatment, an optimal experiment is to produce biologically relevant and extractable data to meet both objectives without substantially increasing the cost. This review discusses the major issues that researchers commonly face when embarking on microarray or RNA-seq experiments and summarizes important aspects of experimental design, which aim to help researchers deliberate how to generate gene expression profiles with low background noise but with more interaction to facilitate novel biological discoveries in modern plant genomics. Copyright © 2015 The Author. Published by Elsevier Inc. All rights reserved.
Classification of lymphoid neoplasms: the microscope as a tool for disease discovery

PubMed Central

Harris, Nancy Lee; Stein, Harald; Isaacson, Peter G.

2008-01-01

In the past 50 years, we have witnessed explosive growth in the understanding of normal and neoplastic lymphoid cells. B-cell, T-cell, and natural killer (NK)–cell neoplasms in many respects recapitulate normal stages of lymphoid cell differentiation and function, so that they can be to some extent classified according to the corresponding normal stage. Likewise, the molecular mechanisms involved the pathogenesis of lymphomas and lymphoid leukemias are often based on the physiology of the lymphoid cells, capitalizing on deregulated normal physiology by harnessing the promoters of genes essential for lymphocyte function. The clinical manifestations of lymphomas likewise reflect the normal function of lymphoid cells in vivo. The multiparameter approach to classification adopted by the World Health Organization (WHO) classification has been validated in international studies as being highly reproducible, and enhancing the interpretation of clinical and translational studies. In addition, accurate and precise classification of disease entities facilitates the discovery of the molecular basis of lymphoid neoplasms in the basic science laboratory. PMID:19029456
Capacity building in anthelmintic drug discovery.

PubMed

Kron, Michael; Yousif, Fouad; Ramirez, Bernadette

2007-10-01

International collaboration in anthelmintic drug discovery holds special challenges compared with local or national discovery projects, and at the same time presents the opportunity to build capacity, forge long lasting inter-institutional relationships and strengthen infrastructure in multinational priority areas. This chapter discusses important issues that should be considered in the context of anthelmintic screening centre development and will give examples (Philippines and Egypt) of the productivity of developing country based screening centres. The positive outcomes of infrastructure building is realised in greater capacities for anthelmintic screening at institutions in the countries where the parasitic diseases are endemic and allows for optimum use of specialised resources for public health priority diseases that may be different from those in Western countries. Support for developing country based screening centres also can help countries optimise product development procedures and policies and can facilitate diffusion of desirable technology in corresponding global regions around the world.
cudaMap: a GPU accelerated program for gene expression connectivity mapping

PubMed Central

2013-01-01

Background Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. On a normal desktop PC, it is common for the connectivity mapping task with a single gene signature to take > 2h to complete using sscMap, a popular Java application that runs on standard CPUs (Central Processing Units). Here, we describe new software, cudaMap, which has been implemented using CUDA C/C++ to harness the computational power of NVIDIA GPUs (Graphics Processing Units) to greatly reduce processing times for connectivity mapping. Results cudaMap can identify candidate therapeutics from the same signature in just over thirty seconds when using an NVIDIA Tesla C2050 GPU. Results from the analysis of multiple gene signatures, which would previously have taken several days, can now be obtained in as little as 10 minutes, greatly facilitating candidate therapeutics discovery with high throughput. We are able to demonstrate dramatic speed differentials between GPU assisted performance and CPU executions as the computational load increases for high accuracy evaluation of statistical significance. Conclusion Emerging ‘omics’ technologies are constantly increasing the volume of data and information to be processed in all areas of biomedical research. Embracing the multicore functionality of GPUs represents a major avenue of local accelerated computing. cudaMap will make a strong contribution in the discovery of candidate therapeutics by enabling speedy execution of heavy duty connectivity mapping tasks, which are increasingly required in modern cancer research. cudaMap is open source and can be freely downloaded from http
cudaMap: a GPU accelerated program for gene expression connectivity mapping.

PubMed

McArt, Darragh G; Bankhead, Peter; Dunne, Philip D; Salto-Tellez, Manuel; Hamilton, Peter; Zhang, Shu-Dong

2013-10-11

Modern cancer research often involves large datasets and the use of sophisticated statistical techniques. Together these add a heavy computational load to the analysis, which is often coupled with issues surrounding data accessibility. Connectivity mapping is an advanced bioinformatic and computational technique dedicated to therapeutics discovery and drug re-purposing around differential gene expression analysis. On a normal desktop PC, it is common for the connectivity mapping task with a single gene signature to take > 2h to complete using sscMap, a popular Java application that runs on standard CPUs (Central Processing Units). Here, we describe new software, cudaMap, which has been implemented using CUDA C/C++ to harness the computational power of NVIDIA GPUs (Graphics Processing Units) to greatly reduce processing times for connectivity mapping. cudaMap can identify candidate therapeutics from the same signature in just over thirty seconds when using an NVIDIA Tesla C2050 GPU. Results from the analysis of multiple gene signatures, which would previously have taken several days, can now be obtained in as little as 10 minutes, greatly facilitating candidate therapeutics discovery with high throughput. We are able to demonstrate dramatic speed differentials between GPU assisted performance and CPU executions as the computational load increases for high accuracy evaluation of statistical significance. Emerging 'omics' technologies are constantly increasing the volume of data and information to be processed in all areas of biomedical research. Embracing the multicore functionality of GPUs represents a major avenue of local accelerated computing. cudaMap will make a strong contribution in the discovery of candidate therapeutics by enabling speedy execution of heavy duty connectivity mapping tasks, which are increasingly required in modern cancer research. cudaMap is open source and can be freely downloaded from http://purl.oclc.org/NET/cudaMap.
Dynamic Structure-Based Pharmacophore Model Development: A New and Effective Addition in the Histone Deacetylase 8 (HDAC8) Inhibitor Discovery

PubMed Central

Thangapandian, Sundarapandian; John, Shalini; Lee, Yuno; Kim, Songmi; Lee, Keun Woo

2011-01-01

Histone deacetylase 8 (HDAC8) is an enzyme involved in deacetylating the amino groups of terminal lysine residues, thereby repressing the transcription of various genes including tumor suppressor gene. The over expression of HDAC8 was observed in many cancers and thus inhibition of this enzyme has emerged as an efficient cancer therapeutic strategy. In an effort to facilitate the future discovery of HDAC8 inhibitors, we developed two pharmacophore models containing six and five pharmacophoric features, respectively, using the representative structures from two molecular dynamic (MD) simulations performed in Gromacs 4.0.5 package. Various analyses of trajectories obtained from MD simulations have displayed the changes upon inhibitor binding. Thus utilization of the dynamically-responded protein structures in pharmacophore development has the added advantage of considering the conformational flexibility of protein. The MD trajectories were clustered based on single-linkage method and representative structures were taken to be used in the pharmacophore model development. Active site complimenting structure-based pharmacophore models were developed using Discovery Studio 2.5 program and validated using a dataset of known HDAC8 inhibitors. Virtual screening of chemical database coupled with drug-like filter has identified drug-like hit compounds that match the pharmacophore models. Molecular docking of these hits reduced the false positives and identified two potential compounds to be used in future HDAC8 inhibitor design. PMID:22272142
Discovery of time-delayed gene regulatory networks based on temporal gene expression profiling

PubMed Central

Li, Xia; Rao, Shaoqi; Jiang, Wei; Li, Chuanxing; Xiao, Yun; Guo, Zheng; Zhang, Qingpu; Wang, Lihong; Du, Lei; Li, Jing; Li, Li; Zhang, Tianwen; Wang, Qing K

2006-01-01

Background It is one of the ultimate goals for modern biological research to fully elucidate the intricate interplays and the regulations of the molecular determinants that propel and characterize the progression of versatile life phenomena, to name a few, cell cycling, developmental biology, aging, and the progressive and recurrent pathogenesis of complex diseases. The vast amount of large-scale and genome-wide time-resolved data is becoming increasing available, which provides the golden opportunity to unravel the challenging reverse-engineering problem of time-delayed gene regulatory networks. Results In particular, this methodological paper aims to reconstruct regulatory networks from temporal gene expression data by using delayed correlations between genes, i.e., pairwise overlaps of expression levels shifted in time relative each other. We have thus developed a novel model-free computational toolbox termed TdGRN (Time-delayed Gene Regulatory Network) to address the underlying regulations of genes that can span any unit(s) of time intervals. This bioinformatics toolbox has provided a unified approach to uncovering time trends of gene regulations through decision analysis of the newly designed time-delayed gene expression matrix. We have applied the proposed method to yeast cell cycling and human HeLa cell cycling and have discovered most of the underlying time-delayed regulations that are supported by multiple lines of experimental evidence and that are remarkably consistent with the current knowledge on phase characteristics for the cell cyclings. Conclusion We established a usable and powerful model-free approach to dissecting high-order dynamic trends of gene-gene interactions. We have carefully validated the proposed algorithm by applying it to two publicly available cell cycling datasets. In addition to uncovering the time trends of gene regulations for cell cycling, this unified approach can also be used to study the complex gene regulations related to
Conducting On-orbit Gene Expression Analysis on ISS: WetLab-2

NASA Technical Reports Server (NTRS)

Parra, Macarena; Almeida, Eduardo; Boone, Travis; Jung, Jimmy; Lera, Matthew P.; Ricco, Antonio; Souza, Kenneth; Wu, Diana; Richey, C. Scott

2013-01-01

WetLab-2 will enable expanded genomic research on orbit by developing tools that support in situ sample collection, processing, and analysis on ISS. This capability will reduce the time-to-results for investigators and define new pathways for discovery on the ISS National Lab. The primary objective is to develop a research platform on ISS that will facilitate real-time quantitative gene expression analysis of biological samples collected on orbit. WetLab-2 will be capable of processing multiple sample types ranging from microbial cultures to animal tissues dissected on orbit. WetLab-2 will significantly expand the analytical capabilities onboard ISS and enhance science return from ISS.
Accelerating scientific discovery : 2007 annual report.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Beckman, P.; Dave, P.; Drugan, C.

2008-11-14

As a gateway for scientific discovery, the Argonne Leadership Computing Facility (ALCF) works hand in hand with the world's best computational scientists to advance research in a diverse span of scientific domains, ranging from chemistry, applied mathematics, and materials science to engineering physics and life sciences. Sponsored by the U.S. Department of Energy's (DOE) Office of Science, researchers are using the IBM Blue Gene/L supercomputer at the ALCF to study and explore key scientific problems that underlie important challenges facing our society. For instance, a research team at the University of California-San Diego/ SDSC is studying the molecular basis ofmore » Parkinson's disease. The researchers plan to use the knowledge they gain to discover new drugs to treat the disease and to identify risk factors for other diseases that are equally prevalent. Likewise, scientists from Pratt & Whitney are using the Blue Gene to understand the complex processes within aircraft engines. Expanding our understanding of jet engine combustors is the secret to improved fuel efficiency and reduced emissions. Lessons learned from the scientific simulations of jet engine combustors have already led Pratt & Whitney to newer designs with unprecedented reductions in emissions, noise, and cost of ownership. ALCF staff members provide in-depth expertise and assistance to those using the Blue Gene/L and optimizing user applications. Both the Catalyst and Applications Performance Engineering and Data Analytics (APEDA) teams support the users projects. In addition to working with scientists running experiments on the Blue Gene/L, we have become a nexus for the broader global community. In partnership with the Mathematics and Computer Science Division at Argonne National Laboratory, we have created an environment where the world's most challenging computational science problems can be addressed. Our expertise in high-end scientific computing enables us to provide guidance for
Leveraging Gene-Environment Interactions and Endotypes for Asthma Gene Discovery

PubMed Central

Bønnelykke, Klaus; Ober, Carole

2016-01-01

Asthma is a heterogeneous clinical syndrome that includes subtypes of disease with different underlying causes and disease mechanisms. Asthma is caused by a complex interaction between genes and environmental exposures; early-life exposures in particular play an important role. Asthma is also heritable, and a number of susceptibility variants have been discovered in genome-wide association studies, although the known risk alleles explain only a small proportion of the heritability. In this review, we present evidence supporting the hypothesis that focusing on more specific asthma phenotypes, such as childhood asthma with severe exacerbations, and on relevant exposures that are involved in gene-environment interactions (GEIs), such as rhinovirus infections, will improve detection of asthma genes and our understanding of the underlying mechanisms. We will discuss the challenges of considering GEIs and the advantages of studying responses to asthma-associated exposures in clinical birth cohorts, as well as in cell models of GEIs, to dissect the context-specific nature of genotypic risks, to prioritize variants in genome-wide association studies, and to identify pathways involved in pathogenesis in subgroups of patients. We propose that such approaches, in spite of their many challenges, present great opportunities for better understanding of asthma pathogenesis and heterogeneity and, ultimately, for improving prevention and treatment of disease. PMID:26947980
Computational Identification of the Paralogs and Orthologs of Human Cytochrome P450 Superfamily and the Implication in Drug Discovery

PubMed Central

Pan, Shu-Ting; Xue, Danfeng; Li, Zhi-Ling; Zhou, Zhi-Wei; He, Zhi-Xu; Yang, Yinxue; Yang, Tianxin; Qiu, Jia-Xuan; Zhou, Shu-Feng

2016-01-01

The human cytochrome P450 (CYP) superfamily consisting of 57 functional genes is the most important group of Phase I drug metabolizing enzymes that oxidize a large number of xenobiotics and endogenous compounds, including therapeutic drugs and environmental toxicants. The CYP superfamily has been shown to expand itself through gene duplication, and some of them become pseudogenes due to gene mutations. Orthologs and paralogs are homologous genes resulting from speciation or duplication, respectively. To explore the evolutionary and functional relationships of human CYPs, we conducted this bioinformatic study to identify their corresponding paralogs, homologs, and orthologs. The functional implications and implications in drug discovery and evolutionary biology were then discussed. GeneCards and Ensembl were used to identify the paralogs of human CYPs. We have used a panel of online databases to identify the orthologs of human CYP genes: NCBI, Ensembl Compara, GeneCards, OMA (“Orthologous MAtrix”) Browser, PATHER, TreeFam, EggNOG, and Roundup. The results show that each human CYP has various numbers of paralogs and orthologs using GeneCards and Ensembl. For example, the paralogs of CYP2A6 include CYP2A7, 2A13, 2B6, 2C8, 2C9, 2C18, 2C19, 2D6, 2E1, 2F1, 2J2, 2R1, 2S1, 2U1, and 2W1; CYP11A1 has 6 paralogs including CYP11B1, 11B2, 24A1, 27A1, 27B1, and 27C1; CYP51A1 has only three paralogs: CYP26A1, 26B1, and 26C1; while CYP20A1 has no paralog. The majority of human CYPs are well conserved from plants, amphibians, fishes, or mammals to humans due to their important functions in physiology and xenobiotic disposition. The data from different approaches are also cross-validated and validated when experimental data are available. These findings facilitate our understanding of the evolutionary relationships and functional implications of the human CYP superfamily in drug discovery. PMID:27367670
Gene Discovery through Genomic Sequencing of Brucella abortus

PubMed Central

Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

2001-01-01

Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979

Top-K Interesting Subgraph Discovery in Information Networks

DTIC Science & Technology

2014-03-03

Integrative Biomarker Discovery for Breast Cancer Metastasis from Gene Expression and Protein Interaction Data Using Error-tolerant Pattern Mining” at...Jiawei Han¶ ∗Microsoft, India . Email: gmanish@microsoft.com †State University of New York at Buffalo. Email: jing@buffalo.edu ‡University of California
Genomic resources for gene discovery, functional genome annotation, and evolutionary studies of maize and its close relatives.

PubMed

Wang, Chao; Shi, Xue; Liu, Lin; Li, Haiyan; Ammiraju, Jetty S S; Kudrna, David A; Xiong, Wentao; Wang, Hao; Dai, Zhaozhao; Zheng, Yonglian; Lai, Jinsheng; Jin, Weiwei; Messing, Joachim; Bennetzen, Jeffrey L; Wing, Rod A; Luo, Meizhong

2013-11-01

Maize is one of the most important food crops and a key model for genetics and developmental biology. A genetically anchored and high-quality draft genome sequence of maize inbred B73 has been obtained to serve as a reference sequence. To facilitate evolutionary studies in maize and its close relatives, much like the Oryza Map Alignment Project (OMAP) (www.OMAP.org) bacterial artificial chromosome (BAC) resource did for the rice community, we constructed BAC libraries for maize inbred lines Zheng58, Chang7-2, and Mo17 and maize wild relatives Zea mays ssp. parviglumis and Tripsacum dactyloides. Furthermore, to extend functional genomic studies to maize and sorghum, we also constructed binary BAC (BIBAC) libraries for the maize inbred B73 and the sorghum landrace Nengsi-1. The BAC/BIBAC vectors facilitate transfer of large intact DNA inserts from BAC clones to the BIBAC vector and functional complementation of large DNA fragments. These seven Zea Map Alignment Project (ZMAP) BAC/BIBAC libraries have average insert sizes ranging from 92 to 148 kb, organellar DNA from 0.17 to 2.3%, empty vector rates between 0.35 and 5.56%, and genome equivalents of 4.7- to 8.4-fold. The usefulness of the Parviglumis and Tripsacum BAC libraries was demonstrated by mapping clones to the reference genome. Novel genes and alleles present in these ZMAP libraries can now be used for functional complementation studies and positional or homology-based cloning of genes for translational genomics.
Cracking the regulatory code of biosynthetic gene clusters as a strategy for natural product discovery.

PubMed

Rigali, Sébastien; Anderssen, Sinaeda; Naômé, Aymeric; van Wezel, Gilles P

2018-01-05

The World Health Organization (WHO) describes antibiotic resistance as "one of the biggest threats to global health, food security, and development today", as the number of multi- and pan-resistant bacteria is rising dangerously. Acquired resistance phenomena also impair antifungals, antivirals, anti-cancer drug therapy, while herbicide resistance in weeds threatens the crop industry. On the positive side, it is likely that the chemical space of natural products goes far beyond what has currently been discovered. This idea is fueled by genome sequencing of microorganisms which unveiled numerous so-called cryptic biosynthetic gene clusters (BGCs), many of which are transcriptionally silent under laboratory culture conditions, and by the fact that most bacteria cannot yet be cultivated in the laboratory. However, brute force antibiotic discovery does not yield the same results as it did in the past, and researchers have had to develop creative strategies in order to unravel the hidden potential of microorganisms such as Streptomyces and other antibiotic-producing microorganisms. Identifying the cis elements and their corresponding transcription factors(s) involved in the control of BGCs through bioinformatic approaches is a promising strategy. Theoretically, we are a few 'clicks' away from unveiling the culturing conditions or genetic changes needed to activate the production of cryptic metabolites or increase the production yield of known compounds to make them economically viable. In this opinion article, we describe and illustrate the idea beyond 'cracking' the regulatory code for natural product discovery, by presenting a series of proofs of concept, and discuss what still should be achieved to increase the rate of success of this strategy. Copyright © 2018 Elsevier Inc. All rights reserved.
Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae

PubMed Central

2013-01-01

Background Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research. Results We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation. Conclusions This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites. PMID:23617571
Using the iPlant collaborative discovery environment.

PubMed

Oliver, Shannon L; Lenards, Andrew J; Barthelson, Roger A; Merchant, Nirav; McKay, Sheldon J

2013-06-01

The iPlant Collaborative is an academic consortium whose mission is to develop an informatics and social infrastructure to address the "grand challenges" in plant biology. Its cyberinfrastructure supports the computational needs of the research community and facilitates solving major challenges in plant science. The Discovery Environment provides a powerful and rich graphical interface to the iPlant Collaborative cyberinfrastructure by creating an accessible virtual workbench that enables all levels of expertise, ranging from students to traditional biology researchers and computational experts, to explore, analyze, and share their data. By providing access to iPlant's robust data-management system and high-performance computing resources, the Discovery Environment also creates a unified space in which researchers can access scalable tools. Researchers can use available Applications (Apps) to execute analyses on their data, as well as customize or integrate their own tools to better meet the specific needs of their research. These Apps can also be used in workflows that automate more complicated analyses. This module describes how to use the main features of the Discovery Environment, using bioinformatics workflows for high-throughput sequence data as examples. © 2013 by John Wiley & Sons, Inc.
Using the TIGR gene index databases for biological discovery.

PubMed

Lee, Yuandan; Quackenbush, John

2003-11-01

The TIGR Gene Index web pages provide access to analyses of ESTs and gene sequences for nearly 60 species, as well as a number of resources derived from these. Each species-specific database is presented using a common format with a homepage. A variety of methods exist that allow users to search each species-specific database. Methods implemented currently include nucleotide or protein sequence queries using WU-BLAST, text-based searches using various sequence identifiers, searches by gene, tissue and library name, and searches using functional classes through Gene Ontology assignments. This protocol provides guidance for using the Gene Index Databases to extract information.
BioTextQuest(+): a knowledge integration platform for literature mining and concept discovery.

PubMed

Papanikolaou, Nikolas; Pavlopoulos, Georgios A; Pafilis, Evangelos; Theodosiou, Theodosios; Schneider, Reinhard; Satagopam, Venkata P; Ouzounis, Christos A; Eliopoulos, Aristides G; Promponas, Vasilis J; Iliopoulos, Ioannis

2014-11-15

The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University
SNP discovery in the bovine milk transcriptome using RNA-Seq technology.

PubMed

Cánovas, Angela; Rincon, Gonzalo; Islas-Trejo, Alma; Wickramasinghe, Saumya; Medrano, Juan F

2010-12-01

High-throughput sequencing of RNA (RNA-Seq) was developed primarily to analyze global gene expression in different tissues. However, it also is an efficient way to discover coding SNPs. The objective of this study was to perform a SNP discovery analysis in the milk transcriptome using RNA-Seq. Seven milk samples from Holstein cows were analyzed by sequencing cDNAs using the Illumina Genome Analyzer system. We detected 19,175 genes expressed in milk samples corresponding to approximately 70% of the total number of genes analyzed. The SNP detection analysis revealed 100,734 SNPs in Holstein samples, and a large number of those corresponded to differences between the Holstein breed and the Hereford bovine genome assembly Btau4.0. The number of polymorphic SNPs within Holstein cows was 33,045. The accuracy of RNA-Seq SNP discovery was tested by comparing SNPs detected in a set of 42 candidate genes expressed in milk that had been resequenced earlier using Sanger sequencing technology. Seventy of 86 SNPs were detected using both RNA-Seq and Sanger sequencing technologies. The KASPar Genotyping System was used to validate unique SNPs found by RNA-Seq but not observed by Sanger technology. Our results confirm that analyzing the transcriptome using RNA-Seq technology is an efficient and cost-effective method to identify SNPs in transcribed regions. This study creates guidelines to maximize the accuracy of SNP discovery and prevention of false-positive SNP detection, and provides more than 33,000 SNPs located in coding regions of genes expressed during lactation that can be used to develop genotyping platforms to perform marker-trait association studies in Holstein cattle.
A platform for rapid prototyping of synthetic gene networks in mammalian cells

PubMed Central

Duportet, Xavier; Wroblewska, Liliana; Guye, Patrick; Li, Yinqing; Eyquem, Justin; Rieders, Julianne; Rimchala, Tharathorn; Batt, Gregory; Weiss, Ron

2014-01-01

Mammalian synthetic biology may provide novel therapeutic strategies, help decipher new paths for drug discovery and facilitate synthesis of valuable molecules. Yet, our capacity to genetically program cells is currently hampered by the lack of efficient approaches to streamline the design, construction and screening of synthetic gene networks. To address this problem, here we present a framework for modular and combinatorial assembly of functional (multi)gene expression vectors and their efficient and specific targeted integration into a well-defined chromosomal context in mammalian cells. We demonstrate the potential of this framework by assembling and integrating different functional mammalian regulatory networks including the largest gene circuit built and chromosomally integrated to date (6 transcription units, 27kb) encoding an inducible memory device. Using a library of 18 different circuits as a proof of concept, we also demonstrate that our method enables one-pot/single-flask chromosomal integration and screening of circuit libraries. This rapid and powerful prototyping platform is well suited for comparative studies of genetic regulatory elements, genes and multi-gene circuits as well as facile development of libraries of isogenic engineered cell lines. PMID:25378321
Drug discovery strategies to outer membrane targets in Gram-negative pathogens.

PubMed

Brown, Dean G

2016-12-15

This review will cover selected recent examples of drug discovery strategies which target the outer membrane (OM) of Gram-negative bacteria either by disruption of outer membrane function or by inhibition of essential gene products necessary for outer membrane assembly. Significant advances in pathway elucidation, structural biology and molecular inhibitor designs have created new opportunities for drug discovery within this target-class space. Copyright Â© 2016 Elsevier Ltd. All rights reserved.
An analysis of gene expression in PTSD implicates genes involved in the glucocorticoid receptor pathway and neural responses to stress

PubMed Central

Logue, Mark W.; Smith, Alicia K.; Baldwin, Clinton; Wolf, Erika J.; Guffanti, Guia; Ratanatharathorn, Andrew; Stone, Annjanette; Schichman, Steven A.; Humphries, Donald; Binder, Elisabeth B.; Arloth, Janine; Menke, Andreas; Uddin, Monica; Wildman, Derek; Galea, Sandro; Aiello, Allison E.; Koenen, Karestan C.; Miller, Mark W.

2015-01-01

We examined the association between posttraumatic stress disorder (PTSD) and gene expression using whole blood samples from a cohort of trauma-exposed white non-Hispanic male veterans (115 cases and 28 controls). 10,264 probes of genes and gene transcripts were analyzed. We found 41 that were differentially expressed in PTSD cases versus controls (multiple-testing corrected p<0.05). The most significant was DSCAM, a neurological gene expressed widely in the developing brain and in the amygdala and hippocampus of the adult brain. We then examined the 41 differentially expressed genes in a meta-analysis using two replication cohorts and found significant associations with PTSD for 7 of the 41 (p<0.05), one of which (ATP6AP1L) survived multiple-testing correction. There was also broad evidence of overlap across the discovery and replication samples for the entire set of genes implicated in the discovery data based on the direction of effect and an enrichment of p<0.05 significant probes beyond what would be expected under the null. Finally, we found that the set of differentially expressed genes from the discovery sample was enriched for genes responsive to glucocorticoid signaling with most showing reduced expression in PTSD cases compared to controls. PMID:25867994
Knock down of Whitefly Gut Gene Expression and Mortality by Orally Delivered Gut Gene-Specific dsRNAs.

PubMed

Vyas, Meenal; Raza, Amir; Ali, Muhammad Yousaf; Ashraf, Muhammad Aleem; Mansoor, Shahid; Shahid, Ahmad Ali; Brown, Judith K

2017-01-01

Control of the whitefly Bemisia tabaci (Genn.) agricultural pest and plant virus vector relies on the use of chemical insecticides. RNA-interference (RNAi) is a homology-dependent innate immune response in eukaryotes, including insects, which results in degradation of the corresponding transcript following its recognition by a double-stranded RNA (dsRNA) that shares 100% sequence homology. In this study, six whitefly 'gut' genes were selected from an in silico-annotated transcriptome library constructed from the whitefly alimentary canal or 'gut' of the B biotype of B. tabaci, and tested for knock down efficacy, post-ingestion of dsRNAs that share 100% sequence homology to each respective gene target. Candidate genes were: Acetylcholine receptor subunit α, Alpha glucosidase 1, Aquaporin 1, Heat shock protein 70, Trehalase1, and Trehalose transporter1. The efficacy of RNAi knock down was further tested in a gene-specific functional bioassay, and mortality was recorded in 24 hr intervals, six days, post-treatment. Based on qPCR analysis, all six genes tested showed significantly reduced gene expression. Moderate-to-high whitefly mortality was associated with the down-regulation of osmoregulation, sugar metabolism and sugar transport-associated genes, demonstrating that whitefly survivability was linked with RNAi results. Silenced Acetylcholine receptor subunit α and Heat shock protein 70 genes showed an initial low whitefly mortality, however, following insecticide or high temperature treatments, respectively, significantly increased knockdown efficacy and death was observed, indicating enhanced post-knockdown sensitivity perhaps related to systemic silencing. The oral delivery of gut-specific dsRNAs, when combined with qPCR analysis of gene expression and a corresponding gene-specific bioassay that relates knockdown and mortality, offers a viable approach for functional genomics analysis and the discovery of prospective dsRNA biopesticide targets. The approach can
Horizontal acquisition of multiple mitochondrial genes from a parasitic plant followed by gene conversion with host mitochondrial genes

PubMed Central

2010-01-01

Background Horizontal gene transfer (HGT) is relatively common in plant mitochondrial genomes but the mechanisms, extent and consequences of transfer remain largely unknown. Previous results indicate that parasitic plants are often involved as either transfer donors or recipients, suggesting that direct contact between parasite and host facilitates genetic transfer among plants. Results In order to uncover the mechanistic details of plant-to-plant HGT, the extent and evolutionary fate of transfer was investigated between two groups: the parasitic genus Cuscuta and a small clade of Plantago species. A broad polymerase chain reaction (PCR) survey of mitochondrial genes revealed that at least three genes (atp1, atp6 and matR) were recently transferred from Cuscuta to Plantago. Quantitative PCR assays show that these three genes have a mitochondrial location in the one species line of Plantago examined. Patterns of sequence evolution suggest that these foreign genes degraded into pseudogenes shortly after transfer and reverse transcription (RT)-PCR analyses demonstrate that none are detectably transcribed. Three cases of gene conversion were detected between native and foreign copies of the atp1 gene. The identical phylogenetic distribution of the three foreign genes within Plantago and the retention of cytidines at ancestral positions of RNA editing indicate that these genes were probably acquired via a single, DNA-mediated transfer event. However, samplings of multiple individuals from two of the three species in the recipient Plantago clade revealed complex and perplexing phylogenetic discrepancies and patterns of sequence divergence for all three of the foreign genes. Conclusions This study reports the best evidence to date that multiple mitochondrial genes can be transferred via a single HGT event and that transfer occurred via a strictly DNA-level intermediate. The discovery of gene conversion between co-resident foreign and native mitochondrial copies suggests
Sea Level Rise Data Discovery

NASA Astrophysics Data System (ADS)

Quach, N.; Huang, T.; Boening, C.; Gill, K. M.

2016-12-01

Research related to sea level rise crosses multiple disciplines from sea ice to land hydrology. The NASA Sea Level Change Portal (SLCP) is a one-stop source for current sea level change information and data, including interactive tools for accessing and viewing regional data, a virtual dashboard of sea level indicators, and ongoing updates through a suite of editorial products that include content articles, graphics, videos, and animations. The architecture behind the SLCP makes it possible to integrate web content and data relevant to sea level change that are archived across various data centers as well as new data generated by sea level change principal investigators. The Extensible Data Gateway Environment (EDGE) is incorporated into the SLCP architecture to provide a unified platform for web content and science data discovery. EDGE is a data integration platform designed to facilitate high-performance geospatial data discovery and access with the ability to support multi-metadata standard specifications. EDGE has the capability to retrieve data from one or more sources and package the resulting sets into a single response to the requestor. With this unified endpoint, the Data Analysis Tool that is available on the SLCP can retrieve dataset and granule level metadata as well as perform geospatial search on the data. This talk focuses on the architecture that makes it possible to seamlessly integrate and enable discovery of disparate data relevant to sea level rise.
Transcriptome Analysis and Discovery of Genes Involved in Immune Pathways from Hepatopancreas of Microbial Challenged Mitten Crab Eriocheir sinensis

PubMed Central

Li, Xihong; Cui, Zhaoxia; Liu, Yuan; Song, Chengwen; Shi, Guohui

2013-01-01

Background The Chinese mitten crab Eriocheir sinensis is an important economic crustacean and has been seriously attacked by various diseases, which requires more and more information for immune relevant genes on genome background. Recently, high-throughput RNA sequencing (RNA-seq) technology provides a powerful and efficient method for transcript analysis and immune gene discovery. Methods/Principal Findings A cDNA library from hepatopancreas of E. sinensis challenged by a mixture of three pathogen strains (Gram-positive bacteria Micrococcus luteus, Gram-negative bacteria Vibrio alginolyticus and fungi Pichia pastoris; 108 cfu·mL−1) was constructed and randomly sequenced using Illumina technique. Totally 39.76 million clean reads were assembled to 70,300 unigenes. After ruling out short-length and low-quality sequences, 52,074 non-redundant unigenes were compared to public databases for homology searching and 17,617 of them showed high similarity to sequences in NCBI non-redundant protein (Nr) database. For function classification and pathway assignment, 18,734 (36.00%) unigenes were categorized to three Gene Ontology (GO) categories, 12,243 (23.51%) were classified to 25 Clusters of Orthologous Groups (COG), and 8,983 (17.25%) were assigned to six Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Potentially, 24, 14, 47 and 132 unigenes were characterized to be involved in Toll, IMD, JAK-STAT and MAPK pathways, respectively. Conclusions/Significance This is the first systematical transcriptome analysis of components relating to innate immune pathways in E. sinensis. Functional genes and putative pathways identified here will contribute to better understand immune system and prevent various diseases in crab. PMID:23874555
A new approach to the rationale discovery of polymeric biomaterials

PubMed Central

Kohn, Joachim; Welsh, William J.; Knight, Doyle

2007-01-01

This paper attempts to illustrate both the need for new approaches to biomaterials discovery as well as the significant promise inherent in the use of combinatorial and computational design strategies. The key observation of this Leading Opinion Paper is that the biomaterials community has been slow to embrace advanced biomaterials discovery tools such as combinatorial methods, high throughput experimentation, and computational modeling in spite of the significant promise shown by these discovery tools in materials science, medicinal chemistry and the pharmaceutical industry. It seems that the complexity of living cells and their interactions with biomaterials has been a conceptual as well as a practical barrier to the use of advanced discovery tools in biomaterials science. However, with the continued increase in computer power, the goal of predicting the biological response of cells in contact with biomaterials surfaces is within reach. Once combinatorial synthesis, high throughput experimentation, and computational modeling are integrated into the biomaterials discovery process, a significant acceleration is possible in the pace of development of improved medical implants, tissue regeneration scaffolds, and gene/drug delivery systems. PMID:17644176
Genome-wide characterization of GRAS family genes in Medicago truncatula reveals their evolutionary dynamics and functional diversification

PubMed Central

Zhang, Hailing; Cao, Yingping; Shang, Chen; Li, Jikai; Wang, Jianli; Wu, Zhenying; Ma, Lichao; Qi, Tianxiong; Fu, Chunxiang; Hu, Baozhong

2017-01-01

The GRAS gene family is a large plant-specific family of transcription factors that are involved in diverse processes during plant development. Medicago truncatula is an ideal model plant for genetic research in legumes, and specifically for studying nodulation, which is crucial for nitrogen fixation. In this study, 59 MtGRAS genes were identified and classified into eight distinct subgroups based on phylogenetic relationships. Motifs located in the C-termini were conserved across the subgroups, while motifs in the N-termini were subfamily specific. Gene duplication was the main evolutionary force for MtGRAS expansion, especially proliferation of the LISCL subgroup. Seventeen duplicated genes showed strong effects of purifying selection and diverse expression patterns, highlighting their functional importance and diversification after duplication. Thirty MtGRAS genes, including NSP1 and NSP2, were preferentially expressed in nodules, indicating possible roles in the process of nodulation. A transcriptome study, combined with gene expression analysis under different stress conditions, suggested potential functions of MtGRAS genes in various biological pathways and stress responses. Taken together, these comprehensive analyses provide basic information for understanding the potential functions of GRAS genes, and will facilitate further discovery of MtGRAS gene functions. PMID:28945786
Discovery of four recessive developmental disorders using probabilistic genotype and phenotype matching among 4,125 families

PubMed Central

Ansari, Morad; Balasubramanian, Meena; Blyth, Moira; Brady, Angela F.; Clayton, Stephen; Cole, Trevor; Deshpande, Charu; Fitzgerald, Tomas W.; Foulds, Nicola; Francis, Richard; Gabriel, George; Gerety, Sebastian S.; Goodship, Judith; Hobson, Emma; Jones, Wendy D.; Joss, Shelagh; King, Daniel; Klena, Nikolai; Kumar, Ajith; Lees, Melissa; Lelliott, Chris; Lord, Jenny; McMullan, Dominic; O'Regan, Mary; Osio, Deborah; Piombo, Virginia; Prigmore, Elena; Rajan, Diana; Rosser, Elisabeth; Sifrim, Alejandro; Smith, Audrey; Swaminathan, Ganesh J.; Turnpenny, Peter; Whitworth, James; Wright, Caroline F.; Firth, Helen V.; Barrett, Jeffrey C.; Lo, Cecilia W.; FitzPatrick, David R.; Hurles, Matthew E.

2018-01-01

Discovery of most autosomal recessive disease genes has involved analysis of large, often consanguineous, multiplex families or small cohorts of unrelated individuals with a well-defined clinical condition. Discovery of novel dominant causes of rare, genetically heterogenous developmental disorders has been revolutionized by exome analysis of large cohorts of phenotypically diverse parent-offspring trios 1,2. Here we analysed 4,125 families with diverse, rare, genetically heterogeneous developmental disorders and identified four novel autosomal recessive disorders. These four disorders were identified by integrating Mendelian filtering (identifying probands with rare biallelic putatively damaging variants in the same gene) with statistical assessments of (i) the likelihood of sampling the observed genotypes from the general population, and (ii) the phenotypic similarity of patients with the same recessive candidate gene. This new paradigm promises to catalyse discovery of novel recessive disorders, especially those with less consistent or nonspecific clinical presentations, and those caused predominantly by compound heterozygous genotypes. PMID:26437029
Discovery of four recessive developmental disorders using probabilistic genotype and phenotype matching among 4,125 families.

PubMed

Akawi, Nadia; McRae, Jeremy; Ansari, Morad; Balasubramanian, Meena; Blyth, Moira; Brady, Angela F; Clayton, Stephen; Cole, Trevor; Deshpande, Charu; Fitzgerald, Tomas W; Foulds, Nicola; Francis, Richard; Gabriel, George; Gerety, Sebastian S; Goodship, Judith; Hobson, Emma; Jones, Wendy D; Joss, Shelagh; King, Daniel; Klena, Nikolai; Kumar, Ajith; Lees, Melissa; Lelliott, Chris; Lord, Jenny; McMullan, Dominic; O'Regan, Mary; Osio, Deborah; Piombo, Virginia; Prigmore, Elena; Rajan, Diana; Rosser, Elisabeth; Sifrim, Alejandro; Smith, Audrey; Swaminathan, Ganesh J; Turnpenny, Peter; Whitworth, James; Wright, Caroline F; Firth, Helen V; Barrett, Jeffrey C; Lo, Cecilia W; FitzPatrick, David R; Hurles, Matthew E

2015-11-01

Discovery of most autosomal recessive disease-associated genes has involved analysis of large, often consanguineous multiplex families or small cohorts of unrelated individuals with a well-defined clinical condition. Discovery of new dominant causes of rare, genetically heterogeneous developmental disorders has been revolutionized by exome analysis of large cohorts of phenotypically diverse parent-offspring trios. Here we analyzed 4,125 families with diverse, rare and genetically heterogeneous developmental disorders and identified four new autosomal recessive disorders. These four disorders were identified by integrating Mendelian filtering (selecting probands with rare, biallelic and putatively damaging variants in the same gene) with statistical assessments of (i) the likelihood of sampling the observed genotypes from the general population and (ii) the phenotypic similarity of patients with recessive variants in the same candidate gene. This new paradigm promises to catalyze the discovery of novel recessive disorders, especially those with less consistent or nonspecific clinical presentations and those caused predominantly by compound heterozygous genotypes.
Advanced biological and chemical discovery (ABCD): centralizing discovery knowledge in an inherently decentralized world.

PubMed

Agrafiotis, Dimitris K; Alex, Simson; Dai, Heng; Derkinderen, An; Farnum, Michael; Gates, Peter; Izrailev, Sergei; Jaeger, Edward P; Konstant, Paul; Leung, Albert; Lobanov, Victor S; Marichal, Patrick; Martin, Douglas; Rassokhin, Dmitrii N; Shemanarev, Maxim; Skalkin, Andrew; Stong, John; Tabruyn, Tom; Vermeiren, Marleen; Wan, Jackson; Xu, Xiang Yang; Yao, Xiang

2007-01-01

We present ABCD, an integrated drug discovery informatics platform developed at Johnson & Johnson Pharmaceutical Research & Development, L.L.C. ABCD is an attempt to bridge multiple continents, data systems, and cultures using modern information technology and to provide scientists with tools that allow them to analyze multifactorial SAR and make informed, data-driven decisions. The system consists of three major components: (1) a data warehouse, which combines data from multiple chemical and pharmacological transactional databases, designed for supreme query performance; (2) a state-of-the-art application suite, which facilitates data upload, retrieval, mining, and reporting, and (3) a workspace, which facilitates collaboration and data sharing by allowing users to share queries, templates, results, and reports across project teams, campuses, and other organizational units. Chemical intelligence, performance, and analytical sophistication lie at the heart of the new system, which was developed entirely in-house. ABCD is used routinely by more than 1000 scientists around the world and is rapidly expanding into other functional areas within the J&J organization.

Discovery and development of new antibacterial drugs: learning from experience?

PubMed

Jackson, Nicole; Czaplewski, Lloyd; Piddock, Laura J V

2018-06-01

Antibiotic (antibacterial) resistance is a serious global problem and the need for new treatments is urgent. The current antibiotic discovery model is not delivering new agents at a rate that is sufficient to combat present levels of antibiotic resistance. This has led to fears of the arrival of a 'post-antibiotic era'. Scientific difficulties, an unfavourable regulatory climate, multiple company mergers and the low financial returns associated with antibiotic drug development have led to the withdrawal of many pharmaceutical companies from the field. The regulatory climate has now begun to improve, but major scientific hurdles still impede the discovery and development of novel antibacterial agents. To facilitate discovery activities there must be increased understanding of the scientific problems experienced by pharmaceutical companies. This must be coupled with addressing the current antibiotic resistance crisis so that compounds and ultimately drugs are delivered to treat the most urgent clinical challenges. By understanding the causes of the failures and successes of the pharmaceutical industry's research history, duplication of discovery programmes will be reduced, increasing the productivity of the antibiotic drug discovery pipeline by academia and small companies. The most important scientific issues to address are getting molecules into the Gram-negative bacterial cell and avoiding their efflux. Hence screening programmes should focus their efforts on whole bacterial cells rather than cell-free systems. Despite falling out of favour with pharmaceutical companies, natural product research still holds promise for providing new molecules as a basis for discovery.
Using telephony data to facilitate discovery of clinical workflows.

PubMed

Rucker, Donald W

2017-04-19

Discovery of clinical workflows to target for redesign using methods such as Lean and Six Sigma is difficult. VoIP telephone call pattern analysis may complement direct observation and EMR-based tools in understanding clinical workflows at the enterprise level by allowing visualization of institutional telecommunications activity. To build an analytic framework mapping repetitive and high-volume telephone call patterns in a large medical center to their associated clinical units using an enterprise unified communications server log file and to support visualization of specific call patterns using graphical networks. Consecutive call detail records from the medical center's unified communications server were parsed to cross-correlate telephone call patterns and map associated phone numbers to a cost center dictionary. Hashed data structures were built to allow construction of edge and node files representing high volume call patterns for display with an open source graph network tool. Summary statistics for an analysis of exactly one week's call detail records at a large academic medical center showed that 912,386 calls were placed with a total duration of 23,186 hours. Approximately half of all calling called number pairs had an average call duration under 60 seconds and of these the average call duration was 27 seconds. Cross-correlation of phone calls identified by clinical cost center can be used to generate graphical displays of clinical enterprise communications. Many calls are short. The compact data transfers within short calls may serve as automation or re-design targets. The large absolute amount of time medical center employees were engaged in VoIP telecommunications suggests that analysis of telephone call patterns may offer additional insights into core clinical workflows.
Service Demand Discovery Mechanism for Mobile Social Networks.

PubMed

Wu, Dapeng; Yan, Junjie; Wang, Honggang; Wang, Ruyan

2016-11-23

In the last few years, the service demand for wireless data over mobile networks has continually been soaring at a rapid pace. Thereinto, in Mobile Social Networks (MSNs), users can discover adjacent users for establishing temporary local connection and thus sharing already downloaded contents with each other to offload the service demand. Due to the partitioned topology, intermittent connection and social feature in such a network, the service demand discovery is challenging. In particular, the service demand discovery is exploited to identify the best relay user through the service registration, service selection and service activation. In order to maximize the utilization of limited network resources, a hybrid service demand discovery architecture, such as a Virtual Dictionary User (VDU) is proposed in this paper. Based on the historical data of movement, users can discover their relationships with others. Subsequently, according to the users activity, VDU is selected to facilitate the service registration procedure. Further, the service information outside of a home community can be obtained through the Global Active User (GAU) to support the service selection. To provide the Quality of Service (QoS), the Service Providing User (SPU) is chosen among multiple candidates. Numerical results show that, when compared with other classical service algorithms, the proposed scheme can improve the successful service demand discovery ratio by 25% under reduced overheads.
Recent development in software and automation tools for high-throughput discovery bioanalysis.

PubMed

Shou, Wilson Z; Zhang, Jun

2012-05-01

Bioanalysis with LC-MS/MS has been established as the method of choice for quantitative determination of drug candidates in biological matrices in drug discovery and development. The LC-MS/MS bioanalytical support for drug discovery, especially for early discovery, often requires high-throughput (HT) analysis of large numbers of samples (hundreds to thousands per day) generated from many structurally diverse compounds (tens to hundreds per day) with a very quick turnaround time, in order to provide important activity and liability data to move discovery projects forward. Another important consideration for discovery bioanalysis is its fit-for-purpose quality requirement depending on the particular experiments being conducted at this stage, and it is usually not as stringent as those required in bioanalysis supporting drug development. These aforementioned attributes of HT discovery bioanalysis made it an ideal candidate for using software and automation tools to eliminate manual steps, remove bottlenecks, improve efficiency and reduce turnaround time while maintaining adequate quality. In this article we will review various recent developments that facilitate automation of individual bioanalytical procedures, such as sample preparation, MS/MS method development, sample analysis and data review, as well as fully integrated software tools that manage the entire bioanalytical workflow in HT discovery bioanalysis. In addition, software tools supporting the emerging high-resolution accurate MS bioanalytical approach are also discussed.
Comparative Oncogenomics for Peripheral Nerve Sheath Cancer Gene Discovery

DTIC Science & Technology

2015-06-01

neurofibromas and MPNSTs, establish gene signatures defining distinct tumor subtypes and functionally test the role of selected driver mutations ...allografted tumor cells, and a variety of in vitro functional assays. We will validate the relevance of these mutated mouse genes in human neurofibromas...and MPNSTs by determining whether these same genes are mutated in human tumors. 15. SUBJECT TERMS Nothing listed 16. SECURITY CLASSIFICATION OF: 17
miRvestigator: web application to identify miRNAs responsible for co-regulated gene expression patterns discovered through transcriptome profiling.

PubMed

Plaisier, Christopher L; Bare, J Christopher; Baliga, Nitin S

2011-07-01

Transcriptome profiling studies have produced staggering numbers of gene co-expression signatures for a variety of biological systems. A significant fraction of these signatures will be partially or fully explained by miRNA-mediated targeted transcript degradation. miRvestigator takes as input lists of co-expressed genes from Caenorhabditis elegans, Drosophila melanogaster, G. gallus, Homo sapiens, Mus musculus or Rattus norvegicus and identifies the specific miRNAs that are likely to bind to 3' un-translated region (UTR) sequences to mediate the observed co-regulation. The novelty of our approach is the miRvestigator hidden Markov model (HMM) algorithm which systematically computes a similarity P-value for each unique miRNA seed sequence from the miRNA database miRBase to an overrepresented sequence motif identified within the 3'-UTR of the query genes. We have made this miRNA discovery tool accessible to the community by integrating our HMM algorithm with a proven algorithm for de novo discovery of miRNA seed sequences and wrapping these algorithms into a user-friendly interface. Additionally, the miRvestigator web server also produces a list of putative miRNA binding sites within 3'-UTRs of the query transcripts to facilitate the design of validation experiments. The miRvestigator is freely available at http://mirvestigator.systemsbiology.net.
Novel statistical tools for management of public databases facilitate community-wide replicability and control of false discovery.

PubMed

Rosset, Saharon; Aharoni, Ehud; Neuvirth, Hani

2014-07-01

Issues of publication bias, lack of replicability, and false discovery have long plagued the genetics community. Proper utilization of public and shared data resources presents an opportunity to ameliorate these problems. We present an approach to public database management that we term Quality Preserving Database (QPD). It enables perpetual use of the database for testing statistical hypotheses while controlling false discovery and avoiding publication bias on the one hand, and maintaining testing power on the other hand. We demonstrate it on a use case of a replication server for GWAS findings, underlining its practical utility. We argue that a shift to using QPD in managing current and future biological databases will significantly enhance the community's ability to make efficient and statistically sound use of the available data resources. © 2014 WILEY PERIODICALS, INC.
De-novo discovery of differentially abundant transcription factor binding sites including their positional preference.

PubMed

Keilwagen, Jens; Grau, Jan; Paponov, Ivan A; Posch, Stefan; Strickert, Marc; Grosse, Ivo

2011-02-10

Transcription factors are a main component of gene regulation as they activate or repress gene expression by binding to specific binding sites in promoters. The de-novo discovery of transcription factor binding sites in target regions obtained by wet-lab experiments is a challenging problem in computational biology, which has not been fully solved yet. Here, we present a de-novo motif discovery tool called Dispom for finding differentially abundant transcription factor binding sites that models existing positional preferences of binding sites and adjusts the length of the motif in the learning process. Evaluating Dispom, we find that its prediction performance is superior to existing tools for de-novo motif discovery for 18 benchmark data sets with planted binding sites, and for a metazoan compendium based on experimental data from micro-array, ChIP-chip, ChIP-DSL, and DamID as well as Gene Ontology data. Finally, we apply Dispom to find binding sites differentially abundant in promoters of auxin-responsive genes extracted from Arabidopsis thaliana microarray data, and we find a motif that can be interpreted as a refined auxin responsive element predominately positioned in the 250-bp region upstream of the transcription start site. Using an independent data set of auxin-responsive genes, we find in genome-wide predictions that the refined motif is more specific for auxin-responsive genes than the canonical auxin-responsive element. In general, Dispom can be used to find differentially abundant motifs in sequences of any origin. However, the positional distribution learned by Dispom is especially beneficial if all sequences are aligned to some anchor point like the transcription start site in case of promoter sequences. We demonstrate that the combination of searching for differentially abundant motifs and inferring a position distribution from the data is beneficial for de-novo motif discovery. Hence, we make the tool freely available as a component of the open
Gene editing tools: state-of-the-art and the road ahead for the model and non-model fishes.

PubMed

Barman, Hirak Kumar; Rasal, Kiran Dashrath; Chakrapani, Vemulawada; Ninawe, A S; Vengayil, Doyil T; Asrafuzzaman, Syed; Sundaray, Jitendra K; Jayasankar, Pallipuram

2017-10-01

Advancements in the DNA sequencing technologies and computational biology have revolutionized genome/transcriptome sequencing of non-model fishes at an affordable cost. This has led to a paradigm shift with regard to our heightened understandings of structure-functional relationships of genes at a global level, from model animals/fishes to non-model large animals/fishes. Whole genome/transcriptome sequencing technologies were supplemented with the series of discoveries in gene editing tools, which are being used to modify genes at pre-determined positions using programmable nucleases to explore their respective in vivo functions. For a long time, targeted gene disruption experiments were mostly restricted to embryonic stem cells, advances in gene editing technologies such as zinc finger nuclease, transcriptional activator-like effector nucleases and CRISPR (clustered regulatory interspaced short palindromic repeats)/CRISPR-associated nucleases have facilitated targeted genetic modifications beyond stem cells to a wide range of somatic cell lines across species from laboratory animals to farmed animals/fishes. In this review, we discuss use of different gene editing tools and the strategic implications in fish species for basic and applied biology research.
Natural products discovery from micro-organisms in the post-genome era.

PubMed

Ikeda, Haruo

2017-01-01

With the decision to award the Nobel Prize in Physiology or Medicine to Drs. S. Ōmura, W.C. Campbell, and Y. Tu, the importance and usefulness of natural drug discovery and development have been revalidated. Since the end of the twentieth century, many genome analyses of organisms have been conducted, and accordingly, numerous microbial genomes have been decoded. In particular, genomic studies of actinomycetes, micro-organisms that readily produce natural products, led to the discovery of biosynthetic gene clusters responsible for producing natural products. New explorations for natural products through a comprehensive approach combining genomic information with conventional methods show great promise for the discovery of new natural products and even systematic generation of unnaturally occurring compounds.
A Host Susceptibility Gene, DR1, Facilitates Influenza A Virus Replication by Suppressing Host Innate Immunity and Enhancing Viral RNA Replication

PubMed Central

Hsu, Shih-Feng; Su, Wen-Chi; Jeng, King-Song

2015-01-01

ABSTRACT Influenza A virus (IAV) depends on cellular factors to complete its replication cycle; thus, investigation of the factors utilized by IAV may facilitate antiviral drug development. To this end, a cellular transcriptional repressor, DR1, was identified from a genome-wide RNA interference (RNAi) screen. Knockdown (KD) of DR1 resulted in reductions of viral RNA and protein production, demonstrating that DR1 acts as a positive host factor in IAV replication. Genome-wide transcriptomic analysis showed that there was a strong induction of interferon-stimulated gene (ISG) expression after prolonged DR1 KD. We found that beta interferon (IFN-β) was induced by DR1 KD, thereby activating the JAK-STAT pathway to turn on ISG expression, which led to a strong inhibition of IAV replication. This result suggests that DR1 in normal cells suppresses IFN induction, probably to prevent undesired cytokine production, but that this suppression may create a milieu that favors IAV replication once cells are infected. Furthermore, biochemical assays of viral RNA replication showed that DR1 KD suppressed viral RNA replication. We also showed that DR1 associated with all three subunits of the viral RNA-dependent RNA polymerase (RdRp) complex, indicating that DR1 may interact with individual components of the viral RdRp complex to enhance viral RNA replication. Thus, DR1 may be considered a novel host susceptibility gene for IAV replication via a dual mechanism, not only suppressing the host defense to indirectly favor IAV replication but also directly facilitating viral RNA replication. IMPORTANCE Investigations of virus-host interactions involved in influenza A virus (IAV) replication are important for understanding viral pathogenesis and host defenses, which may manipulate influenza virus infection or prevent the emergence of drug resistance caused by a high error rate during viral RNA replication. For this purpose, a cellular transcriptional repressor, DR1, was identified from
A new set of ESTs and cDNA clones from full-length and normalized libraries for gene discovery and functional characterization in citrus

PubMed Central

Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A

2009-01-01

Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an
Knowledge-Based Topic Model for Unsupervised Object Discovery and Localization.

PubMed

Niu, Zhenxing; Hua, Gang; Wang, Le; Gao, Xinbo

Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object instances from a given image collection without any supervision. Previous work has attempted to tackle this problem with vanilla topic models, such as latent Dirichlet allocation (LDA). However, in those methods no prior knowledge for the given image collection is exploited to facilitate object discovery. On the other hand, the topic models used in those methods suffer from the topic coherence issue-some inferred topics do not have clear meaning, which limits the final performance of object discovery. In this paper, prior knowledge in terms of the so-called must-links are exploited from Web images on the Internet. Furthermore, a novel knowledge-based topic model, called LDA with mixture of Dirichlet trees, is proposed to incorporate the must-links into topic modeling for object discovery. In particular, to better deal with the polysemy phenomenon of visual words, the must-link is re-defined as that one must-link only constrains one or some topic(s) instead of all topics, which leads to significantly improved topic coherence. Moreover, the must-links are built and grouped with respect to specific object classes, thus the must-links in our approach are semantic-specific , which allows to more efficiently exploit discriminative prior knowledge from Web images. Extensive experiments validated the efficiency of our proposed approach on several data sets. It is shown that our method significantly improves topic coherence and outperforms the unsupervised methods for object discovery and localization. In addition, compared with discriminative methods, the naturally existing object classes in the given image collection can be subtly discovered, which makes our approach well suited for realistic applications of unsupervised object discovery.Unsupervised object discovery and localization is to discover some dominant object classes and localize all of object
A knowledgebase system to enhance scientific discovery: Telemakus

PubMed Central

Fuller, Sherrilynne S; Revere, Debra; Bugni, Paul F; Martin, George M

2004-01-01

Background With the rapid expansion of scientific research, the ability to effectively find or integrate new domain knowledge in the sciences is proving increasingly difficult. Efforts to improve and speed up scientific discovery are being explored on a number of fronts. However, much of this work is based on traditional search and retrieval approaches and the bibliographic citation presentation format remains unchanged. Methods Case study. Results The Telemakus KnowledgeBase System provides flexible new tools for creating knowledgebases to facilitate retrieval and review of scientific research reports. In formalizing the representation of the research methods and results of scientific reports, Telemakus offers a potential strategy to enhance the scientific discovery process. While other research has demonstrated that aggregating and analyzing research findings across domains augments knowledge discovery, the Telemakus system is unique in combining document surrogates with interactive concept maps of linked relationships across groups of research reports. Conclusion Based on how scientists conduct research and read the literature, the Telemakus KnowledgeBase System brings together three innovations in analyzing, displaying and summarizing research reports across a domain: (1) research report schema, a document surrogate of extracted research methods and findings presented in a consistent and structured schema format which mimics the research process itself and provides a high-level surrogate to facilitate searching and rapid review of retrieved documents; (2) research findings, used to index the documents, allowing searchers to request, for example, research studies which have studied the relationship between neoplasms and vitamin E; and (3) visual exploration interface of linked relationships for interactive querying of research findings across the knowledgebase and graphical displays of what is known as well as, through gaps in the map, what is yet to be tested
Novel Directions for Diabetes Mellitus Drug Discovery

PubMed Central

Maiese, Kenneth; Chong, Zhao Zhong; Shang, Yan Chen; Wang, Shaohui

2012-01-01

Introduction Diabetes mellitus impacts almost 200 million individuals worldwide and leads to debilitating complications. New avenues of drug discovery must target the underlying cellular processes of oxidative stress, apoptosis, autophagy, and inflammation that can mediate multi-system pathology during diabetes mellitus. Areas Covered We examine novel directions for drug discovery that involve the β-nicotinamide adenine dinucleotide (NAD+) precursor nicotinamide, the cytokine erythropoietin, the NAD+-dependent protein histone deacetylase SIRT1, the serine/threonine-protein kinase mammalian target of rapamycin (mTOR), and the wingless pathway. Implications for the targeting of these pathways that oversee gluconeogenic genes, insulin signaling and resistance, fatty acid beta-oxidation, inflammation, and cellular survival are presented. Expert Opinion Nicotinamide, erythropoietin, and the downstram pathways of SIRT1, mTOR, forkhead transcription factors, and wingless signaling offer exciting prospects for novel directions of drug discovery for the treatment of metabolic disorders. Future investigations must dissect the complex relationship and fine modulation of these pathways for the successful translation of robust reparative and regenerative strategies against diabetes mellitus and the complications of this disorder. PMID:23092114
Mining disease genes using integrated protein-protein interaction and gene-gene co-regulation information.

PubMed

Li, Jin; Wang, Limei; Guo, Maozu; Zhang, Ruijie; Dai, Qiguo; Liu, Xiaoyan; Wang, Chunyu; Teng, Zhixia; Xuan, Ping; Zhang, Mingming

2015-01-01

In humans, despite the rapid increase in disease-associated gene discovery, a large proportion of disease-associated genes are still unknown. Many network-based approaches have been used to prioritize disease genes. Many networks, such as the protein-protein interaction (PPI), KEGG, and gene co-expression networks, have been used. Expression quantitative trait loci (eQTLs) have been successfully applied for the determination of genes associated with several diseases. In this study, we constructed an eQTL-based gene-gene co-regulation network (GGCRN) and used it to mine for disease genes. We adopted the random walk with restart (RWR) algorithm to mine for genes associated with Alzheimer disease. Compared to the Human Protein Reference Database (HPRD) PPI network alone, the integrated HPRD PPI and GGCRN networks provided faster convergence and revealed new disease-related genes. Therefore, using the RWR algorithm for integrated PPI and GGCRN is an effective method for disease-associated gene mining.
User needs analysis and usability assessment of DataMed - a biomedical data discovery index.

PubMed

Dixit, Ram; Rogith, Deevakar; Narayana, Vidya; Salimi, Mandana; Gururaj, Anupama; Ohno-Machado, Lucila; Xu, Hua; Johnson, Todd R

2017-11-30

To present user needs and usability evaluations of DataMed, a Data Discovery Index (DDI) that allows searching for biomedical data from multiple sources. We conducted 2 phases of user studies. Phase 1 was a user needs analysis conducted before the development of DataMed, consisting of interviews with researchers. Phase 2 involved iterative usability evaluations of DataMed prototypes. We analyzed data qualitatively to document researchers' information and user interface needs. Biomedical researchers' information needs in data discovery are complex, multidimensional, and shaped by their context, domain knowledge, and technical experience. User needs analyses validate the need for a DDI, while usability evaluations of DataMed show that even though aggregating metadata into a common search engine and applying traditional information retrieval tools are promising first steps, there remain challenges for DataMed due to incomplete metadata and the complexity of data discovery. Biomedical data poses distinct problems for search when compared to websites or publications. Making data available is not enough to facilitate biomedical data discovery: new retrieval techniques and user interfaces are necessary for dataset exploration. Consistent, complete, and high-quality metadata are vital to enable this process. While available data and researchers' information needs are complex and heterogeneous, a successful DDI must meet those needs and fit into the processes of biomedical researchers. Research directions include formalizing researchers' information needs, standardizing overviews of data to facilitate relevance judgments, implementing user interfaces for concept-based searching, and developing evaluation methods for open-ended discovery systems such as DDIs. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
A high-density transcript linkage map with 1,845 expressed genes positioned by microarray-based Single Feature Polymorphisms (SFP) in Eucalyptus

PubMed Central

2011-01-01

Background Technological advances are progressively increasing the application of genomics to a wider array of economically and ecologically important species. High-density maps enriched for transcribed genes facilitate the discovery of connections between genes and phenotypes. We report the construction of a high-density linkage map of expressed genes for the heterozygous genome of Eucalyptus using Single Feature Polymorphism (SFP) markers. Results SFP discovery and mapping was achieved using pseudo-testcross screening and selective mapping to simultaneously optimize linkage mapping and microarray costs. SFP genotyping was carried out by hybridizing complementary RNA prepared from 4.5 year-old trees xylem to an SFP array containing 103,000 25-mer oligonucleotide probes representing 20,726 unigenes derived from a modest size expressed sequence tags collection. An SFP-mapping microarray with 43,777 selected candidate SFP probes representing 15,698 genes was subsequently designed and used to genotype SFPs in a larger subset of the segregating population drawn by selective mapping. A total of 1,845 genes were mapped, with 884 of them ordered with high likelihood support on a framework map anchored to 180 microsatellites with average density of 1.2 cM. Using more probes per unigene increased by two-fold the likelihood of detecting segregating SFPs eventually resulting in more genes mapped. In silico validation showed that 87% of the SFPs map to the expected location on the 4.5X draft sequence of the Eucalyptus grandis genome. Conclusions The Eucalyptus 1,845 gene map is the most highly enriched map for transcriptional information for any forest tree species to date. It represents a major improvement on the number of genes previously positioned on Eucalyptus maps and provides an initial glimpse at the gene space for this global tree genome. A general protocol is proposed to build high-density transcript linkage maps in less characterized plant species by SFP genotyping
Streptomyces species: Ideal chassis for natural product discovery and overproduction.

PubMed

Liu, Ran; Deng, Zixin; Liu, Tiangang

2018-05-28

There is considerable interest in mining organisms for new natural products (NPs) and in improving methods to overproduce valuable NPs. Because of the rapid development of tools and strategies for metabolic engineering and the markedly increased knowledge of the biosynthetic pathways and genetics of NP-producing organisms, genome mining and overproduction of NPs can be dramatically accelerated. In particular, Streptomyces species have been proposed as suitable chassis organisms for NP discovery and overproduction because of their many unique characteristics not shared with yeast, Escherichia coli, or other microorganisms. In this review, we summarize the methods for genome sequencing, gene cluster prediction, and gene editing in Streptomyces, as well as metabolic engineering strategies for NP overproduction and approaches for generating new products. Finally, two strategies for utilizing Streptomyces as the chassis for NP discovery and overproduction are emphasized. Copyright © 2018 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.
The Genetic Basis of Mendelian Phenotypes: Discoveries, Challenges, and Opportunities

PubMed Central

Chong, Jessica X.; Buckingham, Kati J.; Jhangiani, Shalini N.; Boehm, Corinne; Sobreira, Nara; Smith, Joshua D.; Harrell, Tanya M.; McMillin, Margaret J.; Wiszniewski, Wojciech; Gambin, Tomasz; Coban Akdemir, Zeynep H.; Doheny, Kimberly; Scott, Alan F.; Avramopoulos, Dimitri; Chakravarti, Aravinda; Hoover-Fong, Julie; Mathews, Debra; Witmer, P. Dane; Ling, Hua; Hetrick, Kurt; Watkins, Lee; Patterson, Karynne E.; Reinier, Frederic; Blue, Elizabeth; Muzny, Donna; Kircher, Martin; Bilguvar, Kaya; López-Giráldez, Francesc; Sutton, V. Reid; Tabor, Holly K.; Leal, Suzanne M.; Gunel, Murat; Mane, Shrikant; Gibbs, Richard A.; Boerwinkle, Eric; Hamosh, Ada; Shendure, Jay; Lupski, James R.; Lifton, Richard P.; Valle, David; Nickerson, Deborah A.; Bamshad, Michael J.

2015-01-01

Discovering the genetic basis of a Mendelian phenotype establishes a causal link between genotype and phenotype, making possible carrier and population screening and direct diagnosis. Such discoveries also contribute to our knowledge of gene function, gene regulation, development, and biological mechanisms that can be used for developing new therapeutics. As of February 2015, 2,937 genes underlying 4,163 Mendelian phenotypes have been discovered, but the genes underlying ∼50% (i.e., 3,152) of all known Mendelian phenotypes are still unknown, and many more Mendelian conditions have yet to be recognized. This is a formidable gap in biomedical knowledge. Accordingly, in December 2011, the NIH established the Centers for Mendelian Genomics (CMGs) to provide the collaborative framework and infrastructure necessary for undertaking large-scale whole-exome sequencing and discovery of the genetic variants responsible for Mendelian phenotypes. In partnership with 529 investigators from 261 institutions in 36 countries, the CMGs assessed 18,863 samples from 8,838 families representing 579 known and 470 novel Mendelian phenotypes as of January 2015. This collaborative effort has identified 956 genes, including 375 not previously associated with human health, that underlie a Mendelian phenotype. These results provide insight into study design and analytical strategies, identify novel mechanisms of disease, and reveal the extensive clinical variability of Mendelian phenotypes. Discovering the gene underlying every Mendelian phenotype will require tackling challenges such as worldwide ascertainment and phenotypic characterization of families affected by Mendelian conditions, improvement in sequencing and analytical techniques, and pervasive sharing of phenotypic and genomic data among researchers, clinicians, and families. PMID:26166479

Gene Discovery in Bladder Cancer Progression using cDNA Microarrays

PubMed Central

Sanchez-Carbayo, Marta; Socci, Nicholas D.; Lozano, Juan Jose; Li, Wentian; Charytonowicz, Elizabeth; Belbin, Thomas J.; Prystowsky, Michael B.; Ortiz, Angel R.; Childs, Geoffrey; Cordon-Cardo, Carlos

2003-01-01

To identify gene expression changes along progression of bladder cancer, we compared the expression profiles of early-stage and advanced bladder tumors using cDNA microarrays containing 17,842 known genes and expressed sequence tags. The application of bootstrapping techniques to hierarchical clustering segregated early-stage and invasive transitional carcinomas into two main clusters. Multidimensional analysis confirmed these clusters and more importantly, it separated carcinoma in situ from papillary superficial lesions and subgroups within early-stage and invasive tumors displaying different overall survival. Additionally, it recognized early-stage tumors showing gene profiles similar to invasive disease. Different techniques including standard t-test, single-gene logistic regression, and support vector machine algorithms were applied to identify relevant genes involved in bladder cancer progression. Cytokeratin 20, neuropilin-2, p21, and p33ING1 were selected among the top ranked molecular targets differentially expressed and validated by immunohistochemistry using tissue microarrays (n = 173). Their expression patterns were significantly associated with pathological stage, tumor grade, and altered retinoblastoma (RB) expression. Moreover, p33ING1 expression levels were significantly associated with overall survival. Analysis of the annotation of the most significant genes revealed the relevance of critical genes and pathways during bladder cancer progression, including the overexpression of oncogenic genes such as DEK in superficial tumors or immune response genes such as Cd86 antigen in invasive disease. Gene profiling successfully classified bladder tumors based on their progression and clinical outcome. The present study has identified molecular biomarkers of potential clinical significance and critical molecular targets associated with bladder cancer progression. PMID:12875971
Cultivation of Hard-To-Culture Subsurface Mercury-Resistant Bacteria and Discovery of New merA Gene Sequences▿

PubMed Central

Rasmussen, L. D.; Zawadsky, C.; Binnerup, S. J.; Øregaard, G.; Sørensen, S. J.; Kroer, N.

2008-01-01

Mercury-resistant bacteria may be important players in mercury biogeochemistry. To assess the potential for mercury reduction by two subsurface microbial communities, resistant subpopulations and their merA genes were characterized by a combined molecular and cultivation-dependent approach. The cultivation method simulated natural conditions by using polycarbonate membranes as a growth support and a nonsterile soil slurry as a culture medium. Resistant bacteria were pregrown to microcolony-forming units (mCFU) before being plated on standard medium. Compared to direct plating, culturability was increased up to 2,800 times and numbers of mCFU were similar to the total number of mercury-resistant bacteria in the soils. Denaturing gradient gel electrophoresis analysis of DNA extracted from membranes suggested stimulation of growth of hard-to-culture bacteria during the preincubation. A total of 25 different 16S rRNA gene sequences were observed, including Alpha-, Beta-, and Gammaproteobacteria; Actinobacteria; Firmicutes; and Bacteroidetes. The diversity of isolates obtained by direct plating included eight different 16S rRNA gene sequences (Alpha- and Betaproteobacteria and Actinobacteria). Partial sequencing of merA of selected isolates led to the discovery of new merA sequences. With phylum-specific merA primers, PCR products were obtained for Alpha- and Betaproteobacteria and Actinobacteria but not for Bacteroidetes and Firmicutes. The similarity to known sequences ranged between 89 and 95%. One of the sequences did not result in a match in the BLAST search. The results illustrate the power of integrating advanced cultivation methodology with molecular techniques for the characterization of the diversity of mercury-resistant populations and assessing the potential for mercury reduction in contaminated environments. PMID:18441111
GENOME-ENABLED DISCOVERY OF CARBON SEQUESTRATION GENES IN POPLAR

DOE Office of Scientific and Technical Information (OSTI.GOV)

DAVIS J M

2007-10-11

Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, themore » ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.« less
Predictive Power Estimation Algorithm (PPEA) - A New Algorithm to Reduce Overfitting for Genomic Biomarker Discovery

PubMed Central

Liu, Jiangang; Jolly, Robert A.; Smith, Aaron T.; Searfoss, George H.; Goldstein, Keith M.; Uversky, Vladimir N.; Dunker, Keith; Li, Shuyu; Thomas, Craig E.; Wei, Tao

2011-01-01

Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses. PMID:21935387
Collaborative Workspaces to Accelerate Discovery

NASA Astrophysics Data System (ADS)

Meade, Bernard; Fluke, Christopher; Cooke, Jeff; Andreoni, Igor; Pritchard, Tyler; Curtin, Christopher; Bernard, Stephanie R.; Asher, Albany; Mack, Katherine J.; Murphy, Michael T.; Vohl, Dany; Codoreanu, Alex; Kotuš, Srđan M.; Rumokoy, Fanuel; Horst, Chuck; Reynolds, Tristan

2017-05-01

By applying a display ecology to the Deeper, Wider, Faster proactive, simultaneous telescope observing campaign, we have shown a dramatic reduction in the time taken to inspect DECam CCD images for potential transient candidates and to produce time-critical triggers to standby telescopes. We also show how facilitating rapid corroboration of potential candidates and the exclusion of non-candidates improves the accuracy of detection; and establish that a practical and enjoyable workspace can improve the experience of an otherwise taxing task for astronomers. We provide a critical road test of two advanced displays in a research context-a rare opportunity to demonstrate how they can be used rather than simply discuss how they might be used to accelerate discovery.
Next-Generation Sequencing Approaches in Genome-Wide Discovery of Single Nucleotide Polymorphism Markers Associated with Pungency and Disease Resistance in Pepper.

PubMed

Manivannan, Abinaya; Kim, Jin-Hee; Yang, Eun-Young; Ahn, Yul-Kyun; Lee, Eun-Su; Choi, Sena; Kim, Do-Sun

2018-01-01

Pepper is an economically important horticultural plant that has been widely used for its pungency and spicy taste in worldwide cuisines. Therefore, the domestication of pepper has been carried out since antiquity. Owing to meet the growing demand for pepper with high quality, organoleptic property, nutraceutical contents, and disease tolerance, genomics assisted breeding techniques can be incorporated to develop novel pepper varieties with desired traits. The application of next-generation sequencing (NGS) approaches has reformed the plant breeding technology especially in the area of molecular marker assisted breeding. The availability of genomic information aids in the deeper understanding of several molecular mechanisms behind the vital physiological processes. In addition, the NGS methods facilitate the genome-wide discovery of DNA based markers linked to key genes involved in important biological phenomenon. Among the molecular markers, single nucleotide polymorphism (SNP) indulges various benefits in comparison with other existing DNA based markers. The present review concentrates on the impact of NGS approaches in the discovery of useful SNP markers associated with pungency and disease resistance in pepper. The information provided in the current endeavor can be utilized for the betterment of pepper breeding in future.
Discovery of Herpes B Virus-Encoded MicroRNAs▿

PubMed Central

Besecker, Michael I.; Harden, Mallory E.; Li, Guanglin; Wang, Xiu-Jie; Griffiths, Anthony

2009-01-01

Herpes B virus (BV) naturally infects macaque monkeys and is a close relative of herpes simplex virus. BV can zoonotically infect humans to cause a rapidly ascending encephalitis with ∼80% mortality. Therefore, BV is a serious danger to those who come into contact with these monkeys or their tissues and cells. MicroRNAs are regulators of gene expression, and there have been reports of virus-encoded microRNAs. We hypothesize that BV-encoded microRNAs are important for the regulation of viral and cellular genes. Herein, we report the discovery of three herpes B virus-encoded microRNAs. PMID:19144716
Gene discovery in EST sequences from the wheat leaf rust fungus Puccinia triticina sexual spores, asexual spores and haustoria, compared to other rust and corn smut fungi

PubMed Central

2011-01-01

Background Rust fungi are biotrophic basidiomycete plant pathogens that cause major diseases on plants and trees world-wide, affecting agriculture and forestry. Their biotrophic nature precludes many established molecular genetic manipulations and lines of research. The generation of genomic resources for these microbes is leading to novel insights into biology such as interactions with the hosts and guiding directions for breakthrough research in plant pathology. Results To support gene discovery and gene model verification in the genome of the wheat leaf rust fungus, Puccinia triticina (Pt), we have generated Expressed Sequence Tags (ESTs) by sampling several life cycle stages. We focused on several spore stages and isolated haustorial structures from infected wheat, generating 17,684 ESTs. We produced sequences from both the sexual (pycniospores, aeciospores and teliospores) and asexual (germinated urediniospores) stages of the life cycle. From pycniospores and aeciospores, produced by infecting the alternate host, meadow rue (Thalictrum speciosissimum), 4,869 and 1,292 reads were generated, respectively. We generated 3,703 ESTs from teliospores produced on the senescent primary wheat host. Finally, we generated 6,817 reads from haustoria isolated from infected wheat as well as 1,003 sequences from germinated urediniospores. Along with 25,558 previously generated ESTs, we compiled a database of 13,328 non-redundant sequences (4,506 singlets and 8,822 contigs). Fungal genes were predicted using the EST version of the self-training GeneMarkS algorithm. To refine the EST database, we compared EST sequences by BLASTN to a set of 454 pyrosequencing-generated contigs and Sanger BAC-end sequences derived both from the Pt genome, and to ESTs and genome reads from wheat. A collection of 6,308 fungal genes was identified and compared to sequences of the cereal rusts, Puccinia graminis f. sp. tritici (Pgt) and stripe rust, P. striiformis f. sp. tritici (Pst), and poplar
De Novo Assembly, Gene Annotation, and Marker Discovery in Stored-Product Pest Liposcelis entomophila (Enderlein) Using Transcriptome Sequences

PubMed Central

Wei, Dan-Dan; Chen, Er-Hu; Ding, Tian-Bo; Chen, Shi-Chun; Dou, Wei; Wang, Jin-Jun

2013-01-01

Background As a major stored-product pest insect, Liposcelis entomophila has developed high levels of resistance to various insecticides in grain storage systems. However, the molecular mechanisms underlying resistance and environmental stress have not been characterized. To date, there is a lack of genomic information for this species. Therefore, studies aimed at profiling the L. entomophila transcriptome would provide a better understanding of the biological functions at the molecular levels. Methodology/Principal Findings We applied Illumina sequencing technology to sequence the transcriptome of L. entomophila. A total of 54,406,328 clean reads were obtained and that de novo assembled into 54,220 unigenes, with an average length of 571 bp. Through a similarity search, 33,404 (61.61%) unigenes were matched to known proteins in the NCBI non-redundant (Nr) protein database. These unigenes were further functionally annotated with gene ontology (GO), cluster of orthologous groups of proteins (COG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases. A large number of genes potentially involved in insecticide resistance were manually curated, including 68 putative cytochrome P450 genes, 37 putative glutathione S-transferase (GST) genes, 19 putative carboxyl/cholinesterase (CCE) genes, and other 126 transcripts to contain target site sequences or encoding detoxification genes representing eight types of resistance enzymes. Furthermore, to gain insight into the molecular basis of the L. entomophila toward thermal stresses, 25 heat shock protein (Hsp) genes were identified. In addition, 1,100 SSRs and 57,757 SNPs were detected and 231 pairs of SSR primes were designed for investigating the genetic diversity in future. Conclusions/Significance We developed a comprehensive transcriptomic database for L. entomophila. These sequences and putative molecular markers would further promote our understanding of the molecular mechanisms underlying insecticide resistance
The Next Step: 25 Discoveries That Could Change Our Lives.

ERIC Educational Resources Information Center

Science85, 1985

1985-01-01

Describes (in separate articles) 25 developments in science, technology, and medicine that have potential impact on the near future. They include discoveries related to space butterflies, drugs, twenty-first century software, experimental mathematics, brain drugs, egg development, ultrasmall microchips, the biology of birth, cancer-causing genes,…
Genome-Wide Identification and Expression Analysis of the Cation Diffusion Facilitator Gene Family in Turnip Under Diverse Metal Ion Stresses.

PubMed

Li, Xiong; Wu, Yuansheng; Li, Boqun; He, Wenqi; Yang, Yonghong; Yang, Yongping

2018-01-01

The cation diffusion facilitator (CDF) family is one of the gene families involved in metal ion uptake and transport in plants, but the understanding of the definite roles and mechanisms of most CDF genes remain limited. In the present study, we identified 18 candidate CDF genes from the turnip genome and named them BrrMTP1.1 - BrrMTP12 . Then, we performed a comparative genomic analysis on the phylogenetic relationships, gene structures and chromosome distributions, conserved domains, and motifs of turnip CDFs. The constructed phylogenetic tree indicated that the BrrMTPs were divided into seven groups (groups 1, 5, 6, 7, 8, 9, and 12) and formed three major clusters (Zn-CDFs, Fe/Zn-CDFs, and Mn-CDFs). Moreover, the structural characteristics of the BrrMTP members in the same group were similar but varied among groups. To investigate the potential roles of BrrMTPs in turnip, we conducted an expression analysis on all BrrMTP genes under Mg, Zn, Cu, Mn, Fe, Co, Na, and Cd stresses. Results showed that the expression levels of all BrrMTP members were induced by at least one metal ion, indicating that these genes may be related to the tolerance or transport of those metal ions. Based on the roles of different metal ions for plants, we hypothesized that BrrMTP genes are possibly involved in heavy metal accumulation and tolerance to salt stress apart from their roles in the maintenance of mineral nutrient homeostasis in turnip. These findings are helpful to understand the roles of MTPs in plants and provide preliminary information for the study of the functions of BrrMTP genes.
Model-driven discovery of underground metabolic functions in Escherichia coli.

PubMed

Guzmán, Gabriela I; Utrilla, José; Nurk, Sergey; Brunk, Elizabeth; Monk, Jonathan M; Ebrahim, Ali; Palsson, Bernhard O; Feist, Adam M

2015-01-20

Enzyme promiscuity toward substrates has been discussed in evolutionary terms as providing the flexibility to adapt to novel environments. In the present work, we describe an approach toward exploring such enzyme promiscuity in the space of a metabolic network. This approach leverages genome-scale models, which have been widely used for predicting growth phenotypes in various environments or following a genetic perturbation; however, these predictions occasionally fail. Failed predictions of gene essentiality offer an opportunity for targeting biological discovery, suggesting the presence of unknown underground pathways stemming from enzymatic cross-reactivity. We demonstrate a workflow that couples constraint-based modeling and bioinformatic tools with KO strain analysis and adaptive laboratory evolution for the purpose of predicting promiscuity at the genome scale. Three cases of genes that are incorrectly predicted as essential in Escherichia coli--aspC, argD, and gltA--are examined, and isozyme functions are uncovered for each to a different extent. Seven isozyme functions based on genetic and transcriptional evidence are suggested between the genes aspC and tyrB, argD and astC, gabT and puuE, and gltA and prpC. This study demonstrates how a targeted model-driven approach to discovery can systematically fill knowledge gaps, characterize underground metabolism, and elucidate regulatory mechanisms of adaptation in response to gene KO perturbations.
Potential biological targets for bioassay development in drug discovery of Sturge-Weber syndrome.

PubMed

Mohammadipanah, Fatemeh; Salimi, Fatemeh

2018-02-01

Sturge-Weber Syndrome (SWS) is a neurocutaneous disease with clinical manifestations including ocular (glaucoma), cutaneous (port-wine birthmark), neurologic (seizures), and vascular problems. Molecular mechanisms of SWS pathogenesis are initiated by the somatic mutation in GNAQ. Therefore, no definite treatments exist for SWS and treatment options only mitigate the intensity of its clinical manifestations. Biological assay design for drug discovery against this syndrome demands comprehensive knowledge on mechanisms which are involved in its pathogenesis. By analysis of the interrelated molecular targets of SWS, some in vitro bioassay systems can be allotted for drug screening against its progression. Development of such platforms of bioassay can bring along the implementation of high-throughput screening of natural or synthetic compounds in drug discovery programs. Regarding the fact that study of molecular targets and their integration in biological assay design can facilitate the process of effective drug discovery; some potential biological targets and their respective biological assay for SWS drug discovery are propounded in this review. For this purpose, some biological targets for SWS drug discovery such as acetylcholinesterase, alkaline phosphatase, GABAergic receptors, Hypoxia-Inducible Factor (HIF)-1α and 2α are suggested. © 2017 John Wiley & Sons A/S.
De Novo Regulatory Motif Discovery Identifies Significant Motifs in Promoters of Five Classes of Plant Dehydrin Genes.

PubMed

Zolotarov, Yevgen; Strömvik, Martina

2015-01-01

Plants accumulate dehydrins in response to osmotic stresses. Dehydrins are divided into five different classes, which are thought to be regulated in different manners. To better understand differences in transcriptional regulation of the five dehydrin classes, de novo motif discovery was performed on 350 dehydrin promoter sequences from a total of 51 plant genomes. Overrepresented motifs were identified in the promoters of five dehydrin classes. The Kn dehydrin promoters contain motifs linked with meristem specific expression, as well as motifs linked with cold/dehydration and abscisic acid response. KS dehydrin promoters contain a motif with a GATA core. SKn and YnSKn dehydrin promoters contain motifs that match elements connected with cold/dehydration, abscisic acid and light response. YnKn dehydrin promoters contain motifs that match abscisic acid and light response elements, but not cold/dehydration response elements. Conserved promoter motifs are present in the dehydrin classes and across different plant lineages, indicating that dehydrin gene regulation is likely also conserved.
High-throughput discovery of novel developmental phenotypes.

PubMed

Dickinson, Mary E; Flenniken, Ann M; Ji, Xiao; Teboul, Lydia; Wong, Michael D; White, Jacqueline K; Meehan, Terrence F; Weninger, Wolfgang J; Westerberg, Henrik; Adissu, Hibret; Baker, Candice N; Bower, Lynette; Brown, James M; Caddle, L Brianna; Chiani, Francesco; Clary, Dave; Cleak, James; Daly, Mark J; Denegre, James M; Doe, Brendan; Dolan, Mary E; Edie, Sarah M; Fuchs, Helmut; Gailus-Durner, Valerie; Galli, Antonella; Gambadoro, Alessia; Gallegos, Juan; Guo, Shiying; Horner, Neil R; Hsu, Chih-Wei; Johnson, Sara J; Kalaga, Sowmya; Keith, Lance C; Lanoue, Louise; Lawson, Thomas N; Lek, Monkol; Mark, Manuel; Marschall, Susan; Mason, Jeremy; McElwee, Melissa L; Newbigging, Susan; Nutter, Lauryl M J; Peterson, Kevin A; Ramirez-Solis, Ramiro; Rowland, Douglas J; Ryder, Edward; Samocha, Kaitlin E; Seavitt, John R; Selloum, Mohammed; Szoke-Kovacs, Zsombor; Tamura, Masaru; Trainor, Amanda G; Tudose, Ilinca; Wakana, Shigeharu; Warren, Jonathan; Wendling, Olivia; West, David B; Wong, Leeyean; Yoshiki, Atsushi; MacArthur, Daniel G; Tocchini-Valentini, Glauco P; Gao, Xiang; Flicek, Paul; Bradley, Allan; Skarnes, William C; Justice, Monica J; Parkinson, Helen E; Moore, Mark; Wells, Sara; Braun, Robert E; Svenson, Karen L; de Angelis, Martin Hrabe; Herault, Yann; Mohun, Tim; Mallon, Ann-Marie; Henkelman, R Mark; Brown, Steve D M; Adams, David J; Lloyd, K C Kent; McKerlie, Colin; Beaudet, Arthur L; Bućan, Maja; Murray, Stephen A

2016-09-22

Approximately one-third of all mammalian genes are essential for life. Phenotypes resulting from knockouts of these genes in mice have provided tremendous insight into gene function and congenital disorders. As part of the International Mouse Phenotyping Consortium effort to generate and phenotypically characterize 5,000 knockout mouse lines, here we identify 410 lethal genes during the production of the first 1,751 unique gene knockouts. Using a standardized phenotyping platform that incorporates high-resolution 3D imaging, we identify phenotypes at multiple time points for previously uncharacterized genes and additional phenotypes for genes with previously reported mutant phenotypes. Unexpectedly, our analysis reveals that incomplete penetrance and variable expressivity are common even on a defined genetic background. In addition, we show that human disease genes are enriched for essential genes, thus providing a dataset that facilitates the prioritization and validation of mutations identified in clinical sequencing efforts.
High-throughput discovery of novel developmental phenotypes

PubMed Central

Dickinson, Mary E.; Flenniken, Ann M.; Ji, Xiao; Teboul, Lydia; Wong, Michael D.; White, Jacqueline K.; Meehan, Terrence F.; Weninger, Wolfgang J.; Westerberg, Henrik; Adissu, Hibret; Baker, Candice N.; Bower, Lynette; Brown, James M.; Caddle, L. Brianna; Chiani, Francesco; Clary, Dave; Cleak, James; Daly, Mark J.; Denegre, James M.; Doe, Brendan; Dolan, Mary E.; Edie, Sarah M.; Fuchs, Helmut; Gailus-Durner, Valerie; Galli, Antonella; Gambadoro, Alessia; Gallegos, Juan; Guo, Shiying; Horner, Neil R.; Hsu, Chih-wei; Johnson, Sara J.; Kalaga, Sowmya; Keith, Lance C.; Lanoue, Louise; Lawson, Thomas N.; Lek, Monkol; Mark, Manuel; Marschall, Susan; Mason, Jeremy; McElwee, Melissa L.; Newbigging, Susan; Nutter, Lauryl M.J.; Peterson, Kevin A.; Ramirez-Solis, Ramiro; Rowland, Douglas J.; Ryder, Edward; Samocha, Kaitlin E.; Seavitt, John R.; Selloum, Mohammed; Szoke-Kovacs, Zsombor; Tamura, Masaru; Trainor, Amanda G; Tudose, Ilinca; Wakana, Shigeharu; Warren, Jonathan; Wendling, Olivia; West, David B.; Wong, Leeyean; Yoshiki, Atsushi; MacArthur, Daniel G.; Tocchini-Valentini, Glauco P.; Gao, Xiang; Flicek, Paul; Bradley, Allan; Skarnes, William C.; Justice, Monica J.; Parkinson, Helen E.; Moore, Mark; Wells, Sara; Braun, Robert E.; Svenson, Karen L.; de Angelis, Martin Hrabe; Herault, Yann; Mohun, Tim; Mallon, Ann-Marie; Henkelman, R. Mark; Brown, Steve D.M.; Adams, David J.; Lloyd, K.C. Kent; McKerlie, Colin; Beaudet, Arthur L.; Bucan, Maja; Murray, Stephen A.

2016-01-01

Approximately one third of all mammalian genes are essential for life. Phenotypes resulting from mouse knockouts of these genes have provided tremendous insight into gene function and congenital disorders. As part of the International Mouse Phenotyping Consortium effort to generate and phenotypically characterize 5000 knockout mouse lines, we have identified 410 lethal genes during the production of the first 1751 unique gene knockouts. Using a standardised phenotyping platform that incorporates high-resolution 3D imaging, we identified novel phenotypes at multiple time points for previously uncharacterized genes and additional phenotypes for genes with previously reported mutant phenotypes. Unexpectedly, our analysis reveals that incomplete penetrance and variable expressivity are common even on a defined genetic background. In addition, we show that human disease genes are enriched for essential genes identified in our screen, thus providing a novel dataset that facilitates prioritization and validation of mutations identified in clinical sequencing efforts. PMID:27626380
Can biochemistry drive drug discovery beyond simple potency measurements?

PubMed

Chène, Patrick

2012-04-01

Among the fields of expertise required to develop drugs successfully, biochemistry holds a key position in drug discovery at the interface between chemistry, structural biology and cell biology. However, taking the example of protein kinases, it appears that biochemical assays are mostly used in the pharmaceutical industry to measure compound potency and/or selectivity. This limited use of biochemistry is surprising, given that detailed biochemical analyses are commonly used in academia to unravel molecular recognition processes. In this article, I show that biochemistry can provide invaluable information on the dynamics and energetics of compound-target interactions that cannot be obtained on the basis of potency measurements and structural data. Therefore, an extensive use of biochemistry in drug discovery could facilitate the identification and/or development of new drugs. Copyright © 2012 Elsevier Ltd. All rights reserved.
Genetic and Epigenetic Discoveries in Human Retinoblastoma.

PubMed

McEvoy, Justina D; Dyer, Michael A

2015-01-01

Retinoblastoma is a rare pediatric cancer of the retina. Nearly all retinoblastomas are initiated through the biallelic inactivation of the retinoblastoma tumor susceptibility gene (RB1). Whole-genome sequencing has made it possible to identify secondary genetic lesions following RB1 inactivation. One of the major discoveries from retinoblastoma sequencing studies is that some retinoblastoma tumors have stable genomes. Subsequent epigenetic studies showed that changes in the epigenome contribute to the rapid progression of retinoblastoma following RB1 gene inactivation. In addition, gene amplification and elevated expression of p53 antagonists, MDM2 and MDM4, may also play an important role in retinoblastoma tumorigenesis. The knowledge gained from these recent molecular, cellular, genomic, and epigenomic analyses are now being integrated to identify new therapeutic approaches that can help save lives and vision in children with retinoblastoma, with fewer long-term side effects.
The web server of IBM's Bioinformatics and Pattern Discovery group.

PubMed

Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel; Shibuya, Tetsuo

2003-07-01

We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.
The web server of IBM's Bioinformatics and Pattern Discovery group

PubMed Central

Huynh, Tien; Rigoutsos, Isidore; Parida, Laxmi; Platt, Daniel; Shibuya, Tetsuo

2003-01-01

We herein present and discuss the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server is operational around the clock and provides access to a variety of methods that have been published by the group's members and collaborators. The available tools correspond to applications ranging from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences and the interactive annotation of amino acid sequences. Additionally, annotations for more than 70 archaeal, bacterial, eukaryotic and viral genomes are available on-line and can be searched interactively. The tools and code bundles can be accessed beginning at http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/. PMID:12824385

What Neural Substrates Trigger the Adept Scientific Pattern Discovery by Biologists?

NASA Astrophysics Data System (ADS)

Lee, Jun-Ki; Kwon, Yong-Ju

2011-04-01

This study investigated the neural correlates of experts and novices during biological object pattern detection using an fMRI approach in order to reveal the neural correlates of a biologist's superior pattern discovery ability. Sixteen healthy male participants (8 biologists and 8 non-biologists) volunteered for the study. Participants were shown fifteen series of organism pictures and asked to detect patterns amid stimulus pictures. Primary findings showed significant activations in the right middle temporal gyrus and inferior parietal lobule amongst participants in the biologist (expert) group. Interestingly, the left superior temporal gyrus was activated in participants from the non-biologist (novice) group. These results suggested that superior pattern discovery ability could be related to a functional facilitation of the parieto-temporal network, which is particularly driven by the right middle temporal gyrus and inferior parietal lobule in addition to the recruitment of additional brain regions. Furthermore, the functional facilitation of the network might actually pertain to high coherent processing skills and visual working memory capacity. Hence, study results suggested that adept scientific thinking ability can be detected by neuronal substrates, which may be used as criteria for developing and evaluating a brain-based science curriculum and test instrument.
Epithelial-Mesenchymal Transition (EMT) Gene Variants and Epithelial Ovarian Cancer (EOC) Risk.

PubMed

Amankwah, Ernest K; Lin, Hui-Yi; Tyrer, Jonathan P; Lawrenson, Kate; Dennis, Joe; Chornokur, Ganna; Aben, Katja K H; Anton-Culver, Hoda; Antonenkova, Natalia; Bruinsma, Fiona; Bandera, Elisa V; Bean, Yukie T; Beckmann, Matthias W; Bisogna, Maria; Bjorge, Line; Bogdanova, Natalia; Brinton, Louise A; Brooks-Wilson, Angela; Bunker, Clareann H; Butzow, Ralf; Campbell, Ian G; Carty, Karen; Chen, Zhihua; Chen, Y Ann; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; du Bois, Andreas; Despierre, Evelyn; Dicks, Ed; Doherty, Jennifer A; Dörk, Thilo; Dürst, Matthias; Easton, Douglas F; Eccles, Diana M; Edwards, Robert P; Ekici, Arif B; Fasching, Peter A; Fridley, Brooke L; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G; Glasspool, Rosalind; Goodman, Marc T; Gronwald, Jacek; Harrington, Patricia; Harter, Philipp; Hasmad, Hanis N; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A T; Hillemanns, Peter; Hogdall, Claus K; Hogdall, Estrid; Hosono, Satoyo; Iversen, Edwin S; Jakubowska, Anna; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y; Jim, Heather; Kellar, Melissa; Kiemeney, Lambertus A; Krakstad, Camilla; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D; Lee, Alice W; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A; Liang, Dong; Lim, Boon Kiong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon F A G; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R; McNeish, Ian; Menon, Usha; Milne, Roger L; Modugno, Francesmary; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Eilber, Ursula; Odunsi, Kunle; Olson, Sara H; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Pelttari, Liisa M; Permuth-Wey, Jennifer; Pike, Malcolm C; Poole, Elizabeth M; Risch, Harvey A; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H; Rudolph, Anja; Runnebaum, Ingo B; Rzepecka, Iwona K; Salvesen, Helga B; Schernhammer, Eva; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C; Spiewankiewicz, Beata; Sucheston-Campbell, Lara; Teo, Soo-Hwang; Terry, Kathryn L; Thompson, Pamela J; Thomsen, Lotte; Tangen, Ingvild L; Tworoger, Shelley S; van Altena, Anne M; Vierkant, Robert A; Vergote, Ignace; Walsh, Christine S; Wang-Gohrke, Shan; Wentzensen, Nicolas; Whittemore, Alice S; Wicklund, Kristine G; Wilkens, Lynne R; Wu, Anna H; Wu, Xifeng; Woo, Yin-Ling; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Kelemen, Linda E; Berchuck, Andrew; Schildkraut, Joellen M; Ramus, Susan J; Goode, Ellen L; Monteiro, Alvaro N A; Gayther, Simon A; Narod, Steven A; Pharoah, Paul D P; Sellers, Thomas A; Phelan, Catherine M

2015-12-01

Epithelial-mesenchymal transition (EMT) is a process whereby epithelial cells assume mesenchymal characteristics to facilitate cancer metastasis. However, EMT also contributes to the initiation and development of primary tumors. Prior studies that explored the hypothesis that EMT gene variants contribute to epithelial ovarian carcinoma (EOC) risk have been based on small sample sizes and none have sought replication in an independent population. We screened 15,816 single-nucleotide polymorphisms (SNPs) in 296 genes in a discovery phase using data from a genome-wide association study of EOC among women of European ancestry (1,947 cases and 2,009 controls) and identified 793 variants in 278 EMT-related genes that were nominally (P < 0.05) associated with invasive EOC. These SNPs were then genotyped in a larger study of 14,525 invasive-cancer patients and 23,447 controls. A P-value <0.05 and a false discovery rate (FDR) <0.2 were considered statistically significant. In the larger dataset, GPC6/GPC5 rs17702471 was associated with the endometrioid subtype among Caucasians (odds ratio (OR) = 1.16, 95% CI = 1.07-1.25, P = 0.0003, FDR = 0.19), whereas F8 rs7053448 (OR = 1.69, 95% CI = 1.27-2.24, P = 0.0003, FDR = 0.12), F8 rs7058826 (OR = 1.69, 95% CI = 1.27-2.24, P = 0.0003, FDR = 0.12), and CAPN13 rs1983383 (OR = 0.79, 95% CI = 0.69-0.90, P = 0.0005, FDR = 0.12) were associated with combined invasive EOC among Asians. In silico functional analyses revealed that GPC6/GPC5 rs17702471 coincided with DNA regulatory elements. These results suggest that EMT gene variants do not appear to play a significant role in the susceptibility to EOC. © 2015 WILEY PERIODICALS, INC.
Epithelial-Mesenchymal Transition (EMT) gene variants and Epithelial Ovarian Cancer (EOC) risk

PubMed Central

Amankwah, Ernest K.; Lin, Hui-Yi; Tyrer, Jonathan P.; Lawrenson, Kate; Dennis, Joe; Chornokur, Ganna; Aben, Katja KH.; Anton-Culver, Hoda; Antonenkova, Natalia; Bruinsma, Fiona; Bandera, Elisa V.; Bean, Yukie T.; Beckmann, Matthias W.; Bisogna, Maria; Bjorge, Line; Bogdanova, Natalia; Brinton, Louise A.; Brooks-Wilson, Angela; Bunker, Clareann H.; Butzow, Ralf; Campbell, Ian G.; Carty, Karen; Chen, Zhihua; Chen, Y. Ann; Chang-Claude, Jenny; Cook, Linda S.; Cramer, Daniel W.; Cunningham, Julie M.; Cybulski, Cezary; Dansonka-Mieszkowska, Agnieszka; du Bois, Andreas; Despierre, Evelyn; Dicks, Ed; Doherty, Jennifer A.; Dörk, Thilo; Dürst, Matthias; Easton, Douglas F.; Eccles, Diana M.; Edwards, Robert P.; Ekici, Arif B.; Fasching, Peter A.; Fridley, Brooke L.; Gao, Yu-Tang; Gentry-Maharaj, Aleksandra; Giles, Graham G.; Glasspool, Rosalind; Goodman, Marc T.; Gronwald, Jacek; Harrington, Patricia; Harter, Philipp; Hasmad, Hanis N.; Hein, Alexander; Heitz, Florian; Hildebrandt, Michelle A.T.; Hillemanns, Peter; Hogdall, Claus K.; Hogdall, Estrid; Hosono, Satoyo; Iversen, Edwin S.; Jakubowska, Anna; Jensen, Allan; Ji, Bu-Tian; Karlan, Beth Y.; Jim, Heather; Kellar, Melissa; Kiemeney, Lambertus A.; Krakstad, Camilla; Kjaer, Susanne K.; Kupryjanczyk, Jolanta; Lambrechts, Diether; Lambrechts, Sandrina; Le, Nhu D.; Lee, Alice W.; Lele, Shashi; Leminen, Arto; Lester, Jenny; Levine, Douglas A.; Liang, Dong; Lim, Boon Kiong; Lissowska, Jolanta; Lu, Karen; Lubinski, Jan; Lundvall, Lene; Massuger, Leon F.A.G.; Matsuo, Keitaro; McGuire, Valerie; McLaughlin, John R.; McNeish, Ian; Menon, Usha; Milne, Roger L.; Modugno, Francesmary; Moysich, Kirsten B.; Ness, Roberta B.; Nevanlinna, Heli; Eilber, Ursula; Odunsi, Kunle; Olson, Sara H.; Orlow, Irene; Orsulic, Sandra; Weber, Rachel Palmieri; Paul, James; Pearce, Celeste L.; Pejovic, Tanja; Pelttari, Liisa M.; Permuth-Wey, Jennifer; Pike, Malcolm C.; Poole, Elizabeth M.; Risch, Harvey A.; Rosen, Barry; Rossing, Mary Anne; Rothstein, Joseph H.; Rudolph, Anja; Runnebaum, Ingo B.; Rzepecka, Iwona K.; Salvesen, Helga B.; Schernhammer, Eva; Schwaab, Ira; Shu, Xiao-Ou; Shvetsov, Yurii B.; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa C.; Spiewankiewicz, Beata; Sucheston-Campbell, Lara; Teo, Soo-Hwang; Terry, Kathryn L.; Thompson, Pamela J.; Thomsen, Lotte; Tangen, Ingvild L.; Tworoger, Shelley S.; van Altena, Anne M.; Vierkant, Robert A.; Vergote, Ignace; Walsh, Christine S.; Wang-Gohrke, Shan; Wentzensen, Nicolas; Whittemore, Alice S.; Wicklund, Kristine G.; Wilkens, Lynne R.; Wu, Anna H.; Wu, Xifeng; Woo, Yin-Ling; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Kelemen, Linda E.; Berchuck, Andrew; Schildkraut, Joellen M.; Ramus, Susan J.; Goode, Ellen L.; Monteiro, Alvaro N.A.; Gayther, Simon A.; Narod, Steven A.; Pharoah, Paul D. P.; Sellers, Thomas A.; Phelan, Catherine M.

2016-01-01

Introduction Epithelial-mesenchymal transition (EMT) is a process whereby epithelial cells assume mesenchymal characteristics to facilitate cancer metastasis. However, EMT also contributes to the initiation and development of primary tumors. Prior studies that explored the hypothesis that EMT gene variants contribute to EOC risk have been based on small sample sizes and none have sought replication in an independent population. Methods We screened 1254 SNPs in 296 genes in a discovery phase using data from a genome-wide association study of EOC among women of European ancestry (1,947 cases and 2,009 controls) and identified 793 variants in 278 EMT-related genes that were nominally (p<0.05) associated with invasive EOC. These SNPs were then genotyped in a larger study of 14,525 invasive-cancer patients and 23,447 controls. A p-value <0.05 and a false discovery rate (FDR) <0.2 was considered statistically significant. Results In the larger dataset, GPC6/GPC5 rs17702471 was associated with the endometrioid subtype among Caucasians (OR=1.16, 95%CI=1.07–1.25, p=0.0003, FDR=0.19), while F8 rs7053448 (OR=1.69, 95%CI=1.27–2.24, p=0.0003, FDR=0.12), F8 rs7058826 (OR=1.69, 95%CI=1.27–2.24, p=0.0003, FDR=0.12), and CAPN13 rs1983383 (OR=0.79, 95%CI=0.69–0.90, p=0.0005, FDR=0.12) were associated with combined invasive EOC among Asians. In silico functional analyses revealed that GPC6/GPC5 rs17702471 coincided with DNA regulatory elements. Conclusion These results suggest that EMT gene variants do not appear to play a significant role in the susceptibility to EOC. PMID:26399219
Accelerating pathway evolution by increasing the gene dosage of chromosomal segments.

PubMed

Tumen-Velasquez, Melissa; Johnson, Christopher W; Ahmed, Alaa; Dominick, Graham; Fulk, Emily M; Khanna, Payal; Lee, Sarah A; Schmidt, Alicia L; Linger, Jeffrey G; Eiteman, Mark A; Beckham, Gregg T; Neidle, Ellen L

2018-06-18

Experimental evolution is a critical tool in many disciplines, including metabolic engineering and synthetic biology. However, current methods rely on the chance occurrence of a key step that can dramatically accelerate evolution in natural systems, namely increased gene dosage. Our studies sought to induce the targeted amplification of chromosomal segments to facilitate rapid evolution. Since increased gene dosage confers novel phenotypes and genetic redundancy, we developed a method, Evolution by Amplification and Synthetic Biology (EASy), to create tandem arrays of chromosomal regions. In Acinetobacter baylyi , EASy was demonstrated on an important bioenergy problem, the catabolism of lignin-derived aromatic compounds. The initial focus on guaiacol (2-methoxyphenol), a common lignin degradation product, led to the discovery of Amycolatopsis genes ( gcoAB ) encoding a cytochrome P450 enzyme that converts guaiacol to catechol. However, chromosomal integration of gcoAB in Pseudomonas putida or A. baylyi did not enable guaiacol to be used as the sole carbon source despite catechol being a growth substrate. In ∼1,000 generations, EASy yielded alleles that in single chromosomal copy confer growth on guaiacol. Different variants emerged, including fusions between GcoA and CatA (catechol 1,2-dioxygenase). This study illustrates the power of harnessing chromosomal gene amplification to accelerate the evolution of desirable traits.
Computational biology for cardiovascular biomarker discovery.

PubMed

Azuaje, Francisco; Devaux, Yvan; Wagner, Daniel

2009-07-01

Computational biology is essential in the process of translating biological knowledge into clinical practice, as well as in the understanding of biological phenomena based on the resources and technologies originating from the clinical environment. One such key contribution of computational biology is the discovery of biomarkers for predicting clinical outcomes using 'omic' information. This process involves the predictive modelling and integration of different types of data and knowledge for screening, diagnostic or prognostic purposes. Moreover, this requires the design and combination of different methodologies based on statistical analysis and machine learning. This article introduces key computational approaches and applications to biomarker discovery based on different types of 'omic' data. Although we emphasize applications in cardiovascular research, the computational requirements and advances discussed here are also relevant to other domains. We will start by introducing some of the contributions of computational biology to translational research, followed by an overview of methods and technologies used for the identification of biomarkers with predictive or classification value. The main types of 'omic' approaches to biomarker discovery will be presented with specific examples from cardiovascular research. This will include a review of computational methodologies for single-source and integrative data applications. Major computational methods for model evaluation will be described together with recommendations for reporting models and results. We will present recent advances in cardiovascular biomarker discovery based on the combination of gene expression and functional network analyses. The review will conclude with a discussion of key challenges for computational biology, including perspectives from the biosciences and clinical areas.
Simulating the drug discovery pipeline: a Monte Carlo approach

PubMed Central

2012-01-01

Background The early drug discovery phase in pharmaceutical research and development marks the beginning of a long, complex and costly process of bringing a new molecular entity to market. As such, it plays a critical role in helping to maintain a robust downstream clinical development pipeline. Despite its importance, however, to our knowledge there are no published in silico models to simulate the progression of discrete virtual projects through a discovery milestone system. Results Multiple variables were tested and their impact on productivity metrics examined. Simulations predict that there is an optimum number of scientists for a given drug discovery portfolio, beyond which output in the form of preclinical candidates per year will remain flat. The model further predicts that the frequency of compounds to successfully pass the candidate selection milestone as a function of time will be irregular, with projects entering preclinical development in clusters marked by periods of low apparent productivity. Conclusions The model may be useful as a tool to facilitate analysis of historical growth and achievement over time, help gauge current working group progress against future performance expectations, and provide the basis for dialogue regarding working group best practices and resource deployment strategies. PMID:23186040
Cogena, a novel tool for co-expressed gene-set enrichment analysis, applied to drug repositioning and drug mode of action discovery.

PubMed

Jia, Zhilong; Liu, Ying; Guan, Naiyang; Bo, Xiaochen; Luo, Zhigang; Barnes, Michael R

2016-05-27

Drug repositioning, finding new indications for existing drugs, has gained much recent attention as a potentially efficient and economical strategy for accelerating new therapies into the clinic. Although improvement in the sensitivity of computational drug repositioning methods has identified numerous credible repositioning opportunities, few have been progressed. Arguably the "black box" nature of drug action in a new indication is one of the main blocks to progression, highlighting the need for methods that inform on the broader target mechanism in the disease context. We demonstrate that the analysis of co-expressed genes may be a critical first step towards illumination of both disease pathology and mode of drug action. We achieve this using a novel framework, co-expressed gene-set enrichment analysis (cogena) for co-expression analysis of gene expression signatures and gene set enrichment analysis of co-expressed genes. The cogena framework enables simultaneous, pathway driven, disease and drug repositioning analysis. Cogena can be used to illuminate coordinated changes within disease transcriptomes and identify drugs acting mechanistically within this framework. We illustrate this using a psoriatic skin transcriptome, as an exemplar, and recover two widely used Psoriasis drugs (Methotrexate and Ciclosporin) with distinct modes of action. Cogena out-performs the results of Connectivity Map and NFFinder webservers in similar disease transcriptome analyses. Furthermore, we investigated the literature support for the other top-ranked compounds to treat psoriasis and showed how the outputs of cogena analysis can contribute new insight to support the progression of drugs into the clinic. We have made cogena freely available within Bioconductor or https://github.com/zhilongjia/cogena . In conclusion, by targeting co-expressed genes within disease transcriptomes, cogena offers novel biological insight, which can be effectively harnessed for drug discovery and
DeepBase: annotation and discovery of microRNAs and other noncoding RNAs from deep-sequencing data.

PubMed

Yang, Jian-Hua; Qu, Liang-Hu

2012-01-01

Recent advances in high-throughput deep-sequencing technology have produced large numbers of short and long RNA sequences and enabled the detection and profiling of known and novel microRNAs (miRNAs) and other noncoding RNAs (ncRNAs) at unprecedented sensitivity and depth. In this chapter, we describe the use of deepBase, a database that we have developed to integrate all public deep-sequencing data and to facilitate the comprehensive annotation and discovery of miRNAs and other ncRNAs from these data. deepBase provides an integrative, interactive, and versatile web graphical interface to evaluate miRBase-annotated miRNA genes and other known ncRNAs, explores the expression patterns of miRNAs and other ncRNAs, and discovers novel miRNAs and other ncRNAs from deep-sequencing data. deepBase also provides a deepView genome browser to comparatively analyze these data at multiple levels. deepBase is available at http://deepbase.sysu.edu.cn/.
The discovery of zinc fingers and their development for practical applications in gene regulation and genome manipulation.

PubMed

Klug, Aaron

2010-02-01

A long-standing goal of molecular biologists has been to construct DNA-binding proteins for the control of gene expression. The classical Cys2His2 (C2H2) zinc finger design is ideally suited for such purposes. Discriminating between closely related DNA sequences both in vitro and in vivo, this naturally occurring design was adopted for engineering zinc finger proteins (ZFPs) to target genes specifically. Zinc fingers were discovered in 1985, arising from the interpretation of our biochemical studies on the interaction of the Xenopus protein transcription factor IIIA (TFIIIA) with 5S RNA. Subsequent structural studies revealed its three-dimensional structure and its interaction with DNA. Each finger constitutes a self-contained domain stabilized by a zinc (Zn) ion ligated to a pair of cysteines and a pair of histidines and also by an inner structural hydrophobic core. This discovery showed not only a new protein fold but also a novel principle of DNA recognition. Whereas other DNA-binding proteins generally make use of the 2-fold symmetry of the double helix, functioning as homo- or heterodimers, zinc fingers can be linked linearly in tandem to recognize nucleic acid sequences of varying lengths. This modular design offers a large number of combinatorial possibilities for the specific recognition of DNA (or RNA). It is therefore not surprising that the zinc finger is found widespread in nature, including 3% of the genes of the human genome. The zinc finger design can be used to construct DNA-binding proteins for specific intervention in gene expression. By fusing selected zinc finger peptides to repression or activation domains, genes can be selectively switched off or on by targeting the peptide to the desired gene target. It was also suggested that by combining an appropriate zinc finger peptide with other effector or functional domains, e.g. from nucleases or integrases to form chimaeric proteins, genomes could be modified or manipulated. The first example of the
Student Learning Centre (SLC) Embraces the New Melbourne Model of Teaching: Facilitating Collaborative Learning

ERIC Educational Resources Information Center

Ball, Sarah

2010-01-01

Learning is about discovery and change. As schools and universities look to the future, it is fundamental that they provide environments that facilitate collaborative learning and act as points for interaction and social activity. The redevelopment of the existing Engineering Library into a Student Learning Centre (SLC) embraces the new Melbourne…
Identification of rat lung-specific microRNAs by micoRNA microarray: valuable discoveries for the facilitation of lung research.

PubMed

Wang, Yang; Weng, Tingting; Gou, Deming; Chen, Zhongming; Chintagari, Narendranath Reddy; Liu, Lin

2007-01-24

An important mechanism for gene regulation utilizes small non-coding RNAs called microRNAs (miRNAs). These small RNAs play important roles in tissue development, cell differentiation and proliferation, lipid and fat metabolism, stem cells, exocytosis, diseases and cancers. To date, relatively little is known about functions of miRNAs in the lung except lung cancer. In this study, we utilized a rat miRNA microarray containing 216 miRNA probes, printed in-house, to detect the expression of miRNAs in the rat lung compared to the rat heart, brain, liver, kidney and spleen. Statistical analysis using Significant Analysis of Microarray (SAM) and Tukey Honestly Significant Difference (HSD) revealed 2 miRNAs (miR-195 and miR-200c) expressed specifically in the lung and 9 miRNAs co-expressed in the lung and another organ. 12 selected miRNAs were verified by Northern blot analysis. The identified lung-specific miRNAs from this work will facilitate functional studies of miRNAs during normal physiological and pathophysiological processes of the lung.
Matchmaking facilitates the diagnosis of an autosomal-recessive mitochondrial disease caused by biallelic mutation of the tRNA isopentenyltransferase (TRIT1) gene.

PubMed

Kernohan, Kristin D; Dyment, David A; Pupavac, Mihaela; Cramer, Zvi; McBride, Arran; Bernard, Genevieve; Straub, Isabella; Tetreault, Martine; Hartley, Taila; Huang, Lijia; Sell, Erick; Majewski, Jacek; Rosenblatt, David S; Shoubridge, Eric; Mhanni, Aziz; Myers, Tara; Proud, Virginia; Vergano, Samanta; Spangler, Brooke; Farrow, Emily; Kussman, Jennifer; Safina, Nicole; Saunders, Carol; Boycott, Kym M; Thiffault, Isabelle

2017-05-01

Deleterious variants in the same gene present in two or more families with overlapping clinical features provide convincing evidence of a disease-gene association; this can be a challenge in the study of ultrarare diseases. To facilitate the identification of additional families, several groups have created "matching" platforms. We describe four individuals from three unrelated families "matched" by GeneMatcher and MatchMakerExchange. Individuals had microcephaly, developmental delay, epilepsy, and recessive mutations in TRIT1. A single homozygous mutation in TRIT1 associated with similar features had previously been reported in one family. The identification of these individuals provides additional evidence to support TRIT1 as the disease-causing gene and interprets the variants as "pathogenic." TRIT1 functions to modify mitochondrial tRNAs and is necessary for protein translation. We show that dysfunctional TRIT1 results in decreased levels of select mitochondrial proteins. Our findings confirm the TRIT1 disease association and advance the phenotypic and molecular understanding of this disorder. © 2017 Wiley Periodicals, Inc.
Characterization of Capsicum annuum Genetic Diversity and Population Structure Based on Parallel Polymorphism Discovery with a 30K Unigene Pepper GeneChip

PubMed Central

Hill, Theresa A.; Ashrafi, Hamid; Reyes-Chin-Wo, Sebastian; Yao, JiQiang; Stoffel, Kevin; Truco, Maria-Jose; Kozik, Alexander; Michelmore, Richard W.; Van Deynze, Allen

2013-01-01

The widely cultivated pepper, Capsicum spp., important as a vegetable and spice crop world-wide, is one of the most diverse crops. To enhance breeding programs, a detailed characterization of Capsicum diversity including morphological, geographical and molecular data is required. Currently, molecular data characterizing Capsicum genetic diversity is limited. The development and application of high-throughput genome-wide markers in Capsicum will facilitate more detailed molecular characterization of germplasm collections, genetic relationships, and the generation of ultra-high density maps. We have developed the Pepper GeneChip® array from Affymetrix for polymorphism detection and expression analysis in Capsicum. Probes on the array were designed from 30,815 unigenes assembled from expressed sequence tags (ESTs). Our array design provides a maximum redundancy of 13 probes per base pair position allowing integration of multiple hybridization values per position to detect single position polymorphism (SPP). Hybridization of genomic DNA from 40 diverse C. annuum lines, used in breeding and research programs, and a representative from three additional cultivated species (C. frutescens, C. chinense and C. pubescens) detected 33,401 SPP markers within 13,323 unigenes. Among the C. annuum lines, 6,426 SPPs covering 3,818 unigenes were identified. An estimated three-fold reduction in diversity was detected in non-pungent compared with pungent lines, however, we were able to detect 251 highly informative markers across these C. annuum lines. In addition, an 8.7 cM region without polymorphism was detected around Pun1 in non-pungent C. annuum. An analysis of genetic relatedness and diversity using the software Structure revealed clustering of the germplasm which was confirmed with statistical support by principle components analysis (PCA) and phylogenetic analysis. This research demonstrates the effectiveness of parallel high-throughput discovery and application of genome
Characterization of Capsicum annuum genetic diversity and population structure based on parallel polymorphism discovery with a 30K unigene Pepper GeneChip.

PubMed

Hill, Theresa A; Ashrafi, Hamid; Reyes-Chin-Wo, Sebastian; Yao, JiQiang; Stoffel, Kevin; Truco, Maria-Jose; Kozik, Alexander; Michelmore, Richard W; Van Deynze, Allen

2013-01-01

The widely cultivated pepper, Capsicum spp., important as a vegetable and spice crop world-wide, is one of the most diverse crops. To enhance breeding programs, a detailed characterization of Capsicum diversity including morphological, geographical and molecular data is required. Currently, molecular data characterizing Capsicum genetic diversity is limited. The development and application of high-throughput genome-wide markers in Capsicum will facilitate more detailed molecular characterization of germplasm collections, genetic relationships, and the generation of ultra-high density maps. We have developed the Pepper GeneChip® array from Affymetrix for polymorphism detection and expression analysis in Capsicum. Probes on the array were designed from 30,815 unigenes assembled from expressed sequence tags (ESTs). Our array design provides a maximum redundancy of 13 probes per base pair position allowing integration of multiple hybridization values per position to detect single position polymorphism (SPP). Hybridization of genomic DNA from 40 diverse C. annuum lines, used in breeding and research programs, and a representative from three additional cultivated species (C. frutescens, C. chinense and C. pubescens) detected 33,401 SPP markers within 13,323 unigenes. Among the C. annuum lines, 6,426 SPPs covering 3,818 unigenes were identified. An estimated three-fold reduction in diversity was detected in non-pungent compared with pungent lines, however, we were able to detect 251 highly informative markers across these C. annuum lines. In addition, an 8.7 cM region without polymorphism was detected around Pun1 in non-pungent C. annuum. An analysis of genetic relatedness and diversity using the software Structure revealed clustering of the germplasm which was confirmed with statistical support by principle components analysis (PCA) and phylogenetic analysis. This research demonstrates the effectiveness of parallel high-throughput discovery and application of genome
Comparative Analysis of Syntenic Genes in Grass Genomes Reveals Accelerated Rates of Gene Structure and Coding Sequence Evolution in Polyploid Wheat1[W][OA

PubMed Central

Akhunov, Eduard D.; Sehgal, Sunish; Liang, Hanquan; Wang, Shichen; Akhunova, Alina R.; Kaur, Gaganpreet; Li, Wanlong; Forrest, Kerrie L.; See, Deven; Šimková, Hana; Ma, Yaqin; Hayden, Matthew J.; Luo, Mingcheng; Faris, Justin D.; Doležel, Jaroslav; Gill, Bikram S.

2013-01-01

Cycles of whole-genome duplication (WGD) and diploidization are hallmarks of eukaryotic genome evolution and speciation. Polyploid wheat (Triticum aestivum) has had a massive increase in genome size largely due to recent WGDs. How these processes may impact the dynamics of gene evolution was studied by comparing the patterns of gene structure changes, alternative splicing (AS), and codon substitution rates among wheat and model grass genomes. In orthologous gene sets, significantly more acquired and lost exonic sequences were detected in wheat than in model grasses. In wheat, 35% of these gene structure rearrangements resulted in frame-shift mutations and premature termination codons. An increased codon mutation rate in the wheat lineage compared with Brachypodium distachyon was found for 17% of orthologs. The discovery of premature termination codons in 38% of expressed genes was consistent with ongoing pseudogenization of the wheat genome. The rates of AS within the individual wheat subgenomes (21%–25%) were similar to diploid plants. However, we uncovered a high level of AS pattern divergence between the duplicated homeologous copies of genes. Our results are consistent with the accelerated accumulation of AS isoforms, nonsynonymous mutations, and gene structure rearrangements in the wheat lineage, likely due to genetic redundancy created by WGDs. Whereas these processes mostly contribute to the degeneration of a duplicated genome and its diploidization, they have the potential to facilitate the origin of new functional variations, which, upon selection in the evolutionary lineage, may play an important role in the origin of novel traits. PMID:23124323
Plant Enhancers: A Call for Discovery.

PubMed

Weber, Blaise; Zicola, Johan; Oka, Rurika; Stam, Maike

2016-11-01

Higher eukaryotes typically contain many different cell types, displaying different cellular functions that are influenced by biotic and abiotic cues. The different functions are characterized by specific gene expression patterns mediated by regulatory sequences such as transcriptional enhancers. Recent genome-wide approaches have identified thousands of enhancers in animals, reviving interest in enhancers in gene regulation. Although the regulatory roles of plant enhancers are as crucial as those in animals, genome-wide approaches have only very recently been applied to plants. Here we review characteristics of enhancers at the DNA and chromatin level in plants and other species, their similarities and differences, and techniques widely used for genome-wide discovery of enhancers in animal systems that can be implemented in plants. Copyright © 2016 Elsevier Ltd. All rights reserved.
The development of high-content screening (HCS) technology and its importance to drug discovery.

PubMed

Fraietta, Ivan; Gasparri, Fabio

2016-01-01

High-content screening (HCS) was introduced about twenty years ago as a promising analytical approach to facilitate some critical aspects of drug discovery. Its application has spread progressively within the pharmaceutical industry and academia to the point that it today represents a fundamental tool in supporting drug discovery and development. Here, the authors review some of significant progress in the HCS field in terms of biological models and assay readouts. They highlight the importance of high-content screening in drug discovery, as testified by its numerous applications in a variety of therapeutic areas: oncology, infective diseases, cardiovascular and neurodegenerative diseases. They also dissect the role of HCS technology in different phases of the drug discovery pipeline: target identification, primary compound screening, secondary assays, mechanism of action studies and in vitro toxicology. Recent advances in cellular assay technologies, such as the introduction of three-dimensional (3D) cultures, induced pluripotent stem cells (iPSCs) and genome editing technologies (e.g., CRISPR/Cas9), have tremendously expanded the potential of high-content assays to contribute to the drug discovery process. Increasingly predictive cellular models and readouts, together with the development of more sophisticated and affordable HCS readers, will further consolidate the role of HCS technology in drug discovery.
Knowledge Discovery from Vibration Measurements

PubMed Central

Li, Jian; Wang, Daoyao

2014-01-01

The framework as well as the particular algorithms of pattern recognition process is widely adopted in structural health monitoring (SHM). However, as a part of the overall process of knowledge discovery from data bases (KDD), the results of pattern recognition are only changes and patterns of changes of data features. In this paper, based on the similarity between KDD and SHM and considering the particularity of SHM problems, a four-step framework of SHM is proposed which extends the final goal of SHM from detecting damages to extracting knowledge to facilitate decision making. The purposes and proper methods of each step of this framework are discussed. To demonstrate the proposed SHM framework, a specific SHM method which is composed by the second order structural parameter identification, statistical control chart analysis, and system reliability analysis is then presented. To examine the performance of this SHM method, real sensor data measured from a lab size steel bridge model structure are used. The developed four-step framework of SHM has the potential to clarify the process of SHM to facilitate the further development of SHM techniques. PMID:24574933
GeoSearch: A lightweight broking middleware for geospatial resources discovery

NASA Astrophysics Data System (ADS)

Gui, Z.; Yang, C.; Liu, K.; Xia, J.

2012-12-01

With petabytes of geodata, thousands of geospatial web services available over the Internet, it is critical to support geoscience research and applications by finding the best-fit geospatial resources from the massive and heterogeneous resources. Past decades' developments witnessed the operation of many service components to facilitate geospatial resource management and discovery. However, efficient and accurate geospatial resource discovery is still a big challenge due to the following reasons: 1)The entry barriers (also called "learning curves") hinder the usability of discovery services to end users. Different portals and catalogues always adopt various access protocols, metadata formats and GUI styles to organize, present and publish metadata. It is hard for end users to learn all these technical details and differences. 2)The cost for federating heterogeneous services is high. To provide sufficient resources and facilitate data discovery, many registries adopt periodic harvesting mechanism to retrieve metadata from other federated catalogues. These time-consuming processes lead to network and storage burdens, data redundancy, and also the overhead of maintaining data consistency. 3)The heterogeneous semantics issues in data discovery. Since the keyword matching is still the primary search method in many operational discovery services, the search accuracy (precision and recall) is hard to guarantee. Semantic technologies (such as semantic reasoning and similarity evaluation) offer a solution to solve these issues. However, integrating semantic technologies with existing service is challenging due to the expandability limitations on the service frameworks and metadata templates. 4)The capabilities to help users make final selection are inadequate. Most of the existing search portals lack intuitive and diverse information visualization methods and functions (sort, filter) to present, explore and analyze search results. Furthermore, the presentation of the value
Facilitating Students' Interaction with Real Gas Properties Using a Discovery-Based Approach and Molecular Dynamics Simulations

ERIC Educational Resources Information Center

Sweet, Chelsea; Akinfenwa, Oyewumi; Foley, Jonathan J., IV

2018-01-01

We present an interactive discovery-based approach to studying the properties of real gases using simple, yet realistic, molecular dynamics software. Use of this approach opens up a variety of opportunities for students to interact with the behaviors and underlying theories of real gases. Students can visualize gas behavior under a variety of…

Insights into inner ear-specific gene regulation: epigenetics and non-coding RNAs in inner ear development and regeneration

PubMed Central

Avraham, Karen B.

2016-01-01

The vertebrate inner ear houses highly specialized sensory organs, tuned to detect and encode sound, head motion and gravity. Gene expression programs under the control of transcription factors orchestrate the formation and specialization of the non-sensory inner ear labyrinth and its sensory constituents. More recently, epigenetic factors and non-coding RNAs emerged as an additional layer of gene regulation, both in inner ear development and disease. In this review, we provide an overview on how epigenetic modifications and non-coding RNAs, in particular microRNAs (miRNAs), influence gene expression and summarize recent discoveries that highlight their critical role in the proper formation of the inner ear labyrinth and its sensory organs. In contrast to non-mammalian vertebrates, adult mammals lack the ability to regenerate inner ear mechano-sensory hair cells. Finally, we discuss recent insights into how epigenetic factors and miRNAs may facilitate, or in the case of mammals, restrict sensory hair cell regeneration. PMID:27836639
AKT phosphorylates H3-threonine 45 to facilitate termination of gene transcription in response to DNA damage.

PubMed

Lee, Jong-Hyuk; Kang, Byung-Hee; Jang, Hyonchol; Kim, Tae Wan; Choi, Jinmi; Kwak, Sojung; Han, Jungwon; Cho, Eun-Jung; Youn, Hong-Duk

2015-05-19

Post-translational modifications of core histones affect various cellular processes, primarily through transcription. However, their relationship with the termination of transcription has remained largely unknown. In this study, we show that DNA damage-activated AKT phosphorylates threonine 45 of core histone H3 (H3-T45). By genome-wide chromatin immunoprecipitation sequencing (ChIP-seq) analysis, H3-T45 phosphorylation was distributed throughout DNA damage-responsive gene loci, particularly immediately after the transcription termination site. H3-T45 phosphorylation pattern showed close-resemblance to that of RNA polymerase II C-terminal domain (CTD) serine 2 phosphorylation, which establishes the transcription termination signal. AKT1 was more effective than AKT2 in phosphorylating H3-T45. Blocking H3-T45 phosphorylation by inhibiting AKT or through amino acid substitution limited RNA decay downstream of mRNA cleavage sites and decreased RNA polymerase II release from chromatin. Our findings suggest that AKT-mediated phosphorylation of H3-T45 regulates the processing of the 3' end of DNA damage-activated genes to facilitate transcriptional termination. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Elements of discovery.

PubMed

Toledo-Pereyra, Luis H

2008-01-01

I understand discovery as the essence of thinking man, or to paraphrase the notable French philosopher René Descartes, "I think, therefore I discover." In this study, I introduce discovery as the foundation of modern science. Discovery consists of six stages or elements, including: concept, belief, ability, support, proof, and protection. Each element is discussed within the context of the whole discovery enterprise. Fundamental tenets for understanding discovery are given throughout the paper, and a few examples illustrate the significance of some of the most important elements. I invite clinicians, researchers, and/or clinical researchers to integrate themselves into the active process of discovery. Remember--I think, therefore I discover.
Natural product discovery: past, present, and future.

PubMed

Katz, Leonard; Baltz, Richard H

2016-03-01

Microorganisms have provided abundant sources of natural products which have been developed as commercial products for human medicine, animal health, and plant crop protection. In the early years of natural product discovery from microorganisms (The Golden Age), new antibiotics were found with relative ease from low-throughput fermentation and whole cell screening methods. Later, molecular genetic and medicinal chemistry approaches were applied to modify and improve the activities of important chemical scaffolds, and more sophisticated screening methods were directed at target disease states. In the 1990s, the pharmaceutical industry moved to high-throughput screening of synthetic chemical libraries against many potential therapeutic targets, including new targets identified from the human genome sequencing project, largely to the exclusion of natural products, and discovery rates dropped dramatically. Nonetheless, natural products continued to provide key scaffolds for drug development. In the current millennium, it was discovered from genome sequencing that microbes with large genomes have the capacity to produce about ten times as many secondary metabolites as was previously recognized. Indeed, the most gifted actinomycetes have the capacity to produce around 30-50 secondary metabolites. With the precipitous drop in cost for genome sequencing, it is now feasible to sequence thousands of actinomycete genomes to identify the "biosynthetic dark matter" as sources for the discovery of new and novel secondary metabolites. Advances in bioinformatics, mass spectrometry, proteomics, transcriptomics, metabolomics and gene expression are driving the new field of microbial genome mining for applications in natural product discovery and development.
Co-clustering phenome–genome for phenotype classification and disease gene discovery

PubMed Central

Hwang, TaeHyun; Atluri, Gowtham; Xie, MaoQiang; Dey, Sanjoy; Hong, Changjin; Kumar, Vipin; Kuang, Rui

2012-01-01

Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways. PMID:22735708
Computational drug discovery

PubMed Central

Ou-Yang, Si-sheng; Lu, Jun-yan; Kong, Xiang-qian; Liang, Zhong-jie; Luo, Cheng; Jiang, Hualiang

2012-01-01

Computational drug discovery is an effective strategy for accelerating and economizing drug discovery and development process. Because of the dramatic increase in the availability of biological macromolecule and small molecule information, the applicability of computational drug discovery has been extended and broadly applied to nearly every stage in the drug discovery and development workflow, including target identification and validation, lead discovery and optimization and preclinical tests. Over the past decades, computational drug discovery methods such as molecular docking, pharmacophore modeling and mapping, de novo design, molecular similarity calculation and sequence-based virtual screening have been greatly improved. In this review, we present an overview of these important computational methods, platforms and successful applications in this field. PMID:22922346
Novel Biomarker Discovery for Diagnostic and Therapeutic Strategies in Prostate Cancer

DTIC Science & Technology

2015-06-01

PURPOSE: to identify high affinity aptamers that distinguish between prostate cancers that are likely to remain organ- confined and those with potential to...metastasize. SCOPE: This was a pilot project to generate RNA aptamers that selectively react with a prostate cancer cell line that remains confined... Aptamer -Facilitated Biomarker Discovery (AptaBiD) technology. TASKS AND PROGRESS: (1) Non-metastatic LNCaP-Pro-5 cells, metastasis-prone LNCaP-LN3
Novel Biomarker Discovery for Diagnostic and Therapeutic Strategies in Prostate Cancer

DTIC Science & Technology

2014-03-01

aptamers that distinguish between prostate cancers that are likely to remain organ-confined and those with potential to metastasize, The scope of this...pilot is to generate DNA aptamers that selectively react with a prostate cancer cell line that remains confined to the prostate (LNCaP) vs. a...subpopulation of this cell line that has acquired the ability to metastasize aggressively, employing Cell-Selex and Aptamer -Facilitated Biomarker Discovery
De Novo Transcriptome Analysis of an Aerial Microalga Trentepohlia jolithus: Pathway Description and Gene Discovery for Carbon Fixation and Carotenoid Biosynthesis

PubMed Central

Li, Qianqian; Liu, Jianguo; Zhang, Litao; Liu, Qian

2014-01-01

Background Algae in the order Trentepohliales have a broad geographic distribution and are generally characterized by the presence of abundant β-carotene. The many monographs published to date have mainly focused on their morphology, taxonomy, phylogeny, distribution and reproduction; molecular studies of this order are still rare. High-throughput RNA sequencing (RNA-Seq) technology provides a powerful and efficient method for transcript analysis and gene discovery in Trentepohlia jolithus. Methods/Principal Findings Illumina HiSeq 2000 sequencing generated 55,007,830 Illumina PE raw reads, which were assembled into 41,328 assembled unigenes. Based on NR annotation, 53.28% of the unigenes (22,018) could be assigned to gene ontology classes with 54 subcategories and 161,451 functional terms. A total of 26,217 (63.44%) assembled unigenes were mapped to 128 KEGG pathways. Furthermore, a set of 5,798 SSRs in 5,206 unigenes and 131,478 putative SNPs were identified. Moreover, the fact that all of the C4 photosynthesis genes exist in T. jolithus suggests a complex carbon acquisition and fixation system. Similarities and differences between T. jolithus and other algae in carotenoid biosynthesis are also described in depth. Conclusions/Significance This is the first broad transcriptome survey for T. jolithus, increasing the amount of molecular data available for the class Ulvophyceae. As well as providing resources for functional genomics studies, the functional genes and putative pathways identified here will contribute to a better understanding of carbon fixation and fatty acid and carotenoid biosynthesis in T. jolithus. PMID:25254555
Potential biological targets for bioassay development in drug discovery of Sturge-Weber syndrome.

PubMed

Mohammadipanah, Fatemeh; Salimi, Fatemeh

2017-04-29

Sturge-Weber Syndrome (SWS) is among the neurocutaneous diseases, which has several clinical manifestations of ocular (glaucoma), cutaneous (port-wine stain), neurological (seizures) and vascular problems. Molecular mechanisms of SWS pathogenesis are initiated by the somatic mutation in GNAQ. Therefore, no definite treatments exist for the SWS and treatment options only mitigate the intensity of its clinical manifestations. Biological assay design for drug discovery against this syndrome demands comprehensive knowledge on mechanisms which are involved in its pathogenesis. By analysis of the interrelated molecular targets of SWS, some in vitro bioassay systems can be allotted for drug screening against this syndrome. Development of such platforms of bioassay can bring along the implementation of high throughput screening of natural or synthetic compounds in drug discovery programs. Regarding the fact that study of biological targets and their integration in biological assay design can facilitate the process of effective drug discovery; some potential biological targets and their respective biological assay for SWS drug discovery are propounded in this review. For this purpose, some biological targets for SWS drug discovery such as acetylcholine esterase, alkaline phosphatase, gamma-aminobutyricacidergic, Hypoxia-Inducible Factor (HIF) -1α and 2α are suggested. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Rapid Molecular Analysis of the STAT3 Gene in Job Syndrome of Hyper-IgE and Recurrent Infectious Diseases

PubMed Central

Kumánovics, Attila; Wittwer, Carl T.; Pryor, Robert J.; Augustine, Nancy H.; Leppert, Mark F.; Carey, John C.; Ochs, Hans D.; Wedgwood, Ralph J.; Faville, Ralph J.; Quie, Paul G.; Hill, Harry R.

2010-01-01

With the recent discovery of mutations in the STAT3 gene in the majority of patients with classic Hyper-IgE syndrome, it is now possible to make a molecular diagnosis in most of these cases. We have developed a PCR-based high-resolution DNA-melting assay to scan selected exons of the STAT3 gene for mutations responsible for Hyper-IgE syndrome, which is then followed by targeted sequencing. We scanned for mutations in 10 unrelated pedigrees, which include 16 patients with classic Hyper-IgE syndrome. These pedigrees include both sporadic and familial cases and their relatives, and we have found STAT3 mutations in all affected individuals. High-resolution melting analysis allows a single day turn-around time for mutation scanning and targeted sequencing of the STAT3 gene, which will greatly facilitate the rapid diagnosis of the Hyper-IgE syndrome, allowing prompt and appropriate therapy, prophylaxis, improved clinical outcome, and accurate genetic counseling. PMID:20093388
Gene expression in thiazide diuretic or statin users in relation to incident type 2 diabetes.

PubMed

Suchy-Dicey, Astrid; Heckbert, Susan R; Smith, Nicholas L; McKnight, Barbara; Rotter, Jerome I; Chen, Yd Ida; Psaty, Bruce M; Enquobahrie, Daniel A

2014-01-01

Thiazide diuretics and statins are used to improve cardiovascular outcomes, but may also cause type 2 diabetes (T2DM), although mechanisms are unknown. Gene expression studies may facilitate understanding of these associations. Participants from ongoing population-based studies were sampled for these longitudinal studies of peripheral blood microarray gene expression, and followed to incident diabetes. All sampled subjects were statin or thiazide users. Those who developed diabetes during follow-up comprised cases (44 thiazide users; 19 statin users), and were matched to drug-using controls who did not develop diabetes on several factors. Supervised normalization, surrogate variable analyses removed technical bias and confounding. Differentially-expressed genes were those with a false discovery rate Q-value<0.05. Among thiazide users, diabetes cases had significantly different expression of CCL14 (down-regulated 6%, Q-value=0.0257), compared with controls. Among statin users, diabetes cases had marginal but insignificantly different expression of ZNF532 (up-regulated 15%, Q-value=0.0584), CXORF21 (up-regulated 11%, Q-value=0.0584), and ZNHIT3 (up-regulated 19%, Q-value=0.0959), compared with controls. These genes comprise potential targets for future expression or mechanistic research on medication-related diabetes development.
Neofunctionalization of embryonic head patterning genes facilitates the positioning of novel traits on the dorsal head of adult beetles

PubMed Central

Busey, Hannah A.; Linz, David M.; Tomoyasu, Yoshinori; Moczek, Armin P.

2016-01-01

The origin and integration of novel traits are fundamental processes during the developmental evolution of complex organisms. Yet how novel traits integrate into pre-existing contexts remains poorly understood. Beetle horns represent a spectacular evolutionary novelty integrated within the context of the adult dorsal head, a highly conserved trait complex present since the origin of insects. We investigated whether otd1/2 and six3, members of a highly conserved gene network that instructs the formation of the anterior end of most bilaterians, also play roles in patterning more recently evolved traits. Using ablation-based fate-mapping, comparative larval RNA interference (RNAi) and transcript sequencing, we found that otd1/2, but not six3, play a fundamental role in the post-embryonic formation of the adult dorsal head and head horns of Onthophagus beetles. By contrast, neither gene appears to pattern the adult head of Tribolium flour beetles even though all are expressed in the dorsal head epidermis of both Onthophagus and Tribolium. We propose that, at least in beetles, the roles of otd genes during post-embryonic development are decoupled from their embryonic functions, and that potentially non-functional post-embryonic expression in the dorsal head facilitated their co-option into a novel horn-patterning network during Onthophagus evolution. PMID:27412276
Toolbox for Antibiotics Discovery from Microorganisms.

PubMed

Fisch, Katja M; Schäberle, Till F

2016-09-01

Microorganisms produce a vast array of biologically active metabolites. Such compounds are applied by humans to positively influence their health and, therefore, natural products serve as drug leads for pharmaceutical and medicinal chemistry. In this minireview, tools for the discovery and the production of potential drug leads are explained. A snapshot is provided, starting from the isolation of new producer strains, across genomic mining of (meta)genomes to identify biosynthetic gene clusters corresponding to natural products, toward heterologous expression to produce potential drug leads. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts.

PubMed

Ng; Wong

1999-01-01

We are entering a new era of research where the latest scientific discoveries are often first reported online and are readily accessible by scientists worldwide. This rapid electronic dissemination of research breakthroughs has greatly accelerated the current pace in genomics and proteomics research. The race to the discovery of a gene or a drug has now become increasingly dependent on how quickly a scientist can scan through voluminous amount of information available online to construct the relevant picture (such as protein-protein interaction pathways) as it takes shape amongst the rapidly expanding pool of globally accessible biological data (e.g. GENBANK) and scientific literature (e.g. MEDLINE). We describe a prototype system for automatic pathway discovery from on-line text abstracts, combining technologies that (1) retrieve research abstracts from online sources, (2) extract relevant information from the free texts, and (3) present the extracted information graphically and intuitively. Our work demonstrates that this framework allows us to routinely scan online scientific literature for automatic discovery of knowledge, giving modern scientists the necessary competitive edge in managing the information explosion in this electronic age.
Exploitation of Fungal Biodiversity for Discovery of Novel Antibiotics.

PubMed

Karwehl, Sabrina; Stadler, Marc

Fungi were among the first sources for antibiotics. The discovery and development of the penicillin-type and cephalosporin-type β-lactams and their synthetic versions were transformative in emergence of the modern pharmaceutical industry. They remain some of the most important antibiotics, even 70 years after their discovery. Meanwhile, thousands of fungal metabolites have been discovered, yet these metabolites have only contributed a few additional compounds that have entered clinical development. Substantial expansion in fungal biodiversity assessment along with the availability of modern "-OMICS" technology and revolutionary developments in fungal biotechnology have been made in the last 15 years subsequent to the exit of most of the big Pharma companies from the field of novel antibiotics discovery. Therefore, the timing seems opportune to revisit these fascinating chemically rich organisms as a reservoir of small-molecule templates for lead discovery. This review will describe ongoing interdisciplinary scenarios in which specialists in fungal biology collaborate with chemists, pharmacologists and biochemical and process engineers in order to reveal and make new antibiotics. The utility of a pre-selection process based on phylogenetic data and distribution of secondary metabolite encoding gene cluster will be highlighted. Examples of novel bioactive metabolites from fungi derived from special ecological groups and new phylogenetic lineages will also be discussed.
Systems biology impact on antiepileptic drug discovery.

PubMed

Margineanu, Doru Georg

2012-02-01

Systems biology (SB), a recent trend in bioscience research to consider the complex interactions in biological systems from a holistic perspective, sees the disease as a disturbed network of interactions, rather than alteration of single molecular component(s). SB-relying network pharmacology replaces the prevailing focus on specific drug-receptor interaction and the corollary of rational drug design of "magic bullets", by the search for multi-target drugs that would act on biological networks as "magic shotguns". Epilepsy being a multi-factorial, polygenic and dynamic pathology, SB approach appears particularly fit and promising for antiepileptic drug (AED) discovery. In fact, long before the advent of SB, AED discovery already involved some SB-like elements. A reported SB project aimed to find out new drug targets in epilepsy relies on a relational database that integrates clinical information, recordings from deep electrodes and 3D-brain imagery with histology and molecular biology data on modified expression of specific genes in the brain regions displaying spontaneous epileptic activity. Since hitting a single target does not treat complex diseases, a proper pharmacological promiscuity might impart on an AED the merit of being multi-potent. However, multi-target drug discovery entails the complicated task of optimizing multiple activities of compounds, while having to balance drug-like properties and to control unwanted effects. Specific design tools for this new approach in drug discovery barely emerge, but computational methods making reliable in silico predictions of poly-pharmacology did appear, and their progress might be quite rapid. The current move away from reductionism into network pharmacology allows expecting that a proper integration of the intrinsic complexity of epileptic pathology in AED discovery might result in literally anti-epileptic drugs. Copyright © 2011 Elsevier B.V. All rights reserved.
Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the QSTAR project.

PubMed

Verbist, Bie; Klambauer, Günter; Vervoort, Liesbet; Talloen, Willem; Shkedy, Ziv; Thas, Olivier; Bender, Andreas; Göhlmann, Hinrich W H; Hochreiter, Sepp

2015-05-01

The pharmaceutical industry is faced with steadily declining R&D efficiency which results in fewer drugs reaching the market despite increased investment. A major cause for this low efficiency is the failure of drug candidates in late-stage development owing to safety issues or previously undiscovered side-effects. We analyzed to what extent gene expression data can help to de-risk drug development in early phases by detecting the biological effects of compounds across disease areas, targets and scaffolds. For eight drug discovery projects within a global pharmaceutical company, gene expression data were informative and able to support go/no-go decisions. Our studies show that gene expression profiling can detect adverse effects of compounds, and is a valuable tool in early-stage drug discovery decision making. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery

PubMed Central

2014-01-01

The Semanticscience Integrated Ontology (SIO) is an ontology to facilitate biomedical knowledge discovery. SIO features a simple upper level comprised of essential types and relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes. SIO specifies simple design patterns to describe and associate qualities, capabilities, functions, quantities, and informational entities including textual, geometrical, and mathematical entities, and provides specific extensions in the domains of chemistry, biology, biochemistry, and bioinformatics. SIO provides an ontological foundation for the Bio2RDF linked data for the life sciences project and is used for semantic integration and discovery for SADI-based semantic web services. SIO is freely available to all users under a creative commons by attribution license. See website for further information: http://sio.semanticscience.org. PMID:24602174
Polar Domain Discovery with Sparkler

NASA Astrophysics Data System (ADS)

Duerr, R.; Khalsa, S. J. S.; Mattmann, C. A.; Ottilingam, N. K.; Singh, K.; Lopez, L. A.

2017-12-01

The scientific web is vast and ever growing. It encompasses millions of textual, scientific and multimedia documents describing research in a multitude of scientific streams. Most of these documents are hidden behind forms which require user action to retrieve and thus can't be directly accessed by content crawlers. These documents are hosted on web servers across the world, most often on outdated hardware and network infrastructure. Hence it is difficult and time-consuming to aggregate documents from the scientific web, especially those relevant to a specific domain. Thus generating meaningful domain-specific insights is currently difficult. We present an automated discovery system (Figure 1) using Sparkler, an open-source, extensible, horizontally scalable crawler which facilitates high throughput and focused crawling of documents pertinent to a particular domain such as information about polar regions. With this set of highly domain relevant documents, we show that it is possible to answer analytical questions about that domain. Our domain discovery algorithm leverages prior domain knowledge to reach out to commercial/scientific search engines to generate seed URLs. Subject matter experts then annotate these seed URLs manually on a scale from highly relevant to irrelevant. We leverage this annotated dataset to train a machine learning model which predicts the `domain relevance' of a given document. We extend Sparkler with this model to focus crawling on documents relevant to that domain. Sparkler avoids disruption of service by 1) partitioning URLs by hostname such that every node gets a different host to crawl and by 2) inserting delays between subsequent requests. With an NSF-funded supercomputer Wrangler, we scaled our domain discovery pipeline to crawl about 200k polar specific documents from the scientific web, within a day.

Towards Robot Scientists for autonomous scientific discovery

PubMed Central

2010-01-01

We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this. This will make scientific information more reproducible and reusable, and promote the integration of computers in scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes tedious and repetitive tasks from the human scientist. PMID:20119518
Towards Robot Scientists for autonomous scientific discovery.

PubMed

Sparkes, Andrew; Aubrey, Wayne; Byrne, Emma; Clare, Amanda; Khan, Muhammed N; Liakata, Maria; Markham, Magdalena; Rowland, Jem; Soldatova, Larisa N; Whelan, Kenneth E; Young, Michael; King, Ross D

2010-01-04

We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this. This will make scientific information more reproducible and reusable, and promote the integration of computers in scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes tedious and repetitive tasks from the human scientist.
Reducing the Bottleneck in Discovery of Novel Antibiotics.

PubMed

Jones, Marcus B; Nierman, William C; Shan, Yue; Frank, Bryan C; Spoering, Amy; Ling, Losee; Peoples, Aaron; Zullo, Ashley; Lewis, Kim; Nelson, Karen E

2017-04-01

Most antibiotics were discovered by screening soil actinomycetes, but the efficiency of the discovery platform collapsed in the 1960s. By now, more than 3000 antibiotics have been described and most of the current discovery effort is focused on the rediscovery of known compounds, making the approach impractical. The last marketed broad-spectrum antibiotics discovered were daptomycin, linezolid, and fidaxomicin. The current state of the art in the development of new anti-infectives is a non-existent pipeline in the absence of a discovery platform. This is particularly troubling given the emergence of pan-resistant pathogens. The current practice in dealing with the problem of the background of known compounds is to use chemical dereplication of extracts to assess the relative novelty of a compound it contains. Dereplication typically requires scale-up, extraction, and often fractionation before an accurate mass and structure can be produced by MS analysis in combination with 2D NMR. Here, we describe a transcriptome analysis approach using RNA sequencing (RNASeq) to identify promising novel antimicrobial compounds from microbial extracts. Our pipeline permits identification of antimicrobial compounds that produce distinct transcription profiles using unfractionated cell extracts. This efficient pipeline will eliminate the requirement for purification and structure determination of compounds from extracts and will facilitate high-throughput screen of cell extracts for identification of novel compounds.
DDMGD: the database of text-mined associations between genes methylated in diseases from different species.

PubMed

Bin Raies, Arwa; Mansour, Hicham; Incitti, Roberto; Bajic, Vladimir B

2015-01-01

Gathering information about associations between methylated genes and diseases is important for diseases diagnosis and treatment decisions. Recent advancements in epigenetics research allow for large-scale discoveries of associations of genes methylated in diseases in different species. Searching manually for such information is not easy, as it is scattered across a large number of electronic publications and repositories. Therefore, we developed DDMGD database (http://www.cbrc.kaust.edu.sa/ddmgd/) to provide a comprehensive repository of information related to genes methylated in diseases that can be found through text mining. DDMGD's scope is not limited to a particular group of genes, diseases or species. Using the text mining system DEMGD we developed earlier and additional post-processing, we extracted associations of genes methylated in different diseases from PubMed Central articles and PubMed abstracts. The accuracy of extracted associations is 82% as estimated on 2500 hand-curated entries. DDMGD provides a user-friendly interface facilitating retrieval of these associations ranked according to confidence scores. Submission of new associations to DDMGD is provided. A comparison analysis of DDMGD with several other databases focused on genes methylated in diseases shows that DDMGD is comprehensive and includes most of the recent information on genes methylated in diseases. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Transcriptome Characterization of Cymbidium sinense 'Dharma' Using 454 Pyrosequencing and Its Application in the Identification of Genes Associated with Leaf Color Variation.

PubMed

Zhu, Genfa; Yang, Fengxi; Shi, Shanshan; Li, Dongmei; Wang, Zhen; Liu, Hailin; Huang, Dan; Wang, Caiyun

2015-01-01

The highly variable leaf color of Cymbidium sinense significantly improves its horticultural and economic value, and makes it highly desirable in the flower markets in China and Southeast Asia. However, little is understood about the molecular mechanism underlying leaf-color variations. In this study, we found the content of photosynthetic pigments, especially chlorophyll degradation metabolite in the leaf-color mutants is distinguished significantly from that in the wild type of Cymbidium sinense 'Dharma'. To further determine the candidate genes controlling leaf-color variations, we first sequenced the global transcriptome using 454 pyrosequencing. More than 0.7 million expressed sequence tags (ESTs) with an average read length of 445.9 bp were generated and assembled into 103,295 isotigs representing 68,460 genes. Of these isotigs, 43,433 were significantly aligned to known proteins in the public database, of which 29,299 could be categorized into 42 functional groups in the gene ontology system, 10,079 classified into 23 functional classifications in the clusters of orthologous groups system, and 23,092 assigned to 139 clusters of specific metabolic pathways in the Kyoto Encyclopedia of Genes and Genomes. Among these annotations, 95 isotigs were designated as involved in chlorophyll metabolism. On this basis, we identified 16 key enzyme-encoding genes in the chlorophyll metabolism pathway, the full length cDNAs and expressions of which were further confirmed. Expression pattern indicated that the key enzyme-encoding genes for chlorophyll degradation were more highly expressed in the leaf color mutants, as was consistent with their lower chlorophyll contents. This study is the first to supply an informative 454 EST dataset for Cymbidium sinense 'Dharma' and to identify original leaf color-associated genes, which provide important resources to facilitate gene discovery for molecular breeding, marketable trait discovery, and investigating various biological
Transcriptome Characterization of Cymbidium sinense 'Dharma' Using 454 Pyrosequencing and Its Application in the Identification of Genes Associated with Leaf Color Variation

PubMed Central

Shi, Shanshan; Li, Dongmei; Wang, Zhen; Liu, Hailin; Huang, Dan; Wang, Caiyun

2015-01-01

The highly variable leaf color of Cymbidium sinense significantly improves its horticultural and economic value, and makes it highly desirable in the flower markets in China and Southeast Asia. However, little is understood about the molecular mechanism underlying leaf-color variations. In this study, we found the content of photosynthetic pigments, especially chlorophyll degradation metabolite in the leaf-color mutants is distinguished significantly from that in the wild type of Cymbidium sinense 'Dharma'. To further determine the candidate genes controlling leaf-color variations, we first sequenced the global transcriptome using 454 pyrosequencing. More than 0.7 million expressed sequence tags (ESTs) with an average read length of 445.9 bp were generated and assembled into 103,295 isotigs representing 68,460 genes. Of these isotigs, 43,433 were significantly aligned to known proteins in the public database, of which 29,299 could be categorized into 42 functional groups in the gene ontology system, 10,079 classified into 23 functional classifications in the clusters of orthologous groups system, and 23,092 assigned to 139 clusters of specific metabolic pathways in the Kyoto Encyclopedia of Genes and Genomes. Among these annotations, 95 isotigs were designated as involved in chlorophyll metabolism. On this basis, we identified 16 key enzyme-encoding genes in the chlorophyll metabolism pathway, the full length cDNAs and expressions of which were further confirmed. Expression pattern indicated that the key enzyme-encoding genes for chlorophyll degradation were more highly expressed in the leaf color mutants, as was consistent with their lower chlorophyll contents. This study is the first to supply an informative 454 EST dataset for Cymbidium sinense 'Dharma' and to identify original leaf color-associated genes, which provide important resources to facilitate gene discovery for molecular breeding, marketable trait discovery, and investigating various biological
PDPR Gene Expression Correlates with Exercise-Training Insulin Sensitivity Changes

PubMed Central

Barberio, Matthew D.; Huffman, Kim M.; Giri, Mamta; Hoffman, Eric P.; Kraus, William E.; Hubal, Monica J.

2016-01-01

Purpose Whole body insulin sensitivity (Si) typically improves following aerobic exercise training; however, individual responses can be highly variable. The purpose of this study was to use global gene expression to identify skeletal muscle genes that correlate with exercise-induced Si changes. Methods Longitudinal cohorts from the Studies of Targeted Risk Reduction Intervention through Defined Exercise (STRRIDE) were utilized as Discovery (Affymetrix) and Confirmation (Illumina) of vastus lateralis gene expression profiles. Discovery (n=39; 21 men) and Confirmation (n=42; 19 men) cohorts were matched for age (52 ± 8 vs. 51 ± 10 yr), BMI (30.4 ± 2.8 vs. 29.7 ± 2.8 kg*m-2), and VO2max (30.4 ± 2.8 vs. 29.7 ± 2.8 mL/kg/min). Si was determined via intravenous glucose tolerance test pre- and post-training. Pearson product-moment correlation coefficients determined relationships between a) baseline and b) training-induced changes in gene expression and %ΔSi after training. Results Expression of 2454 (Discovery) and 1778 genes (Confirmation) at baseline were significantly (P<0.05) correlated to %ΔSi; 112 genes overlapped. Pathway analyses identified Ca2+-signaling-related transcripts in this 112-gene list. Expression changes of 1384 (Discovery) and 1288 genes (Confirmation) following training were significantly (P<0.05) correlated to % ΔSi; 33 genes overlapped, representing contractile apparatus of skeletal and smooth muscle genes. Pyruvate dehydrogenase phosphatase regulatory subunit (PDPR) expression at baseline (p=0.01, r=0.41) and post-training (p=0.01, r=0.43) were both correlated with %ΔSi. Conclusion Exercise-induced adaptations in skeletal muscle Si are related to baseline levels of Ca+2-regulating transcripts, which may prime the muscle for adaptation. Relationships between %ΔSi and PDPR, a regulatory subunit of the pyruvate dehydrogenase complex, indicate that the Si response is strongly related to key steps in metabolic regulation. PMID:27846149
Serendipitous Discovery of an Immunoglobulin-Binding Autotransporter in Bordetella Species▿

PubMed Central

Williams, Corinne L.; Haines, Robert; Cotter, Peggy A.

2008-01-01

We describe the serendipitous discovery of BatB, a classical-type Bordetella autotransporter (AT) protein with an ∼180-kDa passenger domain that remains noncovalently associated with the outer membrane. Like genes encoding all characterized protein virulence factors in Bordetella species, batB transcription is positively regulated by the master virulence regulatory system BvgAS. BatB is predicted to share similarity with immunoglobulin A (IgA) proteases, and we showed that BatB binds Ig in vitro. In vivo, a Bordetella bronchiseptica ΔbatB mutant was unable to overcome innate immune defenses and was cleared from the lower respiratory tracts of mice more rapidly than wild-type B. bronchiseptica. This defect was abrogated in SCID mice, suggesting that BatB functions to resist clearance during the first week postinoculation in a manner dependent on B- and T-cell-mediated activities. Taken together with the previous demonstration that polymorphonuclear neutrophils (PMN) are critical for the control of B. bronchiseptica in mice, our data support the hypothesis that BatB prevents nonspecific antibodies from facilitating PMN-mediated clearance during the first few days postinoculation. Neither of the strictly human-adapted Bordetella subspecies produces a fully functional BatB protein; nucleotide differences within the putative promoter region prevent batB transcription in Bordetella pertussis, and although expressed, the batB gene of human-derived Bordetella parapertussis (B. parapertussishu) contains a large in-frame deletion relative to batB of B. bronchiseptica. Taken together, our data suggest that BatB played an important role in the evolution of virulence and host specificity among the mammalian-adapted bordetellae. PMID:18426869
Discovery and characterization of miRNA genes in atlantic salmon (Salmo salar) by use of a deep sequencing approach

PubMed Central

2013-01-01

enriched in liver tissue and the precursor was mapped to intron 7 of the transferrin gene. Conclusions The identification and annotation of evolutionary conserved and novel Salmo salar miRNAs as well as the characterization of miRNA gene clusters provide biological knowledge that will greatly facilitate further functional studies on miRNAs in this species. PMID:23865519
Flightless I (Drosophila) homolog facilitates chromatin accessibility of the estrogen receptor α target genes in MCF-7 breast cancer cells

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jeong, Kwang Won, E-mail: kwjeong@gachon.ac.kr

2014-04-04

Highlights: • H3K4me3 and Pol II binding at TFF1 promoter were reduced in FLII-depleted MCF-7 cells. • FLII is required for chromatin accessibility of the enhancer of ERalpha target genes. • Depletion of FLII causes inhibition of proliferation of MCF-7 cells. - Abstract: The coordinated activities of multiple protein complexes are essential to the remodeling of chromatin structure and for the recruitment of RNA polymerase II (Pol II) to the promoter in order to facilitate the initiation of transcription in nuclear receptor-mediated gene expression. Flightless I (Drosophila) homolog (FLII), a nuclear receptor coactivator, is associated with the SWI/SNF-chromatin remodeling complexmore » during estrogen receptor (ER)α-mediated transcription. However, the function of FLII in estrogen-induced chromatin opening has not been fully explored. Here, we show that FLII plays a critical role in establishing active histone modification marks and generating the open chromatin structure of ERα target genes. We observed that the enhancer regions of ERα target genes are heavily occupied by FLII, and histone H3K4me3 and Pol II binding induced by estrogen are decreased in FLII-depleted MCF-7 cells. Furthermore, formaldehyde-assisted isolation of regulatory elements (FAIRE)-quantitative polymerase chain reaction (qPCR) experiments showed that depletion of FLII resulted in reduced chromatin accessibility of multiple ERα target genes. These data suggest FLII as a key regulator of ERα-mediated transcription through its role in regulating chromatin accessibility for the binding of RNA Polymerase II and possibly other transcriptional coactivators.« less
Knowledge discovery by accuracy maximization

PubMed Central

Cacciatore, Stefano; Luchinat, Claudio; Tenori, Leonardo

2014-01-01

Here we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold’s topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. This robustness is demonstrated on experimental datasets of gene expression and metabolomics, where KODAMA compares favorably with other existing feature extraction methods. KODAMA is then applied to an astronomical dataset, revealing unexpected features. Interesting and not easily predictable features are also found in the analysis of the State of the Union speeches by American presidents: KODAMA reveals an abrupt linguistic transition sharply separating all post-Reagan from all pre-Reagan speeches. The transition occurs during Reagan’s presidency and not from its beginning. PMID:24706821
Discovery and characterization of proteins associated with aflatoxin-resistance: evaluating their potential as breeding markers.

PubMed

Brown, Robert L; Chen, Zhi-Yuan; Warburton, Marilyn; Luo, Meng; Menkir, Abebe; Fakhoury, Ahmad; Bhatnagar, Deepak

2010-04-01

Host resistance has become a viable approach to eliminating aflatoxin contamination of maize since the discovery of several maize lines with natural resistance. However, to derive commercial benefit from this resistance and develop lines that can aid growers, markers need to be identified to facilitate the transfer of resistance into commercially useful genetic backgrounds without transfer of unwanted traits. To accomplish this, research efforts have focused on the identification of kernel resistance-associated proteins (RAPs) including the employment of comparative proteomics to investigate closely-related maize lines that vary in aflatoxin accumulation. RAPs have been identified and several further characterized through physiological and biochemical investigations to determine their causal role in resistance and, therefore, their suitability as breeding markers. Three RAPs, a 14 kDa trypsin inhibitor, pathogenesis-related protein 10 and glyoxalase I are being investigated using RNAi gene silencing and plant transformation. Several resistant lines have been subjected to QTL mapping to identify loci associated with the aflatoxin-resistance phenotype. Results of proteome and characterization studies are discussed.
LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures.

PubMed

Duan, Qiaonan; Flynn, Corey; Niepel, Mario; Hafner, Marc; Muhlich, Jeremy L; Fernandez, Nicolas F; Rouillard, Andrew D; Tan, Christopher M; Chen, Edward Y; Golub, Todd R; Sorger, Peter K; Subramanian, Aravind; Ma'ayan, Avi

2014-07-01

For the Library of Integrated Network-based Cellular Signatures (LINCS) project many gene expression signatures using the L1000 technology have been produced. The L1000 technology is a cost-effective method to profile gene expression in large scale. LINCS Canvas Browser (LCB) is an interactive HTML5 web-based software application that facilitates querying, browsing and interrogating many of the currently available LINCS L1000 data. LCB implements two compacted layered canvases, one to visualize clustered L1000 expression data, and the other to display enrichment analysis results using 30 different gene set libraries. Clicking on an experimental condition highlights gene-sets enriched for the differentially expressed genes from the selected experiment. A search interface allows users to input gene lists and query them against over 100 000 conditions to find the top matching experiments. The tool integrates many resources for an unprecedented potential for new discoveries in systems biology and systems pharmacology. The LCB application is available at http://www.maayanlab.net/LINCS/LCB. Customized versions will be made part of the http://lincscloud.org and http://lincs.hms.harvard.edu websites. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genetics of rheumatoid arthritis contributes to biology and drug discovery.

PubMed

Okada, Yukinori; Wu, Di; Trynka, Gosia; Raj, Towfique; Terao, Chikashi; Ikari, Katsunori; Kochi, Yuta; Ohmura, Koichiro; Suzuki, Akari; Yoshida, Shinji; Graham, Robert R; Manoharan, Arun; Ortmann, Ward; Bhangale, Tushar; Denny, Joshua C; Carroll, Robert J; Eyler, Anne E; Greenberg, Jeffrey D; Kremer, Joel M; Pappas, Dimitrios A; Jiang, Lei; Yin, Jian; Ye, Lingying; Su, Ding-Feng; Yang, Jian; Xie, Gang; Keystone, Ed; Westra, Harm-Jan; Esko, Tõnu; Metspalu, Andres; Zhou, Xuezhong; Gupta, Namrata; Mirel, Daniel; Stahl, Eli A; Diogo, Dorothée; Cui, Jing; Liao, Katherine; Guo, Michael H; Myouzen, Keiko; Kawaguchi, Takahisa; Coenen, Marieke J H; van Riel, Piet L C M; van de Laar, Mart A F J; Guchelaar, Henk-Jan; Huizinga, Tom W J; Dieudé, Philippe; Mariette, Xavier; Bridges, S Louis; Zhernakova, Alexandra; Toes, Rene E M; Tak, Paul P; Miceli-Richard, Corinne; Bang, So-Young; Lee, Hye-Soon; Martin, Javier; Gonzalez-Gay, Miguel A; Rodriguez-Rodriguez, Luis; Rantapää-Dahlqvist, Solbritt; Arlestig, Lisbeth; Choi, Hyon K; Kamatani, Yoichiro; Galan, Pilar; Lathrop, Mark; Eyre, Steve; Bowes, John; Barton, Anne; de Vries, Niek; Moreland, Larry W; Criswell, Lindsey A; Karlson, Elizabeth W; Taniguchi, Atsuo; Yamada, Ryo; Kubo, Michiaki; Liu, Jun S; Bae, Sang-Cheol; Worthington, Jane; Padyukov, Leonid; Klareskog, Lars; Gregersen, Peter K; Raychaudhuri, Soumya; Stranger, Barbara E; De Jager, Philip L; Franke, Lude; Visscher, Peter M; Brown, Matthew A; Yamanaka, Hisashi; Mimori, Tsuneyo; Takahashi, Atsushi; Xu, Huji; Behrens, Timothy W; Siminovitch, Katherine A; Momohara, Shigeki; Matsuda, Fumihiko; Yamamoto, Kazuhiko; Plenge, Robert M

2014-02-20

A major challenge in human genetics is to devise a systematic strategy to integrate disease-associated variants with diverse genomic and biological data sets to provide insight into disease pathogenesis and guide drug discovery for complex traits such as rheumatoid arthritis (RA). Here we performed a genome-wide association study meta-analysis in a total of >100,000 subjects of European and Asian ancestries (29,880 RA cases and 73,758 controls), by evaluating ∼10 million single-nucleotide polymorphisms. We discovered 42 novel RA risk loci at a genome-wide level of significance, bringing the total to 101 (refs 2 - 4). We devised an in silico pipeline using established bioinformatics methods based on functional annotation, cis-acting expression quantitative trait loci and pathway analyses--as well as novel methods based on genetic overlap with human primary immunodeficiency, haematological cancer somatic mutations and knockout mouse phenotypes--to identify 98 biological candidate genes at these 101 risk loci. We demonstrate that these genes are the targets of approved therapies for RA, and further suggest that drugs approved for other indications may be repurposed for the treatment of RA. Together, this comprehensive genetic study sheds light on fundamental genes, pathways and cell types that contribute to RA pathogenesis, and provides empirical evidence that the genetics of RA can provide important information for drug discovery.
GSEH: A Novel Approach to Select Prostate Cancer-Associated Genes Using Gene Expression Heterogeneity.

PubMed

Kim, Hyunjin; Choi, Sang-Min; Park, Sanghyun

2018-01-01

When a gene shows varying levels of expression among normal people but similar levels in disease patients or shows similar levels of expression among normal people but different levels in disease patients, we can assume that the gene is associated with the disease. By utilizing this gene expression heterogeneity, we can obtain additional information that abets discovery of disease-associated genes. In this study, we used collaborative filtering to calculate the degree of gene expression heterogeneity between classes and then scored the genes on the basis of the degree of gene expression heterogeneity to find "differentially predicted" genes. Through the proposed method, we discovered more prostate cancer-associated genes than 10 comparable methods. The genes prioritized by the proposed method are potentially significant to biological processes of a disease and can provide insight into them.
Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison.

PubMed

Kazemian, Majid; Zhu, Qiyun; Halfon, Marc S; Sinha, Saurabh

2011-12-01

Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, 'enhancers'), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for 'motif-blind' CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to 'supervise' the search. We propose a new statistical method, based on 'Interpolated Markov Models', for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers. © The Author(s) 2011. Published by Oxford University Press.
Improved accuracy of supervised CRM discovery with interpolated Markov models and cross-species comparison

PubMed Central

Kazemian, Majid; Zhu, Qiyun; Halfon, Marc S.; Sinha, Saurabh

2011-01-01

Despite recent advances in experimental approaches for identifying transcriptional cis-regulatory modules (CRMs, ‘enhancers’), direct empirical discovery of CRMs for all genes in all cell types and environmental conditions is likely to remain an elusive goal. Effective methods for computational CRM discovery are thus a critically needed complement to empirical approaches. However, existing computational methods that search for clusters of putative binding sites are ineffective if the relevant TFs and/or their binding specificities are unknown. Here, we provide a significantly improved method for ‘motif-blind’ CRM discovery that does not depend on knowledge or accurate prediction of TF-binding motifs and is effective when limited knowledge of functional CRMs is available to ‘supervise’ the search. We propose a new statistical method, based on ‘Interpolated Markov Models’, for motif-blind, genome-wide CRM discovery. It captures the statistical profile of variable length words in known CRMs of a regulatory network and finds candidate CRMs that match this profile. The method also uses orthologs of the known CRMs from closely related genomes. We perform in silico evaluation of predicted CRMs by assessing whether their neighboring genes are enriched for the expected expression patterns. This assessment uses a novel statistical test that extends the widely used Hypergeometric test of gene set enrichment to account for variability in intergenic lengths. We find that the new CRM prediction method is superior to existing methods. Finally, we experimentally validate 12 new CRM predictions by examining their regulatory activity in vivo in Drosophila; 10 of the tested CRMs were found to be functional, while 6 of the top 7 predictions showed the expected activity patterns. We make our program available as downloadable source code, and as a plugin for a genome browser installed on our servers. PMID:21821659
Discovery of Novel Mammary Developmental and Cancer Genes Using ENU Mutagenesis

DTIC Science & Technology

2002-10-01

death rates we need new therapeutic targets, currently a major challenge facing cancer researchers This requires an understanding of the undiscovered pathways that operate to drive breast cancer cell proliferation, cell survival and cell differentiation, pathways which are also likely to operate during normal mammary development, and which go awry in cancer The discovery of signalling pathways operative in breast cancer has utilised examination of mammary gland development following systemic endocrine ablation or viral insertion, positional cloning in affected families and
Discovery of Seven Novel Mammalian and Avian Coronaviruses in the Genus Deltacoronavirus Supports Bat Coronaviruses as the Gene Source of Alphacoronavirus and Betacoronavirus and Avian Coronaviruses as the Gene Source of Gammacoronavirus and Deltacoronavirus

PubMed Central

Woo, Patrick C. Y.; Lau, Susanna K. P.; Lam, Carol S. F.; Lau, Candy C. Y.; Tsang, Alan K. L.; Lau, John H. N.; Bai, Ru; Teng, Jade L. L.; Tsang, Chris C. C.; Wang, Ming; Zheng, Bo-Jian; Chan, Kwok-Hung

2012-01-01

Recently, we reported the discovery of three novel coronaviruses, bulbul coronavirus HKU11, thrush coronavirus HKU12, and munia coronavirus HKU13, which were identified as representatives of a novel genus, Deltacoronavirus, in the subfamily Coronavirinae. In this territory-wide molecular epidemiology study involving 3,137 mammals and 3,298 birds, we discovered seven additional novel deltacoronaviruses in pigs and birds, which we named porcine coronavirus HKU15, white-eye coronavirus HKU16, sparrow coronavirus HKU17, magpie robin coronavirus HKU18, night heron coronavirus HKU19, wigeon coronavirus HKU20, and common moorhen coronavirus HKU21. Complete genome sequencing and comparative genome analysis showed that the avian and mammalian deltacoronaviruses have similar genome characteristics and structures. They all have relatively small genomes (25.421 to 26.674 kb), the smallest among all coronaviruses. They all have a single papain-like protease domain in the nsp3 gene; an accessory gene, NS6 open reading frame (ORF), located between the M and N genes; and a variable number of accessory genes (up to four) downstream of the N gene. Moreover, they all have the same putative transcription regulatory sequence of ACACCA. Molecular clock analysis showed that the most recent common ancestor of all coronaviruses was estimated at approximately 8100 BC, and those of Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus were at approximately 2400 BC, 3300 BC, 2800 BC, and 3000 BC, respectively. From our studies, it appears that bats and birds, the warm blooded flying vertebrates, are ideal hosts for the coronavirus gene source, bats for Alphacoronavirus and Betacoronavirus and birds for Gammacoronavirus and Deltacoronavirus, to fuel coronavirus evolution and dissemination. PMID:22278237
Discovery of seven novel Mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus.

PubMed

Woo, Patrick C Y; Lau, Susanna K P; Lam, Carol S F; Lau, Candy C Y; Tsang, Alan K L; Lau, John H N; Bai, Ru; Teng, Jade L L; Tsang, Chris C C; Wang, Ming; Zheng, Bo-Jian; Chan, Kwok-Hung; Yuen, Kwok-Yung

2012-04-01

Recently, we reported the discovery of three novel coronaviruses, bulbul coronavirus HKU11, thrush coronavirus HKU12, and munia coronavirus HKU13, which were identified as representatives of a novel genus, Deltacoronavirus, in the subfamily Coronavirinae. In this territory-wide molecular epidemiology study involving 3,137 mammals and 3,298 birds, we discovered seven additional novel deltacoronaviruses in pigs and birds, which we named porcine coronavirus HKU15, white-eye coronavirus HKU16, sparrow coronavirus HKU17, magpie robin coronavirus HKU18, night heron coronavirus HKU19, wigeon coronavirus HKU20, and common moorhen coronavirus HKU21. Complete genome sequencing and comparative genome analysis showed that the avian and mammalian deltacoronaviruses have similar genome characteristics and structures. They all have relatively small genomes (25.421 to 26.674 kb), the smallest among all coronaviruses. They all have a single papain-like protease domain in the nsp3 gene; an accessory gene, NS6 open reading frame (ORF), located between the M and N genes; and a variable number of accessory genes (up to four) downstream of the N gene. Moreover, they all have the same putative transcription regulatory sequence of ACACCA. Molecular clock analysis showed that the most recent common ancestor of all coronaviruses was estimated at approximately 8100 BC, and those of Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus were at approximately 2400 BC, 3300 BC, 2800 BC, and 3000 BC, respectively. From our studies, it appears that bats and birds, the warm blooded flying vertebrates, are ideal hosts for the coronavirus gene source, bats for Alphacoronavirus and Betacoronavirus and birds for Gammacoronavirus and Deltacoronavirus, to fuel coronavirus evolution and dissemination.

COMPUTER-AIDED DRUG DISCOVERY AND DEVELOPMENT (CADDD): in silico-chemico-biological approach

PubMed Central

Kapetanovic, I.M.

2008-01-01

It is generally recognized that drug discovery and development are very time and resources consuming processes. There is an ever growing effort to apply computational power to the combined chemical and biological space in order to streamline drug discovery, design, development and optimization. In biomedical arena, computer-aided or in silico design is being utilized to expedite and facilitate hit identification, hit-to-lead selection, optimize the absorption, distribution, metabolism, excretion and toxicity profile and avoid safety issues. Commonly used computational approaches include ligand-based drug design (pharmacophore, a 3-D spatial arrangement of chemical features essential for biological activity), structure-based drug design (drug-target docking), and quantitative structure-activity and quantitative structure-property relationships. Regulatory agencies as well as pharmaceutical industry are actively involved in development of computational tools that will improve effectiveness and efficiency of drug discovery and development process, decrease use of animals, and increase predictability. It is expected that the power of CADDD will grow as the technology continues to evolve. PMID:17229415
Neofunctionalization of embryonic head patterning genes facilitates the positioning of novel traits on the dorsal head of adult beetles.

PubMed

Zattara, Eduardo E; Busey, Hannah A; Linz, David M; Tomoyasu, Yoshinori; Moczek, Armin P

2016-07-13

The origin and integration of novel traits are fundamental processes during the developmental evolution of complex organisms. Yet how novel traits integrate into pre-existing contexts remains poorly understood. Beetle horns represent a spectacular evolutionary novelty integrated within the context of the adult dorsal head, a highly conserved trait complex present since the origin of insects. We investigated whether otd1/2 and six3, members of a highly conserved gene network that instructs the formation of the anterior end of most bilaterians, also play roles in patterning more recently evolved traits. Using ablation-based fate-mapping, comparative larval RNA interference (RNAi) and transcript sequencing, we found that otd1/2, but not six3, play a fundamental role in the post-embryonic formation of the adult dorsal head and head horns of Onthophagus beetles. By contrast, neither gene appears to pattern the adult head of Tribolium flour beetles even though all are expressed in the dorsal head epidermis of both Onthophagus and Tribolium We propose that, at least in beetles, the roles of otd genes during post-embryonic development are decoupled from their embryonic functions, and that potentially non-functional post-embryonic expression in the dorsal head facilitated their co-option into a novel horn-patterning network during Onthophagus evolution. © 2016 The Author(s).
A monograph proposing the use of canine mammary tumours as a model for the study of hereditary breast cancer susceptibility genes in humans.

PubMed

Goebel, Katie; Merner, Nancy D

2017-05-01

Canines are excellent models for cancer studies due to their similar physiology and genomic sequence to humans, companion status and limited intra-breed heterogeneity. Due to their affliction to mammary cancers, canines can serve as powerful genetic models of hereditary breast cancers. Variants within known human breast cancer susceptibility genes only explain a fraction of familial cases. Thus, further discovery is necessary but such efforts have been thwarted by genetic heterogeneity. Reducing heterogeneity is key, and studying isolated human populations have helped in the endeavour. An alternative is to study dog pedigrees, since artificial selection has resulted in extreme homogeneity. Identifying the genetic predisposition to canine mammary tumours can translate to human discoveries - a strategy currently underutilized. To explore this potential, we reviewed published canine mammary tumour genetic studies and proposed benefits of next generation sequencing canine cohorts to facilitate moving beyond incremental advances.
Technical guide for applications of gene expression profiling in human health risk assessment of environmental chemicals.

PubMed

Bourdon-Lacombe, Julie A; Moffat, Ivy D; Deveau, Michelle; Husain, Mainul; Auerbach, Scott; Krewski, Daniel; Thomas, Russell S; Bushel, Pierre R; Williams, Andrew; Yauk, Carole L

2015-07-01

Toxicogenomics promises to be an important part of future human health risk assessment of environmental chemicals. The application of gene expression profiles (e.g., for hazard identification, chemical prioritization, chemical grouping, mode of action discovery, and quantitative analysis of response) is growing in the literature, but their use in formal risk assessment by regulatory agencies is relatively infrequent. Although additional validations for specific applications are required, gene expression data can be of immediate use for increasing confidence in chemical evaluations. We believe that a primary reason for the current lack of integration is the limited practical guidance available for risk assessment specialists with limited experience in genomics. The present manuscript provides basic information on gene expression profiling, along with guidance on evaluating the quality of genomic experiments and data, and interpretation of results presented in the form of heat maps, pathway analyses and other common approaches. Moreover, potential ways to integrate information from gene expression experiments into current risk assessment are presented using published studies as examples. The primary objective of this work is to facilitate integration of gene expression data into human health risk assessments of environmental chemicals. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
Down-Regulation of Gene Expression by RNA-Induced Gene Silencing

NASA Astrophysics Data System (ADS)

Travella, Silvia; Keller, Beat

Down-regulation of endogenous genes via post-transcriptional gene silencing (PTGS) is a key to the characterization of gene function in plants. Many RNA-based silencing mechanisms such as post-transcriptional gene silencing, co-suppression, quelling, and RNA interference (RNAi) have been discovered among species of different kingdoms (plants, fungi, and animals). One of the most interesting discoveries was RNAi, a sequence-specific gene-silencing mechanism initiated by the introduction of double-stranded RNA (dsRNA), homologous in sequence to the silenced gene, which triggers degradation of mRNA. Infection of plants with modified viruses can also induce RNA silencing and is referred to as virus-induced gene silencing (VIGS). In contrast to insertional mutagenesis, these emerging new reverse genetic approaches represent a powerful tool for exploring gene function and for manipulating gene expression experimentally in cereal species such as barley and wheat. We examined how RNAi and VIGS have been used to assess gene function in barley and wheat, including molecular mechanisms involved in the process and available methodological elements, such as vectors, inoculation procedures, and analysis of silenced phenotypes.
Identification of novel mutations in the XLRS1 gene in Chinese patients with X-linked juvenile retinoschisis.

PubMed

Zeng, Meizhen; Yi, Changxian; Guo, Xiangming; Jia, Xiaoyun; Deng, Yan; Wang, Juan; Shen, Huangxuan

2007-01-01

X-linked juvenile retinoschisis (XLRS) is a major cause of macular degeneration in young men. In this study we analyzed all six exons of the XLRS1 gene in four sporadic XLRS patients and in an affected family in China who were recently diagnosed. We found there are five different mutations with four containing missense point mutations and one having a frame-shift deletion. Among these mutations both c.644A>T and c.520delC are novel and have not been previously reported. Moreover all the second-generation offsprings and most of the third-generation ones in the affected family were found to carry the mutations bearing X chromosome. The discovery of novel mutations in the XLRS1 gene would increase the available information about the spectrum of genetic abnormalities causing XLRS. Although the limited data failed to reveal a correlation between mutations and disease phenotypes our identification of novel mutations in the XLRS1 gene will facilitate early and correct diagnosis and genetic counseling regarding the prognosis of XLRS disease.
Discovery Systems

NASA Technical Reports Server (NTRS)

Pell, Barney

2003-01-01

A viewgraph presentation on NASA's Discovery Systems Project is given. The topics of discussion include: 1) NASA's Computing Information and Communications Technology Program; 2) Discovery Systems Program; and 3) Ideas for Information Integration Using the Web.
The Gene Set Builder: collation, curation, and distribution of sets of genes

PubMed Central

Yusuf, Dimas; Lim, Jonathan S; Wasserman, Wyeth W

2005-01-01

Background In bioinformatics and genomics, there are many applications designed to investigate the common properties for a set of genes. Often, these multi-gene analysis tools attempt to reveal sequential, functional, and expressional ties. However, while tremendous effort has been invested in developing tools that can analyze a set of genes, minimal effort has been invested in developing tools that can help researchers compile, store, and annotate gene sets in the first place. As a result, the process of making or accessing a set often involves tedious and time consuming steps such as finding identifiers for each individual gene. These steps are often repeated extensively to shift from one identifier type to another; or to recreate a published set. In this paper, we present a simple online tool which – with the help of the gene catalogs Ensembl and GeneLynx – can help researchers build and annotate sets of genes quickly and easily. Description The Gene Set Builder is a database-driven, web-based tool designed to help researchers compile, store, export, and share sets of genes. This application supports the 17 eukaryotic genomes found in version 32 of the Ensembl database, which includes species from yeast to human. User-created information such as sets and customized annotations are stored to facilitate easy access. Gene sets stored in the system can be "exported" in a variety of output formats – as lists of identifiers, in tables, or as sequences. In addition, gene sets can be "shared" with specific users to facilitate collaborations or fully released to provide access to published results. The application also features a Perl API (Application Programming Interface) for direct connectivity to custom analysis tools. A downloadable Quick Reference guide and an online tutorial are available to help new users learn its functionalities. Conclusion The Gene Set Builder is an Ensembl-facilitated online tool designed to help researchers compile and manage sets of
Personal discovery in diabetes self-management: Discovering cause and effect using self-monitoring data

PubMed Central

Mamykina, Lena; Heitkemper, Elizabeth M.; Smaldone, Arlene M.; Kukafka, Rita; Cole-Lewis, Heather J.; Davidson, Patricia G.; Mynatt, Elizabeth D.; Cassells, Andrea; Tobin, Jonathan N.; Hripcsak, George

2017-01-01

Objective To outline new design directions for informatics solutions that facilitate personal discovery with self-monitoring data. We investigate this question in the context of chronic disease self-management with the focus on type 2 diabetes. Materials and methods We conducted an observational qualitative study of discovery with personal data among adults attending a diabetes self-management education (DSME) program that utilized a discovery-based curriculum. The study included observations of class sessions, and interviews and focus groups with the educator and attendees of the program (n = 14). Results The main discovery in diabetes self-management evolved around discovering patterns of association between characteristics of individuals’ activities and changes in their blood glucose levels that the participants referred to as “cause and effect”. This discovery empowered individuals to actively engage in self-management and provided a desired flexibility in selection of personalized self-management strategies. We show that discovery of cause and effect involves four essential phases: (1) feature selection, (2) hypothesis generation, (3) feature evaluation, and (4) goal specification. Further, we identify opportunities to support discovery at each stage with informatics and data visualization solutions by providing assistance with: (1) active manipulation of collected data (e.g., grouping, filtering and side-by-side inspection), (2) hypotheses formulation (e.g., using natural language statements or constructing visual queries), (3) inference evaluation (e.g., through aggregation and visual comparison, and statistical analysis of associations), and (4) translation of discoveries into actionable goals (e.g., tailored selection from computable knowledge sources of effective diabetes self-management behaviors). Discussion The study suggests that discovery of cause and effect in diabetes can be a powerful approach to helping individuals to improve their self
Personal discovery in diabetes self-management: Discovering cause and effect using self-monitoring data.

PubMed

Mamykina, Lena; Heitkemper, Elizabeth M; Smaldone, Arlene M; Kukafka, Rita; Cole-Lewis, Heather J; Davidson, Patricia G; Mynatt, Elizabeth D; Cassells, Andrea; Tobin, Jonathan N; Hripcsak, George

2017-12-01

To outline new design directions for informatics solutions that facilitate personal discovery with self-monitoring data. We investigate this question in the context of chronic disease self-management with the focus on type 2 diabetes. We conducted an observational qualitative study of discovery with personal data among adults attending a diabetes self-management education (DSME) program that utilized a discovery-based curriculum. The study included observations of class sessions, and interviews and focus groups with the educator and attendees of the program (n = 14). The main discovery in diabetes self-management evolved around discovering patterns of association between characteristics of individuals' activities and changes in their blood glucose levels that the participants referred to as "cause and effect". This discovery empowered individuals to actively engage in self-management and provided a desired flexibility in selection of personalized self-management strategies. We show that discovery of cause and effect involves four essential phases: (1) feature selection, (2) hypothesis generation, (3) feature evaluation, and (4) goal specification. Further, we identify opportunities to support discovery at each stage with informatics and data visualization solutions by providing assistance with: (1) active manipulation of collected data (e.g., grouping, filtering and side-by-side inspection), (2) hypotheses formulation (e.g., using natural language statements or constructing visual queries), (3) inference evaluation (e.g., through aggregation and visual comparison, and statistical analysis of associations), and (4) translation of discoveries into actionable goals (e.g., tailored selection from computable knowledge sources of effective diabetes self-management behaviors). The study suggests that discovery of cause and effect in diabetes can be a powerful approach to helping individuals to improve their self-management strategies, and that self-monitoring data can
MUFFINN: cancer gene discovery via network analysis of somatic mutation data.

PubMed

Cho, Ara; Shim, Jung Eun; Kim, Eiru; Supek, Fran; Lehner, Ben; Lee, Insuk

2016-06-23

A major challenge for distinguishing cancer-causing driver mutations from inconsequential passenger mutations is the long-tail of infrequently mutated genes in cancer genomes. Here, we present and evaluate a method for prioritizing cancer genes accounting not only for mutations in individual genes but also in their neighbors in functional networks, MUFFINN (MUtations For Functional Impact on Network Neighbors). This pathway-centric method shows high sensitivity compared with gene-centric analyses of mutation data. Notably, only a marginal decrease in performance is observed when using 10 % of TCGA patient samples, suggesting the method may potentiate cancer genome projects with small patient populations.
14 CFR 406.143 - Discovery.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 14 Aeronautics and Space 4 2014-01-01 2014-01-01 false Discovery. 406.143 Section 406.143... Transportation Adjudications § 406.143 Discovery. (a) Initiation of discovery. Any party may initiate discovery... after a complaint has been filed. (b) Methods of discovery. The following methods of discovery are...
14 CFR 406.143 - Discovery.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 14 Aeronautics and Space 4 2011-01-01 2011-01-01 false Discovery. 406.143 Section 406.143... Transportation Adjudications § 406.143 Discovery. (a) Initiation of discovery. Any party may initiate discovery... after a complaint has been filed. (b) Methods of discovery. The following methods of discovery are...
14 CFR 406.143 - Discovery.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 14 Aeronautics and Space 4 2012-01-01 2012-01-01 false Discovery. 406.143 Section 406.143... Transportation Adjudications § 406.143 Discovery. (a) Initiation of discovery. Any party may initiate discovery... after a complaint has been filed. (b) Methods of discovery. The following methods of discovery are...
14 CFR 406.143 - Discovery.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 14 Aeronautics and Space 4 2013-01-01 2013-01-01 false Discovery. 406.143 Section 406.143... Transportation Adjudications § 406.143 Discovery. (a) Initiation of discovery. Any party may initiate discovery... after a complaint has been filed. (b) Methods of discovery. The following methods of discovery are...
14 CFR 406.143 - Discovery.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 14 Aeronautics and Space 4 2010-01-01 2010-01-01 false Discovery. 406.143 Section 406.143... Transportation Adjudications § 406.143 Discovery. (a) Initiation of discovery. Any party may initiate discovery... after a complaint has been filed. (b) Methods of discovery. The following methods of discovery are...
Strain Prioritization for Natural Product Discovery by a High-Throughput Real-Time PCR Method

PubMed Central

2015-01-01

Natural products offer unmatched chemical and structural diversity compared to other small-molecule libraries, but traditional natural product discovery programs are not sustainable, demanding too much time, effort, and resources. Here we report a strain prioritization method for natural product discovery. Central to the method is the application of real-time PCR, targeting genes characteristic to the biosynthetic machinery of natural products with distinct scaffolds in a high-throughput format. The practicality and effectiveness of the method were showcased by prioritizing 1911 actinomycete strains for diterpenoid discovery. A total of 488 potential diterpenoid producers were identified, among which six were confirmed as platensimycin and platencin dual producers and one as a viguiepinol and oxaloterpin producer. While the method as described is most appropriate to prioritize strains for discovering specific natural products, variations of this method should be applicable to the discovery of other classes of natural products. Applications of genome sequencing and genome mining to the high-priority strains could essentially eliminate the chance elements from traditional discovery programs and fundamentally change how natural products are discovered. PMID:25238028
The Fragile X Mental Retardation Syndrome 20 Years After the FMR1 Gene Discovery: an Expanding Universe of Knowledge

PubMed Central

Rousseau, François; Labelle, Yves; Bussières, Johanne; Lindsay, Carmen

2011-01-01

The fragile X mental retardation (FXMR) syndrome is one of the most frequent causes of mental retardation. Affected individuals display a wide range of additional characteristic features including behavioural and physical phenotypes, and the extent to which individuals are affected is highly variable. For these reasons, elucidation of the pathophysiology of this disease has been an important challenge to the scientific community. 1991 marks the year of the discovery of both the FMR1 gene mutations involved in this disease, and of their dynamic nature. Although a mouse model for the disease has been available for 16 years and extensive research has been performed on the FMR1 protein (FMRP), we still understand little about how the disease develops, and no treatment has yet been shown to be effective. In this review, we summarise current knowledge on FXMR with an emphasis on the technical challenges of molecular diagnostics, on its prevalence and dynamics among populations, and on the potential of screening for FMR1 mutations. PMID:21912443
The fragile x mental retardation syndrome 20 years after the FMR1 gene discovery: an expanding universe of knowledge.

PubMed

Rousseau, François; Labelle, Yves; Bussières, Johanne; Lindsay, Carmen

2011-08-01

The fragile X mental retardation (FXMR) syndrome is one of the most frequent causes of mental retardation. Affected individuals display a wide range of additional characteristic features including behavioural and physical phenotypes, and the extent to which individuals are affected is highly variable. For these reasons, elucidation of the pathophysiology of this disease has been an important challenge to the scientific community. 1991 marks the year of the discovery of both the FMR1 gene mutations involved in this disease, and of their dynamic nature. Although a mouse model for the disease has been available for 16 years and extensive research has been performed on the FMR1 protein (FMRP), we still understand little about how the disease develops, and no treatment has yet been shown to be effective. In this review, we summarise current knowledge on FXMR with an emphasis on the technical challenges of molecular diagnostics, on its prevalence and dynamics among populations, and on the potential of screening for FMR1 mutations.
High-throughput discovery of rare human nucleotide polymorphisms by Ecotilling

PubMed Central

Till, Bradley J.; Zerr, Troy; Bowers, Elisabeth; Greene, Elizabeth A.; Comai, Luca; Henikoff, Steven

2006-01-01

Human individuals differ from one another at only ∼0.1% of nucleotide positions, but these single nucleotide differences account for most heritable phenotypic variation. Large-scale efforts to discover and genotype human variation have been limited to common polymorphisms. However, these efforts overlook rare nucleotide changes that may contribute to phenotypic diversity and genetic disorders, including cancer. Thus, there is an increasing need for high-throughput methods to robustly detect rare nucleotide differences. Toward this end, we have adapted the mismatch discovery method known as Ecotilling for the discovery of human single nucleotide polymorphisms. To increase throughput and reduce costs, we developed a universal primer strategy and implemented algorithms for automated band detection. Ecotilling was validated by screening 90 human DNA samples for nucleotide changes in 5 gene targets and by comparing results to public resequencing data. To increase throughput for discovery of rare alleles, we pooled samples 8-fold and found Ecotilling to be efficient relative to resequencing, with a false negative rate of 5% and a false discovery rate of 4%. We identified 28 new rare alleles, including some that are predicted to damage protein function. The detection of rare damaging mutations has implications for models of human disease. PMID:16893952

Integrated physical map of bread wheat chromosome arm 7DS to facilitate gene cloning and comparative studies.

PubMed

Tulpová, Zuzana; Luo, Ming-Cheng; Toegelová, Helena; Visendi, Paul; Hayashi, Satomi; Vojta, Petr; Paux, Etienne; Kilian, Andrzej; Abrouk, Michaël; Bartoš, Jan; Hajdúch, Marián; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana

2018-03-08

Bread wheat (Triticum aestivum L.) is a staple food for a significant part of the world's population. The growing demand on its production can be satisfied by improving yield and resistance to biotic and abiotic stress. Knowledge of the genome sequence would aid in discovering genes and QTLs underlying these traits and provide a basis for genomics-assisted breeding. Physical maps and BAC clones associated with them have been valuable resources from which to generate a reference genome of bread wheat and to assist map-based gene cloning. As a part of a joint effort coordinated by the International Wheat Genome Sequencing Consortium, we have constructed a BAC-based physical map of bread wheat chromosome arm 7DS consisting of 895 contigs and covering 94% of its estimated length. By anchoring BAC contigs to one radiation hybrid map and three high resolution genetic maps, we assigned 73% of the assembly to a distinct genomic position. This map integration, interconnecting a total of 1713 markers with ordered and sequenced BAC clones from a minimal tiling path, provides a tool to speed up gene cloning in wheat. The process of physical map assembly included the integration of the 7DS physical map with a whole-genome physical map of Aegilops tauschii and a 7DS Bionano genome map, which together enabled efficient scaffolding of physical-map contigs, even in the non-recombining region of the genetic centromere. Moreover, this approach facilitated a comparison of bread wheat and its ancestor at BAC-contig level and revealed a reconstructed region in the 7DS pericentromere. Copyright © 2018. Published by Elsevier B.V.
hNaa10p contributes to tumorigenesis by facilitating DNMT1-mediated tumor suppressor gene silencing

PubMed Central

Lee, Chung-Fan; Ou, Derick S.-C.; Lee, Sung-Bau; Chang, Liang-Hao; Lin, Ruo-Kai; Li, Ying-Shiuan; Upadhyay, Anup K.; Cheng, Xiaodong; Wang, Yi-Ching; Hsu, Han-Shui; Hsiao, Michael; Wu, Cheng-Wen; Juan, Li-Jung

2010-01-01

Hypermethylation-mediated tumor suppressor gene silencing plays a crucial role in tumorigenesis. Understanding its underlying mechanism is essential for cancer treatment. Previous studies on human N-α-acetyltransferase 10, NatA catalytic subunit (hNaa10p; also known as human arrest-defective 1 [hARD1]), have generated conflicting results with regard to its role in tumorigenesis. Here we provide multiple lines of evidence indicating that it is oncogenic. We have shown that hNaa10p overexpression correlated with poor survival of human lung cancer patients. In vitro, enforced expression of hNaa10p was sufficient to cause cellular transformation, and siRNA-mediated depletion of hNaa10p impaired cancer cell proliferation in colony assays and xenograft studies. The oncogenic potential of hNaa10p depended on its interaction with DNA methyltransferase 1 (DNMT1). Mechanistically, hNaa10p positively regulated DNMT1 enzymatic activity by facilitating its binding to DNA in vitro and its recruitment to promoters of tumor suppressor genes, such as E-cadherin, in vivo. Consistent with this, interaction between hNaa10p and DNMT1 was required for E-cadherin silencing through promoter CpG methylation, and E-cadherin repression contributed to the oncogenic effects of hNaa10p. Together, our data not only establish hNaa10p as an oncoprotein, but also reveal that it contributes to oncogenesis through modulation of DNMT1 function. PMID:20592467
Refinement of light-responsive transcript lists using rice oligonucleotide arrays: evaluation of gene-redundancy.

PubMed

Jung, Ki-Hong; Dardick, Christopher; Bartley, Laura E; Cao, Peijian; Phetsom, Jirapa; Canlas, Patrick; Seo, Young-Su; Shultz, Michael; Ouyang, Shu; Yuan, Qiaoping; Frank, Bryan C; Ly, Eugene; Zheng, Li; Jia, Yi; Hsia, An-Ping; An, Kyungsook; Chou, Hui-Hsien; Rocke, David; Lee, Geun Cheol; Schnable, Patrick S; An, Gynheung; Buell, C Robin; Ronald, Pamela C

2008-10-06

Studies of gene function are often hampered by gene-redundancy, especially in organisms with large genomes such as rice (Oryza sativa). We present an approach for using transcriptomics data to focus functional studies and address redundancy. To this end, we have constructed and validated an inexpensive and publicly available rice oligonucleotide near-whole genome array, called the rice NSF45K array. We generated expression profiles for light- vs. dark-grown rice leaf tissue and validated the biological significance of the data by analyzing sources of variation and confirming expression trends with reverse transcription polymerase chain reaction. We examined trends in the data by evaluating enrichment of gene ontology terms at multiple false discovery rate thresholds. To compare data generated with the NSF45K array with published results, we developed publicly available, web-based tools (www.ricearray.org). The Oligo and EST Anatomy Viewer enables visualization of EST-based expression profiling data for all genes on the array. The Rice Multi-platform Microarray Search Tool facilitates comparison of gene expression profiles across multiple rice microarray platforms. Finally, we incorporated gene expression and biochemical pathway data to reduce the number of candidate gene products putatively participating in the eight steps of the photorespiration pathway from 52 to 10, based on expression levels of putatively functionally redundant genes. We confirmed the efficacy of this method to cope with redundancy by correctly predicting participation in photorespiration of a gene with five paralogs. Applying these methods will accelerate rice functional genomics.
An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework.

PubMed

Chen, Yi-An; Tripathi, Lokesh P; Mizuguchi, Kenji

2016-01-01

Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org. © The Author(s) 2016. Published by Oxford University Press.
An integrative data analysis platform for gene set analysis and knowledge discovery in a data warehouse framework

PubMed Central

Chen, Yi-An; Tripathi, Lokesh P.; Mizuguchi, Kenji

2016-01-01

Data analysis is one of the most critical and challenging steps in drug discovery and disease biology. A user-friendly resource to visualize and analyse high-throughput data provides a powerful medium for both experimental and computational biologists to understand vastly different biological data types and obtain a concise, simplified and meaningful output for better knowledge discovery. We have previously developed TargetMine, an integrated data warehouse optimized for target prioritization. Here we describe how upgraded and newly modelled data types in TargetMine can now survey the wider biological and chemical data space, relevant to drug discovery and development. To enhance the scope of TargetMine from target prioritization to broad-based knowledge discovery, we have also developed a new auxiliary toolkit to assist with data analysis and visualization in TargetMine. This toolkit features interactive data analysis tools to query and analyse the biological data compiled within the TargetMine data warehouse. The enhanced system enables users to discover new hypotheses interactively by performing complicated searches with no programming and obtaining the results in an easy to comprehend output format. Database URL: http://targetmine.mizuguchilab.org PMID:26989145
Molecular-genetic imaging based on reporter gene expression.

PubMed

Kang, Joo Hyun; Chung, June-Key

2008-06-01

Molecular imaging includes proteomic, metabolic, cellular biologic process, and genetic imaging. In a narrow sense, molecular imaging means genetic imaging and can be called molecular-genetic imaging. Imaging reporter genes play a leading role in molecular-genetic imaging. There are 3 major methods of molecular-genetic imaging, based on optical, MRI, and nuclear medicine modalities. For each of these modalities, various reporter genes and probes have been developed, and these have resulted in successful transitions from bench to bedside applications. Each of these imaging modalities has its unique advantages and disadvantages. Fluorescent and bioluminescent optical imaging modalities are simple, less expensive, more convenient, and more user friendly than other imaging modalities. Another advantage, especially of bioluminescence imaging, is its ability to detect low levels of gene expression. MRI has the advantage of high spatial resolution, whereas nuclear medicine methods are highly sensitive and allow data from small-animal imaging studies to be translated to clinical practice. Moreover, multimodality imaging reporter genes will allow us to choose the imaging technologies that are most appropriate for the biologic problem at hand and facilitate the clinical application of reporter gene technologies. Reporter genes can be used to visualize the levels of expression of particular exogenous and endogenous genes and several intracellular biologic phenomena, including specific signal transduction pathways, nuclear receptor activities, and protein-protein interactions. This technique provides a straightforward means of monitoring tumor mass and can visualize the in vivo distributions of target cells, such as immune cells and stem cells. Molecular imaging has gradually evolved into an important tool for drug discovery and development, and transgenic mice with an imaging reporter gene can be useful during drug and stem cell therapy development. Moreover, instrumentation
Machine Learning for Detecting Gene-Gene Interactions

PubMed Central

McKinney, Brett A.; Reif, David M.; Ritchie, Marylyn D.; Moore, Jason H.

2011-01-01

Complex interactions among genes and environmental factors are known to play a role in common human disease aetiology. There is a growing body of evidence to suggest that complex interactions are ‘the norm’ and, rather than amounting to a small perturbation to classical Mendelian genetics, interactions may be the predominant effect. Traditional statistical methods are not well suited for detecting such interactions, especially when the data are high dimensional (many attributes or independent variables) or when interactions occur between more than two polymorphisms. In this review, we discuss machine-learning models and algorithms for identifying and characterising susceptibility genes in common, complex, multifactorial human diseases. We focus on the following machine-learning methods that have been used to detect gene-gene interactions: neural networks, cellular automata, random forests, and multifactor dimensionality reduction. We conclude with some ideas about how these methods and others can be integrated into a comprehensive and flexible framework for data mining and knowledge discovery in human genetics. PMID:16722772
Limitations and potentials of current motif discovery algorithms

PubMed Central

Hu, Jianjun; Li, Bin; Kihara, Daisuke

2005-01-01

Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 6–45% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them. PMID:16284194
Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes.

PubMed

Zhang, Min; Zhang, Lin; Zou, Jinfeng; Yao, Chen; Xiao, Hui; Liu, Qing; Wang, Jing; Wang, Dong; Wang, Chenguang; Guo, Zheng

2009-07-01

According to current consistency metrics such as percentage of overlapping genes (POG), lists of differentially expressed genes (DEGs) detected from different microarray studies for a complex disease are often highly inconsistent. This irreproducibility problem also exists in other high-throughput post-genomic areas such as proteomics and metabolism. A complex disease is often characterized with many coordinated molecular changes, which should be considered when evaluating the reproducibility of discovery lists from different studies. We proposed metrics percentage of overlapping genes-related (POGR) and normalized POGR (nPOGR) to evaluate the consistency between two DEG lists for a complex disease, considering correlated molecular changes rather than only counting gene overlaps between the lists. Based on microarray datasets of three diseases, we showed that though the POG scores for DEG lists from different studies for each disease are extremely low, the POGR and nPOGR scores can be rather high, suggesting that the apparently inconsistent DEG lists may be highly reproducible in the sense that they are actually significantly correlated. Observing different discovery results for a disease by the POGR and nPOGR scores will obviously reduce the uncertainty of the microarray studies. The proposed metrics could also be applicable in many other high-throughput post-genomic areas.
A method to facilitate and monitor expression of exogenous genes in the rat kidney using plasmid and viral vectors

PubMed Central

Corridon, Peter R.; Rhodes, George J.; Leonard, Ellen C.; Basile, David P.; Gattone, Vincent H.; Bacallao, Robert L.

2013-01-01

Gene therapy has been proposed as a novel alternative to treat kidney disease. This goal has been hindered by the inability to reliably deliver transgenes to target cells throughout the kidney, while minimizing injury. Since hydrodynamic forces have previously shown promising results, we optimized this approach and designed a method that utilizes retrograde renal vein injections to facilitate transgene expression in rat kidneys. We show, using intravital fluorescence two-photon microscopy, that fluorescent albumin and dextrans injected into the renal vein under defined conditions of hydrodynamic pressure distribute broadly throughout the kidney in live animals. We found injection parameters that result in no kidney injury as determined by intravital microscopy, histology, and serum creatinine measurements. Plasmids, baculovirus, and adenovirus vectors, designed to express EGFP, EGFP-actin, EGFP-occludin, EGFP-tubulin, tdTomato-H2B, or RFP-actin fusion proteins, were introduced into live kidneys in a similar fashion. Gene expression was then observed in live and ex vivo kidneys using two-photon imaging and confocal laser scanning microscopy. We recorded widespread fluorescent protein expression lasting more than 1 mo after introduction of transgenes. Plasmid and adenovirus vectors provided gene transfer efficiencies ranging from 50 to 90%, compared with 10–50% using baculovirus. Using plasmids and adenovirus, fluorescent protein expression was observed 1) in proximal and distal tubule epithelial cells; 2) within glomeruli; and 3) within the peritubular interstitium. In isolated kidneys, fluorescent protein expression was observed from the cortex to the papilla. These results provide a robust approach for gene delivery and the study of protein function in live mammal kidneys. PMID:23467422
Glycosyltransferase Gene Expression Profiles Classify Cancer Types and Propose Prognostic Subtypes

NASA Astrophysics Data System (ADS)

Ashkani, Jahanshah; Naidoo, Kevin J.

2016-05-01

Aberrant glycosylation in tumours stem from altered glycosyltransferase (GT) gene expression but can the expression profiles of these signature genes be used to classify cancer types and lead to cancer subtype discovery? The differential structural changes to cellular glycan structures are predominantly regulated by the expression patterns of GT genes and are a hallmark of neoplastic cell metamorphoses. We found that the expression of 210 GT genes taken from 1893 cancer patient samples in The Cancer Genome Atlas (TCGA) microarray data are able to classify six cancers; breast, ovarian, glioblastoma, kidney, colon and lung. The GT gene expression profiles are used to develop cancer classifiers and propose subtypes. The subclassification of breast cancer solid tumour samples illustrates the discovery of subgroups from GT genes that match well against basal-like and HER2-enriched subtypes and correlates to clinical, mutation and survival data. This cancer type glycosyltransferase gene signature finding provides foundational evidence for the centrality of glycosylation in cancer.
High Throughput Screening for Anti–Trypanosoma cruzi Drug Discovery

PubMed Central

Alonso-Padilla, Julio; Rodríguez, Ana

2014-01-01

The discovery of new therapeutic options against Trypanosoma cruzi, the causative agent of Chagas disease, stands as a fundamental need. Currently, there are only two drugs available to treat this neglected disease, which represents a major public health problem in Latin America. Both available therapies, benznidazole and nifurtimox, have significant toxic side effects and their efficacy against the life-threatening symptomatic chronic stage of the disease is variable. Thus, there is an urgent need for new, improved anti–T. cruzi drugs. With the objective to reliably accelerate the drug discovery process against Chagas disease, several advances have been made in the last few years. Availability of engineered reporter gene expressing parasites triggered the development of phenotypic in vitro assays suitable for high throughput screening (HTS) as well as the establishment of new in vivo protocols that allow faster experimental outcomes. Recently, automated high content microscopy approaches have also been used to identify new parasitic inhibitors. These in vitro and in vivo early drug discovery approaches, which hopefully will contribute to bring better anti–T. cruzi drug entities in the near future, are reviewed here. PMID:25474364
High throughput screening for anti-Trypanosoma cruzi drug discovery.

PubMed

Alonso-Padilla, Julio; Rodríguez, Ana

2014-12-01

The discovery of new therapeutic options against Trypanosoma cruzi, the causative agent of Chagas disease, stands as a fundamental need. Currently, there are only two drugs available to treat this neglected disease, which represents a major public health problem in Latin America. Both available therapies, benznidazole and nifurtimox, have significant toxic side effects and their efficacy against the life-threatening symptomatic chronic stage of the disease is variable. Thus, there is an urgent need for new, improved anti-T. cruzi drugs. With the objective to reliably accelerate the drug discovery process against Chagas disease, several advances have been made in the last few years. Availability of engineered reporter gene expressing parasites triggered the development of phenotypic in vitro assays suitable for high throughput screening (HTS) as well as the establishment of new in vivo protocols that allow faster experimental outcomes. Recently, automated high content microscopy approaches have also been used to identify new parasitic inhibitors. These in vitro and in vivo early drug discovery approaches, which hopefully will contribute to bring better anti-T. cruzi drug entities in the near future, are reviewed here.
Large-Scale Discovery of Disease-Disease and Disease-Gene Associations

PubMed Central

Gligorijevic, Djordje; Stojanovic, Jelena; Djuric, Nemanja; Radosavljevic, Vladan; Grbovic, Mihajlo; Kulathinal, Rob J.; Obradovic, Zoran

2016-01-01

Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies. PMID:27578529
A hybrid computational method for the discovery of novel reproduction-related genes.

PubMed

Chen, Lei; Chu, Chen; Kong, Xiangyin; Huang, Guohua; Huang, Tao; Cai, Yu-Dong

2015-01-01

Uncovering the molecular mechanisms underlying reproduction is of great importance to infertility treatment and to the generation of healthy offspring. In this study, we discovered novel reproduction-related genes with a hybrid computational method, integrating three different types of method, which offered new clues for further reproduction research. This method was first executed on a weighted graph, constructed based on known protein-protein interactions, to search the shortest paths connecting any two known reproduction-related genes. Genes occurring in these paths were deemed to have a special relationship with reproduction. These newly discovered genes were filtered with a randomization test. Then, the remaining genes were further selected according to their associations with known reproduction-related genes measured by protein-protein interaction score and alignment score obtained by BLAST. The in-depth analysis of the high confidence novel reproduction genes revealed hidden mechanisms of reproduction and provided guidelines for further experimental validations.
The discovery of the periodic table as a case of simultaneous discovery.

PubMed

Scerri, Eric

2015-03-13

The article examines the question of priority and simultaneous discovery in the context of the discovery of the periodic system. It is argued that rather than being anomalous, simultaneous discovery is the rule. Moreover, I argue that the discovery of the periodic system by at least six authors in over a period of 7 years represents one of the best examples of a multiple discovery. This notion is supported by a new view of the evolutionary development of science through a mechanism that is dubbed Sci-Gaia by analogy with Lovelock's Gaia hypothesis. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
Gene expression in thiazide diuretic or statin users in relation to incident type 2 diabetes

PubMed Central

Suchy-Dicey, Astrid; Heckbert, Susan R; Smith, Nicholas L; McKnight, Barbara; Rotter, Jerome I; Chen, YD Ida; Psaty, Bruce M; Enquobahrie, Daniel A

2014-01-01

Thiazide diuretics and statins are used to improve cardiovascular outcomes, but may also cause type 2 diabetes (T2DM), although mechanisms are unknown. Gene expression studies may facilitate understanding of these associations. Participants from ongoing population-based studies were sampled for these longitudinal studies of peripheral blood microarray gene expression, and followed to incident diabetes. All sampled subjects were statin or thiazide users. Those who developed diabetes during follow-up comprised cases (44 thiazide users; 19 statin users), and were matched to drug-using controls who did not develop diabetes on several factors. Supervised normalization, surrogate variable analyses removed technical bias and confounding. Differentially-expressed genes were those with a false discovery rate Q-value<0.05. Among thiazide users, diabetes cases had significantly different expression of CCL14 (down-regulated 6%, Q-value=0.0257), compared with controls. Among statin users, diabetes cases had marginal but insignificantly different expression of ZNF532 (up-regulated 15%, Q-value=0.0584), CXORF21 (up-regulated 11%, Q-value=0.0584), and ZNHIT3 (up-regulated 19%, Q-value=0.0959), compared with controls. These genes comprise potential targets for future expression or mechanistic research on medication-related diabetes development. PMID:24596594
Discovery of genomic intervals that underlie nematode responses to benzimidazoles.

PubMed

Zamanian, Mostafa; Cook, Daniel E; Zdraljevic, Stefan; Brady, Shannon C; Lee, Daehan; Lee, Junho; Andersen, Erik C

2018-03-01

Parasitic nematodes impose a debilitating health and economic burden across much of the world. Nematode resistance to anthelmintic drugs threatens parasite control efforts in both human and veterinary medicine. Despite this threat, the genetic landscape of potential resistance mechanisms to these critical drugs remains largely unexplored. Here, we exploit natural variation in the model nematodes Caenorhabditis elegans and Caenorhabditis briggsae to discover quantitative trait loci (QTL) that control sensitivity to benzimidazoles widely used in human and animal medicine. High-throughput phenotyping of albendazole, fenbendazole, mebendazole, and thiabendazole responses in panels of recombinant lines led to the discovery of over 15 QTL in C. elegans and four QTL in C. briggsae associated with divergent responses to these anthelmintics. Many of these QTL are conserved across benzimidazole derivatives, but others show drug and dose specificity. We used near-isogenic lines to recapitulate and narrow the C. elegans albendazole QTL of largest effect and identified candidate variants correlated with the resistance phenotype. These QTL do not overlap with known benzimidazole target resistance genes from parasitic nematodes and present specific new leads for the discovery of novel mechanisms of nematode benzimidazole resistance. Analyses of orthologous genes reveal conservation of candidate benzimidazole resistance genes in medically important parasitic nematodes. These data provide a basis for extending these approaches to other anthelmintic drug classes and a pathway towards validating new markers for anthelmintic resistance that can be deployed to improve parasite disease control.
"Eureka, Eureka!" Discoveries in Science

ERIC Educational Resources Information Center

Agarwal, Pankaj

2011-01-01

Accidental discoveries have been of significant value in the progress of science. Although accidental discoveries are more common in pharmacology and chemistry, other branches of science have also benefited from such discoveries. While most discoveries are the result of persistent research, famous accidental discoveries provide a fascinating…
Anthropogenic Habitats Facilitate Dispersal of an Early Successional Obligate: Implications for Restoration of an Endangered Ecosystem.

PubMed

Amaral, Katrina E; Palace, Michael; O'Brien, Kathleen M; Fenderson, Lindsey E; Kovach, Adrienne I

2016-01-01

Landscape modification and habitat fragmentation disrupt the connectivity of natural landscapes, with major consequences for biodiversity. Species that require patchily distributed habitats, such as those that specialize on early successional ecosystems, must disperse through a landscape matrix with unsuitable habitat types. We evaluated landscape effects on dispersal of an early successional obligate, the New England cottontail (Sylvilagus transitionalis). Using a landscape genetics approach, we identified barriers and facilitators of gene flow and connectivity corridors for a population of cottontails in the northeastern United States. We modeled dispersal in relation to landscape structure and composition and tested hypotheses about the influence of habitat fragmentation on gene flow. Anthropogenic and natural shrubland habitats facilitated gene flow, while the remainder of the matrix, particularly development and forest, impeded gene flow. The relative influence of matrix habitats differed between study areas in relation to a fragmentation gradient. Barrier features had higher explanatory power in the more fragmented site, while facilitating features were important in the less fragmented site. Landscape models that included a simultaneous barrier and facilitating effect of roads had higher explanatory power than models that considered either effect separately, supporting the hypothesis that roads act as both barriers and facilitators at all spatial scales. The inclusion of LiDAR-identified shrubland habitat improved the fit of our facilitator models. Corridor analyses using circuit and least cost path approaches revealed the importance of anthropogenic, linear features for restoring connectivity between the study areas. In fragmented landscapes, human-modified habitats may enhance functional connectivity by providing suitable dispersal conduits for early successional specialists.

Anthropogenic Habitats Facilitate Dispersal of an Early Successional Obligate: Implications for Restoration of an Endangered Ecosystem

PubMed Central

Amaral, Katrina E.; Palace, Michael; O’Brien, Kathleen M.; Fenderson, Lindsey E.; Kovach, Adrienne I.

2016-01-01

Landscape modification and habitat fragmentation disrupt the connectivity of natural landscapes, with major consequences for biodiversity. Species that require patchily distributed habitats, such as those that specialize on early successional ecosystems, must disperse through a landscape matrix with unsuitable habitat types. We evaluated landscape effects on dispersal of an early successional obligate, the New England cottontail (Sylvilagus transitionalis). Using a landscape genetics approach, we identified barriers and facilitators of gene flow and connectivity corridors for a population of cottontails in the northeastern United States. We modeled dispersal in relation to landscape structure and composition and tested hypotheses about the influence of habitat fragmentation on gene flow. Anthropogenic and natural shrubland habitats facilitated gene flow, while the remainder of the matrix, particularly development and forest, impeded gene flow. The relative influence of matrix habitats differed between study areas in relation to a fragmentation gradient. Barrier features had higher explanatory power in the more fragmented site, while facilitating features were important in the less fragmented site. Landscape models that included a simultaneous barrier and facilitating effect of roads had higher explanatory power than models that considered either effect separately, supporting the hypothesis that roads act as both barriers and facilitators at all spatial scales. The inclusion of LiDAR-identified shrubland habitat improved the fit of our facilitator models. Corridor analyses using circuit and least cost path approaches revealed the importance of anthropogenic, linear features for restoring connectivity between the study areas. In fragmented landscapes, human-modified habitats may enhance functional connectivity by providing suitable dispersal conduits for early successional specialists. PMID:26954014
A Hybrid Computational Method for the Discovery of Novel Reproduction-Related Genes

PubMed Central

Chen, Lei; Chu, Chen; Kong, Xiangyin; Huang, Guohua; Huang, Tao; Cai, Yu-Dong

2015-01-01

Uncovering the molecular mechanisms underlying reproduction is of great importance to infertility treatment and to the generation of healthy offspring. In this study, we discovered novel reproduction-related genes with a hybrid computational method, integrating three different types of method, which offered new clues for further reproduction research. This method was first executed on a weighted graph, constructed based on known protein-protein interactions, to search the shortest paths connecting any two known reproduction-related genes. Genes occurring in these paths were deemed to have a special relationship with reproduction. These newly discovered genes were filtered with a randomization test. Then, the remaining genes were further selected according to their associations with known reproduction-related genes measured by protein-protein interaction score and alignment score obtained by BLAST. The in-depth analysis of the high confidence novel reproduction genes revealed hidden mechanisms of reproduction and provided guidelines for further experimental validations. PMID:25768094
30 CFR 44.24 - Discovery.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 30 Mineral Resources 1 2014-07-01 2014-07-01 false Discovery. 44.24 Section 44.24 Mineral... Discovery. Parties shall be governed in their conduct of discovery by appropriate provisions of the Federal... discovery. Alternative periods of time for discovery may be prescribed by the presiding administrative law...
39 CFR 952.21 - Discovery.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 39 Postal Service 1 2014-07-01 2014-07-01 false Discovery. 952.21 Section 952.21 Postal Service... AND LOTTERY ORDERS § 952.21 Discovery. (a) Voluntary discovery. The parties are encouraged to engage in voluntary discovery procedures. In connection with any deposition or other discovery procedure...
39 CFR 952.21 - Discovery.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 39 Postal Service 1 2013-07-01 2013-07-01 false Discovery. 952.21 Section 952.21 Postal Service... AND LOTTERY ORDERS § 952.21 Discovery. (a) Voluntary discovery. The parties are encouraged to engage in voluntary discovery procedures. In connection with any deposition or other discovery procedure...
30 CFR 44.24 - Discovery.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 30 Mineral Resources 1 2012-07-01 2012-07-01 false Discovery. 44.24 Section 44.24 Mineral... Discovery. Parties shall be governed in their conduct of discovery by appropriate provisions of the Federal... discovery. Alternative periods of time for discovery may be prescribed by the presiding administrative law...
30 CFR 44.24 - Discovery.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 30 Mineral Resources 1 2011-07-01 2011-07-01 false Discovery. 44.24 Section 44.24 Mineral... Discovery. Parties shall be governed in their conduct of discovery by appropriate provisions of the Federal... discovery. Alternative periods of time for discovery may be prescribed by the presiding administrative law...
19 CFR 356.20 - Discovery.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 19 Customs Duties 3 2014-04-01 2014-04-01 false Discovery. 356.20 Section 356.20 Customs Duties... § 356.20 Discovery. (a) Voluntary discovery. All parties are encouraged to engage in voluntary discovery... sanctions proceeding. (b) Limitations on discovery. The administrative law judge shall place such limits...
39 CFR 952.21 - Discovery.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 39 Postal Service 1 2012-07-01 2012-07-01 false Discovery. 952.21 Section 952.21 Postal Service... AND LOTTERY ORDERS § 952.21 Discovery. (a) Voluntary discovery. The parties are encouraged to engage in voluntary discovery procedures. In connection with any deposition or other discovery procedure...
19 CFR 356.20 - Discovery.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 19 Customs Duties 3 2013-04-01 2013-04-01 false Discovery. 356.20 Section 356.20 Customs Duties... § 356.20 Discovery. (a) Voluntary discovery. All parties are encouraged to engage in voluntary discovery... sanctions proceeding. (b) Limitations on discovery. The administrative law judge shall place such limits...
19 CFR 356.20 - Discovery.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 19 Customs Duties 3 2012-04-01 2012-04-01 false Discovery. 356.20 Section 356.20 Customs Duties... § 356.20 Discovery. (a) Voluntary discovery. All parties are encouraged to engage in voluntary discovery... sanctions proceeding. (b) Limitations on discovery. The administrative law judge shall place such limits...
30 CFR 44.24 - Discovery.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 30 Mineral Resources 1 2013-07-01 2013-07-01 false Discovery. 44.24 Section 44.24 Mineral... Discovery. Parties shall be governed in their conduct of discovery by appropriate provisions of the Federal... discovery. Alternative periods of time for discovery may be prescribed by the presiding administrative law...
19 CFR 356.20 - Discovery.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 19 Customs Duties 3 2011-04-01 2011-04-01 false Discovery. 356.20 Section 356.20 Customs Duties... § 356.20 Discovery. (a) Voluntary discovery. All parties are encouraged to engage in voluntary discovery... sanctions proceeding. (b) Limitations on discovery. The administrative law judge shall place such limits...
30 CFR 44.24 - Discovery.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 30 Mineral Resources 1 2010-07-01 2010-07-01 false Discovery. 44.24 Section 44.24 Mineral... Discovery. Parties shall be governed in their conduct of discovery by appropriate provisions of the Federal... discovery. Alternative periods of time for discovery may be prescribed by the presiding administrative law...
19 CFR 356.20 - Discovery.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 19 Customs Duties 3 2010-04-01 2010-04-01 false Discovery. 356.20 Section 356.20 Customs Duties... § 356.20 Discovery. (a) Voluntary discovery. All parties are encouraged to engage in voluntary discovery... sanctions proceeding. (b) Limitations on discovery. The administrative law judge shall place such limits...
Chemical Discovery

ERIC Educational Resources Information Center

Brown, Herbert C.

1974-01-01

The role of discovery in the advance of the science of chemistry and the factors that are currently operating to handicap that function are considered. Examples are drawn from the author's work with boranes. The thesis that exploratory research and discovery should be encouraged is stressed. (DT)
From General Aberrant Alternative Splicing in Cancers and Its Therapeutic Application to the Discovery of an Oncogenic DMTF1 Isoform

PubMed Central

Tian, Na; Li, Jialiang; Shi, Jinming; Sui, Guangchao

2017-01-01

Alternative pre-mRNA splicing is a crucial process that allows the generation of diversified RNA and protein products from a multi-exon gene. In tumor cells, this mechanism can facilitate cancer development and progression through both creating oncogenic isoforms and reducing the expression of normal or controllable protein species. We recently demonstrated that an alternative cyclin D-binding myb-like transcription factor 1 (DMTF1) pre-mRNA splicing isoform, DMTF1β, is increasingly expressed in breast cancer and promotes mammary tumorigenesis in a transgenic mouse model. Aberrant pre-mRNA splicing is a typical event occurring for many cancer-related functional proteins. In this review, we introduce general aberrant pre-mRNA splicing in cancers and discuss its therapeutic application using our recent discovery of the oncogenic DMTF1 isoform as an example. We also summarize new insights in designing novel targeting strategies of cancer therapies based on the understanding of deregulated pre-mRNA splicing mechanisms. PMID:28257090
Sub-inhibitory concentrations of heavy metals facilitate the horizontal transfer of plasmid-mediated antibiotic resistance genes in water environment.

PubMed

Zhang, Ye; Gu, April Z; Cen, Tianyu; Li, Xiangyang; He, Miao; Li, Dan; Chen, Jianmin

2018-06-01

Although widespread antibiotic resistance has been mostly attributed to the selective pressure generated by overuse and misuse of antibiotics, recent growing evidence suggests that chemicals other than antibiotics, such as certain metals, can also select and stimulate antibiotic resistance via both co-resistance and cross-resistance mechanisms. For instance, tetL, merE, and oprD genes are resistant to both antibiotics and metals. However, the potential de novo resistance induced by heavy metals at environmentally-relevant low concentrations (much below theminimum inhibitory concentrations [MICs], also referred as sub-inhibitory) has hardly been explored. This study investigated and revealed that heavy metals, namely Cu(II), Ag(I), Cr(VI), and Zn(II), at environmentally-relevant and sub-inhibitory concentrations, promoted conjugative transfer of antibiotic resistance genes (ARGs) between E. coli strains. The mechanisms of this phenomenon were further explored, which involved intracellular reactive oxygen species (ROS) formation, SOS response, increased cell membrane permeability, and altered expression of conjugation-relevant genes. These findings suggest that sub-inhibitory levels of heavy metals that widely present in various environments contribute to the resistance phenomena via facilitating horizontal transfer of ARGs. This study provides evidence from multiple aspects implicating the ecological effect of low levels of heavy metals on antibiotic resistance dissemination and highlights the urgency of strengthening efficacious policy and technology to control metal pollutants in the environments. Copyright © 2018 Elsevier Ltd. All rights reserved.
22 CFR 224.21 - Discovery.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 22 Foreign Relations 1 2013-04-01 2013-04-01 false Discovery. 224.21 Section 224.21 Foreign....21 Discovery. (a) The following types of discovery are authorized: (1) Requests for production of... parties, discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery...
24 CFR 180.500 - Discovery.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 24 Housing and Urban Development 1 2013-04-01 2013-04-01 false Discovery. 180.500 Section 180.500... OPPORTUNITY CONSOLIDATED HUD HEARING PROCEDURES FOR CIVIL RIGHTS MATTERS Discovery § 180.500 Discovery. (a) In general. This subpart governs discovery in aid of administrative proceedings under this part. Discovery in...

24 CFR 180.500 - Discovery.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 24 Housing and Urban Development 1 2014-04-01 2014-04-01 false Discovery. 180.500 Section 180.500... OPPORTUNITY CONSOLIDATED HUD HEARING PROCEDURES FOR CIVIL RIGHTS MATTERS Discovery § 180.500 Discovery. (a) In general. This subpart governs discovery in aid of administrative proceedings under this part. Discovery in...
22 CFR 224.21 - Discovery.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 22 Foreign Relations 1 2011-04-01 2011-04-01 false Discovery. 224.21 Section 224.21 Foreign....21 Discovery. (a) The following types of discovery are authorized: (1) Requests for production of... parties, discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery...
22 CFR 224.21 - Discovery.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 22 Foreign Relations 1 2012-04-01 2012-04-01 false Discovery. 224.21 Section 224.21 Foreign....21 Discovery. (a) The following types of discovery are authorized: (1) Requests for production of... parties, discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery...
24 CFR 180.500 - Discovery.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 24 Housing and Urban Development 1 2012-04-01 2012-04-01 false Discovery. 180.500 Section 180.500... OPPORTUNITY CONSOLIDATED HUD HEARING PROCEDURES FOR CIVIL RIGHTS MATTERS Discovery § 180.500 Discovery. (a) In general. This subpart governs discovery in aid of administrative proceedings under this part. Discovery in...
22 CFR 224.21 - Discovery.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 22 Foreign Relations 1 2014-04-01 2014-04-01 false Discovery. 224.21 Section 224.21 Foreign....21 Discovery. (a) The following types of discovery are authorized: (1) Requests for production of... parties, discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery...
24 CFR 180.500 - Discovery.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 24 Housing and Urban Development 1 2011-04-01 2011-04-01 false Discovery. 180.500 Section 180.500... OPPORTUNITY CONSOLIDATED HUD HEARING PROCEDURES FOR CIVIL RIGHTS MATTERS Discovery § 180.500 Discovery. (a) In general. This subpart governs discovery in aid of administrative proceedings under this part. Discovery in...
24 CFR 180.500 - Discovery.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 24 Housing and Urban Development 1 2010-04-01 2010-04-01 false Discovery. 180.500 Section 180.500... OPPORTUNITY CONSOLIDATED HUD HEARING PROCEDURES FOR CIVIL RIGHTS MATTERS Discovery § 180.500 Discovery. (a) In general. This subpart governs discovery in aid of administrative proceedings under this part. Discovery in...
22 CFR 224.21 - Discovery.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 22 Foreign Relations 1 2010-04-01 2010-04-01 false Discovery. 224.21 Section 224.21 Foreign....21 Discovery. (a) The following types of discovery are authorized: (1) Requests for production of... parties, discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery...
Drug target ontology to classify and integrate drug discovery data.

PubMed

Lin, Yu; Mehta, Saurabh; Küçük-McGinty, Hande; Turner, John Paul; Vidovic, Dusica; Forlin, Michele; Koleti, Amar; Nguyen, Dac-Trung; Jensen, Lars Juhl; Guha, Rajarshi; Mathias, Stephen L; Ursu, Oleg; Stathias, Vasileios; Duan, Jianbin; Nabizadeh, Nooshin; Chung, Caty; Mader, Christopher; Visser, Ubbo; Yang, Jeremy J; Bologa, Cristian G; Oprea, Tudor I; Schürer, Stephan C

2017-11-09

model for druggable targets including various related information such as protein, gene, protein domain, protein structure, binding site, small molecule drug, mechanism of action, protein tissue localization, disease association, and many other types of information. DTO will further facilitate the otherwise challenging integration and formal linking to biological assays, phenotypes, disease models, drug poly-pharmacology, binding kinetics and many other processes, functions and qualities that are at the core of drug discovery. The first version of DTO is publically available via the website http://drugtargetontology.org/ , Github ( http://github.com/DrugTargetOntology/DTO ), and the NCBO Bioportal ( http://bioportal.bioontology.org/ontologies/DTO ). The long-term goal of DTO is to provide such an integrative framework and to populate the ontology with this information as a community resource.
Can Untargeted Metabolomics Be Utilized in Drug Discovery/Development?

PubMed

Caldwell, Gary W; Leo, Gregory C

2017-01-01

Untargeted metabolomics is a promising approach for reducing the significant attrition rate for discovering and developing drugs in the pharmaceutical industry. This review aims to highlight the practical decision-making value of untargeted metabolomics for the advancement of drug candidates in drug discovery/development including potentially identifying and validating novel therapeutic targets, creating alternative screening paradigms, facilitating the selection of specific and translational metabolite biomarkers, identifying metabolite signatures for the drug efficacy mechanism of action, and understanding potential drug-induced toxicity. The review provides an overview of the pharmaceutical process workflow to discover and develop new small molecule drugs followed by the metabolomics process workflow that is involved in conducting metabolomics studies. The pros and cons of the major components of the pharmaceutical and metabolomics workflows are reviewed and discussed. Finally, selected untargeted metabolomics literature examples, from primarily 2010 to 2016, are used to illustrate why, how, and where untargeted metabolomics can be integrated into the drug discovery/preclinical drug development process. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Fragment-based approaches to the discovery of kinase inhibitors.

PubMed

Mortenson, Paul N; Berdini, Valerio; O'Reilly, Marc

2014-01-01

Protein kinases are one of the most important families of drug targets, and aberrant kinase activity has been linked to a large number of disease areas. Although eminently targetable using small molecules, kinases present a number of challenges as drug targets, not least obtaining selectivity across such a large and relatively closely related target family. Fragment-based drug discovery involves screening simple, low-molecular weight compounds to generate initial hits against a target. These hits are then optimized to more potent compounds via medicinal chemistry, usually facilitated by structural biology. Here, we will present a number of recent examples of fragment-based approaches to the discovery of kinase inhibitors, detailing the construction of fragment-screening libraries, the identification and validation of fragment hits, and their optimization into potent and selective lead compounds. The advantages of fragment-based methodologies will be discussed, along with some of the challenges associated with using this route. Finally, we will present a number of key lessons derived both from our own experience running fragment screens against kinases and from a large number of published studies.
The Energy Industry Profile of ISO/DIS 19115-1: Facilitating Discovery and Evaluation of, and Access to Distributed Information Resources

NASA Astrophysics Data System (ADS)

Hills, S. J.; Richard, S. M.; Doniger, A.; Danko, D. M.; Derenthal, L.; Energistics Metadata Work Group

2011-12-01

established, capability-rich, open standard for geographic metadata, EIP v1 is expected to be widely acceptable within the community and readily sustainable over the long-term. The EIP design, also per community requirements, will enable discovery, evaluation, and access to types of information resources considered important to the community, including structured and unstructured digital resources, and physical assets such as hardcopy documents and material samples. This presentation will briefly review the development of this initiative as well as the current and planned Work Group activities. More time will be spent providing an overview of the EIP v1, including the requirements it prescribes, design efforts made to enable automated metadata capture and processing, and the structure and content of its documentation, which was written to minimize ambiguity and facilitate implementation. The Work Group considers EIP v1 a solid initial design for interoperable metadata, and first step toward the vision of the Initiative.
15 CFR 25.21 - Discovery.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 15 Commerce and Foreign Trade 1 2013-01-01 2013-01-01 false Discovery. 25.21 Section 25.21... Discovery. (a) The following types of discovery are authorized: (1) Requests for production of documents for..., discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery. (d...
37 CFR 42.224 - Discovery.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Discovery. 42.224 Section 42... Post-Grant Review § 42.224 Discovery. Notwithstanding the discovery provisions of subpart A: (a) Requests for additional discovery may be granted upon a showing of good cause as to why the discovery is...
15 CFR 25.21 - Discovery.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 15 Commerce and Foreign Trade 1 2012-01-01 2012-01-01 false Discovery. 25.21 Section 25.21... Discovery. (a) The following types of discovery are authorized: (1) Requests for production of documents for..., discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery. (d...
15 CFR 25.21 - Discovery.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 15 Commerce and Foreign Trade 1 2014-01-01 2014-01-01 false Discovery. 25.21 Section 25.21... Discovery. (a) The following types of discovery are authorized: (1) Requests for production of documents for..., discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery. (d...
19 CFR 207.109 - Discovery.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 19 Customs Duties 3 2014-04-01 2014-04-01 false Discovery. 207.109 Section 207.109 Customs Duties... and Committee Proceedings § 207.109 Discovery. (a) Discovery methods. All parties may obtain discovery under such terms and limitations as the administrative law judge may order. Discovery may be by one or...
5 CFR 185.122 - Discovery.

Code of Federal Regulations, 2012 CFR

2012-01-01

... 5 Administrative Personnel 1 2012-01-01 2012-01-01 false Discovery. 185.122 Section 185.122... § 185.122 Discovery. (a) The following types of discovery are authorized: (1) Requests for production of..., discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery. (d...
37 CFR 42.224 - Discovery.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 37 Patents, Trademarks, and Copyrights 1 2014-07-01 2014-07-01 false Discovery. 42.224 Section 42... Post-Grant Review § 42.224 Discovery. Notwithstanding the discovery provisions of subpart A: (a) Requests for additional discovery may be granted upon a showing of good cause as to why the discovery is...
15 CFR 25.21 - Discovery.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 15 Commerce and Foreign Trade 1 2011-01-01 2011-01-01 false Discovery. 25.21 Section 25.21... Discovery. (a) The following types of discovery are authorized: (1) Requests for production of documents for..., discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery. (d...

19 CFR 207.109 - Discovery.

Code of Federal Regulations, 2011 CFR

2011-04-01

... 19 Customs Duties 3 2011-04-01 2011-04-01 false Discovery. 207.109 Section 207.109 Customs Duties... and Committee Proceedings § 207.109 Discovery. (a) Discovery methods. All parties may obtain discovery under such terms and limitations as the administrative law judge may order. Discovery may be by one or...
19 CFR 207.109 - Discovery.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 19 Customs Duties 3 2013-04-01 2013-04-01 false Discovery. 207.109 Section 207.109 Customs Duties... and Committee Proceedings § 207.109 Discovery. (a) Discovery methods. All parties may obtain discovery under such terms and limitations as the administrative law judge may order. Discovery may be by one or...
5 CFR 185.122 - Discovery.

Code of Federal Regulations, 2011 CFR

2011-01-01

... 5 Administrative Personnel 1 2011-01-01 2011-01-01 false Discovery. 185.122 Section 185.122... § 185.122 Discovery. (a) The following types of discovery are authorized: (1) Requests for production of..., discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery. (d...
5 CFR 185.122 - Discovery.

Code of Federal Regulations, 2013 CFR

2013-01-01

... 5 Administrative Personnel 1 2013-01-01 2013-01-01 false Discovery. 185.122 Section 185.122... § 185.122 Discovery. (a) The following types of discovery are authorized: (1) Requests for production of..., discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery. (d...
5 CFR 185.122 - Discovery.

Code of Federal Regulations, 2014 CFR

2014-01-01

... 5 Administrative Personnel 1 2014-01-01 2014-01-01 false Discovery. 185.122 Section 185.122... § 185.122 Discovery. (a) The following types of discovery are authorized: (1) Requests for production of..., discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery. (d...
19 CFR 207.109 - Discovery.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 19 Customs Duties 3 2012-04-01 2012-04-01 false Discovery. 207.109 Section 207.109 Customs Duties... and Committee Proceedings § 207.109 Discovery. (a) Discovery methods. All parties may obtain discovery under such terms and limitations as the administrative law judge may order. Discovery may be by one or...
15 CFR 25.21 - Discovery.

Code of Federal Regulations, 2010 CFR

2010-01-01

... 15 Commerce and Foreign Trade 1 2010-01-01 2010-01-01 false Discovery. 25.21 Section 25.21... Discovery. (a) The following types of discovery are authorized: (1) Requests for production of documents for..., discovery is available only as ordered by the ALJ. The ALJ shall regulate the timing of discovery. (d...
19 CFR 207.109 - Discovery.

Code of Federal Regulations, 2010 CFR

2010-04-01

... 19 Customs Duties 3 2010-04-01 2010-04-01 false Discovery. 207.109 Section 207.109 Customs Duties... and Committee Proceedings § 207.109 Discovery. (a) Discovery methods. All parties may obtain discovery under such terms and limitations as the administrative law judge may order. Discovery may be by one or...
CUAHSI-HIS: an Internet based system to facilitate public discovery, access, and exploration of different water science data sources

NASA Astrophysics Data System (ADS)

Arrigo, J. S.; Hooper, R. P.; Choi, Y.; Ames, D. P.; Kadlec, J.; Whiteaker, T.

2011-12-01

"Water is everywhere." This sentiment underscores the importance of instilling hydrologic and earth science literacy in educators, students, and the general public, but also presents challenges for water scientists and educators. Scientific data about water is collected and distributed by several different sources, from federal agencies to scientific investigators to citizen scientists. As competition for limited water resources increase, increasing access to and understanding of the wealth of information about the nation's and the world's water will be critical. The CUAHSI-HIS system is a web based system for sharing hydrologic data that can help address this need. HydroDesktop is a free, open source application for finding, getting, analyzing and using hydrologic data from the CUAHSI-HIS system. It works with HydroCatalog which indexes the data to find out what data exists and where it is, and then it retrieves the data from HydroServers where it is stored communicating using WaterOneFlow web services. Currently, there are over 65 services registered in HydroCatalog providing central discovery of water data from several federal and state agencies, university projects, and other sources. HydroDesktop provides a simplified GIS that allows users to incorporate spatial data, and simple analysis tools to facilitate graphing and visualization. HydroDesktop is designed to be useful for a number of different groups of users with a wide variety of needs and skill levels including university faculty, graduate and undergraduate students, K-12 students, engineering and scientific consultants, and others. This presentation will highlight some of the features of HydroDesktop and the CUAHSI-HIS system that make it particularly appropriate for use in educational and public outreach settings, and will present examples of educational use. The incorporation of "real data," localization to an area of interest, and problem-based learning are all recognized as effective strategies for
[Discovery of the target genes inhibited by formic acid in Candida shehatae].

PubMed

Cai, Peng; Xiong, Xujie; Xu, Yong; Yong, Qiang; Zhu, Junjun; Shiyuan, Yu

2014-01-04

At transcriptional level, the inhibitory effects of formic acid was investigated on Candida shehatae, a model yeast strain capable of fermenting xylose to ethanol. Thereby, the target genes were regulated by formic acid and the transcript profiles were discovered. On the basis of the transcriptome data of C. shehatae metabolizing glucose and xylose, the genes responsible for ethanol fermentation were chosen as candidates by the combined method of yeast metabolic pathway analysis and manual gene BLAST search. These candidates were then quantitatively detected by RQ-PCR technique to find the regulating genes under gradient doses of formic acid. By quantitative analysis of 42 candidate genes, we finally identified 10 and 5 genes as markedly down-regulated and up-regulated targets by formic acid, respectively. With regard to gene transcripts regulated by formic acid in C. shehatae, the markedly down-regulated genes ranking declines as follows: xylitol dehydrogenase (XYL2), acetyl-CoA synthetase (ACS), ribose-5-phosphate isomerase (RKI), transaldolase (TAL), phosphogluconate dehydrogenase (GND1), transketolase (TKL), glucose-6-phosphate dehydrogenase (ZWF1), xylose reductase (XYL1), pyruvate dehydrogenase (PDH) and pyruvate decarboxylase (PDC); and a declining rank for up-regulated gens as follows: fructose-bisphosphate aldolase (ALD), glucokinase (GLK), malate dehydrogenase (MDH), 6-phosphofructokinase (PFK) and alcohol dehydrogenase (ADH).
Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus.

PubMed

Hu, Ben; Zeng, Lei-Ping; Yang, Xing-Lou; Ge, Xing-Yi; Zhang, Wei; Li, Bei; Xie, Jia-Zheng; Shen, Xu-Rui; Zhang, Yun-Zhi; Wang, Ning; Luo, Dong-Sheng; Zheng, Xiao-Shuang; Wang, Mei-Niang; Daszak, Peter; Wang, Lin-Fa; Cui, Jie; Shi, Zheng-Li

2017-11-01

A large number of SARS-related coronaviruses (SARSr-CoV) have been detected in horseshoe bats since 2005 in different areas of China. However, these bat SARSr-CoVs show sequence differences from SARS coronavirus (SARS-CoV) in different genes (S, ORF8, ORF3, etc) and are considered unlikely to represent the direct progenitor of SARS-CoV. Herein, we report the findings of our 5-year surveillance of SARSr-CoVs in a cave inhabited by multiple species of horseshoe bats in Yunnan Province, China. The full-length genomes of 11 newly discovered SARSr-CoV strains, together with our previous findings, reveals that the SARSr-CoVs circulating in this single location are highly diverse in the S gene, ORF3 and ORF8. Importantly, strains with high genetic similarity to SARS-CoV in the hypervariable N-terminal domain (NTD) and receptor-binding domain (RBD) of the S1 gene, the ORF3 and ORF8 region, respectively, were all discovered in this cave. In addition, we report the first discovery of bat SARSr-CoVs highly similar to human SARS-CoV in ORF3b and in the split ORF8a and 8b. Moreover, SARSr-CoV strains from this cave were more closely related to SARS-CoV in the non-structural protein genes ORF1a and 1b compared with those detected elsewhere. Recombination analysis shows evidence of frequent recombination events within the S gene and around the ORF8 between these SARSr-CoVs. We hypothesize that the direct progenitor of SARS-CoV may have originated after sequential recombination events between the precursors of these SARSr-CoVs. Cell entry studies demonstrated that three newly identified SARSr-CoVs with different S protein sequences are all able to use human ACE2 as the receptor, further exhibiting the close relationship between strains in this cave and SARS-CoV. This work provides new insights into the origin and evolution of SARS-CoV and highlights the necessity of preparedness for future emergence of SARS-like diseases.
Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus

PubMed Central

Ge, Xing-Yi; Zhang, Wei; Li, Bei; Xie, Jia-Zheng; Shen, Xu-Rui; Zhang, Yun-Zhi; Wang, Ning; Luo, Dong-Sheng; Zheng, Xiao-Shuang; Wang, Mei-Niang; Wang, Lin-Fa

2017-01-01

A large number of SARS-related coronaviruses (SARSr-CoV) have been detected in horseshoe bats since 2005 in different areas of China. However, these bat SARSr-CoVs show sequence differences from SARS coronavirus (SARS-CoV) in different genes (S, ORF8, ORF3, etc) and are considered unlikely to represent the direct progenitor of SARS-CoV. Herein, we report the findings of our 5-year surveillance of SARSr-CoVs in a cave inhabited by multiple species of horseshoe bats in Yunnan Province, China. The full-length genomes of 11 newly discovered SARSr-CoV strains, together with our previous findings, reveals that the SARSr-CoVs circulating in this single location are highly diverse in the S gene, ORF3 and ORF8. Importantly, strains with high genetic similarity to SARS-CoV in the hypervariable N-terminal domain (NTD) and receptor-binding domain (RBD) of the S1 gene, the ORF3 and ORF8 region, respectively, were all discovered in this cave. In addition, we report the first discovery of bat SARSr-CoVs highly similar to human SARS-CoV in ORF3b and in the split ORF8a and 8b. Moreover, SARSr-CoV strains from this cave were more closely related to SARS-CoV in the non-structural protein genes ORF1a and 1b compared with those detected elsewhere. Recombination analysis shows evidence of frequent recombination events within the S gene and around the ORF8 between these SARSr-CoVs. We hypothesize that the direct progenitor of SARS-CoV may have originated after sequential recombination events between the precursors of these SARSr-CoVs. Cell entry studies demonstrated that three newly identified SARSr-CoVs with different S protein sequences are all able to use human ACE2 as the receptor, further exhibiting the close relationship between strains in this cave and SARS-CoV. This work provides new insights into the origin and evolution of SARS-CoV and highlights the necessity of preparedness for future emergence of SARS-like diseases. PMID:29190287
Facilitated diffusion in chromatin lattices: mechanistic diversity and regulatory potential.

PubMed

Kampmann, Martin

2005-08-01

The interaction between a protein and a specific DNA site is the molecular basis for vital processes in all organisms. Location of the DNA target site by the protein commonly involves facilitated diffusion. Mechanisms of facilitated diffusion vary among proteins; they include one- and two-dimensional sliding along DNA, direct transfer between uncorrelated sites, as well as combinations of these mechanisms. Facilitated diffusion has almost exclusively been studied in vitro. This review discusses facilitated diffusion in the context of the living cell and proposes a theoretical model for facilitated diffusion in chromatin lattices. Chromatin structure differentially affects proteins in different modes of diffusion. The interplay of facilitated diffusion and chromatin structure can determine the rate of protein association with the target site, the frequency of association-dissociation events at the target site, and, under particular conditions, the occupancy of the target site. Facilitated diffusion is required in vivo for efficient DNA repair and bacteriophage restriction and has potential roles in fine-tuning gene regulatory networks and kinetically compartmentalizing the eukaryotic nucleus.
39 CFR 963.14 - Discovery.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 39 Postal Service 1 2012-07-01 2012-07-01 false Discovery. 963.14 Section 963.14 Postal Service... PANDERING ADVERTISEMENTS STATUTE, 39 U.S.C. 3008 § 963.14 Discovery. Discovery is to be conducted on a... such discovery as he or she deems reasonable and necessary. Discovery may include one or more of the...
39 CFR 963.14 - Discovery.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 39 Postal Service 1 2013-07-01 2013-07-01 false Discovery. 963.14 Section 963.14 Postal Service... PANDERING ADVERTISEMENTS STATUTE, 39 U.S.C. 3008 § 963.14 Discovery. Discovery is to be conducted on a... such discovery as he or she deems reasonable and necessary. Discovery may include one or more of the...
39 CFR 963.14 - Discovery.

Code of Federal Regulations, 2014 CFR

2014-07-01

... 39 Postal Service 1 2014-07-01 2014-07-01 false Discovery. 963.14 Section 963.14 Postal Service... PANDERING ADVERTISEMENTS STATUTE, 39 U.S.C. 3008 § 963.14 Discovery. Discovery is to be conducted on a... such discovery as he or she deems reasonable and necessary. Discovery may include one or more of the...
39 CFR 963.14 - Discovery.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 39 Postal Service 1 2011-07-01 2011-07-01 false Discovery. 963.14 Section 963.14 Postal Service... PANDERING ADVERTISEMENTS STATUTE, 39 U.S.C. 3008 § 963.14 Discovery. Discovery is to be conducted on a... such discovery as he or she deems reasonable and necessary. Discovery may include one or more of the...
39 CFR 963.14 - Discovery.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 39 Postal Service 1 2010-07-01 2010-07-01 false Discovery. 963.14 Section 963.14 Postal Service... PANDERING ADVERTISEMENTS STATUTE, 39 U.S.C. 3008 § 963.14 Discovery. Discovery is to be conducted on a... such discovery as he or she deems reasonable and necessary. Discovery may include one or more of the...
High-throughput platform assay technology for the discovery of pre-microrna-selective small molecule probes.

PubMed

Lorenz, Daniel A; Song, James M; Garner, Amanda L

2015-01-21

MicroRNAs (miRNA) play critical roles in human development and disease. As such, the targeting of miRNAs is considered attractive as a novel therapeutic strategy. A major bottleneck toward this goal, however, has been the identification of small molecule probes that are specific for select RNAs and methods that will facilitate such discovery efforts. Using pre-microRNAs as proof-of-concept, herein we report a conceptually new and innovative approach for assaying RNA-small molecule interactions. Through this platform assay technology, which we term catalytic enzyme-linked click chemistry assay or cat-ELCCA, we have designed a method that can be implemented in high throughput, is virtually free of false readouts, and is general for all nucleic acids. Through cat-ELCCA, we envision the discovery of selective small molecule ligands for disease-relevant miRNAs to promote the field of RNA-targeted drug discovery and further our understanding of the role of miRNAs in cellular biology.
Apparently low reproducibility of true differential expression discoveries in microarray studies.

PubMed

Zhang, Min; Yao, Chen; Guo, Zheng; Zou, Jinfeng; Zhang, Lin; Xiao, Hui; Wang, Dong; Yang, Da; Gong, Xue; Zhu, Jing; Li, Yanhui; Li, Xia

2008-09-15

Differentially expressed gene (DEG) lists detected from different microarray studies for a same disease are often highly inconsistent. Even in technical replicate tests using identical samples, DEG detection still shows very low reproducibility. It is often believed that current small microarray studies will largely introduce false discoveries. Based on a statistical model, we show that even in technical replicate tests using identical samples, it is highly likely that the selected DEG lists will be very inconsistent in the presence of small measurement variations. Therefore, the apparently low reproducibility of DEG detection from current technical replicate tests does not indicate low quality of microarray technology. We also demonstrate that heterogeneous biological variations existing in real cancer data will further reduce the overall reproducibility of DEG detection. Nevertheless, in small subsamples from both simulated and real data, the actual false discovery rate (FDR) for each DEG list tends to be low, suggesting that each separately determined list may comprise mostly true DEGs. Rather than simply counting the overlaps of the discovery lists from different studies for a complex disease, novel metrics are needed for evaluating the reproducibility of discoveries characterized with correlated molecular changes. Supplementaty information: Supplementary data are available at Bioinformatics online.

Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes

PubMed Central

2013-01-01

Background Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven manner. However, significant challenges exist with regard to how to segregate individuals into suitable subtypes of the disease and understand the distinct biological mechanisms of each when the goal is to maximize the discovery potential of these data sets. Results A multi-step decision tree-based method is described for defining endotypes based on gene expression, clinical covariates, and disease indicators using childhood asthma as a case study. We attempted to use alternative approaches such as the Student’s t-test, single data domain clustering and the Modk-prototypes algorithm, which incorporates multiple data domains into a single analysis and none performed as well as the novel multi-step decision tree method. This new method gave the best segregation of asthmatics and non-asthmatics, and it provides easy access to all genes and clinical covariates that distinguish the groups. Conclusions The multi-step decision tree method described here will lead to better understanding of complex disease in general by allowing purely data-driven disease endotypes to facilitate the discovery of new mechanisms underlying these diseases. This application should be considered a complement to ongoing efforts to better define and diagnose known endotypes. When coupled with existing methods developed to determine the genetics of gene expression, these methods provide a mechanism for linking genetics and exposomics data and thereby accounting for both major determinants of disease. PMID:24188919
BioProspecting: novel marker discovery obtained by mining the bibleome.

PubMed

Elkin, Peter L; Tuttle, Mark S; Trusko, Brett E; Brown, Steven H

2009-02-05

BioProspecting is a novel approach that enabled our team to mine data related to genetic markers from the New England Journal of Medicine (NEJM) utilizing SNOMED CT and the Human Gene Onotology (HUGO). The Biomedical Informatics Research Collaborative was able to link genes and disorders using the Multi-threaded Clinical Vocabulary Server (MCVS) and natural language processing engine, whose output creates an ontology-network using the semantic encodings of the literature that is organized by these two terminologies. We identified relationships between (genes or proteins) and (diseases or drugs) as linked by metabolic functions and identified potentially novel functional relationships between, for example, genes and diseases (e.g. Article #1 ([Gene - IL27] = > {Enzyme - Dipeptidyl Carboxypeptidase 1}) and Article #2 ({Enzyme - Dipeptidyl Carboxypeptidase 1} < = [Disorder - Type II DM]) showing a metabolic link between IL27 and Type II DM). In this manuscript we describe our method for developing the database and its content as well as its potential to assist in the discovery of novel markers and drugs.
Data protection in biomaterial banks for Parkinson's disease research: the model of GEPARD (Gene Bank Parkinson's Disease Germany).

PubMed

Eggert, Karla; Wüllner, Ullrich; Antony, Gisela; Gasser, Thomas; Janetzky, Bernd; Klein, Christine; Schöls, Ludger; Oertel, Wolfgang

2007-04-15

Parkinson's disease (PD) is the second most common neurodegenerative disease. Although 10 gene loci have been identified to cause a Parkinsonian syndrome, these loci account only for a minority of PD patients. Large, systematic research programs are required to collect, store, and analyze DNA samples and clinical information to support further discovery of additional genetic components of PD or other movement disorders. Such programs facilitate research into the relationship between genotype and phenotype. The German Competence Network on Parkinson's disease (CNP) initiated the Gene Bank Parkinson's Disease Germany (GEPARD), providing an administrative and scientific infrastructure for the storage of DNA and clinical data that are electronically accessible and protective of patient rights. In this article, we offer guidance on how to establish a framework for a clinical genetic data and DNA bank, and describe GEPARD as a model that may be useful to other local, national, and international research groups developing similar programs.
Systematic discovery of novel ciliary genes through functional genomics in the zebrafish

PubMed Central

Choksi, Semil P.; Babu, Deepak; Lau, Doreen; Yu, Xianwen; Roy, Sudipto

2014-01-01

Cilia are microtubule-based hair-like organelles that play many important roles in development and physiology, and are implicated in a rapidly expanding spectrum of human diseases, collectively termed ciliopathies. Primary ciliary dyskinesia (PCD), one of the most prevalent of ciliopathies, arises from abnormalities in the differentiation or motility of the motile cilia. Despite their biomedical importance, a methodical functional screen for ciliary genes has not been carried out in any vertebrate at the organismal level. We sought to systematically discover novel motile cilia genes by identifying the genes induced by Foxj1, a winged-helix transcription factor that has an evolutionarily conserved role as the master regulator of motile cilia biogenesis. Unexpectedly, we find that the majority of the Foxj1-induced genes have not been associated with cilia before. To characterize these novel putative ciliary genes, we subjected 50 randomly selected candidates to a systematic functional phenotypic screen in zebrafish embryos. Remarkably, we find that over 60% are required for ciliary differentiation or function, whereas 30% of the proteins encoded by these genes localize to motile cilia. We also show that these genes regulate the proper differentiation and beating of motile cilia. This collection of Foxj1-induced genes will be invaluable for furthering our understanding of ciliary biology, and in the identification of new mutations underlying ciliary disorders in humans. PMID:25139857
The web server of IBM's Bioinformatics and Pattern Discovery group: 2004 update.

PubMed

Huynh, Tien; Rigoutsos, Isidore

2004-07-01

In this report, we provide an update on the services and content which are available on the web server of IBM's Bioinformatics and Pattern Discovery group. The server, which is operational around the clock, provides access to a large number of methods that have been developed and published by the group's members. There is an increasing number of problems that these tools can help tackle; these problems range from the discovery of patterns in streams of events and the computation of multiple sequence alignments, to the discovery of genes in nucleic acid sequences, the identification--directly from sequence--of structural deviations from alpha-helicity and the annotation of amino acid sequences for antimicrobial activity. Additionally, annotations for more than 130 archaeal, bacterial, eukaryotic and viral genomes are now available on-line and can be searched interactively. The tools and code bundles continue to be accessible from http://cbcsrv.watson.ibm.com/Tspd.html whereas the genomics annotations are available at http://cbcsrv.watson.ibm.com/Annotations/.
Drug Discovery Prospect from Untapped Species: Indications from Approved Natural Product Drugs

PubMed Central

Qin, Chu; Tao, Lin; Liu, Xin; Shi, Zhe; Zhang, Cun Long; Tan, Chun Yan; Chen, Yu Zong; Jiang, Yu Yang

2012-01-01

Due to extensive bioprospecting efforts of the past and technology factors, there have been questions about drug discovery prospect from untapped species. We analyzed recent trends of approved drugs derived from previously untapped species, which show no sign of untapped drug-productive species being near extinction and suggest high probability of deriving new drugs from new species in existing drug-productive species families and clusters. Case histories of recently approved drugs reveal useful strategies for deriving new drugs from the scaffolds and pharmacophores of the natural product leads of these untapped species. New technologies such as cryptic gene-cluster exploration may generate novel natural products with highly anticipated potential impact on drug discovery. PMID:22808057
Translating Discovery in Zebrafish Pancreatic Development to Human Pancreatic Cancer: Biomarkers, Targets, Pathogenesis, and Therapeutics

PubMed Central

Kazi, Abid A.; Yee, Rosemary K.

2013-01-01

Abstract Experimental studies in the zebrafish have greatly facilitated understanding of genetic regulation of the early developmental events in the pancreas. Various approaches using forward and reverse genetics, chemical genetics, and transgenesis in zebrafish have demonstrated generally conserved regulatory roles of mammalian genes and discovered novel genetic pathways in exocrine pancreatic development. Accumulating evidence has supported the use of zebrafish as a model of human malignant diseases, including pancreatic cancer. Studies have shown that the genetic regulators of exocrine pancreatic development in zebrafish can be translated into potential clinical biomarkers and therapeutic targets in human pancreatic adenocarcinoma. Transgenic zebrafish expressing oncogenic K-ras and zebrafish tumor xenograft model have emerged as valuable tools for dissecting the pathogenetic mechanisms of pancreatic cancer and for drug discovery and toxicology. Future analysis of the pancreas in zebrafish will continue to advance understanding of the genetic regulation and biological mechanisms during organogenesis. Results of those studies are expected to provide new insights into how aberrant developmental pathways contribute to formation and growth of pancreatic neoplasia, and hopefully generate valid biomarkers and targets as well as effective and safe therapeutics in pancreatic cancer. PMID:23682805
Translating discovery in zebrafish pancreatic development to human pancreatic cancer: biomarkers, targets, pathogenesis, and therapeutics.

PubMed

Yee, Nelson S; Kazi, Abid A; Yee, Rosemary K

2013-06-01

Abstract Experimental studies in the zebrafish have greatly facilitated understanding of genetic regulation of the early developmental events in the pancreas. Various approaches using forward and reverse genetics, chemical genetics, and transgenesis in zebrafish have demonstrated generally conserved regulatory roles of mammalian genes and discovered novel genetic pathways in exocrine pancreatic development. Accumulating evidence has supported the use of zebrafish as a model of human malignant diseases, including pancreatic cancer. Studies have shown that the genetic regulators of exocrine pancreatic development in zebrafish can be translated into potential clinical biomarkers and therapeutic targets in human pancreatic adenocarcinoma. Transgenic zebrafish expressing oncogenic K-ras and zebrafish tumor xenograft model have emerged as valuable tools for dissecting the pathogenetic mechanisms of pancreatic cancer and for drug discovery and toxicology. Future analysis of the pancreas in zebrafish will continue to advance understanding of the genetic regulation and biological mechanisms during organogenesis. Results of those studies are expected to provide new insights into how aberrant developmental pathways contribute to formation and growth of pancreatic neoplasia, and hopefully generate valid biomarkers and targets as well as effective and safe therapeutics in pancreatic cancer.
Dissection of Insertion–Deletion Variants within Differentially Expressed Genes Involved in Wood Formation in Populus

PubMed Central

Gong, Chenrui; Du, Qingzhang; Xie, Jianbo; Quan, Mingyang; Chen, Beibei; Zhang, Deqiang

2018-01-01

Short insertions and deletions (InDels) are one of the major genetic variants and are distributed widely across the genome; however, few investigations of InDels have been conducted in long-lived perennial plants. Here, we employed a combination of RNA-seq and population resequencing to identify InDels within differentially expressed (DE) genes underlying wood formation in a natural population of Populus tomentosa (435 individuals) and utilized InDel-based association mapping to detect the causal variants under additive, dominance, and epistasis underlying growth and wood properties. In the present paper, 5,482 InDels detected from 629 DE genes showed uneven distributions throughout all 19 chromosomes, and 95.9% of these loci were diallelic InDels. Seventy-four InDels (positive false discovery rate q ≤ 0.10) from 68 genes exhibited significant additive/dominant effects on 10 growth and wood-properties, with an average of 14.7% phenotypic variance explained. Potential pleiotropy was observed in one-third of the InDels (representing 24 genes). Seven genes exhibited significantly differential expression among the genotypic classes of associated InDels, indicating possible important roles for these InDels. Epistasis analysis showed that overlapping interacting genes formed unique interconnected networks for each trait, supporting the putative biochemical links that control quantitative traits. Therefore, the identification and utilization of InDels in trees will be recognized as an effective marker system for molecular marker-assisted breeding applications, and further facilitate our understanding of quantitative genomics. PMID:29403506
Emerging techniques for the discovery and validation of therapeutic targets for skeletal diseases.

PubMed

Cho, Christine H; Nuttall, Mark E

2002-12-01

Advances in genomics and proteomics have revolutionised the drug discovery process and target validation. Identification of novel therapeutic targets for chronic skeletal diseases is an extremely challenging process based on the difficulty of obtaining high-quality human diseased versus normal tissue samples. The quality of tissue and genomic information obtained from the sample is critical to identifying disease-related genes. Using a genomics-based approach, novel genes or genes with similar homology to existing genes can be identified from cDNA libraries generated from normal versus diseased tissue. High-quality cDNA libraries are prepared from uncontaminated homogeneous cell populations harvested from tissue sections of interest. Localised gene expression analysis and confirmation are obtained through in situ hybridisation or immunohistochemical studies. Cells overexpressing the recombinant protein are subsequently designed for primary cell-based high-throughput assays that are capable of screening large compound banks for potential hits. Afterwards, secondary functional assays are used to test promising compounds. The same overexpressing cells are used in the secondary assay to test protein activity and functionality as well as screen for small-molecule agonists or antagonists. Once a hit is generated, a structure-activity relationship of the compound is optimised for better oral bioavailability and pharmacokinetics allowing the compound to progress into development. Parallel efforts from proteomics, as well as genetics/transgenics, bioinformatics and combinatorial chemistry, and improvements in high-throughput automation technologies, allow the drug discovery process to meet the demands of the medicinal market. This review discusses and illustrates how different approaches are incorporated into the discovery and validation of novel targets and, consequently, the development of potentially therapeutic agents in the areas of osteoporosis and osteoarthritis
Identification of Genes in the Phenylalanine Metabolic Pathway by Ectopic Expression of a MYB Transcription Factor in Tomato Fruit[W

PubMed Central

Dal Cin, Valeriano; Tieman, Denise M.; Tohge, Takayuki; McQuinn, Ryan; de Vos, Ric C.H.; Osorio, Sonia; Schmelz, Eric A.; Taylor, Mark G.; Smits-Kroon, Miriam T.; Schuurink, Robert C.; Haring, Michel A.; Giovannoni, James; Fernie, Alisdair R.; Klee, Harry J.

2011-01-01

Altering expression of transcription factors can be an effective means to coordinately modulate entire metabolic pathways in plants. It can also provide useful information concerning the identities of genes that constitute metabolic networks. Here, we used ectopic expression of a MYB transcription factor, Petunia hybrida ODORANT1, to alter Phe and phenylpropanoid metabolism in tomato (Solanum lycopersicum) fruits. Despite the importance of Phe and phenylpropanoids to plant and human health, the pathway for Phe synthesis has not been unambiguously determined. Microarray analysis of ripening fruits from transgenic and control plants permitted identification of a suite of coregulated genes involved in synthesis and further metabolism of Phe. The pattern of coregulated gene expression facilitated discovery of the tomato gene encoding prephenate aminotransferase, which converts prephenate to arogenate. The expression and biochemical data establish an arogenate pathway for Phe synthesis in tomato fruits. Metabolic profiling and 13C flux analysis of ripe fruits further revealed large increases in the levels of a specific subset of phenylpropanoid compounds. However, while increased levels of these human nutrition-related phenylpropanoids may be desirable, there were no increases in levels of Phe-derived flavor volatiles. PMID:21750236
Concept Formation in Scientific Knowledge Discovery from a Constructivist View

NASA Astrophysics Data System (ADS)

Peng, Wei; Gero, John S.

The central goal of scientific knowledge discovery is to learn cause-effect relationships among natural phenomena presented as variables and the consequences their interactions. Scientific knowledge is normally expressed as scientific taxonomies and qualitative and quantitative laws [1]. This type of knowledge represents intrinsic regularities of the observed phenomena that can be used to explain and predict behaviors of the phenomena. It is a generalization that is abstracted and externalized from a set of contexts and applicable to a broader scope. Scientific knowledge is a type of third-person knowledge, i.e., knowledge that independent of a specific enquirer. Artificial intelligence approaches, particularly data mining algorithms that are used to identify meaningful patterns from large data sets, are approaches that aim to facilitate the knowledge discovery process [2]. A broad spectrum of algorithms has been developed in addressing classification, associative learning, and clustering problems. However, their linkages to people who use them have not been adequately explored. Issues in relation to supporting the interpretation of the patterns, the application of prior knowledge to the data mining process and addressing user interactions remain challenges for building knowledge discovery tools [3]. As a consequence, scientists rely on their experience to formulate problems, evaluate hypotheses, reason about untraceable factors and derive new problems. This type of knowledge which they have developed during their career is called “first-person” knowledge. The formation of scientific knowledge (third-person knowledge) is highly influenced by the enquirer’s first-person knowledge construct, which is a result of his or her interactions with the environment. There have been attempts to craft automatic knowledge discovery tools but these systems are limited in their capabilities to handle the dynamics of personal experience. There are now trends in developing
Concept Formation in Scientific Knowledge Discovery from a Constructivist View

NASA Astrophysics Data System (ADS)

Peng, Wei; Gero, John S.

The central goal of scientific knowledge discovery is to learn cause-effect relationships among natural phenomena presented as variables and the consequences their interactions. Scientific knowledge is normally expressed as scientific taxonomies and qualitative and quantitative laws [1]. This type of knowledge represents intrinsic regularities of the observed phenomena that can be used to explain and predict behaviors of the phenomena. It is a generalization that is abstracted and externalized from a set of contexts and applicable to a broader scope. Scientific knowledge is a type of third-person knowledge, i.e., knowledge that independent of a specific enquirer. Artificial intelligence approaches, particularly data mining algorithms that are used to identify meaningful patterns from large data sets, are approaches that aim to facilitate the knowledge discovery process [2]. A broad spectrum of algorithms has been developed in addressing classification, associative learning, and clustering problems. However, their linkages to people who use them have not been adequately explored. Issues in relation to supporting the interpretation of the patterns, the application of prior knowledge to the data mining process and addressing user interactions remain challenges for building knowledge discovery tools [3]. As a consequence, scientists rely on their experience to formulate problems, evaluate hypotheses, reason about untraceable factors and derive new problems. This type of knowledge which they have developed during their career is called "first-person" knowledge. The formation of scientific knowledge (third-person knowledge) is highly influenced by the enquirer's first-person knowledge construct, which is a result of his or her interactions with the environment. There have been attempts to craft automatic knowledge discovery tools but these systems are limited in their capabilities to handle the dynamics of personal experience. There are now trends in developing
A role for physicians in ethnopharmacology and drug discovery.

PubMed

Raza, Mohsin

2006-04-06

Ethnopharmacology investigations classically involved traditional healers, botanists, anthropologists, chemists and pharmacologists. The role of some groups of researchers but not of physician has been highlighted and well defined in ethnopharmacological investigations. Historical data shows that discovery of several important modern drugs of herbal origin owe to the medical knowledge and clinical expertise of physicians. Current trends indicate negligible role of physicians in ethnopharmacological studies. Rising cost of modern drug development is attributed to the lack of classical ethnopharmacological approach. Physicians can play multiple roles in the ethnopharmacological studies to facilitate drug discovery as well as to rescue authentic traditional knowledge of use of medicinal plants. These include: (1) Ethnopharmacological field work which involves interviewing healers, interpreting traditional terminologies into their modern counterparts, examining patients consuming herbal remedies and identifying the disease for which an herbal remedy is used. (2) Interpretation of signs and symptoms mentioned in ancient texts and suggesting proper use of old traditional remedies in the light of modern medicine. (3) Clinical studies on herbs and their interaction with modern medicines. (4) Advising pharmacologists to carryout laboratory studies on herbs observed during field studies. (5) Work in collaboration with local healers to strengthen traditional system of medicine in a community. In conclusion, physician's involvement in ethnopharmacological studies will lead to more reliable information on traditional use of medicinal plants both from field and ancient texts, more focused and cheaper natural product based drug discovery, as well as bridge the gap between traditional and modern medicine.
Pharmacogenetics in type 2 diabetes: precision medicine or discovery tool?

PubMed

Florez, Jose C

2017-05-01

In recent years, technological and analytical advances have led to an explosion in the discovery of genetic loci associated with type 2 diabetes. However, their ability to improve prediction of disease outcomes beyond standard clinical risk factors has been limited. On the other hand, genetic effects on drug response may be stronger than those commonly seen for disease incidence. Pharmacogenetic findings may aid in identifying new drug targets, elucidate pathophysiology, unravel disease heterogeneity, help prioritise specific genes in regions of genetic association, and contribute to personalised or precision treatment. In diabetes, precedent for the successful application of pharmacogenetic concepts exists in its monogenic subtypes, such as MODY or neonatal diabetes. Whether similar insights will emerge for the much more common entity of type 2 diabetes remains to be seen. As genetic approaches advance, the progressive deployment of candidate gene, large-scale genotyping and genome-wide association studies has begun to produce suggestive results that may transform clinical practice. However, many barriers to the translation of diabetes pharmacogenetic discoveries to the clinic still remain. This perspective offers a contemporary overview of the field with a focus on sulfonylureas and metformin, identifies the major uses of pharmacogenetics, and highlights potential limitations and future directions.
Strategies for Discovery of Small Molecule Radiation Protectors and Radiation Mitigators

PubMed Central

Greenberger, Joel S.; Clump, David; Kagan, Valerian; Bayir, Hülya; Lazo, John S.; Wipf, Peter; Li, Song; Gao, Xiang; Epperly, Michael W.

2011-01-01

Mitochondrial targeted radiation damage protectors (delivered prior to irradiation) and mitigators (delivered after irradiation, but before the appearance of symptoms associated with radiation syndrome) have been a recent focus in drug discovery for (1) normal tissue radiation protection during fractionated radiotherapy, and (2) radiation terrorism counter measures. Several categories of such molecules have been discovered: nitroxide-linked hybrid molecules, including GS-nitroxide, GS-nitric oxide synthase inhibitors, p53/mdm2/mdm4 inhibitors, and pharmaceutical agents including inhibitors of the phosphoinositide-3-kinase pathway and the anti-seizure medicine, carbamazepine. Evaluation of potential new radiation dose modifying molecules to protect normal tissue includes: clonogenic radiation survival curves, assays for apoptosis and DNA repair, and irradiation-induced depletion of antioxidant stores. Studies of organ specific radioprotection and in total body irradiation-induced hematopoietic syndrome in the mouse model for protection/mitigation facilitate rational means by which to move candidate small molecule drugs along the drug discovery pipeline into clinical development. PMID:22655254
In silico method for modelling metabolism and gene product expression at genome scale

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lerman, Joshua A.; Hyduke, Daniel R.; Latif, Haythem

2012-07-03

Transcription and translation use raw materials and energy generated metabolically to create the macromolecular machinery responsible for all cellular functions, including metabolism. A biochemically accurate model of molecular biology and metabolism will facilitate comprehensive and quantitative computations of an organism's molecular constitution as a function of genetic and environmental parameters. Here we formulate a model of metabolism and macromolecular expression. Prototyping it using the simple microorganism Thermotoga maritima, we show our model accurately simulates variations in cellular composition and gene expression. Moreover, through in silico comparative transcriptomics, the model allows the discovery of new regulons and improving the genome andmore » transcription unit annotations. Our method presents a framework for investigating molecular biology and cellular physiology in silico and may allow quantitative interpretation of multi-omics data sets in the context of an integrated biochemical description of an organism.« less
Sphingosine 1-Phosphate Receptor Modulators and Drug Discovery

PubMed Central

Park, Soo-Jin; Im, Dong-Soon

2017-01-01

Initial discovery on sphingosine 1-phosphate (S1P) as an intracellular second messenger was faced unexpectedly with roles of S1P as a first messenger, which subsequently resulted in cloning of its G protein-coupled receptors, S1P1–5. The molecular identification of S1P receptors opened up a new avenue for pathophysiological research on this lipid mediator. Cellular and molecular in vitro studies and in vivo studies on gene deficient mice have elucidated cellular signaling pathways and the pathophysiological meanings of S1P receptors. Another unexpected finding that fingolimod (FTY720) modulates S1P receptors accelerated drug discovery in this field. Fingolimod was approved as a first-in-class, orally active drug for relapsing multiple sclerosis in 2010, and its applications in other disease conditions are currently under clinical trials. In addition, more selective S1P receptor modulators with better pharmacokinetic profiles and fewer side effects are under development. Some of them are being clinically tested in the contexts of multiple sclerosis and other autoimmune and inflammatory disorders, such as, psoriasis, Crohn’s disease, ulcerative colitis, polymyositis, dermatomyositis, liver failure, renal failure, acute stroke, and transplant rejection. In this review, the authors discuss the state of the art regarding the status of drug discovery efforts targeting S1P receptors and place emphasis on potential clinical applications. PMID:28035084
Routine Discovery of Complex Genetic Models using Genetic Algorithms

PubMed Central

Moore, Jason H.; Hahn, Lance W.; Ritchie, Marylyn D.; Thornton, Tricia A.; White, Bill C.

2010-01-01

Simulation studies are useful in various disciplines for a number of reasons including the development and evaluation of new computational and statistical methods. This is particularly true in human genetics and genetic epidemiology where new analytical methods are needed for the detection and characterization of disease susceptibility genes whose effects are complex, nonlinear, and partially or solely dependent on the effects of other genes (i.e. epistasis or gene-gene interaction). Despite this need, the development of complex genetic models that can be used to simulate data is not always intuitive. In fact, only a few such models have been published. We have previously developed a genetic algorithm approach to discovering complex genetic models in which two single nucleotide polymorphisms (SNPs) influence disease risk solely through nonlinear interactions. In this paper, we extend this approach for the discovery of high-order epistasis models involving three to five SNPs. We demonstrate that the genetic algorithm is capable of routinely discovering interesting high-order epistasis models in which each SNP influences risk of disease only through interactions with the other SNPs in the model. This study opens the door for routine simulation of complex gene-gene interactions among SNPs for the development and evaluation of new statistical and computational approaches for identifying common, complex multifactorial disease susceptibility genes. PMID:20948983
37 CFR 11.52 - Discovery.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 37 Patents, Trademarks, and Copyrights 1 2013-07-01 2013-07-01 false Discovery. 11.52 Section 11... Disciplinary Proceedings; Jurisdiction, Sanctions, Investigations, and Proceedings § 11.52 Discovery. Discovery... establishes that discovery is reasonable and relevant, the hearing officer, under such conditions as he or she...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.