Integration of pines into the large scope of plant biology research depends on study of pines in parallel with study of annual plants, and on availability of research materials from pine to plant biologists interested in comparing pine with annual plant systems. The objectives of the Pine Gene Discovery Project were to obtain 10,000 partial DNA sequences of genes expressed in loblolly pine, to determine which of those pine genes were similar to known genes from other organisms, and to make the DNA sequences and isolated pine genes available to plant researchers to stimulate integration of pines into the wider scope of plant biology research. Those objectives have been completed, and the results are available to the public. Requests for pine genes have been received from a number of laboratories that would otherwise not have included pine in their research, indicating that progress is being made toward the goal of integrating pine research into the larger molecular biology research community.
Whetten, R. W.; Sederoff, R. R.; Kinlaw, C.; Retzel, E.
Inherited monogenic disease has an enormous impact on the well-being of children and their families. Over half of the children living with one of these conditions are without a molecular diagnosis because of the rarity of the disease, the marked clinical heterogeneity, and the reality that there are thousands of rare diseases for which causative mutations have yet to be identified. It is in this context that in 2010 a Canadian consortium was formed to rapidly identify mutations causing a wide spectrum of pediatric-onset rare diseases by using whole-exome sequencing. The FORGE (Finding of Rare Disease Genes) Canada Consortium brought together clinicians and scientists from 21 genetics centers and three science and technology innovation centers from across Canada. From nation-wide requests for proposals, 264 disorders were selected for study from the 371 submitted; disease-causing variants (including in 67 genes not previously associated with human disease; 41 of these have been genetically or functionally validated, and 26 are currently under study) were identified for 146 disorders over a 2-year period. Here, we present our experience with four strategies employed for gene discovery and discuss FORGE's impact in a number of realms, from clinical diagnostics to the broadening of the phenotypic spectrum of many diseases to the biological insight gained into both disease states and normal human development. Lastly, on the basis of this experience, we discuss the way forward for rare-disease genetic discovery both in Canada and internationally. PMID:24906018
Beaulieu, Chandree L; Majewski, Jacek; Schwartzentruber, Jeremy; Samuels, Mark E; Fernandez, Bridget A; Bernier, Francois P; Brudno, Michael; Knoppers, Bartha; Marcadier, Janet; Dyment, David; Adam, Shelin; Bulman, Dennis E; Jones, Steve J M; Avard, Denise; Nguyen, Minh Thu; Rousseau, Francois; Marshall, Christian; Wintle, Richard F; Shen, Yaoqing; Scherer, Stephen W; Friedman, Jan M; Michaud, Jacques L; Boycott, Kym M
This Project Information Package (PIP) has been designed to provide teachers with the information needed to teach in the Discovery through Reading Project, originally developed by the Clarkston School District in Michigan. The Project uses a modified tutorial approach for second- and third-grade students who encounter reading difficulties in their…
CEMREL, Inc., Minneapolis, MN.
Despite recent discoveries in the genetics of sporadic Alzheimer's disease, there remains substantial "hidden heritability." It is thought that some of this missing heritability may be because of gene-gene, i.e., epistatic, interactions. We examined potential epistasis between 110 candidate polymorphisms in 1757 cases of Alzheimer's disease and 6294 control subjects of the Epistasis Project, divided between a discovery and a replication dataset. We found an epistatic interaction, between rs7483 in GSTM3 and rs1111875 in the HHEX/IDE/KIF11 gene cluster, with a closely similar, significant result in both datasets. The synergy factor (SF) in the combined dataset was 1.79, 95% confidence interval [CI], 1.35-2.36; p = 0.00004. Consistent interaction was also found in 7 out of the 8 additional subsets that we examined post hoc: i.e., it was shown in both North Europe and North Spain, in both men and women, in both those with and without the ?4 allele of apolipoprotein E, and in people older than 75 years (SF, 2.27; 95% CI, 1.60-3.20; p < 0.00001), but not in those younger than 75 years (SF, 1.06; 95% CI, 0.59-1.91; p = 0.84). The association with Alzheimer's disease was purely epistatic with neither polymorphism showing an independent effect: odds ratio, 1.0; p ? 0.7. Indeed, each factor was associated with protection in the absence of the other factor, but with risk in its presence. In conclusion, this epistatic interaction showed a high degree of consistency when stratifying by sex, the ?4 allele of apolipoprotein E genotype, and geographic region. PMID:23036584
Bullock, James M; Medway, Christopher; Cortina-Borja, Mario; Turton, James C; Prince, Jonathan A; Ibrahim-Verbaas, Carla A; Schuur, Maaike; Breteler, Monique M; van Duijn, Cornelia M; Kehoe, Patrick G; Barber, Rachel; Coto, Eliecer; Alvarez, Victoria; Deloukas, Panos; Hammond, Naomi; Combarros, Onofre; Mateo, Ignacio; Warden, Donald R; Lehmann, Michael G; Belbin, Olivia; Brown, Kristelle; Wilcock, Gordon K; Heun, Reinhard; Kölsch, Heike; Smith, A David; Lehmann, Donald J; Morgan, Kevin
The elucidation of the human and mouse genome sequence and developments in high-throughput genome analysis, and in computational tools, have made it possible to profile entire cancer genomes. In parallel with these advances mouse models of cancer have evolved into a powerful tool for cancer gene discovery. Here we discuss the approaches that may be used for cancer gene identification in both human and mouse and discuss how a cross-species ‘oncogenomics’ approach to cancer gene discovery represents a powerful strategy for finding genes that drive tumourigenesis.
Mattison, Jenny; van der Weyden, Louise; Hubbard, Tim; Adams, David J.
Background Triatoma infestans is the most relevant vector of Chagas disease in the southern cone of South America. Since its genome has not yet been studied, sequencing of Expressed Sequence Tags (ESTs) is one of the most powerful tools for efficiently identifying large numbers of expressed genes in this insect vector. Results In this work, we generated 826 ESTs, resulting in an increase of 47% in the number of ESTs available for T. infestans. These ESTs were assembled in 471 unique sequences, 151 of which represent 136 new genes for the Reduviidae family. Conclusions Among the putative new genes for the Reduviidae family, we identified and described an interesting subset of genes involved in development and reproduction, which constitute potential targets for insecticide development.
The rise of comparative genomics and related technologies has added important new dimensions to the study of human evolution. Our knowledge of the genes that underwent expression changes or were targets of positive selection in human evolution is rapidly increasing, as is our knowledge of gene duplications, translocations, and deletions. It is now clear that the genetic differences between humans and chimpanzees are far more extensive than previously thought; their genomes are not 98% or 99% identical. Despite the rapid growth in our understanding of the evolution of the human genome, our understanding of the relationship between genetic changes and phenotypic changes is tenuous. This is true even for the most intensively studied gene, FOXP2, which underwent positive selection in the human terminal lineage and is thought to have played an important role in the evolution of human speech and language. In part, the difficulty of connecting genes to phenotypes reflects our generally poor knowledge of human phenotypic specializations, as well as the difficulty of interpreting the consequences of genetic changes in species that are not amenable to invasive research. On the positive side, investigations of FOXP2, along with genomewide surveys of gene-expression changes and selection-driven sequence changes, offer the opportunity for “phenotype discovery,” providing clues to human phenotypic specializations that were previously unsuspected. What is more, at least some of the specializations that have been proposed are amenable to testing with noninvasive experimental techniques appropriate for the study of humans and apes.
Preuss, Todd M.
Abstract Analysis of large gene expression data sets in the presence and absence of a phenotype can lead to the selection of a group of genes serving as biomarkers jointly predicting the phenotype. Among gene selection methods, filter methods derived from ranked individual genes have been widely used in existing products for diagnosis and prognosis. Univariate filter approaches selecting genes individually, although computationally efficient, often ignore gene interactions inherent in the biological data. On the other hand, multivariate approaches selecting gene subsets are known to have a higher risk of selecting spurious gene subsets due to the overfitting of the vast number of gene subsets evaluated. Here we propose a framework of statistical significance tests for multivariate feature selection that can reduce the risk of selecting spurious gene subsets. Using three existing data sets, we show that our proposed approach is an essential step to identify such a gene set that is generated by a significant interaction of its members, even improving classification performance when compared to established approaches. This technique can be applied for the discovery of robust biomarkers for medical diagnosis.
Kim, Hoon; Watkinson, John
Analysis of large gene expression data sets in the presence and absence of a phenotype can lead to the selection of a group of genes serving as biomarkers jointly predicting the phenotype. Among gene selection methods, filter methods derived from ranked individual genes have been widely used in existing products for diagnosis and prognosis. Univariate filter approaches selecting genes individually, although computationally efficient, often ignore gene interactions inherent in the biological data. On the other hand, multivariate approaches selecting gene subsets are known to have a higher risk of selecting spurious gene subsets due to the overfitting of the vast number of gene subsets evaluated. Here we propose a framework of statistical significance tests for multivariate feature selection that can reduce the risk of selecting spurious gene subsets. Using three existing data sets, we show that our proposed approach is an essential step to identify such a gene set that is generated by a significant interaction of its members, even improving classification performance when compared to established approaches. This technique can be applied for the discovery of robust biomarkers for medical diagnosis. PMID:21457009
Kim, Hoon; Watkinson, John; Anastassiou, Dimitris
The GeneEd website was created by the National Library of Medicine (NLM), the National Human Genome Research Institute (NHGRI), and the National Institutes of Health (NIH) as a helpful resource for the teaching and learning of genetics. On the site, visitors can find labs and experiments, fact sheets, and teacher resources on topics including DNA forensics, genetic conditions, evolution, and biostatistics. First-time visitors will want to start their journey by looking over the Topics tab at the top of the page. There are 40 different thematic areas here consisting of articles, video clips, webcasts, and links to additional quality resources vetted by the GeneEd web team. The Labs & Experiments section includes virtual labs that explore the genetics of different organisms as well as links to resources provided by the Howard Hughes Medical Institute and Cold Spring Harbor Laboratory. Young people may also wish to take a look at the Careers in Genetics section as it features interviews with scientists that will inspire and delight.
Insertional mutagenesis has been utilized as a functional forward genetics screen for the identification of novel genes involved in the pathogenesis of human cancers. Different insertional mutagens have been successfully used to reveal new cancer genes. For example, retroviruses (RVs) are integrating viruses with the capacity to induce the deregulation of genes in the neighborhood of the insertion site. RVs have been employed for more than 30 years to identify cancer genes in the hematopoietic system and mammary gland. Similarly, another tool that has revolutionized cancer gene discovery is the cut-and-paste transposons. These DNA elements have been engineered to contain strong promoters and stop cassettes that may function to perturb gene expression upon integration proximal to genes. In addition, complex mouse models characterized by tissue-restricted activity of transposons have been developed to identify oncogenes and tumor suppressor genes that control the development of a wide range of solid tumor types, extending beyond those tissues accessible using RV-based approaches. Most recently, lentiviral vectors (LVs) have appeared on the scene for use in cancer gene screens. LVs are replication defective integrating vectors that have the advantage of being able to infect non-dividing cells, in a wide range of cell types and tissues. In this review, we describe the various insertional mutagens focusing on their advantages/limitations and we discuss the new and promising tools that will improve the insertional mutagenesis screens of the future.
Ranzani, Marco; Annunziato, Stefano; Adams, David J.; Montini, Eugenio
Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.
Cun, Yupeng; Frohlich, Holger
Retroviral short hairpin RNA (shRNA)–mediated genetic screens in mammalian cells are powerful tools for discovering loss-of-function phenotypes. We describe a highly parallel multiplex methodology for screening large pools of shRNAs using half-hairpin barcodes for microarray deconvolution. We carried out dropout screens for shRNAs that affect cell proliferation and viability in cancer cells and normal cells. We identified many shRNAs to be antiproliferative that target core cellular processes, such as the cell cycle and protein translation, in all cells examined. Moreover, we identified genes that are selectively required for proliferation and survival in different cell lines. Our platform enables rapid and cost-effective genome-wide screens to identify cancer proliferation and survival genes for target discovery. Such efforts are complementary to the Cancer Genome Atlas and provide an alternative functional view of cancer cells.
Schlabach, Michael R.; Luo, Ji; Solimini, Nicole L.; Hu, Guang; Xu, Qikai; Li, Mamie Z.; Zhao, Zhenming; Smogorzewska, Agata; Sowa, Mathew E.; Ang, Xiaolu L.; Westbrook, Thomas F.; Liang, Anthony C.; Chang, Kenneth; Hackett, Jennifer A.; Harper, J. Wade; Hannon, Gregory J.; Elledge, Stephen J.
The ultimate goal of this project is to identify the human gene(s) responsible for the disorder known as IBD. The work was planned in two phases. The desired products resulting from Phase 1 were BAC clone(s) containing the genetic marker(s) identified by gene/Networks, Inc. as potentially linked to IBD, plasmid subclones of those BAC(s), and new genetic markers developed from these plasmid subclones. The newly developed markers would be genotyped by gene/Networks, Inc. to ascertain evidence for linkage or non-linkage of IBD to this region. If non-linkage was indicated, the project would move to investigation of other candidate chromosomal regions. Where linkage was indicated, the project would move to Phase 2, in which a physical map of the candidate region(s) would be developed. The products of this phase would be contig(s) of BAC clones in the region exhibiting linkage to IBD, as well as plasmic subclones of the BACs and further genetic marker development. There would also be continued genotyping with new polymorphic markers during this phase. It was anticipated that clones identified and developed during these two phases would provide the physical resources for eventual disease gene discovery.
Launched in October 2009, the CARTaGENE project is the largest prospective longitudinal cohort of Québec. CARTaGENE's distinguishing features are that it is a quantitative prospective cohort that has deeply phenotyped 20,000 individuals aged 40-69, the age most individuals will develop chronic disease. 37,000 individuals will be enrolled by 2014. It is an open-access infrastructure enabling researchers to investigate the genetic, environmental, and lifestyle determinants of disease in the French Canadian population. CARTaGENE has collected whole blood for DNA from 30,000 individuals.
The Merck Gene Index project (MGIP) fills an important niche in the Human Genome Project by directly identifying genes through sequences of their transcripts and placing in the public domain a set of EST sequences and associated clones for the uniquely expressed human genes. The MGIP promotes the unrestricted exchange of human genomic data, and facilitates progress in biomedical research
Alan R. Williamson
Background: Hereditary hemochromatosis is an inher- ited disorder of iron metabolism that is characterized by excessive iron deposition in major organs of the body. Chronic increased iron absorption leads to multiorgan dysfunction. Since the discovery of the gene responsible for the majority of cases, research has progressed rapidly to identify the gene product, the effects of mutations, and the implications
Elaine Lyon; Elizabeth L. Frank
Mutations or aberrations of the von Hippel-Lindau gene are responsible for the hereditary neoplastic syndrome that bears the same name, as well as for the majority of sporadic clear cell renal cell carcinomas. The discovery of this gene and subsequent clarification of its mechanism of action have led to a series of targeted treatments for advanced kidney cancer and have dramatically changed how we manage this disease. The discovery of the VHL gene is a prime example of how discoveries at the bench can inform and revolutionize therapeutics at the bedside. In this review, the authors trace this illuminating tale, from the cloning of the VHL gene, to elucidating its biologic function, to the development of novel therapeutics that have dramatically changed the paradigm of managing advanced renal cell carcinoma. PMID:18800388
Clark, Peter E; Cookson, Michael S
This is an update of a 1991 review on tumor suppressor genes written at a time when understanding of how the genes work was limited. A recent major breakthrough in the understanding of the function of tumor suppressor genes is discussed. (LZ)
Oppenheimer, Steven B.
Background The use of gene signatures can potentially be of considerable value in the field of clinical diagnosis. However, gene signatures defined with different methods can be quite various even when applied the same disease and the same endpoint. Previous studies have shown that the correct selection of subsets of genes from microarray data is key for the accurate classification of disease phenotypes, and a number of methods have been proposed for the purpose. However, these methods refine the subsets by only considering each single feature, and they do not confirm the association between the genes identified in each gene signature and the phenotype of the disease. We proposed an innovative new method termed Minimize Feature's Size (MFS) based on multiple level similarity analyses and association between the genes and disease for breast cancer endpoints by comparing classifier models generated from the second phase of MicroArray Quality Control (MAQC-II), trying to develop effective meta-analysis strategies to transform the MAQC-II signatures into a robust and reliable set of biomarker for clinical applications. Results We analyzed the similarity of the multiple gene signatures in an endpoint and between the two endpoints of breast cancer at probe and gene levels, the results indicate that disease-related genes can be preferably selected as the components of gene signature, and that the gene signatures for the two endpoints could be interchangeable. The minimized signatures were built at probe level by using MFS for each endpoint. By applying the approach, we generated a much smaller set of gene signature with the similar predictive power compared with those gene signatures from MAQC-II. Conclusions Our results indicate that gene signatures of both large and small sizes could perform equally well in clinical applications. Besides, consistency and biological significances can be detected among different gene signatures, reflecting the studying endpoints. New classifiers built with MFS exhibit improved performance with both internal and external validation, suggesting that MFS method generally reduces redundancies for features within gene signatures and improves the performance of the model. Consequently, our strategy will be beneficial for the microarray-based clinical applications.
Cytogenetics is the study of chromosomes and how changes in chromosome structure and number affect the individual. In this video, Professor Porteous describes the process of hunting for the DISC1 gene, a gene disrupted by a balanced translocation on chromosome 1q42.
Recently, it has become possible to mobilize the Tc1\\/mariner transposon, Sleeping Beauty (SB), in mouse somatic cells at frequencies high enough to induce cancer. Tumours result from SB insertional mutagenesis of cancer genes, thus facilitating the identification of the genes and signalling pathways that drive tumour formation. A conditional SB transposition system has also been developed that makes it possible
Neal G. Copeland; Nancy A. Jenkins
Presents a laboratory in which students are provided with cultures of three bacterial strains. Using the results, students will determine which of the strains corresponds to a mutant lacking a particular functional gene. (DDR)
Genetic testing is important for diagnosis and predic- tion of many diseases. The development of a clinical genetic test can be rapid for common disorders, but for rare genetic disorders this process can take years, if it occurs at all. We review the path from gene discovery to development of a clinical genetic test, using frontotem- poral dementia with parkinsonism
Vivianna M. Van Deerlin; Lisa H. Gill; Jennifer M. Farmer; John Q. Trojanowski
April 29, 2014 1:00 PM - 2:00 PM Shady Grove, Room TE408/410 + Add to Outlook Calendar Speaker: Michael Dean, Ph.D. Chief, Human Genetics Section Laboratory of Experimental Immunology, CCR, NCI Print This Page Cancer Genes: Discovery and Function
ABSTRACT: BACKGROUND: To date, few peptides in the complex mixture of platypus venom have been identified and sequenced, in part due to the limited amounts of platypus venom available to study. We have constructed and sequenced a cDNA library from an active platypus venom gland to identify the remaining components. RESULTS: We identified 83 novel putative platypus venom genes from
Camilla M Whittington; Anthony T Papenfuss; Devin P Locke; Elaine R Mardis; Richard K Wilson; Sahar Abubucker; Makedonka Mitreva; Emily SW Wong; Arthur L Hsu; Philip W Kuchel; Katherine Belov; Wesley C Warren
This review makes the case that gene discovery is a worthwhile approach to the study of ingestive behavior in general and to calcium appetite in particular. A description of the methods used to discover genes is provided for non-geneticists. Areas covered include the characterization of an appropriate phenotype, the choice of suitable mouse strains, the generation of a hybrid cross, interval mapping, congenic strain production, and candidate gene analysis. The approach is illustrated with an example involving mice of the C57BL/6J and PWK/PhJ strains, which differ in avidity for calcium solutions. The variation between the strains can be attributed to at least seven quantitative trait loci (QTLs). One of these QTLs is most likely accounted for by Tas1r3, which is a gene involved in the detection of sweet and umami tastes. The discovery of a novel function for a gene with no previously known role in calcium consumption illustrates the power of gene discovery methods to uncover novel mechanisms.
Tordoff, Michael G.
In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.
Marcelo Bento Soares
Background Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. However, when class information is unavailable, discovering gene expression patterns becomes difficult. Methods For a gene pool with large number of genes, we first cluster the genes into smaller groups. In each group, we use the representative gene, one with highest interdependence with others in the group, to drive the discretization of the gene expression levels of other genes. Treating intervals as discrete events, association patterns of events can be discovered. If the gene groups obtained are crisp gene clusters, significant patterns overlapping different gene clusters cannot be found. This paper presents a new method of “fuzzifying” the crisp gene clusters to overcome such problem. Results To evaluate the effectiveness of our approach, we first apply the above described procedure on a synthetic data set and then a gene expression data set with known class labels. The class labels are not being used in both analyses but used later as the ground truth in a classificatory problem for assessing the algorithm’s effectiveness in fuzzy gene clustering and discretization. The results show the efficacy of the proposed method. The existence of correlation among continuous valued gene expression levels suggests that certain genes in the gene groups have high interdependence with other genes in the group. Fuzzification of a crisp gene cluster allows the cluster to take in genes from other clusters so that overlapping relationship among gene clusters could be uncovered. Hence, previously unknown hidden patterns resided in overlapping gene clusters are discovered. From the experimental results, the high order patterns discovered reveal multiple gene interaction patterns in cancerous tissues not found in normal tissues. It was also found that for the colon cancer experiment, 70% of the top patterns and most of the discriminative patterns between cancerous and normal tissues are among those spanning across different crisp gene clusters. Conclusions We show that the proposed method for analyzing the error-prone microarray is effective even without the presence of tissue class information. A unified framework is presented, allowing fast and accurate pattern discovery for gene expression data. For a large gene set, to discover a comprehensive set of patterns, gene clustering, gene expression discretization and gene cluster fuzzification are absolutely necessary.
Filamentous fungi are a large group of diverse and economically important microorganisms. Large-scale gene disruption strategies developed in budding yeast are not applicable to these organisms because of their larger genomes and lower rate of targeted integration (TI) during transformation. We developed transposon-arrayed gene knockouts (TAGKO) to discover genes and simultaneously create gene disruption cassettes for subsequent transformation and mutant
Lisbeth Hamer; Kiichi Adachi; Maria V. Montenegro-Chamorro; Matthew M. Tanzer; Sanjoy K. Mahanty; Clive Lo; Rex W. Tarpey; Amy R. Skalchunes; Ryan W. Heiniger; Sheryl A. Frank; Blaise A. Darveaux; David J. Lampe; Ted M. Slater; Lakshman Ramamurthy; Todd M. Dezwaan; Grant H. Nelson; Jeffrey R. Shuster; Jeffrey Woessner; John E. Hamer
Background To date, few peptides in the complex mixture of platypus venom have been identified and sequenced, in part due to the limited amounts of platypus venom available to study. We have constructed and sequenced a cDNA library from an active platypus venom gland to identify the remaining components. Results We identified 83 novel putative platypus venom genes from 13 toxin families, which are homologous to known toxins from a wide range of vertebrates (fish, reptiles, insectivores) and invertebrates (spiders, sea anemones, starfish). A number of these are expressed in tissues other than the venom gland, and at least three of these families (those with homology to toxins from distant invertebrates) may play non-toxin roles. Thus, further functional testing is required to confirm venom activity. However, the presence of similar putative toxins in such widely divergent species provides further evidence for the hypothesis that there are certain protein families that are selected preferentially during evolution to become venom peptides. We have also used homology with known proteins to speculate on the contributions of each venom component to the symptoms of platypus envenomation. Conclusions This study represents a step towards fully characterizing the first mammal venom transcriptome. We have found similarities between putative platypus toxins and those of a number of unrelated species, providing insight into the evolution of mammalian venom.
The composition of oils, proteins, and carbohydrates in seeds of corn, soybean, and other crops has been modified to produce grains with enhanced value. Both plant breeding and molecular technologies have been used to produce plants carrying the desired traits. Genomics-based strategies for gene discovery, coupled with high-throughput transformation processes and miniaturized, automated analytical and functionality assays, have accelerated the identification of product candidates. Molecular markerÃ¢ÂÂbased breeding strategies have been used to accelerate the process of moving trait genes into high-yielding germplasm for commercialization. These products are being tested for applications in food, feed, and industrial markets.
Barbara Mazur (DuPont Agricultural Products Experimental Station;); Enno Krebers (DuPont Agricultural Products Experimental Station;); Scott Tingey (DuPont Agricultural Products Experimental Station;)
Motivation: Computational methods are widely used to discover gene–disease relationships hidden in vast masses of available genomic and post-genomic data. In most current methods, a similarity measure is calculated between gene annotations and known disease genes or disease descriptions. However, more explicit gene–disease relationships are required for better insights into the molecular bases of diseases, especially for complex multi-gene diseases. Results: Explicit relationships between genes and diseases are formulated as candidate gene definitions that may include intermediary genes, e.g. orthologous or interacting genes. These definitions guide data modelling in our database approach for gene–disease relationship discovery and are expressed as views which ultimately lead to the retrieval of documented sets of candidate genes. A system called ACGR (Approach for Candidate Gene Retrieval) has been implemented and tested with three case studies including a rare orphan gene disease. Availability: The ACGR sources are freely available at http://bioinfo.loria.fr/projects/acgr/acgr-software/. See especially the file ‘disease_description’ and the folders ‘Xcollect_scenarios’ and ‘ACGR_views’. Contact: email@example.com Supplementary information: Supplementary data are available at Bioinformatics online.
Yilmaz, S.; Jonveaux, P.; Bicep, C.; Pierron, L.; Smail-Tabbone, M.; Devignes, M.D.
Advances in pig gene identification, mapping and functional analysis have continued to make rapid progress. The porcine genetic linkage map now has nearly 3000 loci, including several hundred genes, and is likely to expand considerably in the next few years, with many more genes and amplified fragment length polymorphism (AFLP) markers being added to the map. The physical genetic map is also growing rapidly and has over 3000 genes and markers. Several recent quantitative trait loci (QTL) scans and candidate gene analyses have identified important chromosomal regions and individual genes associated with traits of economic interest. The commercial pig industry is actively using this information and traditional performance information to improve pig production by marker-assisted selection (MAS). Research to study the co-expression of thousands of genes is now advancing and methods to combine these approaches to aid in gene discovery are under way. The pig's role in xenotransplantation and biomedical research makes the study of its genome important for the study of human disease. This review will briefly describe advances made, directions for future research and the implications for both the pig industry and human health.
Bacteriocins represent a large family of ribosomally produced peptide antibiotics. Here we describe the discovery of a widely conserved biosynthetic gene cluster for the synthesis of thiazole and oxazole heterocycles on ribosomally produced peptides. These clusters encode a toxin precursor and all necessary proteins for toxin maturation and export. Using the toxin precursor peptide and heterocycle-forming synthetase proteins from the human pathogen Streptococcus pyogenes, we demonstrate the in vitro reconstitution of streptolysin S activity. We provide evidence that the synthetase enzymes, as predicted from our bioinformatics analysis, introduce heterocycles onto precursor peptides, thereby providing molecular insight into the chemical structure of streptolysin S. Furthermore, our studies reveal that the synthetase exhibits relaxed substrate specificity and modifies toxin precursors from both related and distant species. Given our findings, it is likely that the discovery of similar peptidic toxins will rapidly expand to existing and emerging genomes.
Lee, Shaun W.; Mitchell, Douglas A.; Markley, Andrew L.; Hensler, Mary E.; Gonzalez, David; Wohlrab, Aaron; Dorrestein, Pieter C.; Nizet, Victor; Dixon, Jack E.
Filamentous fungi are a large group of diverse and economically important microorganisms. Large-scale gene disruption strategies developed in budding yeast are not applicable to these organisms because of their larger genomes and lower rate of targeted integration (TI) during transformation. We developed transposon-arrayed gene knockouts (TAGKO) to discover genes and simultaneously create gene disruption cassettes for subsequent transformation and mutant analysis. Transposons carrying a bacterial and fungal drug resistance marker are used to mutagenize individual cosmids or entire libraries in vitro. Cosmids are annotated by DNA sequence analysis at the transposon insertion sites, and cosmid inserts are liberated to direct insertional mutagenesis events in the genome. Based on saturation analysis of a cosmid insert and insertions in a fungal cosmid library, we show that TAGKO can be used to rapidly identify and mutate genes. We further show that insertions can create alterations in gene expression, and we have used this approach to investigate an amino acid oxidation pathway in two important fungal phytopathogens. PMID:11296265
Hamer, L; Adachi, K; Montenegro-Chamorro, M V; Tanzer, M M; Mahanty, S K; Lo, C; Tarpey, R W; Skalchunes, A R; Heiniger, R W; Frank, S A; Darveaux, B A; Lampe, D J; Slater, T M; Ramamurthy, L; DeZwaan, T M; Nelson, G H; Shuster, J R; Woessner, J; Hamer, J E
Filamentous fungi are a large group of diverse and economically important microorganisms. Large-scale gene disruption strategies developed in budding yeast are not applicable to these organisms because of their larger genomes and lower rate of targeted integration (TI) during transformation. We developed transposon-arrayed gene knockouts (TAGKO) to discover genes and simultaneously create gene disruption cassettes for subsequent transformation and mutant analysis. Transposons carrying a bacterial and fungal drug resistance marker are used to mutagenize individual cosmids or entire libraries in vitro. Cosmids are annotated by DNA sequence analysis at the transposon insertion sites, and cosmid inserts are liberated to direct insertional mutagenesis events in the genome. Based on saturation analysis of a cosmid insert and insertions in a fungal cosmid library, we show that TAGKO can be used to rapidly identify and mutate genes. We further show that insertions can create alterations in gene expression, and we have used this approach to investigate an amino acid oxidation pathway in two important fungal phytopathogens.
Hamer, Lisbeth; Adachi, Kiichi; Montenegro-Chamorro, Maria V.; Tanzer, Matthew M.; Mahanty, Sanjoy K.; Lo, Clive; Tarpey, Rex W.; Skalchunes, Amy R.; Heiniger, Ryan W.; Frank, Sheryl A.; Darveaux, Blaise A.; Lampe, David J.; Slater, Ted M.; Ramamurthy, Lakshman; DeZwaan, Todd M.; Nelson, Grant H.; Shuster, Jeffrey R.; Woessner, Jeffrey; Hamer, John E.
The sequencing of the human genome and the genomes of several model organisms is the first step toward the long- term objective of genetic research: the identification of all genes, and the discovery of their functions and mutual interactions. This article presents a methodology and a computer program called GenePath to support the discovery of gene function. GenePath uses mutant
Ivan Bratko; Adam Kuspa; John A Halter; Robert J Beck; Gad Shaulsky
The Discovery and New Frontiers (D&NF) programs are multi-project, uncoupled programs that currently comprise 13 missions in phases A through F. The ability to fly frequent science missions to explore the solar system is the primary measure of program success. The program office uses a Budget Analysis Tool to perform "what-if" analyses and compare mission scenarios to the current program budget, and rapidly forecast the programs ability to meet their launch rate requirements. The tool allows the user to specify the total mission cost (fixed year), mission development and operations profile by phase (percent total mission cost and duration), launch vehicle, and launch date for multiple missions. The tool automatically applies inflation and rolls up the total program costs (in real year dollars) for comparison against available program budget. Thus, the tool allows the user to rapidly and easily explore a variety of launch rates and analyze the effect of changes in future mission or launch vehicle costs, the differing development profiles or operational durations of a future mission, or a replan of a current mission on the overall program budget. Because the tool also reports average monthly costs for the specified mission profile, the development or operations cost profile can easily be validate against program experience for similar missions. While specifically designed for predicting overall program budgets for programs that develop and operate multiple missions concurrently, the basic concept of the tool (rolling up multiple, independently-budget lines) could easily be adapted to other applications.
Newhouse, Marilyn E.
Sugarcane is a highly productive crop used for centuries as the main source of sugar and recently to produce ethanol, a renewable bio-fuel energy source. There is increased interest in this crop due to the impending need to decrease fossil fuel usage. Sugarcane has a highly polyploid genome. Expressed sequence tag (EST) sequencing has significantly contributed to gene discovery and expression studies used to associate function with sugarcane genes. A significant amount of data exists on regulatory events controlling responses to herbivory, drought, and phosphate deficiency, which cause important constraints on yield and on endophytic bacteria, which are highly beneficial. The means to reduce drought, phosphate deficiency, and herbivory by the sugarcane borer have a negative impact on the environment. Improved tolerance for these constraints is being sought. Sugarcane's ability to accumulate sucrose up to 16% of its culm dry weight is a challenge for genetic manipulation. Genome-based technology such as cDNA microarray data indicates genes associated with sugar content that may be used to develop new varieties improved for sucrose content or for traits that restrict the expansion of the cultivated land. The genes can also be used as molecular markers of agronomic traits in traditional breeding programs.
Menossi, M.; Silva-Filho, M. C.; Vincentz, M.; Van-Sluys, M.-A.; Souza, G. M.
Sugarcane is a highly productive crop used for centuries as the main source of sugar and recently to produce ethanol, a renewable bio-fuel energy source. There is increased interest in this crop due to the impending need to decrease fossil fuel usage. Sugarcane has a highly polyploid genome. Expressed sequence tag (EST) sequencing has significantly contributed to gene discovery and expression studies used to associate function with sugarcane genes. A significant amount of data exists on regulatory events controlling responses to herbivory, drought, and phosphate deficiency, which cause important constraints on yield and on endophytic bacteria, which are highly beneficial. The means to reduce drought, phosphate deficiency, and herbivory by the sugarcane borer have a negative impact on the environment. Improved tolerance for these constraints is being sought. Sugarcane's ability to accumulate sucrose up to 16% of its culm dry weight is a challenge for genetic manipulation. Genome-based technology such as cDNA microarray data indicates genes associated with sugar content that may be used to develop new varieties improved for sucrose content or for traits that restrict the expansion of the cultivated land. The genes can also be used as molecular markers of agronomic traits in traditional breeding programs. PMID:18273390
Menossi, M; Silva-Filho, M C; Vincentz, M; Van-Sluys, M-A; Souza, G M
Analysis of expressed sequence tags (ESTs) constitutes a useful approach for gene identification that, in the case of human pathogens, might result in the identification of new targets for chemotherapy and vaccine development. As part of the Trypanosoma cruzi genome project, we have partially sequenced the 5* ends of 1,949 clones to generate ESTs. The clones were randomly selected from
RAMIRO E. VERDUN; NELSON DI PAOLO; TURAN P. URMENYI; EDSON RONDINELLI; ALBERTO C. C. FRASCH; DANIEL O. SANCHEZ
Amyotrophic lateral sclerosis (ALS) is the most common form of motor neuron disease (MND). It is currently incurable and treatment is largely limited to supportive care. Family history is associated with an increased risk of ALS, and many Mendelian causes have been discovered. However, most forms of the disease are not obviously familial. Recent advances in human genetics have enabled genome-wide analyses of single nucleotide polymorphisms (SNPs) that make it possible to study complex genetic contributions to human disease. Genome-wide SNP analyses require a large sample size and thus depend upon collaborative efforts to collect and manage the biological samples and corresponding data. Public availability of biological samples (such as DNA), phenotypic and genotypic data further enhances research endeavors. Here we discuss a large collaboration among academic investigators, government, and non-government organizations which has created a public repository of human DNA, immortalized cell lines, and clinical data to further gene discovery in ALS. This resource currently maintains samples and associated phenotypic data from 2332 MND subjects and 4692 controls. This resource should facilitate genetic discoveries which we anticipate will ultimately provide a better understanding of the biological mechanisms of neurodegeneration in ALS.
Gwinn, Katrina; Corriveau, Roderick A.; Mitsumoto, Hiroshi; Bednarz, Kate; Brown, Robert H.; Cudkowicz, Merit; Gordon, Paul H.; Hardy, John; Kasarskis, Edward J.; Kaufmann, Petra; Miller, Robert; Sorenson, Eric; Tandan, Rup; Traynor, Bryan J.; Nash, Josefina; Sherman, Alex; Mailman, Matthew D.; Ostell, James; Bruijn, Lucie; Cwik, Valerie; Rich, Stephen S.; Singleton, Andrew; Refolo, Larry; Andrews, Jaime; Zhang, Ran; Conwit, Robin; Keller, Margaret A.
The Archean Biosphere Drilling Project (ABDP), an international scientific drilling project involving scientists from the USA, Australia and Japan, was initiated in Pilbara Craton, Western Australia. The scientific objectives of the ABDP are the identification of microfossils and biomarkers, the clarification of geochemical environment of the early Earth, and the understanding of geophysical contribution to the co-evolution of life and environment. Through 2003 and 2004 activities, we have drilled 150 _| 300 m deep holes to recover _gfresh_h (modern weathering-free) geologic formations that range from 3.5 to 2.7 Ga in age. The drilling targets were: (1) 3.46 Ga Towers Formation, (2) mid-Archean Mosquito Formation, (3) 2.77 Ga Mt Roe Basalt, (4) 2.76 Ga Tumbiana Formation, (5) 2.74 Ga Hardey Formation. The initial investigations on the ABDP drill cores by Japanese members have already produced many exciting and interesting data and observations. 3.46 Ga Marble Bar Jasper could provide clues to the argument about the early photosynthetic cyanobacteria that have produced free oxygen and have evolved the oxygen level on the earth. There have been many ideas how the hematite in jasper was formed. Our most important discoveries are the confirmations that hematite, magnetite and siderite precipitated separately as primary minerals, and that there is a remaining texture which resembles microfossil using FE-SEM, ESCA, Laser-Raman and cathodoluminescence. Taking into account the carbon isotopic ratios of remains from _|25 to _|40 permil, these iron oxides might be biogenic. We need to identify the iron bacteria in detail to deduce the early earth_fs surface environment. In addition, the black shale of Apex Basalt overlying Marble Bar Jasper contains organic carbon from 0.7 to 5.2 percent, and the carbon isotopic ratio of which is from -26 to -30 per mil, suggesting that various microbes inhabited in the early Archean ocean. 2.77 Ga Mt Roe Basalt, which is composed of basaltic lavas interbedded with tuffs, clastic sediment and minor evaporites, well preserves the primary biogeochemical, geochemical and geophysical phenomena. The discovery of black shale with sulfide nodules is worthy of special attention. Our study suggests that the following succession of events occurred more than once, (1) eruption of amygdaloidal basaltic lava followed by eruption of tuff into shallower water, (2) deposition of sandstone and black shale, and (3) concurrent hydrothermal activity with reduced fluids altered the tuff and the lowermost clastic sediments. The extremely light carbon isotopic ratios suggest the activities of methanogene in hydrothermal veinlets and methanotroph in black shale. In addition, the wide range of sulfur isotopic ratio in black shale suggests activity of co-existing sulfate-reducing bacteria in the black shale. Occasional presence of sandstone, especially in late stage of clastic sedimentation, suggests the sedimentation near coastal environment. Stromatolite-like microtexture in the sandstone suggests the existence of photosynthetic microbes, which is supported by heavy carbon isotopic ratios (up to _|25 permil) and by the signals of hopanoids biomarker. The three dimensional geochemical data suggest the existence of marine environment from oxic at shallow site to euxinic at the deeper site. Paleomagnetic analyses suggest the episodic initiation of the earth's dynamo at about 3.5 Ga and the increase of it's momentum since at least 2.77 Ga. Taking into account the biogeochemical evidences confirmed from other ABDP cores, the increase of geomagnetic intensity might have accelerated the diversification of early life.
Expressed sequence tags (ESTs) are providing a new approach to gene discovery in plant secondary metabolism. Steviarebaudiana Bert. leaves produce high concentrations of diterpene steviol glycosides and should be a rich source of transcripts involved in diterpene synthesis. In order to create a resource for gene discovery and increase our understanding of steviol glycoside biosynthesis, we sequenced 5548 ESTs from
J. E. Brandle; A. Richman; A. K. Swanson; B. P. Chapman
...DEPARTMENT OF AGRICULTURE Forest Service Lake Tahoe Basin Management Unit, California...Resort Epic Discovery Project AGENCY: Lake Tahoe Basin Management Unit, Forest Service...expanding needs and expectations of visitors to Lake Tahoe, better support the...
This final report describes accomplishments of Project Discovery, a 3-year project in Kentucky to assist teachers in creating an innovative learning environment for gifted and talented primary-aged children. Major goals focused on and achieved by the project included: (1) increasing the percentage of disadvantaged students identified as gifted in…
Luvisi, Christopher L.
Uncovering the underlying genetic component of any disease is key to the understanding of its pathophysiology and may open\\u000a new avenues for development of therapeutic strategies and biomarkers. In the past several years, there has been an explosion\\u000a of genome-wide association studies (GWAS) resulting in the discovery of novel candidate genes conferring risk for complex\\u000a diseases, including neurodegenerative diseases. Despite
Despite intensive research over many years, the treatment of schizophrenia remains a major health issue. Current and emerging treatments for schizophrenia are based upon the classical dopamine and glutamate hypotheses of disease. Existing first and second generation antipsychotic drugs based upon the dopamine hypothesis are limited by their inability to treat all symptom domains and their undesirable side effect profiles. Third generation drugs based upon the glutamate hypothesis of disease are currently under evaluation but are more likely to be used as add on treatments. Hence there is a large unmet clinical need. A major challenge in neuropsychiatric disease research is the relatively limited knowledge of disease mechanisms. However, as our understanding of the genetic causes of the disease evolves, novel strategies for the development of improved therapeutic agents will become apparent. In this review we consider the current status of knowledge of the genetic basis of schizophrenia, including methods for identifying genetic variants associated with the disorder and how they impact on gene function. Although the genetic architecture of schizophrenia is complex, some targets amenable to pharmacological intervention can be discerned. We conclude that many challenges lie ahead but the stratification of patients according to biobehavioural constructs that cross existing disease classifications but with common genetic and neurobiological bases, offer opportunities for new approaches to effective drug discovery. PMID:24561132
Winchester, Catherine L; Pratt, Judith A; Morris, Brian J
A key goal of molecular/cell biology/biotechnology is to identify essential genes in virtually every physiological process to uncover basic mechanisms of cell function and to establish potential targets of drug therapy combating human disease. The current article describes a semester-long, project-oriented molecular/cellular/biotechnology laboratory providing students, within a framework of bone cell biology, with a modern approach to gene discovery. Students are introduced to the topics of bone cells, bone synthesis, bone resorption, and osteoporosis. They then review the theory of microchip gene arrays, and study microchip array data generated during the differentiation of bone-resorbing osteoclasts in vitro. The class selects genes whose expression increases during osteoclastogenesis, and researches them in small groups using web-based bioinformatics tools. Students then go to a biotechnology company website to find and order siRNAs designed to “knockdown” expression of the gene of interest. Students then learn to transfect these siRNAs into osteoclasts, stimulate the cells to differentiate, assay osteoclast differentiation in vitro, and measure specific gene expression using real-time PCR and immunoblotting. Specific siRNA knockdown resulting in a decrease in osteoclastogenesis is indicative of a gene's physiological relevance. The results are analyzed statistically, and presented to the class in groups. In the past two years, students identified several genes essential for optimal osteoclast differentiation, including Myo1d. The students hypothesize that the myo1d protein functions in osteoclasts to deliver important proteins to the cell surface via vesicular transport along microfilaments. Student response to the new course was overwhelmingly positive.
Picco, Jenna; Clements, Meghan; Witwicka, Hanna; Yang, Meiheng; Hoey, Margaret T.; Odgren, Paul R.
The goal of our current consortium project is to launch a new era—functional genomics of poultry— by providing genomic resources (expressed sequence tags (EST) and DNA microarrays) and by examining global gene expression in target tissues of chickens. DNA mi- croarray analysis has been a fruitful strategy for the iden- tification of functional genes in several model organisms (i.e., human,
L. A. Cogburn; X. Wang; W. Carre; L. Rejto; T. E. Porter; S. E. Aggrey; J. Simon
Background Cholangiocarcinoma (CCA) – cancer of the bile ducts – is associated with chronic infection with the liver fluke, Opisthorchis viverrini. Despite being the only eukaryote that is designated as a 'class I carcinogen' by the International Agency for Research on Cancer, little is known about its genome. Results Approximately 5,000 randomly selected cDNAs from the adult stage of O. viverrini were characterized and accounted for 1,932 contigs, representing ~14% of the entire transcriptome, and, presently, the largest sequence dataset for any species of liver fluke. Twenty percent of contigs were assigned GO classifications. Abundantly represented protein families included those involved in physiological functions that are essential to parasitism, such as anaerobic respiration, reproduction, detoxification, surface maintenance and feeding. GO assignments were well conserved in relation to other parasitic flukes, however, some categories were over-represented in O. viverrini, such as structural and motor proteins. An assessment of evolutionary relationships showed that O. viverrini was more similar to other parasitic (Clonorchis sinensis and Schistosoma japonicum) than to free-living (Schmidtea mediterranea) flatworms, and 105 sequences had close homologues in both parasitic species but not in S. mediterranea. A total of 164 O. viverrini contigs contained ORFs with signal sequences, many of which were platyhelminth-specific. Examples of convergent evolution between host and parasite secreted/membrane proteins were identified as were homologues of vaccine antigens from other helminths. Finally, ORFs representing secreted proteins with known roles in tumorigenesis were identified, and these might play roles in the pathogenesis of O. viverrini-induced CCA. Conclusion This gene discovery effort for O. viverrini should expedite molecular studies of cholangiocarcinogenesis and accelerate research focused on developing new interventions, drugs and vaccines, to control O. viverrini and related flukes.
Laha, Thewarach; Pinlaor, Porntip; Mulvenna, Jason; Sripa, Banchob; Sripa, Manop; Smout, Michael J; Gasser, Robin B; Brindley, Paul J; Loukas, Alex
To accelerate gene discovery and facilitate genetic mapping in the protozoan parasite Toxoplasma gondii, we have generated >7000 new ESTs from the 58 ends of randomly selected tachyzoite cDNAs. Comparison of the ESTs with the existing gene databases identified possible functions for more than 500 new T. gondii genes by virtue of sequence motifs shared with conserved protein families, including
James W. Ajioka; John C. Boothroyd; Brian P. Brunk; Adrian Hehl; Ledeana Hillier; Ian D. Manger; Marco Marra; G. Christian Overton; David S. Roos; Kiew-Lian Wan; Robert Waterston; L. David Sibley
The Gene Ontology (GO) project (http:\\/\\/www. geneontology.org) develops and uses a set of struc- tured, controlled vocabularies for community use in annotating genes, gene products and sequen- ces (also see http:\\/\\/song.sourceforge.net\\/). The GO Consortium continues to improve to the vocabulary content, reflecting the impact of several novel mech- anisms of incorporating community input. A growing number of model organism databases
The first genome survey sequencing of the rodent malaria parasite Plasmodium chabaudi is presented here. In 766 sequences, 131 putative gene sequences have been identified by sequence similarity database searches. Further, 7 potential gene families, four of which have not previously been described, were discovered. These genes may be important in understanding the biology of malaria, as well as offering
Christoph S. Janssen; Michael P. Barrett; Daniel Lawson; Michael A. Quail; David Harris; Sharen Bowman; R. Stephen Phillips; C. Michael R. Turner
Gene annotation underpins genome science. Most often protein coding sequence is inferred from the genome based on transcript evidence and computational predictions. While generally correct, gene models suffer from errors in reading frame, exon border definition, and exon identification. To ascertain the error rate of Arabidopsis thaliana gene models, we isolated proteins from a sample of Arabidopsis tissues and determined the amino acid sequences of 144,079 distinct peptides by tandem mass spectrometry. The peptides corresponded to 1 or more of 3 different translations of the genome: a 6-frame translation, an exon splice-graph, and the currently annotated proteome. The majority of the peptides (126,055) resided in existing gene models (12,769 confirmed proteins), comprising 40% of annotated genes. Surprisingly, 18,024 novel peptides were found that do not correspond to annotated genes. Using the gene finding program AUGUSTUS and 5,426 novel peptides that occurred in clusters, we discovered 778 new protein-coding genes and refined the annotation of an additional 695 gene models. The remaining 13,449 novel peptides provide high quality annotation (>99% correct) for thousands of additional genes. Our observation that 18,024 of 144,079 peptides did not match current gene models suggests that 13% of the Arabidopsis proteome was incomplete due to approximately equal numbers of missing and incorrect gene models.
Castellana, Natalie E.; Payne, Samuel H.; Shen, Zhouxin; Stanke, Mario; Bafna, Vineet; Briggs, Steven P.
As genomic sequences become easier to acquire, shotgun proteomics will play an increasingly important role in genome annotation. With proteomics, researchers can confirm and revise existing genome annotations and discover completely new genes. Proteomic-based de novo gene discovery should be especially useful for sets of genes with characteristics that make them difficult to predict with gene-finding algorithms. Here, we report the proteomic discovery of 19 previously unannotated genes encoding seminal fluid proteins (Sfps) that are transferred from males to females during mating in Drosophila. Using bioinformatics, we detected putative orthologs of these genes, as well as 19 others detected by the same method in a previous study, across several related species. Gene expression analysis revealed that nearly all predicted orthologs are transcribed and that most are expressed in a male-specific or male-biased manner. We suggest several reasons why these genes escaped computational prediction. Like annotated Sfps, many of these new proteins show a pattern of adaptive evolution, consistent with their potential role in influencing male sperm competitive ability. However, in contrast to annotated Sfps, these new genes are shorter, have a higher rate of nonsynonymous substitution, and have a markedly lower GC content in coding regions. Our data demonstrate the utility of applying proteomic gene discovery methods to a specific biological process and provide a more complete picture of the molecules that are critical to reproductive success in Drosophila.
Findlay, Geoffrey D.; MacCoss, Michael J.; Swanson, Willie J.
The concept of discovery learning is exemplified in this account of four students and their instructor who began and developed a nurse-run clinic for the homeless in a community health project. The students were registered nurses returning for their bachelor's degrees. They experienced frustration at learning the difference between care based on their assessments of patients' needs and care geared to clients' assessments of desired interventions. The journals that they kept reveal self-discovery as well as new respect for other humans. In addition, a new type of community care emerged, which gives all indications of surviving. PMID:2780501
Turner, S L; Bauer, G; McNair, E; McNutt, B; Walker, W
Estrogen has a profound impact on human physiology and affects numerous genes. The classical estrogen reaction is mediated by its receptors (ERs), which bind to the estrogen response elements (EREs) in target gene's promoter region. Due to tedious and expensive experiments, a limited number of human genes are functionally well characterized. It is still unclear how many and which human genes respond to estrogen treatment. We propose a simple, economic, yet effective computational method to predict a subclass of estrogen responsive genes. Our method relies on the similarity of ERE frames across different promoters in the human genome. Matching ERE frames of a test set of 60 known estrogen responsive genes to the collection of over 18?000 human promoters, we obtained 604 candidate genes. Evaluating our result by comparison with the published microarray data and literature, we found that more than half (53.6%, 324/604) of predicted candidate genes are responsive to estrogen. We believe this method can significantly reduce the number of testing potential estrogen target genes and provide functional clues for annotating part of genes that lack functional information.
Tang, Suisheng; Tan, Sin Lam; Ramadoss, Suresh Kumar; Kumar, Arun Prashanth; Tang, Man-Hung Eric; Bajic, Vladimir B.
Advances in genome sequencing technologies have begun to revolutionize neurogenetics, allowing the full spectrum of genetic variation to be better understood in relation to disease. Exome sequencing of hundreds to thousands of samples from patients with autism spectrum disorder, intellectual disability, epilepsy and schizophrenia provides strong evidence of the importance of de novo and gene-disruptive events. There are now several hundred new candidate genes and targeted resequencing technologies that allow screening of dozens of genes in tens of thousands of individuals with high specificity and sensitivity. The decision of which genes to pursue depends on many factors, including recurrence, previous evidence of overlap with pathogenic copy number variants, the position of the mutation in the protein, the mutational burden among healthy individuals and membership of the candidate gene in disease-implicated protein networks. We discuss these emerging criteria for gene prioritization and the potential impact on the field of neuroscience. PMID:24866042
Hoischen, Alexander; Krumm, Niklas; Eichler, Evan E
Inherited retinal dystrophies are Mendelian neurodegenerative conditions classified as pigmentary retinopathies, macular dystrophies and others. Over a 21-year period, from 1990 to 2011, we have screened in Montpellier 107 genes in 609 families and have identified a causal mutation in 68.5% of them. Following a gene candidate approach, we established that RPE65, the isomerohydrolase of the visual cycle, is responsible for severe childhood blindness (Leber congenital amaurosis or early onset retinal dystrophy). In an ongoing study, we screened the genes in a series of 283 families with dominant retinitis pigmentosa and we have estimated that 80% of the families have a mutation in a known gene. A similar study is currently undergoing for autosomal recessive retinitis pigmentosa. Finally, we have identified IMPG1 as a responsible gene for rare cases of macular vitelliform dystrophy with a dominant or recessive inheritance. PMID:24702842
Hamel, Christian P
Plants utilize carbon by partitioning the reduced carbon obtained through photosynthesis into different compartments and into different chemistries within a cell and subsequently allocating such carbon to sink tissues throughout the plant. Since the phytohormones auxin and cytokinin are known to influence sink strength in tissues such as roots (Skoog & Miller 1957, Nordstrom et al. 2004), we hypothesized that altering the expression of genes that regulate auxin-mediated (e.g., AUX/IAA or ARF transcription factors) or cytokinin-mediated (e.g., RR transcription factors) control of root growth and development would impact carbon allocation and partitioning belowground (Fig. 1 - Renewal Proposal). Specifically, the ARF, AUX/IAA and RR transcription factor gene families mediate the effects of the growth regulators auxin and cytokinin on cell expansion, cell division and differentiation into root primordia. Invertases (IVR), whose transcript abundance is enhanced by both auxin and cytokinin, are critical components of carbon movement and therefore of carbon allocation. Thus, we initiated comparative genomic studies to identify the AUX/IAA, ARF, RR and IVR gene families in the Populus genome that could impact carbon allocation and partitioning. Bioinformatics searches using Arabidopsis gene sequences as queries identified regions with high degrees of sequence similarities in the Populus genome. These Populus sequences formed the basis of our transgenic experiments. Transgenic modification of gene expression involving members of these gene families was hypothesized to have profound effects on carbon allocation and partitioning.
DAVIS J M
Background Horned beetles, in particular in the genus Onthophagus, are important models for studies on sexual selection, biological radiations, the origin of novel traits, developmental plasticity, biocontrol, conservation, and forensic biology. Despite their growing prominence as models for studying both basic and applied questions in biology, little genomic or transcriptomic data are available for this genus. We used massively parallel pyrosequencing (Roche 454-FLX platform) to produce a comprehensive EST dataset for the horned beetle Onthophagus taurus. To maximize sequence diversity, we pooled RNA extracted from a normalized library encompassing diverse developmental stages and both sexes. Results We used 454 pyrosequencing to sequence ESTs from all post-embryonic stages of O. taurus. Approximately 1.36 million reads assembled into 50,080 non-redundant sequences encompassing a total of 26.5 Mbp. The non-redundant sequences match over half of the genes in Tribolium castaneum, the most closely related species with a sequenced genome. Analyses of Gene Ontology annotations and biochemical pathways indicate that the O. taurus sequences reflect a wide and representative sampling of biological functions and biochemical processes. An analysis of sequence polymorphisms revealed that SNP frequency was negatively related to overall expression level and the number of tissue types in which a given gene is expressed. The most variable genes were enriched for a limited number of GO annotations whereas the least variable genes were enriched for a wide range of GO terms directly related to fitness. Conclusions This study provides the first large-scale EST database for horned beetles, a much-needed resource for advancing the study of these organisms. Furthermore, we identified instances of gene duplications and alternative splicing, useful for future study of gene regulation, and a large number of SNP markers that could be used in population-genetic studies of O. taurus and possibly other horned beetles.
Insertional mutagenesis was applied to Cryptococcus neoformans to identify genes associated with virulence attributes. Using biolistic transformation, we generated 4,300 nourseothricin (NAT)-resistant strains, of which 590 exhibited stable resistance. We focused on mutants with defects in established virulence factors and identified two with reduced growth at 37°C, four with reduced production of the antioxidant pigment melanin, and two with an increased sensitivity to nitric oxide (NO). The NAT insertion and mutant phenotypes were genetically linked in five of eight mutants, and the DNA flanking the insertions was characterized. For the strains with altered growth at 37°C and altered melanin production, mutations were in previously uncharacterized genes, while the two NO-sensitive strains bore insertions in the flavohemoglobin gene FHB1, whose product counters NO stress. Because of the frequent instability of nourseothricin resistance associated with biolistic transformation, Agrobacterium-mediated transformation was tested. This transkingdom DNA delivery approach produced 100% stable nourseothricin-resistant transformants, and three melanin-defective strains were identified from 576 transformants, of which 2 were linked to NAT in segregation analysis. One of these mutants contained a T-DNA insertion in the promoter of the LAC1 (laccase) gene, which encodes a key enzyme required for melanin production, while the second contained an insertion in the promoter of the CLC1 gene, encoding a voltage-gated chloride channel. Clc1 and its homologs are required for ion homeostasis, and in their absence Cu+ transport into the secretory pathway is compromised, depriving laccase and other Cu+-dependent proteins of their essential cofactor. The NAT resistance cassette was optimized for cryptococcal codon usage and GC content and was then used to disrupt a mitogen-activated protein kinase gene, a predicted gene, and two putative chloride channel genes to analyze their contributions to fungal physiology. Our findings demonstrate that both insertional mutagenesis methods can be applied to gene identification, but Agrobacterium-mediated transformation is more efficient and generates exclusively stable insertion mutations.
Idnurm, Alexander; Reedy, Jennifer L.; Nussbaum, Jesse C.; Heitman, Joseph
Purpose. To facilitate the identification of genes associated with cataract and other ocular defects, the authors developed and validated a computational tool termed iSyTE (integrated Systems Tool for Eye gene discovery; http://bioinformatics.udel.edu/Research/iSyTE). iSyTE uses a mouse embryonic lens gene expression data set as a bioinformatics filter to select candidate genes from human or mouse genomic regions implicated in disease and to prioritize them for further mutational and functional analyses. Methods. Microarray gene expression profiles were obtained for microdissected embryonic mouse lens at three key developmental time points in the transition from the embryonic day (E)10.5 stage of lens placode invagination to E12.5 lens primary fiber cell differentiation. Differentially regulated genes were identified by in silico comparison of lens gene expression profiles with those of whole embryo body (WB) lacking ocular tissue. Results. Gene set analysis demonstrated that this strategy effectively removes highly expressed but nonspecific housekeeping genes from lens tissue expression profiles, allowing identification of less highly expressed lens disease–associated genes. Among 24 previously mapped human genomic intervals containing genes associated with isolated congenital cataract, the mutant gene is ranked within the top two iSyTE-selected candidates in approximately 88% of cases. Finally, in situ hybridization confirmed lens expression of several novel iSyTE-identified genes. Conclusions. iSyTE is a publicly available Web resource that can be used to prioritize candidate genes within mapped genomic intervals associated with congenital cataract for further investigation. Extension of this approach to other ocular tissue components will facilitate eye disease gene discovery.
Lachke, Salil A.; Ho, Joshua W. K.; Kryukov, Gregory V.; O'Connell, Daniel J.; Aboukhalil, Anton; Bulyk, Martha L.; Park, Peter J.
Insertional mutagenesis was applied to Cryptococcus neoformans to identify genes associated with virulence attributes. Using biolistic transformation, we generated 4,300 nourseothricin (NAT)-resistant strains, of which 590 exhibited stable resistance. We focused on mutants with defects in established virulence factors and identified two with reduced growth at 37°C, four with reduced production of the antioxidant pigment melanin, and two with an
Alexander Idnurm; Jennifer L. Reedy; Jesse C. Nussbaum; Joseph Heitman
Background Technological leaps in genome sequencing have resulted in a surge in discovery of human disease genes. These discoveries have led to increased clarity on the molecular pathology of disease and have also demonstrated considerable overlap in the genetic roots of human diseases. In light of this large genetic overlap, we tested whether cross-disease research approaches lead to faster, more impactful discoveries. Methods We leveraged several gene-disease association databases to calculate a Mutual Citation Score (MCS) for 10,853 pairs of genetically related diseases to measure the frequency of cross-citation between research fields. To assess the importance of cooperative research, we computed an Individual Disease Cooperation Score (ICS) and the average publication rate for each disease. Results For all disease pairs with one gene in common, we found that the degree of genetic overlap was a poor predictor of cooperation (r2=0.3198) and that the vast majority of disease pairs (89.56%) never cited previous discoveries of the same gene in a different disease, irrespective of the level of genetic similarity between the diseases. A fraction (0.25%) of the pairs demonstrated cross-citation in greater than 5% of their published genetic discoveries and 0.037% cross-referenced discoveries more than 10% of the time. We found strong positive correlations between ICS and publication rate (r2=0.7931), and an even stronger correlation between the publication rate and the number of cross-referenced diseases (r2=0.8585). These results suggested that cross-disease research may have the potential to yield novel discoveries at a faster pace than singular disease research. Conclusions Our findings suggest that the frequency of cross-disease study is low despite the high level of genetic similarity among many human diseases, and that collaborative methods may accelerate and increase the impact of new genetic discoveries. Until we have a better understanding of the taxonomy of human diseases, cross-disease research approaches should become the rule rather than the exception.
Similarity measurement is one of the most important stages in the process of cancer discovery from gene expression data. Traditional distance functions, such as the Euclidean distance, the correlation coefficient measure, the cosine distance, and so on, are selected to quantify the similarity between two cancer samples. However, these measures do not take into account the properties of cancer samples and do not consider the relationships among the genes in gene expression data. In order to explore the properties of cancer samples and the relationships among genes, we design a new similarity measure called representative distance (RD) to identify cancer samples in gene expression data. Specifically, RD does not compute the distance between two cancer samples using all the genes, but only calculates the similarity using representative genes selected by the affinity propagation algorithm. Then, a similarity matrix is constructed based on the representative distance. Finally, the spectral clustering algorithm is adopted to partition the similarity matrix, and discover the biological meaningful samples. To our knowledge, this is the first time in which the representative distance is applied to class discovery for gene expression data. Experiments on real cancer datasets indicate that our similarity measure can i) outperform most of the traditional distance measures, ii) identify cancer samples correctly in most of the datasets. PMID:22893451
Yu, Zhiwen; You, Jane; Li, Le; Wong, Hau-San; Han, Guoqiang
Home News and Events Multimedia Library Videos Discovery and Functional Characterization of Recurrent Gene Fusions from 4,932 Primary Tumor Transcr Discovery and Functional Characterization of Recurrent Gene Fusions from 4,932 Primary Tumor Transcriptomes
The Arctic Research Mapping Application (ARMAP) is a suite of online applications and data services that support Arctic science by providing project tracking information (who's doing what, when and where in the region) for United States Government funded projects. Development of an interagency standard for tracking discovery level metadata for projects has been achieved through collaboration with the Alaska Data Integration work group. The US National Science Foundation plus 17 other agencies and organizations have adopted the standard with several entities successfully implementing XML based REST webservices. With ARMAP's web mapping applications and data services (http://armap.org), users can search for research projects by location, year, funding program, keyword, investigator, and discipline, among other variables. Key information about each project is displayed within the application with links to web pages that provide additional information. The ARMAP 2D mapping application has been significantly enhanced to include support for multiple projections, improved base maps, additional reference data layers, and optimization for better performance. In 2013, ship tracks for US National Science Foundation supported vessel based surveys and health care facilities have been included in ARMAP. The additional functionality of this tool will increase awareness of projects funded by numerous entities in the Arctic, enhance coordination for logistics support, help identify geographic gaps in research efforts and potentially foster more collaboration amongst researchers working in the region. Additionally, ARMAP can be used to demonstrate the effects of the International Polar Year (IPY) on funding of different research disciplines by the U.S. Government.
Gaylord, A. G.; Kassin, A.; Cody, R. P.; Manley, W. F.; Dover, M.; Score, R.; Garcia-Lavigne3, D.; Tweedie, C. E.
Background Perkinsus marinus, a protozoan parasite of the eastern oyster Crassostrea virginica, has devastated natural and farmed oyster populations along the Atlantic and Gulf coasts of the United States. It is classified as a member of the Perkinsozoa, a recently established phylum considered close to the ancestor of ciliates, dinoflagellates, and apicomplexans, and a key taxon for understanding unique adaptations (e.g. parasitism) within the Alveolata. Despite intense parasite pressure, no disease-resistant oysters have been identified and no effective therapies have been developed to date. Results To gain insight into the biological basis of the parasite's virulence and pathogenesis mechanisms, and to identify genes encoding potential targets for intervention, we generated >31,000 5' expressed sequence tags (ESTs) derived from four trophozoite libraries generated from two P. marinus strains. Trimming and clustering of the sequence tags yielded 7,863 unique sequences, some of which carry a spliced leader. Similarity searches revealed that 55% of these had hits in protein sequence databases, of which 1,729 had their best hit with proteins from the chromalveolates (E-value ? 1e-5). Some sequences are similar to those proven to be targets for effective intervention in other protozoan parasites, and include not only proteases, antioxidant enzymes, and heat shock proteins, but also those associated with relict plastids, such as acetyl-CoA carboxylase and methyl erythrithol phosphate pathway components, and those involved in glycan assembly, protein folding/secretion, and parasite-host interactions. Conclusions Our transcriptome analysis of P. marinus, the first for any member of the Perkinsozoa, contributes new insight into its biology and taxonomic position. It provides a very informative, albeit preliminary, glimpse into the expression of genes encoding functionally relevant proteins as potential targets for chemotherapy, and evidence for the presence of a relict plastid. Further, although P. marinus sequences display significant similarity to those from both apicomplexans and dinoflagellates, the presence of trans-spliced transcripts confirms the previously established affinities with the latter. The EST analysis reported herein, together with the recently completed sequence of the P. marinus genome and the development of transfection methodology, should result in improved intervention strategies against dermo disease.
The Research Project Knowledge Base (RPKB) is currently being designed and will be implemented in a manner that is fully compatible and interoperable with enterprise architecture tools developed to support NASA's Applied Sciences Program. Through user needs assessment, collaboration with Stennis Space Center, Goddard Space Flight Center, and NASA's DEVELOP Staff personnel insight to information needs for the RPKB were gathered from across NASA scientific communities of practice. To enable efficient, consistent, standard, structured, and managed data entry and research results compilation a prototype RPKB has been designed and fully integrated with the existing NASA Earth Science Systems Components database. The RPKB will compile research project and keyword information of relevance to the six major science focus areas, 12 national applications, and the Global Change Master Directory (GCMD). The RPKB will include information about projects awarded from NASA research solicitations, project investigator information, research publications, NASA data products employed, and model or decision support tools used or developed as well as new data product information. The RPKB will be developed in a multi-tier architecture that will include a SQL Server relational database backend, middleware, and front end client interfaces for data entry. The purpose of this project is to intelligently harvest the results of research sponsored by the NASA Applied Sciences Program and related research program results. We present various approaches for a wide spectrum of knowledge discovery of research results, publications, projects, etc. from the NASA Systems Components database and global information systems and show how this is implemented in SQL Server database. The application of knowledge discovery is useful for intelligent query answering and multiple-layered database construction. Using advanced EA tools such as the Earth Science Architecture Tool (ESAT), RPKB will enable NASA and partner agencies to efficiently identify the significant results for new experiment directions and principle investigators to formulate experiment directions for new proposals.
Dabiru, L.; O'Hara, C. G.; Shaw, D.; Katragadda, S.; Anderson, D.; Kim, S.; Shrestha, B.; Aanstoos, J.; Frisbie, T.; Policelli, F.; Keblawi, N.
In the framework of the Magnetism in Massive Stars (MiMeS) project, a HARPSpol Large Program at the 3.6m-ESO telescope has recently started to collect high-resolution spectropolarimetric data of a large number of Southern massive OB stars in the field of the Galaxy and in many young clusters and associations. In this contribution, we present details of the HARPSpol survey, the first HARPSpol discoveries of magnetic fields in massive stars, and the magnetic properties of two previously known magnetic stars.
Alecian, E.; Peralta, R.; Oksala, M. E.; Neiner, C.
Malignant cell transformation commonly results in the deregulation of thousands of cellular genes, an observation that suggests a complex biological process and an inherently challenging scenario for the development of effective cancer interventions. To better define the genes/pathways essential to regulating the malignant phenotype, we recently described a novel strategy based on the cooperative nature of carcinogenesis that focuses on genes synergistically deregulated in response to cooperating oncogenic mutations. These so-called 'cooperation response genes' (CRGs) are highly enriched for genes critical for the cancer phenotype, thereby suggesting their causal role in the malignant state. Here, we show that CRGs have an essential role in drug-mediated anticancer activity and that anticancer agents can be identified through their ability to antagonize the CRG expression profile. These findings provide proof-of-concept for the use of the CRG signature as a novel means of drug discovery with relevance to underlying anticancer drug mechanisms. PMID:22964631
Sampson, E R; McMurray, H R; Hassane, D C; Newman, L; Salzman, P; Jordan, C T; Land, H
BACKGROUND: MicroRNAs (miRNAs) are endogenous non-protein-coding RNA genes which exist in a wide variety of organisms, including animals, plants, virus and even unicellular organisms. Medaka (Oryzias latipes) is a useful model organism among vertebrate animals. However, no medaka miRNAs have been investigated systematically. It is beneficial to conduct a genome-wide miRNA discovery study using the next generation sequencing (NGS) technology,
Sung-Chou Li; Wen-Ching Chan; Meng-Ru Ho; Kuo-Wang Tsai; Ling-Yueh Hu; Chun-Hung Lai; Chun-Nan Hsu; Pung-Pung Hwang; Wen-chang Lin
Accumulated transcriptome data can be used to investigate regulatory networks of genes involved in various biological systems. Co-expression analysis data sets generated from comprehensively collected transcriptome data sets now represent efficient resources that are capable of facilitating the discovery of genes with closely correlated expression patterns. In order to construct a co-expression network for barley, we analyzed 45 publicly available experimental series, which are composed of 1,347 sets of GeneChip data for barley. On the basis of a gene-to-gene weighted correlation coefficient, we constructed a global barley co-expression network and classified it into clusters of subnetwork modules. The resulting clusters are candidates for functional regulatory modules in the barley transcriptome. To annotate each of the modules, we performed comparative annotation using genes in Arabidopsis and Brachypodium distachyon. On the basis of a comparative analysis between barley and two model species, we investigated functional properties from the representative distributions of the gene ontology (GO) terms. Modules putatively involved in drought stress response and cellulose biogenesis have been identified. These modules are discussed to demonstrate the effectiveness of the co-expression analysis. Furthermore, we applied the data set of co-expressed genes coupled with comparative analysis in attempts to discover potentially Triticeae-specific network modules. These results demonstrate that analysis of the co-expression network of the barley transcriptome together with comparative analysis should promote the process of gene discovery in barley. Furthermore, the insights obtained should be transferable to investigations of Triticeae plants. The associated data set generated in this analysis is publicly accessible at http://coexpression.psc.riken.jp/barley/.
Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo
Musa (banana and plantain) is an important genus for the global export market and in local markets where it provides staple food for approximately 400 million people. Hybridization and polyploidization of several (sub)species, combined with vegetative propagation and human selection have produced a complex genetic history. We describe the application of the Ecotilling method for the discovery and characterization of nucleotide polymorphisms in diploid and polyploid accessions of Musa. We discovered over 800 novel alleles in 80 accessions. Sequencing and band evaluation shows Ecotilling to be a robust and accurate platform for the discovery of polymorphisms in homologous and homeologous gene targets. In the process of validating the method, we identified two single nucleotide polymorphisms that may be deleterious for the function of a gene putatively important for phototropism. Evaluation of heterozygous polymorphism and haplotype blocks revealed a high level of nucleotide diversity in Musa accessions. We further applied a strategy for the simultaneous discovery of heterozygous and homozygous polymorphisms in diploid accessions to rapidly evaluate nucleotide diversity in accessions of the same genome type. This strategy can be used to develop hypotheses for inheritance patterns of nucleotide polymorphisms within and between genome types. We conclude that Ecotilling is suitable for diversity studies in Musa, that it can be considered for functional genomics studies and as tool in selecting germplasm for traditional and mutation breeding approaches. PMID:20589365
Till, Bradley J; Jankowicz-Cieslak, Joanna; Sági, László; Huynh, Owen A; Utsushi, Hiroe; Swennen, Rony; Terauchi, Ryohei; Mba, Chikelu
Musa (banana and plantain) is an important genus for the global export market and in local markets where it provides staple food for approximately 400 million people. Hybridization and polyploidization of several (sub)species, combined with vegetative propagation and human selection have produced a complex genetic history. We describe the application of the Ecotilling method for the discovery and characterization of nucleotide polymorphisms in diploid and polyploid accessions of Musa. We discovered over 800 novel alleles in 80 accessions. Sequencing and band evaluation shows Ecotilling to be a robust and accurate platform for the discovery of polymorphisms in homologous and homeologous gene targets. In the process of validating the method, we identified two single nucleotide polymorphisms that may be deleterious for the function of a gene putatively important for phototropism. Evaluation of heterozygous polymorphism and haplotype blocks revealed a high level of nucleotide diversity in Musa accessions. We further applied a strategy for the simultaneous discovery of heterozygous and homozygous polymorphisms in diploid accessions to rapidly evaluate nucleotide diversity in accessions of the same genome type. This strategy can be used to develop hypotheses for inheritance patterns of nucleotide polymorphisms within and between genome types. We conclude that Ecotilling is suitable for diversity studies in Musa, that it can be considered for functional genomics studies and as tool in selecting germplasm for traditional and mutation breeding approaches. Electronic supplementary material The online version of this article (doi:10.1007/s00122-010-1395-5) contains supplementary material, which is available to authorized users.
Jankowicz-Cieslak, Joanna; Sagi, Laszlo; Huynh, Owen A.; Utsushi, Hiroe; Swennen, Rony; Terauchi, Ryohei; Mba, Chikelu
Female infertility syndromes are among the most prevalent chronic health disorders in women, but their genetic basis remains unknown because of uncertainty regarding the number and identity of ovarian factors controlling the assembly, preservation, and maturation of ovarian follicles. To systematically discover ovarian fertility genes en masse, we employed a mouse model (Foxo3) in which follicles are assembled normally but then undergo synchronous activation. We developed a microarray-based approach for the systematic discovery of tissue-specific genes and, by applying it to Foxo3 ovaries and other samples, defined a surprisingly large set of ovarian factors (n = 348, ?1% of the mouse genome). This set included the vast majority of known ovarian factors, 44% of which when mutated produce female sterility phenotypes, but most were novel. Comparative profiling of other tissues, including microdissected oocytes and somatic cells, revealed distinct gene classes and provided new insights into oogenesis and ovarian function, demonstrating the utility of our approach for tissue-specific gene discovery. This study will thus facilitate comprehensive analyses of follicle development, ovarian function, and female infertility.
Gallardo, Teresa D.; John, George B.; Shirley, Lane; Contreras, Cristina M.; Akbay, Esra A.; Haynie, J. Marshall; Ward, Samuel E.; Shidler, Meredith J.; Castrillon, Diego H.
The wealth of genomic technologies has enabled biologists to rapidly ascribe phenotypic characters to biological substrates. Central to effective biological investigation is the operational definition of the process under investigation. We propose an elucidation of categories of biological characters, including disease relevant traits, based on natural endogenous processes and experimentally observed biological networks, pathways and systems rather than on externally manifested constructs and current semantics such as disease names and processes. The Ontological Discovery Environment (ODE) is an Internet accessible resource for the storage, sharing, retrieval and analysis of phenotype-centered genomic data sets across species and experimental model systems. Any type of data set representing gene-phenotype relationships, such quantitative trait loci (QTL) positional candidates, literature reviews, microarray experiments, ontological or even meta-data, may serve as inputs. To demonstrate a use case leveraging the homology capabilities of ODE and its ability to synthesize diverse data sets, we conducted an analysis of genomic studies related to alcoholism. The core of ODE’s gene-set similarity, distance and hierarchical analysis is the creation of a bipartite network of gene-phenotype relations, a unique discrete graph approach to analysis that enables set-set matching of non-referential data. Gene sets are annotated with several levels of metadata, including community ontologies, while gene set translations compare models across species. Computationally derived gene sets are integrated into hierarchical trees based on gene-derived phenotype interdependencies. Automated set identifications are augmented by statistical tools which enable users to interpret the confidence of modeled results. This approach allows data integration and hypothesis discovery across multiple experimental contexts, regardless of the face similarity and semantic annotation of the experimental systems or species domain.
Baker, Erich J.; Jay, Jeremy J.; Philip, Vivek M.; Zhang, Yun; Li, Zuopan; Kirova, Roumyana; Langston, Michael A.; Chesler, Elissa J.
Background The Signal-to-Noise-Ratio (SNR) is often used for identification of biomarkers for two-class problems and no formal and useful generalization of SNR is available for multiclass problems. We propose innovative generalizations of SNR for multiclass cancer discrimination through introduction of two indices, Gene Dominant Index and Gene Dormant Index (GDIs). These two indices lead to the concepts of dominant and dormant genes with biological significance. We use these indices to develop methodologies for discovery of dominant and dormant biomarkers with interesting biological significance. The dominancy and dormancy of the identified biomarkers and their excellent discriminating power are also demonstrated pictorially using the scatterplot of individual gene and 2-D Sammon's projection of the selected set of genes. Using information from the literature we have shown that the GDI based method can identify dominant and dormant genes that play significant roles in cancer biology. These biomarkers are also used to design diagnostic prediction systems. Results and discussion To evaluate the effectiveness of the GDIs, we have used four multiclass cancer data sets (Small Round Blue Cell Tumors, Leukemia, Central Nervous System Tumors, and Lung Cancer). For each data set we demonstrate that the new indices can find biologically meaningful genes that can act as biomarkers. We then use six machine learning tools, Nearest Neighbor Classifier (NNC), Nearest Mean Classifier (NMC), Support Vector Machine (SVM) classifier with linear kernel, and SVM classifier with Gaussian kernel, where both SVMs are used in conjunction with one-vs-all (OVA) and one-vs-one (OVO) strategies. We found GDIs to be very effective in identifying biomarkers with strong class specific signatures. With all six tools and for all data sets we could achieve better or comparable prediction accuracies usually with fewer marker genes than results reported in the literature using the same computational protocols. The dominant genes are usually easy to find while good dormant genes may not always be available as dormant genes require stronger constraints to be satisfied; but when they are available, they can be used for authentication of diagnosis. Conclusion Since GDI based schemes can find a small set of dominant/dormant biomarkers that is adequate to design diagnostic prediction systems, it opens up the possibility of using real-time qPCR assays or antibody based methods such as ELISA for an easy and low cost diagnosis of diseases. The dominant and dormant genes found by GDIs can be used in different ways to design more reliable diagnostic prediction systems.
Tsai, Yu-Shuen; Lin, Chin-Teng; Tseng, George C; Chung, I-Fang; Pal, Nikhil Ranjan
Membrane protein targets constitute a key segment of drug discovery portfolios and significant effort has gone into increasing the speed and efficiency of pursuing these targets. However, issues still exist in routine gene expression and stable cell-based assay development for membrane proteins, which are often multimeric or toxic to host cells. To enhance cell-based assay capabilities, modified baculovirus (BacMam virus) gene delivery technology has been successfully applied to the transient expression of target proteins in mammalian cells. Here, we review the development, full implementation and benefits of this platform-based gene expression technology in support of SAR and HTS assays across GlaxoSmithKline. PMID:17467576
Kost, Thomas A; Condreay, J Patrick; Ames, Robert S; Rees, Stephen; Romanos, Michael A
The field of bacterial natural product research is currently undergoing a paradigm change concerning the discovery of natural products. Previously most efforts were based on isolation of the most abundant compound in an extract, or on tracking bioactivity. However, traditional activity-guided approaches are limited by the available test panels and frequently lead to the rediscovery of already known compounds. The constantly increasing availability of bacterial genome sequences provides the potential for the discovery of a huge number of new natural compounds by in silico identification of biosynthetic gene clusters. Examination of the information on the biosynthetic machinery can further prevent rediscovery of known compounds, and can help identify so far unknown biosynthetic pathways of known compounds. By in silico screening of the genome of the myxobacterium Stigmatella aurantiaca Sg a15, a trans-AT polyketide synthase/non-ribosomal peptide synthetase (PKS/NRPS) gene cluster was identified that could not be correlated to any secondary metabolite known to be produced by this strain. Targeted gene inactivation and analysis of extracts from the resulting mutants by high performance liquid chromatography coupled to high resolution mass spectrometry (HPLC-HRMS), in combination with the use of statistical tools resulted in the identification of a compound that was absent in the mutants extracts. By matching with our in-house database of myxobacterial secondary metabolites, this compound was identified as rhizopodin. A detailed analysis of the rhizopodin biosynthetic machinery is presented in this manuscript. PMID:22278953
Pistorius, Dominik; Müller, Rolf
Over the past decade, bacterial genome sequences have revealed an immense reservoir of biosynthetic gene clusters, sets of contiguous genes that have the potential to produce drugs or drug-like molecules. However, the majority of these gene clusters appear to be inactive for unknown reasons prompting terms such as "cryptic" or "silent" to describe them. Because natural products have been a major source of therapeutic molecules, methods that rationally activate these silent clusters would have a profound impact on drug discovery. Herein, a new strategy is outlined for awakening silent gene clusters using small molecule elicitors. In this method, a genetic reporter construct affords a facile read-out for activation of the silent cluster of interest, while high-throughput screening of small molecule libraries provides potential inducers. This approach was applied to two cryptic gene clusters in the pathogenic model Burkholderia thailandensis. The results not only demonstrate a prominent activation of these two clusters, but also reveal that the majority of elicitors are themselves antibiotics, most in common clinical use. Antibiotics, which kill B. thailandensis at high concentrations, act as inducers of secondary metabolism at low concentrations. One of these antibiotics, trimethoprim, served as a global activator of secondary metabolism by inducing at least five biosynthetic pathways. Further application of this strategy promises to uncover the regulatory networks that activate silent gene clusters while at the same time providing access to the vast array of cryptic molecules found in bacteria. PMID:24808135
Seyedsayamdost, Mohammad R
Over the past decade, bacterial genome sequences have revealed an immense reservoir of biosynthetic gene clusters, sets of contiguous genes that have the potential to produce drugs or drug-like molecules. However, the majority of these gene clusters appear to be inactive for unknown reasons prompting terms such as “cryptic” or “silent” to describe them. Because natural products have been a major source of therapeutic molecules, methods that rationally activate these silent clusters would have a profound impact on drug discovery. Herein, a new strategy is outlined for awakening silent gene clusters using small molecule elicitors. In this method, a genetic reporter construct affords a facile read-out for activation of the silent cluster of interest, while high-throughput screening of small molecule libraries provides potential inducers. This approach was applied to two cryptic gene clusters in the pathogenic model Burkholderia thailandensis. The results not only demonstrate a prominent activation of these two clusters, but also reveal that the majority of elicitors are themselves antibiotics, most in common clinical use. Antibiotics, which kill B. thailandensis at high concentrations, act as inducers of secondary metabolism at low concentrations. One of these antibiotics, trimethoprim, served as a global activator of secondary metabolism by inducing at least five biosynthetic pathways. Further application of this strategy promises to uncover the regulatory networks that activate silent gene clusters while at the same time providing access to the vast array of cryptic molecules found in bacteria.
Seyedsayamdost, Mohammad R.
Background Constructing coexpression networks and performing network analysis using large-scale gene expression data sets is an effective way to uncover new biological knowledge; however, the methods used for gene association in constructing these coexpression networks have not been thoroughly evaluated. Since different methods lead to structurally different coexpression networks and provide different information, selecting the optimal gene association method is critical. Methods and Results In this study, we compared eight gene association methods – Spearman rank correlation, Weighted Rank Correlation, Kendall, Hoeffding's D measure, Theil-Sen, Rank Theil-Sen, Distance Covariance, and Pearson – and focused on their true knowledge discovery rates in associating pathway genes and construction coordination networks of regulatory genes. We also examined the behaviors of different methods to microarray data with different properties, and whether the biological processes affect the efficiency of different methods. Conclusions We found that the Spearman, Hoeffding and Kendall methods are effective in identifying coexpressed pathway genes, whereas the Theil-sen, Rank Theil-Sen, Spearman, and Weighted Rank methods perform well in identifying coordinated transcription factors that control the same biological processes and traits. Surprisingly, the widely used Pearson method is generally less efficient, and so is the Distance Covariance method that can find gene pairs of multiple relationships. Some analyses we did clearly show Pearson and Distance Covariance methods have distinct behaviors as compared to all other six methods. The efficiencies of different methods vary with the data properties to some degree and are largely contingent upon the biological processes, which necessitates the pre-analysis to identify the best performing method for gene association and coexpression network construction.
Kumari, Sapna; Nie, Jeff; Chen, Huann-Sheng; Ma, Hao; Stewart, Ron; Li, Xiang; Lu, Meng-Zhu; Taylor, William M.; Wei, Hairong
Background The antifungal therapy caspofungin is a semi-synthetic derivative of pneumocandin B0, a lipohexapeptide produced by the fungus Glarea lozoyensis, and was the first member of the echinocandin class approved for human therapy. The nonribosomal peptide synthetase (NRPS)-polyketide synthases (PKS) gene cluster responsible for pneumocandin biosynthesis from G. lozoyensis has not been elucidated to date. In this study, we report the elucidation of the pneumocandin biosynthetic gene cluster by whole genome sequencing of the G. lozoyensis wild-type strain ATCC 20868. Results The pneumocandin biosynthetic gene cluster contains a NRPS (GLNRPS4) and a PKS (GLPKS4) arranged in tandem, two cytochrome P450 monooxygenases, seven other modifying enzymes, and genes for L-homotyrosine biosynthesis, a component of the peptide core. Thus, the pneumocandin biosynthetic gene cluster is significantly more autonomous and organized than that of the recently characterized echinocandin B gene cluster. Disruption mutants of GLNRPS4 and GLPKS4 no longer produced the pneumocandins (A0 and B0), and the ?glnrps4 and ?glpks4 mutants lost antifungal activity against the human pathogenic fungus Candida albicans. In addition to pneumocandins, the G. lozoyensis genome encodes a rich repertoire of natural product-encoding genes including 24 PKSs, six NRPSs, five PKS-NRPS hybrids, two dimethylallyl tryptophan synthases, and 14 terpene synthases. Conclusions Characterization of the gene cluster provides a blueprint for engineering new pneumocandin derivatives with improved pharmacological properties. Whole genome estimation of the secondary metabolite-encoding genes from G. lozoyensis provides yet another example of the huge potential for drug discovery from natural products from the fungal kingdom.
Microviridins are ribosomally synthesized tricyclic depsipeptides produced by different genera of cyanobacteria. The prevalence of the microviridin gene clusters and the natural diversity of microviridin precursor sequences are currently unknown. Screening of laboratory strains and field samples of the bloom-forming freshwater cyanobacterium Microcystis via PCR revealed global occurrence of the microviridin pathway and an unexpected natural variety. We could detect 15 new variants of the precursor gene mdnA encoding microviridin backbones that differ in up to 4 amino acid positions from known isoforms of the peptide. The survey not only provides insights into the versatility of the biosynthetic enzymes in a closely related group of cyanobacteria, but also facilitates the discovery and characterization of cryptic microviridin variants. This is demonstrated for microviridin L in Microcystis aeruginosa strain NIES843 and heterologously produced variants.
Ziemert, Nadine; Ishida, Keishi; Weiz, Annika; Hertweck, Christian; Dittmann, Elke
Background Orchids are one of the most diversified angiosperms, but few genomic resources are available for these non-model plants. In addition to the ecological significance, Phalaenopsis has been considered as an economically important floriculture industry worldwide. We aimed to use massively parallel 454 pyrosequencing for a global characterization of the Phalaenopsis transcriptome. Results To maximize sequence diversity, we pooled RNA from 10 samples of different tissues, various developmental stages, and biotic- or abiotic-stressed plants. We obtained 206,960 expressed sequence tags (ESTs) with an average read length of 228 bp. These reads were assembled into 8,233 contigs and 34,630 singletons. The unigenes were searched against the NCBI non-redundant (NR) protein database. Based on sequence similarity with known proteins, these analyses identified 22,234 different genes (E-value cutoff, e-7). Assembled sequences were annotated with Gene Ontology, Gene Family and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Among these annotations, over 780 unigenes encoding putative transcription factors were identified. Conclusion Pyrosequencing was effective in identifying a large set of unigenes from Phalaenopsis. The informative EST dataset we developed constitutes a much-needed resource for discovery of genes involved in various biological processes in Phalaenopsis and other orchid species. These transcribed sequences will narrow the gap between study of model organisms with many genomic resources and species that are important for ecological and evolutionary studies.
Potato is the world's fourth largest food crop yet it continues to endure late blight, a devastating disease caused by the Irish famine pathogen Phytophthora infestans. Breeding broad-spectrum disease resistance (R) genes into potato (Solanum tuberosum) is the best strategy for genetically managing late blight but current approaches are slow and inefficient. We used a repertoire of effector genes predicted computationally from the P. infestans genome to accelerate the identification, functional characterization, and cloning of potentially broad-spectrum R genes. An initial set of 54 effectors containing a signal peptide and a RXLR motif was profiled for activation of innate immunity (avirulence or Avr activity) on wild Solanum species and tentative Avr candidates were identified. The RXLR effector family IpiO induced hypersensitive responses (HR) in S. stoloniferum, S. papita and the more distantly related S. bulbocastanum, the source of the R gene Rpi-blb1. Genetic studies with S. stoloniferum showed cosegregation of resistance to P. infestans and response to IpiO. Transient co-expression of IpiO with Rpi-blb1 in a heterologous Nicotiana benthamiana system identified IpiO as Avr-blb1. A candidate gene approach led to the rapid cloning of S. stoloniferum Rpi-sto1 and S. papita Rpi-pta1, which are functionally equivalent to Rpi-blb1. Our findings indicate that effector genomics enables discovery and functional profiling of late blight R genes and Avr genes at an unprecedented rate and promises to accelerate the engineering of late blight resistant potato varieties.
Vleeshouwers, Vivianne G. A. A.; Rietman, Hendrik; Krenek, Pavel; Champouret, Nicolas; Young, Carolyn; Oh, Sang-Keun; Wang, Miqia; Bouwmeester, Klaas; Vosman, Ben; Visser, Richard G. F.; Jacobsen, Evert; Govers, Francine; Kamoun, Sophien; Van der Vossen, Edwin A. G.
Target discovery, which involves the identification and early validation of disease-modifying targets, is an essential first step in the drug discovery pipeline. Indeed, the drive to determine protein function has been stimulated, both in industry and academia, by the completion of the human genome project. In this article, we critically examine the strategies and methodologies used for both the identification
Mark A. Lindsay
The rampant use of antibiotics in the last half-century has imposed an unforeseen biological cost, the unprecedented acceleration of bacterial evolution to produce drug-resistant strains to practically every approved antibiotic. This rise in antimicrobial drug resistance, alongside the failure of conventional research efforts to discover new antibiotics, may eventually lead to a public health crisis that can drastically curtail our ability to combat infectious disease. To address this public health need for novel countermeasure strategies, research efforts have recently focused on identification of genes in the host, rather than the pathogen, that are essential for successful pathogen infection, as potential targets for drug discovery. In the past decade, RNA interference (RNAi) has emerged as a powerful tool for analyzing gene function by silencing target genes through the specific destruction of their mRNAs. Based on RNAi methodology, high-throughput genome- wide assay platforms have been developed to identify candidate host genes that are manipulated by pathogens during infection. In this review, we will discuss recent strategies for RNAi-based genomic screens to investigate hostpathogen mechanisms in human cell models using both bacterial pathogens, including Salmonella typhimurium, Mycobacterium tuberculosis, and Listeria monocytogenes, and viruses, such as Human Immunodeficiency Virus (HIV), Hepatitis C Virus (HCV) and influenza. These functional genomics studies have begun to elucidate novel pathogen virulence mechanisms and thus, may serve as the basis for the design of novel host-based inhibitor therapeutics that can block or alleviate the downstream effects of pathogen infection. PMID:20836760
Hong-Geller, Elizabeth; Micheva-Viteva, Sofiya N
Background Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of biomedical sciences. Many such classifiers discovered thus far lack vigorous statistical and experimental validations. A combination of genetic algorithm/support vector machines and genetic algorithm/K nearest neighbors was used in this study to search for classifiers of endocrine-disrupting chemicals (EDCs) in zebrafish. Searches were conducted on both tissue-specific and tissue-combined datasets, either across the entire transcriptome or within individual transcription factor (TF) networks previously linked to EDC effects. Candidate classifiers were evaluated by gene set enrichment analysis (GSEA) on both the original training data and a dedicated validation dataset. Results Multi-tissue dataset yielded no classifiers. Among the 19 chemical-tissue conditions evaluated, the transcriptome-wide searches yielded classifiers for six of them, each having approximately 20 to 30 gene features unique to a condition. Searches within individual TF networks produced classifiers for 15 chemical-tissue conditions, each containing 100 or fewer top-ranked gene features pooled from those of multiple TF networks and also unique to each condition. For the training dataset, 10 out of 11 classifiers successfully identified the gene expression profiles (GEPs) of their targeted chemical-tissue conditions by GSEA. For the validation dataset, classifiers for prochloraz-ovary and flutamide-ovary also correctly identified the GEPs of corresponding conditions while no classifier could predict the GEP from prochloraz-brain. Conclusions The discrepancies in the performance of these classifiers were attributed in part to varying data complexity among the conditions, as measured to some degree by Fisher’s discriminant ratio statistic. This variation in data complexity could likely be compensated by adjusting sample size for individual chemical-tissue conditions, thus suggesting a need for a preliminary survey of transcriptomic responses before launching a full scale classifier discovery effort. Classifier discovery based on individual TF networks could yield more mechanistically-oriented biomarkers. GSEA proved to be a flexible and effective tool for application of gene classifiers but a similar and more refined algorithm, connectivity mapping, should also be explored. The distribution characteristics of classifiers across tissues, chemicals, and TF networks suggested a differential biological impact among the EDCs on zebrafish transcriptome involving some basic cellular functions.
A fundamental goal of genetics and functional genomics is to identify and mutate every gene in model organisms such as Drosophila melanogaster. The Berkeley Drosophila Genome Project (BDGP) gene disruption project generates single P-element insertion strains that each mutate unique genomic open reading frames. Such strains strongly facilitate further genetic and molecular studies of the disrupted loci, but it has
Allan C. Spradling; Dianne Stern; Amy Beaton; E. Jay Rhem; Todd Laverty; Nicole Mozden; Sima Misra; Gerald M. Rubin
Decoding transcriptional programs governing transcriptomic diversity across human multiple tissues is a major challenge in bioinformatics. To address this problem, a number of computational methods have focused on cis-regulatory codes driving overexpression or underexpression in a single tissue as compared to others. On the other hand, we recently proposed a different approach to mine cis-regulatory codes: starting from gene sets sharing common cis-regulatory motifs, the method screens for expression modules based on expression coherence. However, both approaches seem to be insufficient to capture transcriptional programs that control gene expression in a subset of all samples. Especially, this limitation would be serious when analyzing multiple tissue data. To overcome this limitation, we developed a new module discovery method termed BEEM (Biclusering-based Extraction of Expression Modules) in order to discover expression modules that are functional in a subset of tissues. We showed that, when applied to expression profiles of human multiple tissues, BEEM finds expression modules missed by two existing approaches that are based on the coherent expression and the single tissue-specific differential expression. From the BEEM results, we obtained new insights into transcriptional programs controlling transcriptomic diversity across various types of tissues. This study introduces BEEM as a powerful tool for decoding regulatory programs from a compendium of gene expression profiles. PMID:20544005
Niida, Atsushi; Imoto, Seiya; Yamaguchi, Rui; Nagasaki, Masao; Miyano, Satoru
Stagonospora nodorum is an important wheat (Triticum aestivum) pathogen in many parts of the world, causing major yield losses. It was the first species in the large fungal Dothideomycete class to be genome sequenced. The reference genome sequence (SN15) has been instrumental in the discovery of genes encoding necrotrophic effectors that induce disease symptoms in specific host genotypes. Here we present the genome sequence of two further S. nodorum strains (Sn4 and Sn79) that differ in their effector repertoire from the reference. Sn79 is avirulent on wheat and produces no apparent effectors when infiltrated onto many cultivars and mapping population parents. Sn4 is pathogenic on wheat and has virulences not found in SN15. The new strains, sequenced with short-read Illumina chemistry, are compared with SN15 by a combination of mapping and de novo assembly approaches. Each of the genomes contains a large number of strain-specific genes, many of which have no meaningful similarity to any known gene. Large contiguous sections of the reference genome are absent in the two newly sequenced strains. We refer to these differences as “sectional gene absences.” The presence of genes in pathogenic strains and absence in Sn79 is added to computationally predicted properties of known proteins to produce a list of likely effector candidates. Transposon insertion was observed in the mitochondrial genomes of virulent strains where the avirulent strain retained the likely ancestral sequence. The study suggests that short-read enabled comparative genomics is an effective way to both identify new S. nodorum effector candidates and to illuminate evolutionary processes in this species.
Syme, Robert Andrew; Hane, James K.; Friesen, Timothy L.; Oliver, Richard P.
In recent years, RNA-seq has become a very competitive alternative to microarrays. In RNA-seq experiments, the expected read count for a gene is proportional to its expression level multiplied by its transcript length. Even when two genes are expressed at the same level, differences in length will yield differing numbers of total reads. The characteristics of these RNA-seq experiments create a gene-level bias such that the proportion of significantly differentially expressed genes increases with the transcript length, whereas such bias is not present in microarray data. Gene-set analysis seeks to identify the gene sets that are enriched in the list of the identified significant genes. In the gene-set analysis of RNA-seq, the gene-level bias subsequently yields the gene-set-level bias that a gene set with genes of long length will be more likely to show up as enriched than will a gene set with genes of shorter length. Because gene expression is not related to its transcript length, any gene set containing long genes is not of biologically greater interest than gene sets with shorter genes. Accordingly the gene-set-level bias should be removed to accurately calculate the statistical significance of each gene-set enrichment in the RNA-seq. We present a new gene set analysis method of RNA-seq, called FDRseq, which can accurately calculate the statistical significance of a gene-set enrichment score by the grouped false-discovery rate. Numerical examples indicated that FDRseq is appropriate for controlling the transcript length bias in the gene-set analysis of RNA-seq data. To implement FDRseq, we developed the R program, which can be downloaded at no cost from http://home.mju.ac.kr/home/index.action?siteId=tyang.
Yang, Tae Young; Jeong, Seongmun
We report the discovery of a low-mass companion orbiting the metal-rich, main sequence F star TYC 2949-00557-1 during the Multi-object APO Radial Velocity Exoplanet Large-area Survey (MARVELS) pilot project. The host star has an effective temperature T eff = 6135 ± 40 K, logg = 4.4 ± 0.1, and [Fe\\/H] = 0.32 ± 0.01, indicating a mass of M =
Scott W. Fleming; Jian Ge; Suvrath Mahadevan; Brian Lee; Jason D. Eastman; Robert J. Siverd; B. Scott Gaudi; Andrzej Niedzielski; Thirupathi Sivarani; Keivan G. Stassun; Alex Wolszczan; Rory Barnes; Bruce Gary; Duy Cuong Nguyen; Robert C. Morehead; Xiaoke Wan; Bo Zhao; Jian Liu; Pengcheng Guo; Stephen R. Kane; Julian C. van Eyken; Nathan M. De Lee; Justin R. Crepp; Alaina C. Shelden; Chris Laws; John P. Wisniewski; Donald P. Schneider; Joshua Pepper; Stephanie A. Snedden; Kaike Pan; Dmitry Bizyaev; Howard Brewington; Olena Malanushenko; Viktor Malanushenko; Daniel Oravetz; Audrey Simmons; Shannon Watters
Summary Heme biosynthesis consists of a series of eight enzymatic reactions that originate in mitochondria and continue in the cytosol before returning to mitochondria. Although these core enzymes are well studied, additional mitochondrial transporters and regulatory factors are predicted to be required. To discover such unknown components, we utilized a large-scale computational screen to identify mitochondrial proteins whose transcripts consistently co-express with the core machinery of heme biosynthesis. We identified SLC25A39, SLC22A4 and TMEM14C, which are putative mitochondrial transporters, as well as C1orf69 and ISCA1, which are iron-sulfur cluster proteins. Targeted knockdowns of all five genes in zebrafish resulted in profound anemia without impacting erythroid lineage specification. Moreover, silencing of Slc25a39 in murine erythroleukemia cells impaired iron incorporation into protoporphyrin IX, and vertebrate Slc25a39 complemented an iron homeostasis defect in the orthologous yeast mtm1? deletion mutant. Our results advance the molecular understanding of heme biosynthesis and offer promising candidate genes for inherited anemias.
Nilsson, Roland; Schultz, Iman J.; Pierce, Eric L.; Soltis, Kathleen A.; Naranuntarat, Amornrat; Ward, Diane M.; Baughman, Joshua; Paradkar, Prasad N.; Kingsley, Paul D.; Culotta, Valeria C.; Kaplan, Jerry; Palis, James; Paw, Barry H.; Mootha, Vamsi K.
Chickpea ranks third among the food legume crops production in the world. However, the genomic resources available for chickpea are still very limited. In the present study, the transcriptome of chickpea was sequenced with short reads on Illumina Genome Analyzer platform. We have assessed the effect of sequence quality, various assembly parameters and assembly programs on the final assembly output. We assembled ?107million high-quality trimmed reads using Velvet followed by Oases with optimal parameters into a non-redundant set of 53 409 transcripts (?100 bp), representing about 28 Mb of unique transcriptome sequence. The average length of transcripts was 523 bp and N50 length of 900 bp with coverage of 25.7 rpkm (reads per kilobase per million). At the protein level, a total of 45 636 (85.5%) chickpea transcripts showed significant similarity with unigenes/predicted proteins from other legumes or sequenced plant genomes. Functional categorization revealed the conservation of genes involved in various biological processes in chickpea. In addition, we identified simple sequence repeat motifs in transcripts. The chickpea transcripts set generated here provides a resource for gene discovery and development of functional molecular markers. In addition, the strategy for de novo assembly of transcriptome data presented here will be helpful in other similar transcriptome studies.
Garg, Rohini; Patel, Ravi K.; Tyagi, Akhilesh K.; Jain, Mukesh
Testing for significance with gene expression data from DNA microarray experiments involves simultaneous comparisons of hundreds or thousands of genes. If R denotes the number of rejections (declared significant genes) and V denotes the number of false rejections, then V/R, if R > 0, is the proportion of false rejected hypotheses. This paper proposes a model for the distribution of the number of rejections and the conditional distribution of V given R, V / R. Under the independence assumption, the distribution of R is a convolution of two binomials and the distribution of V / R has a noncentral hypergeometric distribution. Under an equicorrelated model, the distributions are more complex and are also derived. Five false discovery rate probability error measures are considered: FDR = E(V/R), pFDR = E(V/R / R > 0) (positive FDR), cFDR = E(V/R / R = r) (conditional FDR), mFDR = E(V)/E(R) (marginal FDR), and eFDR = E(V)/r (empirical FDR). The pFDR, cFDR, and mFDR are shown to be equivalent under the Bayesian framework, in which the number of true null hypotheses is modeled as a random variable. We present a parametric and a bootstrap procedure to estimate the FDRs. Monte Carlo simulations were conducted to evaluate the performance of these two methods. The bootstrap procedure appears to perform reasonably well, even when the alternative hypotheses are correlated (rho = .25). An example from a toxicogenomic microarray experiment is presented for illustration. PMID:14969487
Tsai, Chen-An; Hsueh, Huey-miin; Chen, James J
Experienced science journalist David Bradley serves up this resource on current chemical happenings. Tracking some of the discoveries and controversies at the forefront of chemistry, each issue of elemental discoveries summarizes a range of newsworthy topics, from gene control and tubular sensors to singing fish. In addition to the current issue, readers may browse past issues beginning December 1997. Two additional sections, Elemental Reviews and Book Sale, provide brief commentary on or descriptions (with UK prices) of related resources.
An interagency task force would be the best way to coordinate the many research and technology projects aimed at mapping and sequencing our genes. The immediate goal of these projects is not to completely understand human genetic information, but to creat...
Clustering analysis has a growing role in the study of co-expressed genes for gene discovery. Conventional binary and fuzzy clustering do not embrace the biological reality that some genes may be irrelevant for a problem and not be assigned to a cluster, while other genes may participate in several biological functions and should simultaneously belong to multiple clusters. Also, these algorithms cannot generate tight clusters that focus on their cores or wide clusters that overlap and contain all possibly relevant genes. In this paper, a new clustering paradigm is proposed. In this paradigm, all three eventualities of a gene being exclusively assigned to a single cluster, being assigned to multiple clusters, and being not assigned to any cluster are possible. These possibilities are realised through the primary novelty of the introduction of tunable binarization techniques. Results from multiple clustering experiments are aggregated to generate one fuzzy consensus partition matrix (CoPaM), which is then binarized to obtain the final binary partitions. This is referred to as Binarization of Consensus Partition Matrices (Bi-CoPaM). The method has been tested with a set of synthetic datasets and a set of five real yeast cell-cycle datasets. The results demonstrate its validity in generating relevant tight, wide, and complementary clusters that can meet requirements of different gene discovery studies.
Abu-Jamous, Basel; Fa, Rui; Roberts, David J.; Nandi, Asoke K.
Using the complete genome sequence of Pseudomonas: aeruginosa PAO1, sequenced by the Pseudomonas: Genome Project (ftp://ftp.pseudomonas. com/data/pacontigs.121599), a genome database (http://pseudomonas. bit.uq.edu.au/) has been developed containing information on more than 95% of all ORFs in Pseudomonas: aeruginosa. The database is searchable by a variety of means, including gene name, position, keyword, sequence similarity and Pfam domain. Automated and manual annotation, nucleotide and peptide sequences, Pfam and SMART domains (where available), Medline and GenBank links and a scrollable, graphical representation of the surrounding genomic landscape are available for each ORF. Using the database has revealed, among other things, that P. aeruginosa contains four chemotaxis systems, two novel general secretion pathways, at least three loci encoding F17-like thin fimbriae, six novel filamentous haemagglutinin-like genes, a number of unusual composite genetic loci related to vgr/RHS: elements in Escherichia coli, a number of fix-like genes encoding a micro-oxic respiration system, novel biosynthetic pathways and 38 genes containing domains of unknown function (DUF1/DUF2). It is anticipated that this database will be a useful bioinformatic tool for the Pseudomonas: community that will continue to evolve. PMID:11021912
Croft, L; Beatson, S A; Whitchurch, C B; Huang, B; Blakeley, R L; Mattick, J S
Inteins are rare, translated genetic parasites mainly found in bacteria and archaea, while spliceosomal introns are distinctly eukaryotic features abundant in most nuclear genomes. Using targeted metagenomics, we discovered an intein in an Atlantic population of the photosynthetic eukaryote, Bathycoccus, harbored by the essential spliceosomal protein PRP8 (processing factor 8 protein). Although previously thought exclusive to fungi, we also identified PRP8 inteins in parasitic (Capsaspora) and predatory (Salpingoeca) protists. Most new PRP8 inteins were at novel insertion sites that, surprisingly, were not in the most conserved regions of the gene. Evolutionarily, Dikarya fungal inteins at PRP8 insertion site a appeared more related to the Bathycoccus intein at a unique insertion site, than to other fungal and opisthokont inteins. Strikingly, independent analyses of Pacific and Atlantic samples revealed an intron at the same codon as the Bathycoccus PRP8 intein. The two elements are mutually exclusive and neither was found in cultured Bathycoccus or other picoprasinophyte genomes. Thus, wild Bathycoccus contain one of few non-fungal eukaryotic inteins known and a rare polymorphic intron. Our data indicate at least two Bathycoccus ecotypes exist, associated respectively with oceanic or mesotrophic environments. We hypothesize that intein propagation is facilitated by marine viruses; and, while intron gain is still poorly understood, presence of a spliceosomal intron where a locus lacks an intein raises the possibility of new, intein-primed mechanisms for intron gain. The discovery of nucleus-encoded inteins and associated sequence polymorphisms in uncultivated marine eukaryotes highlights their diversity and reveals potential sexual boundaries between populations indistinguishable by common marker genes.
Monier, Adam; Sudek, Sebastian; Fast, Naomi M; Worden, Alexandra Z
Inteins are rare, translated genetic parasites mainly found in bacteria and archaea, while spliceosomal introns are distinctly eukaryotic features abundant in most nuclear genomes. Using targeted metagenomics, we discovered an intein in an Atlantic population of the photosynthetic eukaryote, Bathycoccus, harbored by the essential spliceosomal protein PRP8 (processing factor 8 protein). Although previously thought exclusive to fungi, we also identified PRP8 inteins in parasitic (Capsaspora) and predatory (Salpingoeca) protists. Most new PRP8 inteins were at novel insertion sites that, surprisingly, were not in the most conserved regions of the gene. Evolutionarily, Dikarya fungal inteins at PRP8 insertion site a appeared more related to the Bathycoccus intein at a unique insertion site, than to other fungal and opisthokont inteins. Strikingly, independent analyses of Pacific and Atlantic samples revealed an intron at the same codon as the Bathycoccus PRP8 intein. The two elements are mutually exclusive and neither was found in cultured Bathycoccus or other picoprasinophyte genomes. Thus, wild Bathycoccus contain one of few non-fungal eukaryotic inteins known and a rare polymorphic intron. Our data indicate at least two Bathycoccus ecotypes exist, associated respectively with oceanic or mesotrophic environments. We hypothesize that intein propagation is facilitated by marine viruses; and, while intron gain is still poorly understood, presence of a spliceosomal intron where a locus lacks an intein raises the possibility of new, intein-primed mechanisms for intron gain. The discovery of nucleus-encoded inteins and associated sequence polymorphisms in uncultivated marine eukaryotes highlights their diversity and reveals potential sexual boundaries between populations indistinguishable by common marker genes. PMID:23635865
Monier, Adam; Sudek, Sebastian; Fast, Naomi M; Worden, Alexandra Z
All cells need to protect themselves against the osmotic challenges of their environment by maintaining low permeability to ions across their cell membranes. This is a basic principle of cellular function, which is reflected in the interactions among ion transport and drug efflux genes that have arisen during cellular evolution. Thus, upon exposure to pore-forming antibiotics such as amphotericin B (AmB) or daptomycin (Dap), sensitive cells overexpress common resistance genes to protect themselves from added osmotic challenges. These genes share pathway interactions with the various types of multidrug resistance (MDR) transporter genes, which both preserve the native lipid membrane composition and at the same time eliminate disruptive hydrophobic molecules that partition excessively within the lipid bilayer. An increased understanding of the relationships between the genes (and their products) that regulate osmotic stress responses and MDR transporters will help to identify novel strategies and targets to overcome the current stalemate in drug discovery.
Background Single nucleotide polymorphisms (SNPs) are the most abundant genetic variant found in vertebrates and invertebrates. SNP discovery has become a highly automated, robust and relatively inexpensive process allowing the identification of many thousands of mutations for model and non-model organisms. Annotating large numbers of SNPs can be a difficult and complex process. Many tools available are optimised for use with organisms densely sampled for SNPs, such as humans. There are currently few tools available that are species non-specific or support non-model organism data. Results Here we present SNPdat, a high throughput analysis tool that can provide a comprehensive annotation of both novel and known SNPs for any organism with a draft sequence and annotation. Using a dataset of 4,566 SNPs identified in cattle using high-throughput DNA sequencing we demonstrate the annotations performed and the statistics that can be generated by SNPdat. Conclusions SNPdat provides users with a simple tool for annotation of genomes that are either not supported by other tools or have a small number of annotated SNPs available. SNPdat can also be used to analyse datasets from organisms which are densely sampled for SNPs. As a command line tool it can easily be incorporated into existing SNP discovery pipelines and fills a niche for analyses involving non-model organisms that are not supported by many available SNP annotation tools. SNPdat will be of great interest to scientists involved in SNP discovery and analysis projects, particularly those with limited bioinformatics experience.
The Discovery Corps Fellowship Program is a pilot program seeking new postdoctoral and professional development models that combine research expertise with service-oriented projects. The Discovery Corps Fellowship Program comprises two categories of awards: recent doctoral recipients serve as Discovery Corps Postdoctoral Fellows; and mid-career professionals serve as Discovery Corps Senior Fellows. More information about the Discovery Corps Fellowship Program, including abstracts of the ...
The Discovery Corps Fellowship Program is a pilot program seeking new postdoctoral and professional development models that combine research expertise with professional service. Discovery Corps Fellows leverage their research expertise through projects that address areas of national need. The Discovery Corps Fellowship Program comprises two categories of awards: recent doctoral recipients serve as Discovery Corps Postdoctoral Fellows; and mid-career professionals serve as Discovery Corps ...
Knowledge Discovery Nuggets is both a web site and an associated newsletter. The newsletter focuses on the latest research, new applications, conference announcements, and news about data mining and knowledge discovery. The web site offers a large index of categorized pointers to data mining and knowledge discovery software, informative reference materials, related research projects, data sets, and much more. While somewhat difficult to navigate, Knowledge Discovery Nuggets offers an excellent place to start a data mining or knowledge discovery related search.
Wellington Muchero from Oak Ridge National Laboratory gives a talk titled "Discovery of Cell Wall Biosynthesis Genes in Populus" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.
Muchero, Wellington [Oak Ridge National Laboratory
Wellington Muchero from Oak Ridge National Laboratory gives a talk titled "Discovery of Cell Wall Biosynthesis Genes in Populus" at the JGI 7th Annual Users Meeting: Genomics of Energy & Environment Meeting on March 22, 2012 in Walnut Creek, California.
Muchero, Wellington [Oak Ridge National Laboratory] [Oak Ridge National Laboratory
Probing the functional complexity of the human genome will require new gene cloning techniques, not only to discover intraspecies gene homologs and interspecies gene orthologs, but also to identify alternatively spliced gene variants. We report homologous cDNA cloning methods that allow cloning of gene family members, genes from different species, and alternatively spliced gene variants. We cloned human 14-3-3 gene
Hong Zeng; Elizabeth Allen; Chris W. Lehman; R. Geoffrey Sargent; Sushma Pati; David A. Zarling
The Berkeley Drosophila Genome Project (BDGP) strives to disrupt each Drosophila gene by the insertion of a single transposable element. As part of this effort, transposons in more than 30,000 fly strains were localized and analyzed relative to predicted Drosophila gene structures. Approximately 6,300 lines that maximize genomic coverage were selected to be sent to the Bloomington Stock Center for
Hugo J. Bellen; Robert W. Levis; Guochun Liao; Yuchun He; Joseph W. Carlson; Garson Tsang; Martha Evans-Holm; P. Robin Hiesinger; Karen L. Schulze; Gerald M. Rubin; Roger A. Hoskins; Allan C. Spradling
The main topic of this paper is evaluating a system that uses the expected value of experimentation for discovering causal pathways in gene expression data. By experimentation we mean both interventions (e.g., a gene knock-out experiment) and observations (e.g., passively observing the expression level of a "wild-type" gene). We introduce a system called GEEVE (causal discovery in Gene Expression data using Expected Value of Experimentation), which implements expected value of experimentation in discovering causal pathways using gene expression data. GEEVE provides the following assistance, which is intended to help biologists in their quest to discover gene-regulation pathways: Recommending which experiments to perform (with a focus on "knock-out" experiments) using an expected value of experimentation (EVE) method. Recommending the number of measurements (observational and experimental) to include in the experimental design, again using an EVE method. Providing a Bayesian analysis that combines prior knowledge with the results of recent microarray experimental results to derive posterior probabilities of gene regulation relationships. In recommending which experiments to perform (and how many times to repeat them) the EVE approach considers the biologist's preferences for which genes to focus the discovery process. Also, since exact EVE calculations are exponential in time, GEEVE incorporates approximation methods. GEEVE is able to combine data from knock-out experiments with data from wild-type experiments to suggest additional experiments to perform and then to analyze the results of those microarray experimental results. It models the possibility that unmeasured (latent) variables may be responsible for some of the statistical associations among the expression levels of the genes under study. To evaluate the GEEVE system, we used a gene expression simulator to generate data from specified models of gene regulation. Using the simulator, we evaluated the GEEVE system using a randomized control study that involved 10 biologists, some of whom used GEEVE and some of whom did not. The results show that biologists who used GEEVE reached correct causal assessments about gene regulation more often than did those biologists who did not use GEEVE. The GEEVE users also reached their assessments in a more cost-effective manner. PMID:16203178
Yoo, Changwon; Cooper, Gregory F; Schmidt, Martin
A fundamental goal of genetics and functional genomics is to identify and mutate every gene in model organisms such as Drosophila melanogaster. The Berkeley Drosophila Genome Project (BDGP) gene disruption project generates single P-element insertion strains that each mutate unique genomic open reading frames. Such strains strongly facilitate further genetic and molecular studies of the disrupted loci, but it has remained unclear if P elements can be used to mutate all Drosophila genes. We now report that the primary collection has grown to contain 1045 strains that disrupt more than 25% of the estimated 3600 Drosophila genes that are essential for adult viability. Of these P insertions, 67% have been verified by genetic tests to cause the associated recessive mutant phenotypes, and the validity of most of the remaining lines is predicted on statistical grounds. Sequences flanking >920 insertions have been determined to exactly position them in the genome and to identify 376 potentially affected transcripts from collections of EST sequences. Strains in the BDGP collection are available from the Bloomington Stock Center and have already assisted the research community in characterizing >250 Drosophila genes. The likely identity of 131 additional genes in the collection is reported here. Our results show that Drosophila genes have a wide range of sensitivity to inactivation by P elements, and provide a rationale for greatly expanding the BDGP primary collection based entirely on insertion site sequencing. We predict that this approach can bring >85% of all Drosophila open reading frames under experimental control.
Spradling, A C; Stern, D; Beaton, A; Rhem, E J; Laverty, T; Mozden, N; Misra, S; Rubin, G M
Bacterial infections are increasingly difficult to treat owing to the spread of antibiotic resistance. A major concern is Gram-negative bacteria, for which the discovery of new antimicrobial drugs has been particularly scarce. In an effort to accelerate early steps in drug discovery, the EU-funded AEROPATH project aims to identify novel targets in the opportunistic pathogen Pseudomonas aeruginosa by applying a multidisciplinary approach encompassing target validation, structural characterization, assay development and hit identification from small-molecule libraries. Here, the strategies used for target selection are described and progress in protein production and structure analysis is reported. Of the 102 selected targets, 84 could be produced in soluble form and the de novo structures of 39 proteins have been determined. The crystal structures of eight of these targets, ranging from hypothetical unknown proteins to metabolic enzymes from different functional classes (PA1645, PA1648, PA2169, PA3770, PA4098, PA4485, PA4992 and PA5259), are reported here. The structural information is expected to provide a firm basis for the improvement of hit compounds identified from fragment-based and high-throughput screening campaigns.
Moynie, Lucille; Schnell, Robert; McMahon, Stephen A.; Sandalova, Tatyana; Boulkerou, Wassila Abdelli; Schmidberger, Jason W.; Alphey, Magnus; Cukier, Cyprian; Duthie, Fraser; Kopec, Jolanta; Liu, Huanting; Jacewicz, Agata; Hunter, William N.; Naismith, James H.; Schneider, Gunter
Hematopoietic stem cells (HSCs) are rare quiescent cells that continuously replenish the cellular components of the peripheral blood. Observing that the ataxia-associated gene Ataxin-1-like (Atxn1L) was highly expressed in HSCs, we examined its role in HSC function through in vitro and in vivo assays. Mice lacking Atxn1L had greater numbers of HSCs that regenerated the blood more quickly than their wild-type counterparts. Molecular analyses indicated Atxn1L null HSCs had gene expression changes that regulate a program consistent with their higher level of proliferation, suggesting that Atxn1L is a novel regulator of HSC quiescence. To determine if additional brain-associated genes were candidates for hematologic regulation, we examined genes encoding proteins from autism- and ataxia-associated protein–protein interaction networks for their representation in hematopoietic cell populations. The interactomes were found to be highly enriched for proteins encoded by genes specifically expressed in HSCs relative to their differentiated progeny. Our data suggest a heretofore unappreciated similarity between regulatory modules in the brain and HSCs, offering a new strategy for novel gene discovery in both systems.
Yu, Peng; Zohren, Fabian; Lee, Yoontae; Shaw, Chad A.; Zoghbi, Huda Y.; Goodell, Margaret A.
Single nucleotide polymorphisms (SNPs) in immune response genes have been reported as markers for susceptibility to infectious diseases in human and livestock. A disease caused by cyprinid herpesvirus 3 (CyHV-3) is highly contagious and virulent in common carp (Cyprinus carpio). With the aim to develop molecular tools for breeding CyHV-3-resistant carp, we have amplified and sequenced 11 candidate genes for viral disease resistance including TLR2, TLR3, TLR4ba, TLR7, TLR9, TLR21, TLR22, MyD88, TRAF6, type I IFN and IL-1beta. For each gene, we initially cloned and sequenced PCR amplicons from 8 to 12 fish (2-3 fish per strain) from the SNP discovery panel. We then identified and evaluated putative SNPs for their polymorphisms in the SNP discovery panel and validated their usefulness for linkage analysis in a full-sib family using the SNaPshot method. Our sequencing results and phylogenetic analyses suggested that TLR3, TLR7 and MyD88 genes are duplicated in the common carp genome. We, therefore, developed locus-specific PCR primers and SNP genotyping assays for the duplicated loci. A total of 48 SNP markers were developed from PCR fragments of the 13 loci (7 single-locus and 3 duplicated genes). Thirty-nine markers were polymorphic with estimated minor allele frequencies of more than 0.1. The utility of the SNP markers was evaluated in one full-sib family and revealed that 20 markers from 9 loci segregated in a disomic and Mendelian pattern and would be useful for linkage analysis. PMID:20420915
Kongchum, Pawapol; Palti, Yniv; Hallerman, Eric M; Hulata, Gideon; David, Lior
Background Meiosis is a critical process in the reproduction and life cycle of flowering plants in which homologous chromosomes pair, synapse, recombine and segregate. Understanding meiosis will not only advance our knowledge of the mechanisms of genetic recombination, but also has substantial applications in crop improvement. Despite the tremendous progress in the past decade in other model organisms (e.g., Saccharomyces cerevisiae and Drosophila melanogaster), the global identification of meiotic genes in flowering plants has remained a challenge due to the lack of efficient methods to collect pure meiocytes for analyzing the temporal and spatial gene expression patterns during meiosis, and for the sensitive identification and quantitation of novel genes. Results A high-throughput approach to identify meiosis-specific genes by combining isolated meiocytes, RNA-Seq, bioinformatic and statistical analysis pipelines was developed. By analyzing the studied genes that have a meiosis function, a pipeline for identifying meiosis-specific genes has been defined. More than 1,000 genes that are specifically or preferentially expressed in meiocytes have been identified as candidate meiosis-specific genes. A group of 55 genes that have mitochondrial genome origins and a significant number of transposable element (TE) genes (1,036) were also found to have up-regulated expression levels in meiocytes. Conclusion These findings advance our understanding of meiotic genes, gene expression and regulation, especially the transcript profiles of MGI genes and TE genes, and provide a framework for functional analysis of genes in meiosis.
DNA microarrays and cell cycle synchronization experiments have made possible the study of the mechanisms of cell cycle regulation of Saccharomyces cerevisiae by simultaneously monitoring the expression levels of thousands of genes at specific time points. On the other hand, pattern recognition techniques can contribute to the analysis of such massive measurements, providing a model of gene expression level evolution through the cell cycle process. In this paper, we propose the use of one of such techniques –an unsupervised artificial neural network called a Self-Organizing Map (SOM)–which has been successfully applied to processes involving very noisy signals, classifying and organizing them, and assisting in the discovery of behavior patterns without requiring prior knowledge about the process under analysis. As a test bed for the use of SOMs in finding possible relationships among genes and their possible contribution in some biological processes, we selected 282 S. cerevisiae genes that have been shown through biological experiments to have an activity during the cell cycle. The expression level of these genes was analyzed in five of the most cited time series DNA microarray databases used in the study of the cell cycle of this organism. With the use of SOM, it was possible to find clusters of genes with similar behavior in the five databases along two cell cycles. This result suggested that some of these genes might be biologically related or might have a regulatory relationship, as was corroborated by comparing some of the clusters obtained with SOMs against a previously reported regulatory network that was generated using biological knowledge, such as protein-protein interactions, gene expression levels, metabolism dynamics, promoter binding, and modification, regulation and transport of proteins. The methodology described in this paper could be applied to the study of gene relationships of other biological processes in different organisms.
Chavez-Alvarez, Rocio; Chavoya, Arturo; Mendez-Vazquez, Andres
Project ARCHIMEDES was designed in cooperation with local teachers to enhance concept understanding of teachers of physics and physical sciences, to increase use of electronics and computers in the classroom, and to introduce research on students' misconceptions in physics, teaching methods for identifying and remediating misconceptions, and ways…
Lea, Suzanne M.
A gene-specific, metagenomic PCR method has led to the discovery of a novel esterase subfamily consisting of five homologous members. Sequence analysis of this esterase subfamily, named the ArmEst subfamily, revealed a unique conserved pattern with a significant variable interior sequence flanked by two symmetric and identical long arm sequences. The two homologous long arm sequences had 100 % sequence identity and symmetry at both ends between the five members of this esterase class, but only 17-58 % identity was shared for the internal sequence. The biochemical properties of two of the ArmEst esterases definitively demonstrated that they are true active esterases rather than pseudogenes. This is the first report presenting an esterase subfamily containing a unique arm sequence, indicating a rare homologous recombination occurring in the coding area of a functional gene to generate their functional diversity. PMID:23881330
Zhang, Ao; Zhao, Rong; Jin, Peng; Ma, Lifang; Xiong, Xiaolong; Xie, Tian; Pei, Xiaolin; Yu, Li; Yin, Xiaopu; Wang, Qiuyan
We report the construction and characterization of a normalized cDNA library from the digestive gland of the marine bivalve Nodipecten nodosus, a commercially valuable tropical scallop. A total of 288 clones were sequenced, and 250 unique sequences were obtained. The cDNA library showed a small sequence redundancy (2.3%) and high numbers of recombinant (99.9%) and independent clones (2.0 × 10(6) cfu), indicating that the cDNA library generated in this study is a profitable resource for efficient gene discovery for N. nodosus. EST functional annotation by Gene Ontology term assignment revealed the identification of sequences potentially involved in aquaculture and ecotoxicology relevant processes such as apoptosis, growth, lipid metabolism, reproduction, development, response to stress and immunity. PMID:23669241
Americo, Juliana Alves; Dondero, Francesco; Moraes, Milton Ozório; Allodi, Silvana; de Freitas Rebelo, Mauro
The Berkeley Drosophila Genome Project (BDGP) strives to disrupt each Drosophila gene by the insertion of a single transposable element. As part of this effort, transposons in more than 30,000 fly strains were localized and analyzed relative to predicted Drosophila gene structures. Approximately 6,300 lines that maximize genomic coverage were selected to be sent to the Bloomington Stock Center for public distribution, bringing the size of the BDGP gene disruption collection to 7,140 lines. It now includes individual lines predicted to disrupt 5,362 of the 13,666 currently annotated Drosophila genes (39 percent). Other lines contain an insertion at least 2 kb from others in the collection and likely mutate additional incompletely annotated or uncharacterized genes and chromosomal regulatory elements. The remaining strains contain insertions likely to disrupt alternative gene promoters or to allow gene mis-expression. The expanded BDGP gene disruption collection provides a public resource that will facilitate the application of Drosophila genetics to diverse biological problems. Finally, the project reveals new insight into how transposons interact with a eukaryotic genome and helps define optimal strategies for using insertional mutagenesis as a genomic tool.
Bellen, Hugo J.; Levis, Robert W.; Liao, Guochun; He, Yuchun; Carlson, Joseph W.; Tsang, Garson; Evans-Holm, Martha; Hiesinger, P. Robin; Schulze, Karen L.; Rubin, Gerald M.; Hoskins, Roger A.; Spradling, Allan C.
The Berkeley Drosophila Genome Project (BDGP) strives to disrupt each Drosophila gene by the insertion of a single transposable element. As part of this effort, transposons in >30,000 fly strains were localized and analyzed relative to predicted Drosophila gene structures. Approximately 6300 lines that maximize genomic coverage were selected to be sent to the Bloomington Stock Center for public distribution, bringing the size of the BDGP gene disruption collection to 7140 lines. It now includes individual lines predicted to disrupt 5362 of the 13,666 currently annotated Drosophila genes (39%). Other lines contain an insertion at least 2 kb from others in the collection and likely mutate additional incompletely annotated or uncharacterized genes and chromosomal regulatory elements. The remaining strains contain insertions likely to disrupt alternative gene promoters or to allow gene misexpression. The expanded BDGP gene disruption collection provides a public resource that will facilitate the application of Drosophila genetics to diverse biological problems. Finally, the project reveals new insight into how transposons interact with a eukaryotic genome and helps define optimal strategies for using insertional mutagenesis as a genomic tool.
Bellen, Hugo J; Levis, Robert W; Liao, Guochun; He, Yuchun; Carlson, Joseph W; Tsang, Garson; Evans-Holm, Martha; Hiesinger, P Robin; Schulze, Karen L; Rubin, Gerald M; Hoskins, Roger A; Spradling, Allan C
TILLING (Targeting Induced Local Lesions IN Genomes) is a powerful reverse genetic technique that employs a mismatch-specific endonuclease to discover induced point mutations in genes of interest. The use of the TILLING technique to survey natural variation in genes is called Ecotilling. We report an adaptation of Ecotilling for rapid detection of single-nucleotide mutations in the acetolactate synthase (ALS) genes
Guang-Xi Wang; Mui-Keng Tan; Sujay Rakshit; Hiromasa Saitoh; Ryohei Terauchi; Toshiyuki Imaizumi; Takanori Ohsako; Tohru Tominaga
Microarrays are an effective tool for monitoring genome-wide gene expression levels. In current microarray analyses, the majority of genes on arrays are frequently eliminated for further analysis because the changes in their expression levels (ratios) are considered to be not significant. This strategy risks failure to discover whole sets of genes related to a quantitative trait of interest, which is
Kentaro Yano; Kazuhide Imai; Akifumi Shimizu; Takao Hanashita
Musa (banana and plantain) is an important genus for the global export market and in local markets where it provides staple food\\u000a for approximately 400 million people. Hybridization and polyploidization of several (sub)species, combined with vegetative\\u000a propagation and human selection have produced a complex genetic history. We describe the application of the Ecotilling method\\u000a for the discovery and characterization of nucleotide
Bradley J. Till; Joanna Jankowicz-Cieslak; László Sági; Owen A. Huynh; Hiroe Utsushi; Rony Swennen; Ryohei Terauchi
H-InvDB (http://www.h-invitational.jp/) is a comprehensive human gene database started in 2004. In the latest version, H-InvDB 8.0, a total of 244 709 human complementary DNA was mapped onto the hg19 reference genome and 43 829 gene loci, including nonprotein-coding ones, were identified. Of these loci, 35 631 were identified as potential protein-coding genes, and 22 898 of these were identical to known genes. In our analysis, 19 309 annotated genes were specific to H-InvDB and not found in RefSeq and Ensembl. In fact, 233 genes of the 19 309 turned out to have protein functions in this version of H-InvDB; they were annotated as unknown protein functions in the previous version. Furthermore, 11 genes were identified as known Mendelian disorder genes. It is advantageous that many biologically functional genes are hidden in the H-InvDB unique genes. As large-scale proteomic projects have been conducted to elucidate the functions of all human proteins, we have enhanced the proteomic information with an advanced protein view and new subdatabase of protein complexes (Protein Complex Database with quality index). We propose that H-InvDB is an important resource for finding novel candidate targets for medical care and drug development.
Takeda, Jun-ichi; Yamasaki, Chisato; Murakami, Katsuhiko; Nagai, Yoko; Sera, Miho; Hara, Yuichiro; Obi, Nobuo; Habara, Takuya; Gojobori, Takashi; Imanishi, Tadashi
Cell autolysis plays important physiological roles in the life cycle of clostridial cells. Understanding the genetic basis of the autolysis phenomenon of pathogenic Clostridium or solvent producing Clostridium cells might provide new insights into this important species. Genes that might be involved in autolysis of Clostridium acetobutylicum, a model clostridial species, were investigated in this study. Twelve putative autolysin genes were predicted in C. acetobutylicum DSM 1731 genome through bioinformatics analysis. Of these 12 genes, gene SMB_G3117 was selected for testing the in tracellular autolysin activity, growth profile, viable cell numbers, and cellular morphology. We found that overexpression of SMB_G3117 gene led to earlier ceased growth, significantly increased number of dead cells, and clear electrolucent cavities, while disruption of SMB_G3117 gene exhibited remarkably reduced intracellular autolysin activity. These results indicate that SMB_G3117 is a novel gene involved in cellular autolysis of C. acetobutylicum. PMID:23702687
Yang, Liejian; Bao, Guanhui; Zhu, Yan; Dong, Hongjun; Zhang, Yanping; Li, Yin
We report on the discovery of Cepheids in the field spiral galaxy NGC3621, based on observations made with the Wide Field and Planetary Camera 2 on board the Hubble Space Telescope (HST). NGC 3621 is one of 18 galaxies observed as part of the HST Key Project on the Extragalctic Distance Scale, which aims to measure the Hubble Constant to 10 percent accuracy.
Rawson, D. M.; Mould, J. R.; Macri, L. M.; Huchra, J. P.; Kennicutt, R. C.; Harding, P.; Freedman, W. L.; Hill, R. J.; Phelps, R. L.; Madore, B. F.; Silbermann, N. A.; Graham, J. A.; Ferrarese, L.; Ford, H. C.; Illingworth, G. D.; Hoessel, J. G.; Han, M.; Hughes, S. M.; Saha, A.; Stetson, P. B.
Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped >2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.
Hwang, TaeHyun; Atluri, Gowtham; Xie, MaoQiang; Dey, Sanjoy; Hong, Changjin; Kumar, Vipin; Kuang, Rui
Over 900 genes have been annotated within duplicated regions of the human genome, yet their functions and potential roles in disease remain largely unknown. One major obstacle has been our inability to accurately and comprehensively assay genetic variation for these genes in a high-throughput manner. We developed a sequencing-based method for rapid and high-throughput genotyping of duplicated genes using molecular inversion probes designed to unique paralogous sequence variants. We apply this method to genotype all members of two gene families, SRGAP2 and RH, among a diversity panel of 1,056 humans. The approach can accurately distinguish copy number in paralogs having up to ?99.6% sequence identity, identify small gene-disruptive deletions, detect single nucleotide variants, define breakpoints of unequal crossover, and discover regions of interlocus gene conversion. Our analysis of SRGAP2 suggests that nonreciprocal genetic exchange akin to interlocus gene conversion can occur over long distances (> 80 Mbp) between paralogs. The ability to rapidly and accurately genotype multiple gene families in thousands of individuals at low cost enables the development of genome-wide gene conversion maps and unlocks many duplicated genes for association with human traits.
Nuttle, Xander; Huddleston, John; O'Roak, Brian J.; Antonacci, Francesca; Fichera, Marco; Romano, Corrado; Shendure, Jay; Eichler, Evan E.
Summary While a few cancer genes are mutated in a high proportion of tumors of a given type (>20%), most are mutated at intermediate frequencies (2–20%). To explore the feasibility of creating a comprehensive catalog of cancer genes, we analyzed somatic point mutations in exome sequence from 4,742 tumor-normal pairs across 21 cancer types. We found that large-scale genomic analysis can identify nearly all known cancer genes in these tumor types. Our analysis also identified 33 genes not previously known to be significantly mutated, including genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis. Down-sampling analysis indicates that larger sample sizes will reveal many more genes, mutated at clinically important frequencies. We estimate that near-saturation may be achieved with 600–5000 samples per tumor type, depending on background mutation rate. The results help guide the next stage of cancer genomics.
Lawrence, Michael S.; Stojanov, Petar; Mermel, Craig H.; Garraway, Levi A.; Golub, Todd R.; Meyerson, Matthew; Gabriel, Stacey B.; Lander, Eric S.; Getz, Gad
The Comparative Toxicogenomics Database (CTD) is a curated database that promotes understanding about the effects of environmental chemicals on human health. Biocurators at CTD manually curate chemical-gene interactions, chemical-disease rela- tionships and gene-disease relationships from the literature. This strategy allows data to be integrated to construct chemical-gene-disease networks. CTD is unique in numerous respects: curation focuses on environmental chemicals; interactions
Allan Peter Davis; Cynthia G. Murphy; Cynthia A. Saraceni-richards; Michael C. Rosenstein; Thomas C. Wiegers; Carolyn J. Mattingly
Scientists at Albert Einstein College of Medicine of Yeshiva University have discovered, in a mouse model of acute myeloid leukemia, that the gene HLX is expressed at abnormally high levels in leukemia stem cells. Gene expression is the process by which a gene synthesizes the molecule that it codes for; an "over-expressed" gene makes its product in abnormally high amounts. These findings suggest that targeting elevated HLX expression may be a promising novel strategy for treating AML. The Albert Einstein College of Medicine is home to the Albert Einstein Cancer Center.
We implemented least absolute shrinkage and selection operator (LASSO) regression to evaluate gene effects in genome-wide association studies (GWAS) of brain images, using an MRI-derived temporal lobe volume measure from 729 subjects scanned as part of the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Sparse groups of SNPs in individual genes were selected by LASSO, which identifies efficient sets of variants influencing the data. These SNPs were considered jointly when assessing their association with neuroimaging measures. We discovered 22 genes that passed genome-wide significance for influencing temporal lobe volume. This was a substantially greater number of significant genes compared to those found with standard, univariate GWAS. These top genes are all expressed in the brain and include genes previously related to brain function or neuropsychiatric disorders such as MACROD2, SORCS2, GRIN2B, MAGI2, NPAS3, CLSTN2, GABRG3, NRXN3, PRKAG2, GAS7, RBFOX1, ADARB2, CHD4, and CDH13. The top genes we identified with this method also displayed significant and widespread post hoc effects on voxelwise, tensor-based morphometry (TBM) maps of the temporal lobes. The most significantly associated gene was an autism susceptibility gene known as MACROD2. We were able to successfully replicate the effect of the MACROD2 gene in an independent cohort of 564 young, Australian healthy adult twins and siblings scanned with MRI (mean age: 23.8?±?2.2 SD years). Our approach powerfully complements univariate techniques in detecting influences of genes on the living brain.
Kohannim, Omid; Hibar, Derrek P.; Stein, Jason L.; Jahanshad, Neda; Hua, Xue; Rajagopalan, Priya; Toga, Arthur W.; Jack, Clifford R.; Weiner, Michael W.; de Zubicaray, Greig I.; McMahon, Katie L.; Hansell, Narelle K.; Martin, Nicholas G.; Wright, Margaret J.; Thompson, Paul M.
The Human Genome Project (HGP), a $437 million effort that began in 1990 to chart the chemical sequence of our three billion base pairs of DNA, was completed in 2003, marking the 50th anniversary that proved the definitive structure of the molecule. This study considered how dialectical and rhetorical arguments functioned in the science, political, and public forums over a 20-year period, from 1980 to 2000, to advance human genome research and to establish the official project. I argue that Aristotle's continuum of knowledge--which ranges from the probable on one end to certified or demonstrated knowledge on the other--provides useful distinctions for analyzing scientific reasoning. While contemporary scientific research seeks to discover certified knowledge, investigators generally employ the hypothetico-deductive or scientific method, which often yields probable rather than certain findings, making these dialectical in nature. Analysis of the discourse describing human genome research revealed the use of numerous rhetorical figures and topics. Persuasive and probable reasoning were necessary for scientists to characterize unknown genetic phenomena, to secure interest in and funding for large-scale human genome research, to solve scientific problems, to issue probable findings, to convince colleagues and government officials that the findings were sound and to disseminate information to the public. Both government and private venture scientists drew on these tools of reasoning to promote their methods of mapping and sequencing the genome. The debate over how to carry out sequencing was rooted in conflicting values. Scientists representing the academic tradition valued a more conservative method that would establish high quality results, and those supporting private industry valued an unconventional approach that would yield products and profits more quickly. Values in turn influenced political and public forum arguments. Agency representatives and investors sided with the approach that reflected values they supported. Fascinated with this controversy and the convincing comparisons, the media often endorsed Celera's work for its efficiency. The analysis of discourse from the science, political, and public forums revealed that value systems influenced the accuracy and quality of the arguments more than the type or number of figures used to describe the research to various audiences.
Robidoux, Charlotte A.
NGC 2090 is a highly resolved ScII galaxy with an inclination that makes it suitable for Tully-Fisher (TF) studies. The high probability of resolving individual Cepheid variable stars, and hence the prospect of refining the calibration of the TF relation, made NGC 2090 an ideal target for observations as part of the HST Extragalactic Distance Scale Key Project. The HST observing sequence for NGC2090, incorporating 13 separate visits, has been completed. Thirteen F555W (V) and 5 F814 (I) cosmic-ray split images were obtained. In this paper we report on the detection of 34 Cepheid variables in NGC 2090, with periods ranging between 5 < P < 58 days. Apparent V and I period-luminosity relations have been constructed, assuming a fiducial distance modulus, mu_o = 18.50 +/- 0.10 mag and a reddening, E(B-V) = 0.10 mag for the Large Magellanic Cloud. A true distance modulus of 30.41 +/- 0.05 mag is found for NGC 2090, corresponding to a distance of 12.1 +/- 0.3 Mpc. The apparent moduli are found to be mu_I = 30.55 +/- 0.04 mag and mu_v = 30.66 +/- 0.05 mag, with an inferred reddening E(V-I) = 0.11 mag.
Sakai, S.; Phelps, R. L.; Freedman, W. L.; Madore, B.; Saha, A.; Stetson, P. B.; Kennicutt, R. C.; Mould, J. R.; Ferrarese, L.; Ford, H. C.; Graham, J. A.; Han, M.; Hoessel, J. G.; Huchra, J. P.; Hughes, S. M. G.; Illingworth, G. D.; Silbermann, N.
Asthma and atopy are complex phenotypes that are influenced by both genetic and environmental factors. A review of nearly 500 papers on disease association studies identified 25 genes that have been associated with an asthma or atopy phenotype in six or more populations. An additional 54 genes have been associated in 2–5 populations. Here, we discuss the methods that have
C Ober; S Hoffjan
Treatment of pediatric acute lymphoblastic leukemia (ALL) is based on the concept of tailoring the intensity of therapy to a patient's risk of relapse. To determine whether gene expression profiling could enhance risk assignment, we used oligonucleotide microarrays to analyze the pattern of genes expressed in leukemic blasts from 360 pediatric ALL patients. Distinct expression profiles identified each of the
Eng-Juh Yeoh; Mary E. Ross; Sheila A. Shurtleff; W. Kent Williams; Divyen Patel; Rami Mahfouz; Fred G. Behm; Susana C. Raimondi; Mary V. Relling; Anami Patel; Cheng Cheng; Dario Campana; Dawn Wilkins; Xiaodong Zhou; Jinyan Li; Huiqing Liu; Ching-Hon Pui; William E Evans; Clayton Naeve; Limsoon Wong; James R Downing
Recent genome-wide studies on nucleosome positioning in model organisms have shown strong evidence that nucleosome landscapes in the proximity of protein-coding genes exhibit regular characteristic patterns. Here, we propose a computational framework to discover novel genes in the human malaria parasite genome P. falciparum using nucleosome positioning inferred from MAINE-seq data. We rely on a classifier trained on the nucleosome landscape profiles of experimentally verified genes, and then used to discover new genes (without considering the primary DNA sequence). Cross-validation experiments show that our classifier is very accurate. About two thirds of the locations reported by the classifier match experimentally determined expressed sequence tags in GenBank, for which no gene has been annotated in the human malaria parasite.
Pokhriyal, N.; Ponts, N.; Harris, E. Y.; Le Roch, K. G.; Lonardi, S.
To identify gene expression changes along progression of bladder cancer, we compared the expression profiles of early-stage and advanced bladder tumors using cDNA microarrays containing 17,842 known genes and expressed sequence tags. The application of bootstrapping techniques to hierarchical clustering segregated early-stage and invasive transitional carcinomas into two main clusters. Multidimensional analysis confirmed these clusters and more importantly, it separated carcinoma in situ from papillary superficial lesions and subgroups within early-stage and invasive tumors displaying different overall survival. Additionally, it recognized early-stage tumors showing gene profiles similar to invasive disease. Different techniques including standard t-test, single-gene logistic regression, and support vector machine algorithms were applied to identify relevant genes involved in bladder cancer progression. Cytokeratin 20, neuropilin-2, p21, and p33ING1 were selected among the top ranked molecular targets differentially expressed and validated by immunohistochemistry using tissue microarrays (n = 173). Their expression patterns were significantly associated with pathological stage, tumor grade, and altered retinoblastoma (RB) expression. Moreover, p33ING1 expression levels were significantly associated with overall survival. Analysis of the annotation of the most significant genes revealed the relevance of critical genes and pathways during bladder cancer progression, including the overexpression of oncogenic genes such as DEK in superficial tumors or immune response genes such as Cd86 antigen in invasive disease. Gene profiling successfully classified bladder tumors based on their progression and clinical outcome. The present study has identified molecular biomarkers of potential clinical significance and critical molecular targets associated with bladder cancer progression. PMID:12875971
Sanchez-Carbayo, Marta; Socci, Nicholas D; Lozano, Juan Jose; Li, Wentian; Charytonowicz, Elizabeth; Belbin, Thomas J; Prystowsky, Michael B; Ortiz, Angel R; Childs, Geoffrey; Cordon-Cardo, Carlos
Background Estrogens and their receptors are important in human development, physiology and disease. In this study, we utilized an integrated genome-wide molecular and computational approach to characterize the interaction between the activated estrogen receptor (ER) and the regulatory elements of candidate target genes. Results Of around 19,000 genes surveyed in this study, we observed 137 ER-regulated genes in T-47D cells, of which only 89 were direct target genes. Meta-analysis of heterogeneous in vitro and in vivo datasets showed that the expression profiles in T-47D and MCF-7 cells are remarkably similar and overlap with genes differentially expressed between ER-positive and ER-negative tumors. Computational analysis revealed a significant enrichment of putative estrogen response elements (EREs) in the cis-regulatory regions of direct target genes. Chromatin immunoprecipitation confirmed ligand-dependent ER binding at the computationally predicted EREs in our highest ranked ER direct target genes, NRIP1, GREB1 and ABCA3. Wider examination of the cis-regulatory regions flanking the transcriptional start sites showed species conservation in mouse-human comparisons in only 6% of predicted EREs. Conclusions Only a small core set of human genes, validated across experimental systems and closely associated with ER status in breast tumors, appear to be sufficient to induce ER effects in breast cancer cells. That cis-regulatory regions of these core ER target genes are poorly conserved suggests that different evolutionary mechanisms are operative at transcriptional control elements than at coding regions. These results predict that certain biological effects of estrogen signaling will differ between mouse and human to a larger extent than previously thought.
Lin, Chin-Yo; Strom, Anders; Vega, Vinsensius Berlian; Li Kong, Say; Li Yeo, Ai; Thomsen, Jane S; Chan, Wan Ching; Doray, Balraj; Bangarusamy, Dhinoth K; Ramasamy, Adaikalavan; Vergara, Liza A; Tang, Suisheng; Chong, Allen; Bajic, Vladimir B; Miller, Lance D; Gustafsson, Jan-Ake; Liu, Edison T
Background Grosmannia clavigera is a bark beetle-vectored fungal pathogen of pines that causes wood discoloration and may kill trees by disrupting nutrient and water transport. Trees respond to attacks from beetles and associated fungi by releasing terpenoid and phenolic defense compounds. It is unclear which genes are important for G. clavigera's ability to overcome antifungal pine terpenoids and phenolics. Results We constructed seven cDNA libraries from eight G. clavigera isolates grown under various culture conditions, and Sanger sequenced the 5' and 3' ends of 25,000 cDNA clones, resulting in 44,288 high quality ESTs. The assembled dataset of unique transcripts (unigenes) consists of 6,265 contigs and 2,459 singletons that mapped to 6,467 locations on the G. clavigera reference genome, representing ~70% of the predicted G. clavigera genes. Although only 54% of the unigenes matched characterized proteins at the NCBI database, this dataset extensively covers major metabolic pathways, cellular processes, and genes necessary for response to environmental stimuli and genetic information processing. Furthermore, we identified genes expressed in spores prior to germination, and genes involved in response to treatment with lodgepole pine phloem extract (LPPE). Conclusions We provide a comprehensively annotated EST dataset for G. clavigera that represents a rich resource for gene characterization in this and other ophiostomatoid fungi. Genes expressed in response to LPPE treatment are indicative of fungal oxidative stress response. We identified two clusters of potentially functionally related genes responsive to LPPE treatment. Furthermore, we report a simple method for identifying contig misassemblies in de novo assembled EST collections caused by gene overlap on the genome.
Background Drug discovery and chemical biology are exceedingly complex and demanding enterprises. In recent years there are been increasing awareness about the importance of predicting/optimizing the absorption, distribution, metabolism, excretion and toxicity (ADMET) properties of small chemical compounds along the search process rather than at the final stages. Fast methods for evaluating ADMET properties of small molecules often involve applying a set of simple empirical rules (educated guesses) and as such, compound collections' property profiling can be performed in silico. Clearly, these rules cannot assess the full complexity of the human body but can provide valuable information and assist decision-making. Results This paper presents FAF-Drugs2, a free adaptable tool for ADMET filtering of electronic compound collections. FAF-Drugs2 is a command line utility program (e.g., written in Python) based on the open source chemistry toolkit OpenBabel, which performs various physicochemical calculations, identifies key functional groups, some toxic and unstable molecules/functional groups. In addition to filtered collections, FAF-Drugs2 can provide, via Gnuplot, several distribution diagrams of major physicochemical properties of the screened compound libraries. Conclusion We have developed FAF-Drugs2 to facilitate compound collection preparation, prior to (or after) experimental screening or virtual screening computations. Users can select to apply various filtering thresholds and add rules as needed for a given project. As it stands, FAF-Drugs2 implements numerous filtering rules (23 physicochemical rules and 204 substructure searching rules) that can be easily tuned.
Lagorce, David; Sperandio, Olivier; Galons, Herve; Miteva, Maria A; Villoutreix, Bruno O
Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of human biomedical science. Many such classifiers discovered thus far lack vigorous statistical and experimental validations, with their stability and rel...
Many genes and biological processes function in similar ways across different species. Cross-species gene expression analysis, as a powerful tool to characterize the dynamical properties of the cell, has found a number of applications, such as identifying a conserved core set of cell cycle genes. However, to the best of our knowledge, there is limited effort on developing appropriate techniques to capture the causality relations between genes from time-series microarray data across species. In this paper, we present hidden Markov random field regression with L(1) penalty to uncover the regulatory network structure for different species. The algorithm provides a framework for sharing information across species via hidden component graphs and is able to incorporate domain knowledge across species easily. We demonstrate our method on two synthetic datasets and apply it to discover causal graphs from innate immune response data. PMID:21523930
Liu, Yan; Niculescu-Mizil, Alexandru; Lozano, Aurélie; Lu, Yong
Benzylisoquinoline alkaloids (BIAs) are a large, diverse group of ?2500 specialized plant metabolites. Many BIAs display potent pharmacological activities, including the narcotic analgesics codeine and morphine, the vasodilator papaverine, the cough suppressant and potential anticancer drug noscapine, the antimicrobial agents sanguinarine and berberine, and the muscle relaxant (+)-tubocurarine. Opium poppy remains the sole commercial source for codeine, morphine, and a variety of semisynthetic drugs, including oxycodone and buprenorphine, derived primarily from the biosynthetic pathway intermediate thebaine. Recent advances in transcriptomics, proteomics, and metabolomics have created unprecedented opportunities for isolating and characterizing novel BIA biosynthetic genes. Here, we describe the application of next-generation sequencing and cDNA microarrays for selecting gene candidates based on comparative transcriptome analysis. We outline the basic mass spectrometric techniques to perform deep proteome and targeted metabolite analyses on BIA-producing plant tissues and provide methodologies for functionally characterizing biosynthetic gene candidates through in vitro enzyme assays and transient gene silencing in planta. PMID:22999177
Dang, Thu Thuy T; Onoyovwi, Akpevwe; Farrow, Scott C; Facchini, Peter J
Autism is an extremely common and heterogeneous neurodevelopmental disorder. While genetic factors are known to play a critical role in the etiologies of autism, the underlying genes and mechanisms remain unknown in approximately 70-75% of cases. Advances...
E. Hansen G. E. Herman M. B. Dewitt R. Smith W. Sadee
About 15 percent of cases of an aggressive, difficult-to-detect form of ovarian cancer contain a unique fusion between two neighboring, normally separate genes, say researchers at the Stanford University School of Medicine.
BackgroundPromoter and 5? end methylation regulation of tumour suppressor genes is a common feature of many cancers. Such occurrences often lead to the silencing of these key genes and thus they may contribute to the development of cancer, including prostate cancer.Methodology\\/Principal FindingsIn order to identify methylation changes in prostate cancer, we performed a genome-wide analysis of DNA methylation using Agilent
Ken Kron; Vaijayanti Pethe; Laurent Briollais; Bekim Sadikovic; Hilmi Ozcelik; Alia Sunderji; Vasundara Venkateswaran; Jehonathan Pinthus; Neil Fleshner; Theodorus van der Kwast; Bharati Bapat; Mikhail V. Blagosklonny
To identify gene expression changes along progres- sion of bladder cancer, we compared the expression profiles of early-stage and advanced bladder tumors using cDNA microarrays containing 17,842 known genes and expressed sequence tags. The application of bootstrapping techniques to hierarchical cluster- ing segregated early-stage and invasive transitional carcinomas into two main clusters. Multidimensional analysis confirmed these clusters and more impor-
Marta Sanchez-Carbayo; Nicholas D. Socci; Juan Jose Lozano; Wentian Li; Elizabeth Charytonowicz; Thomas J. Belbin; Michael B. Prystowsky; Angel R. Ortiz; Geoffrey Childs; Carlos Cordon-Cardo
The advent of DNA microarray technology has offered the promise of casting new insights onto deciphering secrets of life by\\u000a monitoring activities of thousands of genes simultaneously. Current analyses of microarray data focus on precise classification\\u000a of biological types, for example, tumor versus normal tissues. A further scientific challenging task is to extract disease-relevant\\u000a genes from the bewildering amounts of
Xia Li; Shaoqi Rao; Tianwen Zhang; Zheng Guo; Qingpu Zhang; Kathy L. Moser; Eric J. Topol
The exact biochemical steps of xylan backbone synthesis remain elusive. In Arabidopsis, three non-redundant genes from two glycosyltransferase (GT) families, IRX9 and IRX14 from GT43 and IRX10 from GT47, are candidates for forming the xylan backbone. In other plants, evidence exists that different tissues express these three genes at widely different levels, which suggests that diversity in the makeup of the xylan synthase complex exists. Recently we have profiled the transcripts present in the developing mucilaginous tissue of psyllium (Plantago ovata Forsk). This tissue was found to have high expression levels of an IRX10 homolog, but very low levels of the two GT43 family members. This contrasts with recent wheat endosperm tissue profiling that found a relatively high abundance of the GT43 family members. We have performed an in-depth analysis of all GTs genes expressed in four developmental stages of the psyllium mucilagenous layer and in a single stage of the psyllium stem using RNA-Seq. This analysis revealed several IRX10 homologs, an expansion in GT61 (homologs of At3g18170/At3g18180), and several GTs from other GT families that are highly abundant and specifically expressed in the mucilaginous tissue. Our current hypothesis is that the four IRX10 genes present in the mucilagenous tissues have evolved to function without the GT43 genes. These four genes represent some of the most divergent IRX10 genes identified to date. Conversely, those present in the psyllium stem are very similar to those in other eudicots. This suggests these genes are under selective pressure, likely due to the synthesis of the various xylan structures present in mucilage that has a different biochemical role than that present in secondary walls. The numerous GT61 family members also show a wide sequence diversity and may be responsible for the larger number of side chain structures present in the psyllium mucilage. PMID:23761806
Jensen, Jacob K; Johnson, Nathan; Wilkerson, Curtis G
The exact biochemical steps of xylan backbone synthesis remain elusive. In Arabidopsis, three non-redundant genes from two glycosyltransferase (GT) families, IRX9 and IRX14 from GT43 and IRX10 from GT47, are candidates for forming the xylan backbone. In other plants, evidence exists that different tissues express these three genes at widely different levels, which suggests that diversity in the makeup of the xylan synthase complex exists. Recently we have profiled the transcripts present in the developing mucilaginous tissue of psyllium (Plantago ovata Forsk). This tissue was found to have high expression levels of an IRX10 homolog, but very low levels of the two GT43 family members. This contrasts with recent wheat endosperm tissue profiling that found a relatively high abundance of the GT43 family members. We have performed an in-depth analysis of all GTs genes expressed in four developmental stages of the psyllium mucilagenous layer and in a single stage of the psyllium stem using RNA-Seq. This analysis revealed several IRX10 homologs, an expansion in GT61 (homologs of At3g18170/At3g18180), and several GTs from other GT families that are highly abundant and specifically expressed in the mucilaginous tissue. Our current hypothesis is that the four IRX10 genes present in the mucilagenous tissues have evolved to function without the GT43 genes. These four genes represent some of the most divergent IRX10 genes identified to date. Conversely, those present in the psyllium stem are very similar to those in other eudicots. This suggests these genes are under selective pressure, likely due to the synthesis of the various xylan structures present in mucilage that has a different biochemical role than that present in secondary walls. The numerous GT61 family members also show a wide sequence diversity and may be responsible for the larger number of side chain structures present in the psyllium mucilage.
Jensen, Jacob K.; Johnson, Nathan; Wilkerson, Curtis G.
Auricularia polytricha (Mont.) Sacc., a type of edible black-brown mushroom with a gelatinous and modality-specific fruiting body, is in high demand in Asia due to its nutritional and medicinal properties. Illumina Solexa sequenceing technology was used to generate very large transcript sequences from the mycelium and the mature fruiting body of A. polytricha for gene discovery and molecular marker development. De novo assembly generated 36,483 ESTs with an N50 length of 636 bp. A total of 28,108 ESTs demonstrated significant hits with known proteins in the nr database, and 94.03% of the annotated ESTs showed the greatest similarity to A. delicata, a related species of A. polytricha. Functional categorization of the Gene Ontology (GO), Clusters of Orthologous Groups (COG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathways revealed the conservation of genes involved in various biological processes in A. polytricha. Gene expression profile analysis indicated that a total of 2,057 ESTs were differentially expressed, including 1,020 ESTs that were up-regulated in the mycelium and 1,037 up-regulated in the fruiting body. Functional enrichment showed that the ESTs associated with biosynthesis, metabolism and assembly of proteins were more active in fruiting body development. The expression patterns of homologous transcription factors indicated that the molecular mechanisms of fruiting body formation and development were not exactly the same as for other agarics. Interestingly, an EST encoding tyrosinase was significantly up-regulated in the fruiting body, indicating that melanins accumulated during the processes of the formation of the black-brown color of the fruiting body in A. polytricha development. In addition, a total of 1,715 potential SSRs were detected in this transcriptome. The transcriptome analysis of A. polytricha provides valuable sequence resources and numerous molecular markers to facilitate further functional genomics studies and genetic researches on this fungus.
Zhou, Yan; Chen, Lianfu; Fan, Xiuzhi; Bian, Yinbing
Understanding the control of any trait optimally requires the detection of causal genes, gene interaction, and mechanism of action to discover and model the biochemical pathways underlying the expressed phenotype. Functional genomics techniques, including RNA expression profiling via microarray and high-throughput DNA sequencing, allow for the precise genome localization of biological information. Powerful genetic approaches, including quantitative trait locus (QTL) and genome-wide association study mapping, link phenotype with genome positions, yet genetics is less precise in localizing the relevant mechanistic information encoded in DNA. The coupling of salient functional genomic signals with genetically mapped positions is an appealing approach to discover meaningful gene-phenotype relationships. Techniques used to define this genetic-genomic convergence comprise the field of systems genetics. This short review will address an application of systems genetics where RNA profiles are associated with genetically mapped genome positions of individual genes (eQTL mapping) or as gene sets (co-expression network modules). Both approaches can be applied for knowledge independent selection of candidate genes (and possible control mechanisms) underlying complex traits where multiple, likely unlinked, genomic regions might control specific complex traits. PMID:24767114
Feltus, F Alex
The transcription factor Etsrp is required for vasculogenesis and primitive myelopoiesis in zebrafish. When ectopically expressed, etsrp is sufficient to induce the expression of many vascular and myeloid genes in zebrafish. The mammalian homolog of etsrp, ER71/Etv2, is also essential for vascular and hematopoietic development. To identify genes downstream of etsrp, gain-of-function experiments were performed for etsrp in zebrafish embryos followed by transcription profile analysis by microarray. Subsequent in vivo expression studies resulted in the identification of fourteen genes with blood and/or vascular expression, six of these being completely novel. Regulation of these genes by etsrp was confirmed by ectopic induction in etsrp overexpressing embryos and decreased expression in etsrp deficient embryos. Additional functional analysis of two newly discovered genes, hapln1b and sh3gl3, demonstrates their importance in embryonic vascular development. The results described here identify a group of genes downstream of etsrp likely to be critical for vascular and/or myeloid development.
Zhao, Yan; Burgess, Shawn; Lin, Shuo
Background Promoter and 5? end methylation regulation of tumour suppressor genes is a common feature of many cancers. Such occurrences often lead to the silencing of these key genes and thus they may contribute to the development of cancer, including prostate cancer. Methodology/Principal Findings In order to identify methylation changes in prostate cancer, we performed a genome-wide analysis of DNA methylation using Agilent human CpG island arrays. Using computational and gene-specific validation approaches we have identified a large number of potential epigenetic biomarkers of prostate cancer. Further validation of candidate genes on a separate cohort of low and high grade prostate cancers by quantitative MethyLight analysis has allowed us to confirm DNA hypermethylation of HOXD3 and BMP7, two genes that may play a role in the development of high grade tumours. We also show that promoter hypermethylation is responsible for downregulated expression of these genes in the DU-145 PCa cell line. Conclusions/Significance This study identifies novel epigenetic biomarkers of prostate cancer and prostate cancer progression, and provides a global assessment of DNA methylation in prostate cancer.
Briollais, Laurent; Sadikovic, Bekim; Ozcelik, Hilmi; Sunderji, Alia; Venkateswaran, Vasundara; Pinthus, Jehonathan; Fleshner, Neil; van der Kwast, Theodorus; Bapat, Bharati
Antibodies are the fastest-growing segment of the biologics market. The success of antibody-based drugs resides in their exquisite specificity, high potency, stability, solubility, safety, and relatively inexpensive manufacturing process in comparison with other biologics. We outline here the structural studies and fundamental principles that define how antibodies interact with diverse targets. We also describe the antibody repertoires and affinity maturation mechanisms of humans, mice, and chickens, plus the use of novel single-domain antibodies in camelids and sharks. These species all utilize diverse evolutionary solutions to generate specific and high affinity antibodies and illustrate the plasticity of natural antibody repertoires. In addition, we discuss the multiple variations of man-made antibody repertoires designed and validated in the last two decades, which have served as tools to explore how the size, diversity, and composition of a repertoire impact the antibody discovery process. PMID:23162556
Finlay, William J J; Almagro, Juan C
Antibodies are the fastest-growing segment of the biologics market. The success of antibody-based drugs resides in their exquisite specificity, high potency, stability, solubility, safety, and relatively inexpensive manufacturing process in comparison with other biologics. We outline here the structural studies and fundamental principles that define how antibodies interact with diverse targets. We also describe the antibody repertoires and affinity maturation mechanisms of humans, mice, and chickens, plus the use of novel single-domain antibodies in camelids and sharks. These species all utilize diverse evolutionary solutions to generate specific and high affinity antibodies and illustrate the plasticity of natural antibody repertoires. In addition, we discuss the multiple variations of man-made antibody repertoires designed and validated in the last two decades, which have served as tools to explore how the size, diversity, and composition of a repertoire impact the antibody discovery process.
Finlay, William J. J.; Almagro, Juan C.
A genomics approach was used to identify nutritionally regulated genes involved in growth of fast skeletal muscle in Atlantic salmon (Salmo salar L.). Forward and reverse subtractive cDNA libraries were prepared comparing fish with zero growth rates to fish growing rapidly. We produced 7,420 ESTs and assembled them into nonredundant clusters prior to annotation. Contigs representing 40 potentially unrecognized nutritionally responsive candidate genes were identified. Twenty-three of the subtractive library candidates were also differentially regulated by nutritional state in an independent fasting-refeeding experiment and their expression placed in the context of 26 genes with established roles in muscle growth regulation. The expression of these genes was also determined during the maturation of a primary myocyte culture, identifying 13 candidates from the subtractive cDNA libraries with putative roles in the myogenic program. During early stages of refeeding DNAJA4, HSPA1B, HSP90A, and CHAC1 expression increased, indicating activation of unfolded protein response pathways. Four genes were considered inhibitory to myogenesis based on their in vivo and in vitro expression profiles (CEBPD, ASB2, HSP30, novel transcript GE623928). Other genes showed increased expression with feeding and highest in vitro expression during the proliferative phase of the culture (FOXD1, DRG1) or as cells differentiated (SMYD1, RTN1, MID1IP1, HSP90A, novel transcript GE617747). The genes identified were associated with chromatin modification (SMYD1, RTN1), microtubule stabilization (MID1IP1), cell cycle regulation (FOXD1, CEBPD, DRG1), and negative regulation of signaling (ASB2) and may play a role in the stimulation of myogenesis during the transition from a catabolic to anabolic state in skeletal muscle.
Johnston, Ian A.
High-throughput short-read sequencing of exomes and whole cancer genomes in multiple human hepatocellular carcinoma (HCC) cohorts confirmed previously identified frequently mutated somatic genes, such as TP53, CTNNB1 and AXIN1, and identified several novel genes with moderate mutation frequencies, including ARID1A, ARID2, MLL, MLL2, MLL3, MLL4, IRF2, ATM, CDKN2A, FGF19, PIK3CA, RPS6KA3, JAK1, KEAP1, NFE2L2, C16orf62, LEPR, RAC2, and IL6ST. Functional classification of these mutated genes suggested that alterations in pathways participating in chromatin remodeling, Wnt/?-catenin signaling, JAK/STAT signaling, and oxidative stress play critical roles in HCC tumorigenesis. Nevertheless, because there are few druggable genes used in HCC therapy, the identification of new therapeutic targets through integrated genomic approaches remains an important task. Because a large amount of HCC genomic data genotyped by high density single nucleotide polymorphism arrays is deposited in the public domain, copy number alteration (CNA) analyses of these arrays is a cost-effective way to reveal target genes through profiling of recurrent and overlapping amplicons, homozygous deletions and potentially unbalanced chromosomal translocations accumulated during HCC progression. Moreover, integration of CNAs with other high-throughput genomic data, such as aberrantly coding transcriptomes and non-coding gene expression in human HCC tissues and rodent HCC models, provides lines of evidence that can be used to facilitate the identification of novel HCC target genes with the potential of improving the survival of HCC patients. PMID:24379610
Gu, De-Leung; Chen, Yen-Hsieh; Shih, Jou-Ho; Lin, Chi-Hung; Jou, Yuh-Shan; Chen, Chian-Feng
High-throughput short-read sequencing of exomes and whole cancer genomes in multiple human hepatocellular carcinoma (HCC) cohorts confirmed previously identified frequently mutated somatic genes, such as TP53, CTNNB1 and AXIN1, and identified several novel genes with moderate mutation frequencies, including ARID1A, ARID2, MLL, MLL2, MLL3, MLL4, IRF2, ATM, CDKN2A, FGF19, PIK3CA, RPS6KA3, JAK1, KEAP1, NFE2L2, C16orf62, LEPR, RAC2, and IL6ST. Functional classification of these mutated genes suggested that alterations in pathways participating in chromatin remodeling, Wnt/?-catenin signaling, JAK/STAT signaling, and oxidative stress play critical roles in HCC tumorigenesis. Nevertheless, because there are few druggable genes used in HCC therapy, the identification of new therapeutic targets through integrated genomic approaches remains an important task. Because a large amount of HCC genomic data genotyped by high density single nucleotide polymorphism arrays is deposited in the public domain, copy number alteration (CNA) analyses of these arrays is a cost-effective way to reveal target genes through profiling of recurrent and overlapping amplicons, homozygous deletions and potentially unbalanced chromosomal translocations accumulated during HCC progression. Moreover, integration of CNAs with other high-throughput genomic data, such as aberrantly coding transcriptomes and non-coding gene expression in human HCC tissues and rodent HCC models, provides lines of evidence that can be used to facilitate the identification of novel HCC target genes with the potential of improving the survival of HCC patients.
Gu, De-Leung; Chen, Yen-Hsieh; Shih, Jou-Ho; Lin, Chi-Hung; Jou, Yuh-Shan; Chen, Chian-Feng
To pursue a systematic approach to discovery of mechanisms of action of traditional Chinese medicine (TCM), we used microarrays, bioinformatics and the “Connectivity Map” (CMAP) to examine TCM-induced changes in gene expression. We demonstrated that this approach can be used to elucidate new molecular targets using a model TCM herbal formula Si-Wu-Tang (SWT) which is widely used for women's health. The human breast cancer MCF-7 cells treated with 0.1 µM estradiol or 2.56 mg/ml of SWT showed dramatic gene expression changes, while no significant change was detected for ferulic acid, a known bioactive compound of SWT. Pathway analysis using differentially expressed genes related to the treatment effect identified that expression of genes in the nuclear factor erythroid 2-related factor 2 (Nrf2) cytoprotective pathway was most significantly affected by SWT, but not by estradiol or ferulic acid. The Nrf2-regulated genes HMOX1, GCLC, GCLM, SLC7A11 and NQO1 were upreguated by SWT in a dose-dependent manner, which was validated by real-time RT-PCR. Consistently, treatment with SWT and its four herbal ingredients resulted in an increased antioxidant response element (ARE)-luciferase reporter activity in MCF-7 and HEK293 cells. Furthermore, the gene expression profile of differentially expressed genes related to SWT treatment was used to compare with those of 1,309 compounds in the CMAP database. The CMAP profiles of estradiol-treated MCF-7 cells showed an excellent match with SWT treatment, consistent with SWT's widely claimed use for women's diseases and indicating a phytoestrogenic effect. The CMAP profiles of chemopreventive agents withaferin A and resveratrol also showed high similarity to the profiles of SWT. This study identified SWT as an Nrf2 activator and phytoestrogen, suggesting its use as a nontoxic chemopreventive agent, and demonstrated the feasibility of combining microarray gene expression profiling with CMAP mining to discover mechanisms of actions and to identify new health benefits of TCMs.
Wen, Zhining; Wang, Zhijun; Wang, Steven; Ravula, Ranadheer; Yang, Lun; Xu, Jun; Wang, Charles; Zuo, Zhong; Chow, Moses S. S.; Shi, Leming; Huang, Ying
Background MicroRNAs (miRNAs) are small non-coding RNAs which play a key role in the post-transcriptional regulation of many genes. Elucidating miRNA-regulated gene networks is crucial for the understanding of mechanisms and functions of miRNAs in many biological processes, such as cell proliferation, development, differentiation and cell homeostasis, as well as in many types of human tumors. To this aim, we have recently presented the biclustering method HOCCLUS2, for the discovery of miRNA regulatory networks. Experiments on predicted interactions revealed that the statistical and biological consistency of the obtained networks is negatively affected by the poor reliability of the output of miRNA target prediction algorithms. Recently, some learning approaches have been proposed to learn to combine the outputs of distinct prediction algorithms and improve their accuracy. However, the application of classical supervised learning algorithms presents two challenges: i) the presence of only positive examples in datasets of experimentally verified interactions and ii) unbalanced number of labeled and unlabeled examples. Results We present a learning algorithm that learns to combine the score returned by several prediction algorithms, by exploiting information conveyed by (only positively labeled/) validated and unlabeled examples of interactions. To face the two related challenges, we resort to a semi-supervised ensemble learning setting. Results obtained using miRTarBase as the set of labeled (positive) interactions and mirDIP as the set of unlabeled interactions show a significant improvement, over competitive approaches, in the quality of the predictions. This solution also improves the effectiveness of HOCCLUS2 in discovering biologically realistic miRNA:mRNA regulatory networks from large-scale prediction data. Using the miR-17-92 gene cluster family as a reference system and comparing results with previous experiments, we find a large increase in the number of significantly enriched biclusters in pathways, consistent with miR-17-92 functions. Conclusion The proposed approach proves to be fundamental for the computational discovery of miRNA regulatory networks from large-scale predictions. This paves the way to the systematic application of HOCCLUS2 for a comprehensive reconstruction of all the possible multiple interactions established by miRNAs in regulating the expression of gene networks, which would be otherwise impossible to reconstruct by considering only experimentally validated interactions.
Dozens of common genetic variants associated with cancer risk have been identified through genome-wide association studies (GWASs). However, these variants only explain a modest fraction of the heritability of disease. The missing heritability has been attributed to several factors, among them the existence of genetic interactions (G × G). Systematic screens for G × G in model organisms have revealed their fundamental influence in complex phenotypes. In this scenario, G × G overlap significantly with other types of gene and/or protein relationships. Here, by integrating predicted G × G from GWAS data and complex- and context-defined gene coexpression profiles, we provide evidence for G × G associated with cancer risk. G × G predicted from a breast cancer GWAS dataset identified significant overlaps [relative enrichments (REs) of 8-36%, empirical P values < 0.05 to 10(-4)] with complex (non-linear) gene coexpression in breast tumors. The use of gene or protein data not specific for breast cancer did not reveal overlaps. According to the predicted G × G, experimental assays demonstrated functional interplay between lipoma-preferred partner and transforming growth factor-? signaling in the MCF10A non-tumorigenic mammary epithelial cell model. Next, integration of pancreatic tumor gene expression profiles with pancreatic cancer G × G predicted from a GWAS corroborated the observations made for breast cancer risk (REs of 25-59%). The method presented here can potentially support the identification of genetic interactions associated with cancer risk, providing novel mechanistic hypotheses for carcinogenesis. PMID:24296589
Bonifaci, Núria; Colas, Eva; Serra-Musach, Jordi; Karbalai, Nazanin; Brunet, Joan; Gómez, Antonio; Esteller, Manel; Fernández-Taboada, Enrique; Berenguer, Antoni; Reventós, Jaume; Müller-Myhsok, Bertram; Amundadottir, Laufey; Duell, Eric J; Pujana, Miquel Àngel
Purpose To identify a prognostic gene signature for HPV-negative OSCC patients. Experimental Design Two gene expression datasets were used; a training dataset from the Fred Hutchinson Cancer Research Center (FHCRC) (n=97), and a validation dataset from the MD Anderson Cancer Center (MDACC) (n=71). We applied L1/L2-penalized Cox regression models to the FHCRC data on the 131–gene signature previously identified to be prognostic in OSCC patients to identify a prognostic model specific for high-risk HPV-negative OSCC patients. The models were tested with the MDACC dataset using a receiver operating characteristic analysis. Results A 13-gene model was identified as the best predictor of HPV-negative OSCC-specific survival in the training dataset. The risk score for each patient in the validation dataset was calculated from this model and dichotomized at the median. The estimated 2-year mortality (± SE) of patients with high risk scores was 47.1 (±9.24)% compared with 6.35 (± 4.42)% for patients with low risk scores. ROC analyses showed that the areas under the curve for the age, gender, and treatment modality-adjusted models with risk score (0.78, 95%CI: 0.74-0.86) and risk score plus tumor stage (0.79, 95%CI: 0.75-0.87) were substantially higher than for the model with tumor stage (0.54, 95%CI: 0.48-0.62). Conclusions We identified and validated a 13-gene signature that is considerably better than tumor stage in predicting survival of HPV-negative OSCC patients. Further evaluation of this gene signature as a prognostic marker in other populations of patients with HPV-negative OSCC is warranted.
Lohavanichbutr, Pawadee; Mendez, Eduardo; Holsinger, F. Christopher; Rue, Tessa C.; Zhang, Yuzheng; Houck, John; Upton, Melissa P.; Futran, Neal; Schwartz, Stephen M.; Wang, Pei; Chen, Chu
Genome sequencing of Streptomyces, myxobacteria, and fungi showed that although each strain contains genes that encode the enzymes to synthesize a plethora of potential secondary metabolites, only a fraction are expressed during fermentation. Interest has therefore grown in the activation of these cryptic pathways. We review current progress on this topic, describing concepts for activating silent genes, utilization of "natural" mutant-type RNA polymerases and rare earth elements, and the applicability of ribosome engineering to myxobacteria and fungi, the microbial groups known as excellent searching sources, as well as actinomycetes, for secondary metabolites. PMID:23143535
Ochi, Kozo; Hosaka, Takeshi
Understanding physiological control of osteoblast differentiation necessitates characterization of the regulatory signals that initiate the events directing a cell to lineage commitment and establishing competency for bone formation. The bone morphogenetic protein, BMP-2, a member of the TGFbeta superfamily, induces osteoblast differentiation and functions through the Smad signal transduction pathway during in vivo bone formation. However, the molecular targets of BMP-mediated gene transcription during the process of osteoblast differentiation have not been comprehensively identified. In the present study, BMP-2 responsive factors involved in the early stages of commitment and differentiation to the osteoblast phenotype were analyzed by microarray gene expression profiling in samples ranging from 1 to 24 h following BMP-2 dependent differentiation of C2C12 premyoblasts into the osteogenic lineage. A total of 1,800 genes were responsive to BMP-2 and expression was modulated from 3- to 14-fold for less than 100 genes during the time course. Approximately 50% of these 100 genes are either up- or downregulated. Major events associated with phenotypic changes towards the osteogenic lineage were identified from hierarchical and functional clustering analyses. BMP-2 immediately responsive genes (1-4 h), which exhibited either transient or sustained expression, reflect activation and repression of non-osseous BMP-2 developmental systems. This initial response was followed by waves of expression of nuclear proteins and developmental regulatory factors including inhibitors of DNA binding, Runx2, C/EBP, Zn finger binding proteins, forkhead, and numerous homeobox proteins (e.g., CDP/cut, paired, distaless, Hox) which are expressed at characterized stages during osteoblast differentiation. A sequential profile of genes mediating changes in cell morphology, cell growth, and basement membrane formation is observed as a secondary transient early response (2-8 h). Commitment to the osteogenic phenotype is recognized by 8 h, reflected by downregulation of most myogenic-related genes and induction of a spectrum of signaling proteins and enzymes facilitating synthesis and assembly of an extracellular skeletal environment. These genes included collagens Type I and VI and the small leucine rich repeat family of proteoglycans (e.g., decorin, biglycan, osteomodulin, fibromodulin, and osteoadherin/osteoglycin) that reached peak expression at 24 h. With extracellular matrix development, the bone phenotype was further established from 16 to 24 h by induction of genes for cell adhesion and communication and enzymes that organize the bone ECM. Our microarray analysis resulted in the discovery of a class of genes, initially described in relation to differentiation of astrocytes and oligodendrocytes that are functionally coupled to signals for cellular extensions. They include nexin, neuropilin, latexin, neuroglian, neuron specific gene 1, and Ulip; suggesting novel roles for these genes in the bone microenvironment. This global analysis identified a multistage molecular and cellular cascade that supports BMP-2-mediated osteoblast differentiation. PMID:12704803
Balint, Eva; Lapointe, David; Drissi, Hicham; van der Meijden, Caroline; Young, Daniel W; van Wijnen, Andre J; Stein, Janet L; Stein, Gary S; Lian, Jane B
BACKGROUND: Molecular alterations critical to development of cancer include mutations, copy number alterations (amplifications and deletions) as well as genomic rearrangements resulting in gene fusions. Massively parallel next generation sequencing, which enables the discovery of such changes, uses considerable quantities of genomic DNA (> 5 ug), a serious limitation in ever smaller clinical samples. However, a commonly available microarray platforms
Ewa Przybytkowski; Cristiano Ferrario; Mark Basik
Background Chickpea (Cicer arietinum L.), an important grain legume crop of the world is seriously challenged by terminal drought and salinity stresses. However, very limited number of molecular markers and candidate genes are available for undertaking molecular breeding in chickpea to tackle these stresses. This study reports generation and analysis of comprehensive resource of drought- and salinity-responsive expressed sequence tags (ESTs) and gene-based markers. Results A total of 20,162 (18,435 high quality) drought- and salinity- responsive ESTs were generated from ten different root tissue cDNA libraries of chickpea. Sequence editing, clustering and assembly analysis resulted in 6,404 unigenes (1,590 contigs and 4,814 singletons). Functional annotation of unigenes based on BLASTX analysis showed that 46.3% (2,965) had significant similarity (?1E-05) to sequences in the non-redundant UniProt database. BLASTN analysis of unique sequences with ESTs of four legume species (Medicago, Lotus, soybean and groundnut) and three model plant species (rice, Arabidopsis and poplar) provided insights on conserved genes across legumes as well as novel transcripts for chickpea. Of 2,965 (46.3%) significant unigenes, only 2,071 (32.3%) unigenes could be functionally categorised according to Gene Ontology (GO) descriptions. A total of 2,029 sequences containing 3,728 simple sequence repeats (SSRs) were identified and 177 new EST-SSR markers were developed. Experimental validation of a set of 77 SSR markers on 24 genotypes revealed 230 alleles with an average of 4.6 alleles per marker and average polymorphism information content (PIC) value of 0.43. Besides SSR markers, 21,405 high confidence single nucleotide polymorphisms (SNPs) in 742 contigs (with ? 5 ESTs) were also identified. Recognition sites for restriction enzymes were identified for 7,884 SNPs in 240 contigs. Hierarchical clustering of 105 selected contigs provided clues about stress- responsive candidate genes and their expression profile showed predominance in specific stress-challenged libraries. Conclusion Generated set of chickpea ESTs serves as a resource of high quality transcripts for gene discovery and development of functional markers associated with abiotic stress tolerance that will be helpful to facilitate chickpea breeding. Mapping of gene-based markers in chickpea will also add more anchoring points to align genomes of chickpea and other legume species.
Over the past 14 years, researchers in the Clinical Genetics Branch (CGB), led by Branch Chief Sharon Savage, M.D., have carried out a study of dyskeratosis congenita (DC) at the NIH Clinical Center to better understand the disorder and to identify the genes responsible for it.
The characterization of transcriptional networks (TNs) is essential for understanding complex biological phenomena such as development, disease, and evolution. In this study, we have designed and implemented a procedure that combines in silico target screens with zebrafish and mouse validation, in order to identify cis-elements and genes directly regulated by Pax6. We chose Pax6 as the paradigm because of its crucial roles in organogenesis and human disease. We identified over 600 putative Pax6 binding sites and more than 200 predicted direct target genes, conserved in evolution from zebrafish to human and to mouse. This was accomplished using hidden Markov models (HMMs) generated from experimentally validated Pax6 binding sites. A small sample of genes, expressed in the neural lineage, was chosen from the predictions for RNA in situ validation using zebrafish and mouse models. Validation of DNA binding to some predicted cis-elements was also carried out using chromatin immunoprecipitation (ChIP) and zebrafish reporter transgenic studies. The results show that this combined procedure is a highly efficient tool to investigate the architecture of TNs and constitutes a useful complementary resource to ChIP and expression data sets because of its inherent spatiotemporal independence. We have identified several novel direct targets, including some putative disease genes, among them Foxp2; these will allow further dissection of Pax6 function in development and disease.
Coutinho, Pedro; Pavlou, Sofia; Bhatia, Shipra; Chalmers, Kevin J.; Kleinjan, Dirk A.; van Heyningen, Veronica
Transporters are essential in homeostatic exchange of endogenous and exogenous substances at the systematic, organic, cellular, and subcellular levels. Gene mutations of transporters are often related to pharmacogenetics traits. Recent developments in high throughput technologies on genomics, transcriptomics and proteomics allow in depth studies of transporter genes in normal cellular processes and diverse disease conditions. The flood of high throughput data have resulted in urgent need for an updated knowledgebase with curated, organized, and annotated human transporters in an easily accessible way. Using a pipeline with the combination of automated keywords query, sequence similarity search and manual curation on transporters, we collected 1,555 human non-redundant transporter genes to develop the Human Transporter Database (HTD) (http://htd.cbi.pku.edu.cn). Based on the extensive annotations, global properties of the transporter genes were illustrated, such as expression patterns and polymorphisms in relationships with their ligands. We noted that the human transporters were enriched in many fundamental biological processes such as oxidative phosphorylation and cardiac muscle contraction, and significantly associated with Mendelian and complex diseases such as epilepsy and sudden infant death syndrome. Overall, HTD provides a well-organized interface to facilitate research communities to search detailed molecular and genetic information of transporters for development of personalized medicine. PMID:24558441
Ye, Adam Y; Liu, Qing-Rong; Li, Chuan-Yun; Zhao, Min; Qu, Hong
Soybean production in South and North America has recently been threatened by the widespread dissemination of soybean rust (SBR) caused by the fungus Phakopsora pachyrhizi. Currently, chemical spray containing fungicides is the only effective method to control the disease. This strategy increases production costs and exposes the environment to higher levels of fungicides. As a first step towards the development of SBR resistant cultivars, we studied the genetic basis of SBR resistance in five F2 populations derived from crossing the Brazilian-adapted susceptible cultivar CD 208 to each of five different plant introductions (PI 200487, PI 200526, PI 230970, PI 459025, PI 471904) carrying SBR-resistant genes (Rpp). Molecular mapping of SBR-resistance genes was performed in three of these PIs (PI 459025, PI 200526, PI 471904), and also in two other PIs (PI 200456 and 224270). The strategy mapped two genes present in PI 230970 and PI 459025, the original sources of Rpp2 and Rpp4, to linkage groups (LG) J and G, respectively. A new SBR resistance locus, rpp5 was mapped in the LG-N. Together, the genetic and molecular analysis suggested multiple alleles or closely linked genes that govern SBR resistance in soybean. PMID:18506417
Garcia, Alexandre; Calvo, Eberson Sanches; de Souza Kiihl, Romeu Afonso; Harada, Arlindo; Hiromoto, Dario Minoru; Vieira, Luiz Gonzaga Esteves
Candida albicans is the most common human fungal pathogen, causing infections that can be lethal in immunocompromised patients. Although Saccharomyces cerevisiae has been used as a model for C. albicans, it lacks C. albicans' diverse morphogenic forms and is primarily non-pathogenic. Comprehensive genetic analyses that have been instrumental for determining gene function in S. cerevisiae are hampered in C. albicans,
Julia Oh; Eula Fung; Ulrich Schlecht; Ronald W. Davis; Guri Giaever; Robert P. St. Onge; Adam Deutschbauer; Corey Nislow
Background Cnidarians, including corals and anemones, offer unique insights into metazoan evolution because they harbor genetic similarities with vertebrates beyond that found in model invertebrates and retain genes known only from non-metazoans. Cataloging genes expressed in Acropora palmata, a foundation-species of reefs in the Caribbean and western Atlantic, will advance our understanding of the genetic basis of ecologically important traits in corals and comes at a time when sequencing efforts in other cnidarians allow for multi-species comparisons. Results A cDNA library from a sample enriched for symbiont free larval tissue was sequenced on the 454 GS-FLX platform. Over 960,000 reads were obtained and assembled into 42,630 contigs. Annotation data was acquired for 57% of the assembled sequences. Analysis of the assembled sequences indicated that 83–100% of all A. palmata transcripts were tagged, and provided a rough estimate of the total number genes expressed in our samples (?18,000–20,000). The coral annotation data contained many of the same molecular components as in the Bilateria, particularly in pathways associated with oxidative stress and DNA damage repair, and provided evidence that homologs of p53, a key player in DNA repair pathways, has experienced selection along the branch separating Cnidaria and Bilateria. Transcriptome wide screens of paralog groups and transition/transversion ratios highlighted genes including: green fluorescent proteins, carbonic anhydrase, and oxidative stress proteins; and functional groups involved in protein and nucleic acid metabolism, and the formation of structural molecules. These results provide a starting point for study of adaptive evolution in corals. Conclusions Currently available transcriptome data now make comparative studies of the mechanisms underlying coral's evolutionary success possible. Here we identified candidate genes that enable corals to maintain genomic integrity despite considerable exposure to genotoxic stress over long life spans, and showed conservation of important physiological pathways between corals and bilaterians.
Polato, Nicholas R.; Vera, J. Cristobal; Baums, Iliana B.
The characterization of molecular alterations specific to cancer facilitates the discovery of predictive and prognostic biomarkers important to targeted therapeutics. Alterations critical to cancer therapeutics include copy number alterations (CNAs) such as gene amplifications and deletions as well as genomic rearrangements resulting in gene fusions. There are two genome-wide technologies used to detect CNAs: next generation sequencing (NGS) and dense microarray based comparative genomic hybridization (aCGH). Array CGH is a mature robust technology of lower cost and more accessible than NGS. This chapter describes the protocol steps and analysis required to obtain reliable aCGH results from clinical samples. Technical options and various necessary compromises related to the nature of clinical material are considered and the consequences of these choices for data analysis and interpretation are discussed. The chapter includes brief description of the data analysis, even though analysis is often performed by bioinformaticians. Today’s cancer research requires collaboration of clinicians, molecular biologist and mathematicians. Acquaintance with the basic principles related to the extraction of the data from arrays, its normalization and the algorithms available for analysis provides a baseline for mutual understanding and communication.
Przybytkowski, Ewa; Aguilar-Mahecha, Adrianan; Nabavi, Sheida; Tonellato, Peter J.; Basik, Mark
Human brain connectivity is disrupted in a wide range of disorders - from Alzheimer's disease to autism - but little is known about which specific genes affect it. Here we conducted a genome-wide association for connectivity matrices that capture information on the density of fiber connections between 70 brain regions. We scanned a large twin cohort (N=366) with 4-Tesla high angular resolution diffusion imaging (105-gradient HARDI). Using whole brain HARDI tractography, we extracted a relatively sparse 70×70 matrix representing fiber density between all pairs of cortical regions automatically labeled in co-registered anatomical scans. Additive genetic factors accounted for 1-58% of the variance in connectivity between 90 (of 122) tested nodes. We discovered genome-wide significant associations between variants and connectivity. GWAS permutations at various levels of heritability, and split-sample replication, validated our genetic findings. The resulting genes may offer new leads for mechanisms influencing aberrant connectivity and neurodegeneration. PMID:22903411
Jahanshad, Neda; Hibar, Derrek P; Ryles, April; Toga, Arthur W; McMahon, Katie L; de Zubicaray, Greig I; Hansell, Narelle K; Montgomery, Grant W; Martin, Nicholas G; Wright, Margaret J; Thompson, Paul M
In the past decade, there have been fundamental advances in our understanding of genetic factors that contribute to the inflammatory bowel diseases (IBDs) Crohn’s disease and ulcerative colitis. The latest international collaborative studies have brought the number of IBD susceptibility gene loci to 163. However, genetic factors account for only a portion of overall disease variance, indicating a need to better explore gene-environment interactions in the development of IBD. Epigenetic factors can mediate interactions between the environment and the genome; their study could provide new insight into the pathogenesis of IBD. We review recent progress in identification of genetic factors associated with IBD and discuss epigenetic mechanisms that could affect development and progression of IBD.
Ventham, Nicholas T.; Kennedy, Nicholas A.; Nimmo, Elaine R.; Satsangi, Jack
In the present study, allele mining was conducted on a panel of drought related candidate genes in a set of 96 barley genotypes\\u000a using EcoTILLING, which is a variant of the targeting induced local lesions in genomes (TILLING) technology. Analyzing approximately\\u000a 1.5 million basepairs in barley a total number of 94 verified unique haplotypes were identified in 18 amplicons designed
András Cseri; Mátyás Cserháti; Maria von Korff; Bettina Nagy; Gábor V. Horváth; András Palágyi; János Pauk; Dénes Dudits; Ottó Törjék
The success of the Asian bivalve Limnoperna fortunei as an invader in South America is related to its high acclimation capability. It can inhabit waters with a wide range of temperatures and salinity and handle long-term periods of air exposure. We describe the transcriptome of L. fortunei aiming to give a first insight into the phenotypic plasticity that allows non-native taxa to become established and widespread. We sequenced 95,219 reads from five main tissues of the mussel L. fortunei using Roche’s 454 and assembled them to form a set of 84,063 unigenes (contigs and singletons) representing partial or complete gene sequences. We annotated 24,816 unigenes using a BLAST sequence similarity search against a NCBI nr database. Unigenes were divided into 20 eggNOG functional categories and 292 KEGG metabolic pathways. From the total unigenes, 1,351 represented putative full-length genes of which 73.2% were functionally annotated. We described the first partial and complete gene sequences in order to start understanding bivalve invasiveness. An expansion of the hsp70 gene family, seen also in other bivalves, is present in L. fortunei and could be involved in its adaptation to extreme environments, e.g. during intertidal periods. The presence of toll-like receptors gives a first insight into an immune system that could be more complex than previously assumed and may be involved in the prevention of disease and extinction when population densities are high. Finally, the apparent lack of special adaptations to extremely low O2 levels is a target worth pursuing for the development of a molecular control approach.
Uliano-Silva, Marcela; Americo, Juliana Alves; Brindeiro, Rodrigo; Dondero, Francesco; Prosdocimi, Francisco; de Freitas Rebelo, Mauro
Forty-eight prevalent strains of Riemerella anatipestifer (RA) isolated in China were tested for susceptibility to eighteen antibiotics and investigated for the frequencies and characteristics of integrons and gene cassettes. All isolates were resistant to between three and ten antimicrobial drugs. Forty-seven isolates contained class 1 integron (97.92%), and 15 of the 47 isolates contained class 2 integron (31.25%). Class 3
Fuying Zheng; Guozhen Lin; Jizhang Zhou; Xiaoan Cao; Xiaowei Gong; Guanghua Wang; Changqing Qiu
Background A key step in the regulation of gene expression is the sequence-specific binding of transcription factors (TFs) to their DNA recognition sites. However, elucidating TF binding site (TFBS) motifs in higher eukaryotes has been challenging, even when employing cross-species sequence conservation. We hypothesized that for human and mouse, many orthologous genes expressed in a similarly tissue-specific manner in both human and mouse gene expression data, are likely to be co-regulated by orthologous TFs that bind to DNA sequence motifs present within noncoding sequence conserved between these genomes. Results We performed automated motif searching and merging across four different motif finding algorithms, followed by filtering of the resulting motifs for those that contain blocks of information content. Applying this motif finding strategy to conserved noncoding regions surrounding co-expressed tissue-specific human genes allowed us to discover both previously known, and many novel candidate, regulatory DNA motifs in all 18 tissue-specific expression clusters that we examined. For previously known TFBS motifs, we observed that if a TF was expressed in the specified tissue of interest, then in most cases we identified a motif that matched its TRANSFAC motif; conversely, of all those discovered motifs that matched TRANSFAC motifs, most of the corresponding TF transcripts were expressed in the tissue(s) corresponding to the expression cluster for which the motif was found. Conclusion Our results indicate that the integration of the results from multiple motif finding tools identifies and ranks highly more known and novel motifs than does the use of just one of these tools. In addition, we believe that our simultaneous enrichment strategies helped to identify likely human cis regulatory elements. A number of the discovered motifs may correspond to novel binding site motifs for as yet uncharacterized tissue-specific TFs. We expect this strategy to be useful for identifying motifs in other metazoan genomes.
Huber, Bertrand R; Bulyk, Martha L
Saxitoxin is a potent neurotoxin that occurs in aquatic environments worldwide. Ingestion of vector species can lead to paralytic shellfish poisoning, a severe human illness that may lead to paralysis and death. In freshwaters, the toxin is produced by prokaryotic cyanobacteria; in marine waters, it is associated with eukaryotic dinoflagellates. However, several studies suggest that saxitoxin is not produced by dinoflagellates themselves, but by co-cultured bacteria. Here, we show that genes required for saxitoxin synthesis are encoded in the nuclear genomes of dinoflagellates. We sequenced >1.2×10(6) mRNA transcripts from the two saxitoxin-producing dinoflagellate strains Alexandrium fundyense CCMP1719 and A. minutum CCMP113 using high-throughput sequencing technology. In addition, we used in silico transcriptome analyses, RACE, qPCR and conventional PCR coupled with Sanger sequencing. These approaches successfully identified genes required for saxitoxin-synthesis in the two transcriptomes. We focused on sxtA, the unique starting gene of saxitoxin synthesis, and show that the dinoflagellate transcripts of sxtA have the same domain structure as the cyanobacterial sxtA genes. But, in contrast to the bacterial homologs, the dinoflagellate transcripts are monocistronic, have a higher GC content, occur in multiple copies, contain typical dinoflagellate spliced-leader sequences and eukaryotic polyA-tails. Further, we investigated 28 saxitoxin-producing and non-producing dinoflagellate strains from six different genera for the presence of genomic sxtA homologs. Our results show very good agreement between the presence of sxtA and saxitoxin-synthesis, except in three strains of A. tamarense, for which we amplified sxtA, but did not detect the toxin. Our work opens for possibilities to develop molecular tools to detect saxitoxin-producing dinoflagellates in the environment. PMID:21625593
Stüken, Anke; Orr, Russell J S; Kellmann, Ralf; Murray, Shauna A; Neilan, Brett A; Jakobsen, Kjetill S
Saxitoxin is a potent neurotoxin that occurs in aquatic environments worldwide. Ingestion of vector species can lead to paralytic shellfish poisoning, a severe human illness that may lead to paralysis and death. In freshwaters, the toxin is produced by prokaryotic cyanobacteria; in marine waters, it is associated with eukaryotic dinoflagellates. However, several studies suggest that saxitoxin is not produced by dinoflagellates themselves, but by co-cultured bacteria. Here, we show that genes required for saxitoxin synthesis are encoded in the nuclear genomes of dinoflagellates. We sequenced >1.2×106 mRNA transcripts from the two saxitoxin-producing dinoflagellate strains Alexandrium fundyense CCMP1719 and A. minutum CCMP113 using high-throughput sequencing technology. In addition, we used in silico transcriptome analyses, RACE, qPCR and conventional PCR coupled with Sanger sequencing. These approaches successfully identified genes required for saxitoxin-synthesis in the two transcriptomes. We focused on sxtA, the unique starting gene of saxitoxin synthesis, and show that the dinoflagellate transcripts of sxtA have the same domain structure as the cyanobacterial sxtA genes. But, in contrast to the bacterial homologs, the dinoflagellate transcripts are monocistronic, have a higher GC content, occur in multiple copies, contain typical dinoflagellate spliced-leader sequences and eukaryotic polyA-tails. Further, we investigated 28 saxitoxin-producing and non-producing dinoflagellate strains from six different genera for the presence of genomic sxtA homologs. Our results show very good agreement between the presence of sxtA and saxitoxin-synthesis, except in three strains of A. tamarense, for which we amplified sxtA, but did not detect the toxin. Our work opens for possibilities to develop molecular tools to detect saxitoxin-producing dinoflagellates in the environment.
Stuken, Anke; Orr, Russell J. S.; Kellmann, Ralf; Murray, Shauna A.; Neilan, Brett A.; Jakobsen, Kjetill S.
Juvenile Batten disease is a neurodegenerative disease caused by accel- erated apoptotic death of photoreceptors and neurons attributable to defects in the CLN3 gene. CLN3 is antiapoptotic when overexpressed in NT2 neuronal precursor cells. CLN3 negatively modulates endogenous ceramide levels in NT2 cells and acts upstream of ceramide generation. Because defects in regulation of apoptosis are involved in the development
Svetlana N. Rylova; Andrea Amalfitano; Dixie-Ann Persaud-Sawin; Wei-Xing Guo; Jerry Chang; Paul J. Jansen; Alan D. Proia; Rose-Mary Boustany
Background Rapid improvements in the development of new sequencing technologies have led to the availability of genome sequences of more than 300 organisms today. Thanks to bioinformatic analyses, prediction of gene models and protein-coding transcripts has become feasible. Various reverse and forward genetics strategies have been followed to determine the functions of these gene models and regulatory sequences. Using T-DNA or transposons as tags, significant progress has been made by using "Knock-in" approaches ("gain-of-function" or "activation tagging") in different plant species but not in perennial plants species, e.g. long-lived trees. Here, large scale gene tagging resources are still lacking. Results We describe the first application of an inducible transposon-based activation tagging system for a perennial plant species, as example a poplar hybrid (P. tremula L. × P. tremuloides Michx.). Four activation-tagged populations comprising a total of 12,083 individuals derived from 23 independent "Activation Tagging Ds" (ATDs) transgenic lines were produced and phenotyped. To date, 29 putative variants have been isolated and new ATDs genomic positions were successfully determined for 24 of those. Sequences obtained were blasted against the publicly available genome sequence of P. trichocarpa v2.0 (Phytozome v7.0; http://www.phytozome.net/poplar) revealing possible transcripts for 17 variants. In a second approach, 300 randomly selected individuals without any obvious phenotypic alterations were screened for ATDs excision. For one third of those transposition of ATDs was confirmed and in about 5% of these cases genes were tagged. Conclusions The novel strategy of first genotyping and then phenotyping a tagging population as proposed here is, in particular, applicable for long-lived, difficult to transform plant species. We could demonstrate the power of the ATDs transposon approach and the simplicity to induce ATDs transposition in vitro. Since a transposon is able to pass chromosomal boundaries, only very few primary transposon-carrying transgenic lines are required for the establishment of large transposon tagging populations. In contrast to T-DNA-based activation tagging, which is plagued by a lack of transformation efficiency and its time consuming nature, this for the first time, makes it feasible one day to tag (similarly to Arabidopsis) every gene within a perennial plant genome.
Integration of pines into the large scope of plant biology research depends on study of pines in parallel with study of annual plants, and on availability of research materials from pine to plant biologists interested in comparing pine with annual plant s...
R. W. Whetten R. R. Sederoff C. Kinlaw E. Retzel
Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value =?0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance.
Ficklin, Stephen P.; Feltus, Frank Alex
Background Paspalum dilatatum Poir. (common name dallisgrass) is a native grass species of South America, with special relevance to dairy and red meat production. P. dilatatum exhibits higher forage quality than other C4 forage grasses and is tolerant to frost and water stress. This species is predominantly cultivated in an apomictic monoculture, with an inherent high risk that biotic and abiotic stresses could potentially devastate productivity. Therefore, advanced breeding strategies that characterise and use available genetic diversity, or assess germplasm collections effectively are required to deliver advanced cultivars for production systems. However, there are limited genomic resources available for this forage grass species. Results Transcriptome sequencing using second-generation sequencing platforms has been employed using pooled RNA from different tissues (stems, roots, leaves and inflorescences) at the final reproductive stage of P. dilatatum cultivar Primo. A total of 324,695 sequence reads were obtained, corresponding to c. 102 Mbp. The sequences were assembled, generating 20,169 contigs of a combined length of 9,336,138 nucleotides. The contigs were BLAST analysed against the fully sequenced grass species of Oryza sativa subsp. japonica, Brachypodium distachyon, the closely related Sorghum bicolor and foxtail millet (Setaria italica) genomes as well as against the UniRef 90 protein database allowing a comprehensive gene ontology analysis to be performed. The contigs generated from the transcript sequencing were also analysed for the presence of simple sequence repeats (SSRs). A total of 2,339 SSR motifs were identified within 1,989 contigs and corresponding primer pairs were designed. Empirical validation of a cohort of 96 SSRs was performed, with 34% being polymorphic between sexual and apomictic biotypes. Conclusions The development of genetic and genomic resources for P. dilatatum will contribute to gene discovery and expression studies. Association of gene function with agronomic traits will significantly enable molecular breeding and advance germplasm enhancement.
Giordano, Andrea; Cogan, Noel O. I.; Kaur, Sukhjiwan; Drayton, Michelle; Mouradov, Aidyn; Panter, Stephen; Schrauf, Gustavo E.; Mason, John G.; Spangenberg, German C.
Background Grass carp (Ctenopharyngodon idella) is one of the most economically important freshwater fish, but its production is often affected by diseases that cause serious economic losses. To date, no good breeding varieties have been obtained using the oriented cultivation technique. The ability to identify disease resistance genes in grass carp is important to cultivate disease-resistant varieties of grass carp. Results In this study, we constructed a non-normalized cDNA library of head kidney in grass carp, and, after clustering and assembly, we obtained 3,027 high-quality unigenes. Solexa sequencing was used to generate sequence tags from the transcriptomes of the head kidney in grass carp before and after grass carp reovirus (GCRV) infection. After processing, we obtained 22,144 tags that were differentially expressed by more than 2-fold between the uninfected and infected groups. 679 of the differentially expressed tags (3.1%) mapped to 483 of the unigenes (16.0%). The up-regulated and down-regulated unigenes were annotated using gene ontology terms; 16 were annotated as immune-related and 42 were of unknown function having no matches to any of the sequences in the databases that were used in the similarity searches. Semi-quantitative RT-PCR revealed four unknown unigenes that showed significant responses to the viral infection. Based on domain structure predictions, one of these sequences was found to encode a protein that contained two transmembrane domains and, therefore, may be a transmembrane protein. Here, we proposed that this novel unigene may encode a virus receptor or a protein that mediates the immune signalling pathway at the cell surface. Conclusion This study enriches the molecular basis data of grass carp and further confirms that, based on fish tissue-specific EST databases, transcriptome analysis is an effective route to discover novel functional genes.
The intracellular suppressors of cytokine signaling (SOCS) family members, including CISH and SOCS1 to 7 in mammals, are important regulators of cytokine signaling pathways. So far, the orthologues of all the eight mammalian SOCS members have been identified in fish, with several of them having multiple copies. Whilst fish CISH, SOCS3, and SOCS5 paralogues are possibly the result of the fish-specific whole genome duplication event, gene duplication or lineage-specific genome duplication may also contribute to some paralogues, as with the three trout SOCS2s and three zebrafish SOCS5s. Fish SOCS genes are broadly expressed and also show species-specific expression patterns. They can be upregulated by cytokines, such as IFN-?, TNF-?, IL-1?, IL-6, and IL-21, by immune stimulants such as LPS, poly I:C, and PMA, as well as by viral, bacterial, and parasitic infections in member- and species-dependent manners. Initial functional studies demonstrate conserved mechanisms of fish SOCS action via JAK/STAT pathways.
Wang, Tiehui; Gorgoglione, Bartolomeo; Maehr, Tanja; Holland, Jason W.; Vecino, Jose L. Gonzalez; Wadsworth, Simon; Secombes, Christopher J.
The red palm weevil (RPW; Rhynchophorus ferrugineus) is a devastating pest of palms, prevalent in the Middle East as well as many other regions of the world. Here, we report a large-scale de novo complementary DNA (cDNA) sequencing effort that acquired ?5 million reads and assembled them into 26?765 contigs from 12 libraries made from samples of different RPW developmental stages based on the Roche/454 GS FLX platform. We annotated these contigs based on the publically available known insect genes and the Tribolium castaneum genome assembly. We find that over 80% of coding sequences (CDS) from the RPW contigs have high-identity homologs to known proteins with complete CDS. Gene expression analysis shows that the pupa and larval stages have the highest and lowest expression levels, respectively. In addition, we also identified more than 60?000 single nucleotide polymorphisms and 1?200 simple sequence repeat markers. This study provides the first large-scale cDNA dataset for RPW, a much-needed resource for future molecular studies. PMID:23955844
Wang, Lei; Zhang, Xiao-Wei; Pan, Lin-Lin; Liu, Wan-Fei; Wang, Da-Peng; Zhang, Guang-Yu; Yin, Yu-Xin; Yin, An; Jia, Shan-Gang; Yu, Xiao-Guang; Sun, Gao-Yuan; Hu, Song-Nian; Al-Mssallem, Ibrahim S; Yu, Jun
Forty-eight prevalent strains of Riemerella anatipestifer (RA) isolated in China were tested for susceptibility to eighteen antibiotics and investigated for the frequencies and characteristics of integrons and gene cassettes. All isolates were resistant to between three and ten antimicrobial drugs. Forty-seven isolates contained class 1 integron (97.92%), and 15 of the 47 isolates contained class 2 integron (31.25%). Class 3 integron was not detected in the strains analysed. Three different cassette arrays (aadA1, aadA5 and aacA4-aadA1) of class 1 integron and one gene cassette (sat2-aadA1) of class 2 integron were discovered. Three out of the four cassette arrays were novel, with the exception of aadA5. The location of integrons was confirmed by transforming extracted plasmids into an integron-negative strain of Escherichia coli (E. coli) BL21 (DE3). Class 1 integrons were always discovered in plasmids, while class 2 integrons could be located on plasmids or in the chromosome. This is the first description of class 2 integrons, three novel cassette arrays and the location of integrons in RA species. PMID:22112855
Zheng, Fuying; Lin, Guozhen; Zhou, Jizhang; Cao, Xiaoan; Gong, Xiaowei; Wang, Guanghua; Qiu, Changqing
The application of established drug compounds to novel therapeutic indications, known as drug repositioning, offers several advantages over traditional drug development, including reduced development costs and shorter paths to approval. Recent approaches to drug repositioning employ high-throughput experimental approaches to assess a compound’s potential therapeutic qualities. Here we present a systematic computational approach to predict novel therapeutic indications based on comprehensive testing of molecular signatures in drug-disease pairs. We integrated gene expression measurements from 100 diseases and gene expression measurements on 164 drug compounds yielding predicted therapeutic potentials for these drugs. We demonstrate the ability to recover many known drug and disease relationships using computationally derived therapeutic potentials, and also predict many new indications for these drugs. We experimentally validated a prediction for the anti-ulcer drug cimetidine as a candidate therapeutic in the treatment of lung adenocarcinoma, and demonstrate both in vitro and in vivo using mouse xenograft models. This novel computational method provides a novel and systematic approach to reposition established drugs to treat a wide range of human diseases.
Sirota, Marina; Dudley, Joel T.; Kim, Jeewon; Chiang, Annie P.; Morgan, Alex A.; Sweet-Cordero, Alejandro; Sage, Julien; Butte, Atul J.
Background Pigeonpea (Cajanus cajan (L.) Millsp) is one of the major grain legume crops of the tropics and subtropics, but biotic stresses [Fusarium wilt (FW), sterility mosaic disease (SMD), etc.] are serious challenges for sustainable crop production. Modern genomic tools such as molecular markers and candidate genes associated with resistance to these stresses offer the possibility of facilitating pigeonpea breeding for improving biotic stress resistance. Availability of limited genomic resources, however, is a serious bottleneck to undertake molecular breeding in pigeonpea to develop superior genotypes with enhanced resistance to above mentioned biotic stresses. With an objective of enhancing genomic resources in pigeonpea, this study reports generation and analysis of comprehensive resource of FW- and SMD- responsive expressed sequence tags (ESTs). Results A total of 16 cDNA libraries were constructed from four pigeonpea genotypes that are resistant and susceptible to FW ('ICPL 20102' and 'ICP 2376') and SMD ('ICP 7035' and 'TTB 7') and a total of 9,888 (9,468 high quality) ESTs were generated and deposited in dbEST of GenBank under accession numbers GR463974 to GR473857 and GR958228 to GR958231. Clustering and assembly analyses of these ESTs resulted into 4,557 unique sequences (unigenes) including 697 contigs and 3,860 singletons. BLASTN analysis of 4,557 unigenes showed a significant identity with ESTs of different legumes (23.2-60.3%), rice (28.3%), Arabidopsis (33.7%) and poplar (35.4%). As expected, pigeonpea ESTs are more closely related to soybean (60.3%) and cowpea ESTs (43.6%) than other plant ESTs. Similarly, BLASTX similarity results showed that only 1,603 (35.1%) out of 4,557 total unigenes correspond to known proteins in the UniProt database (? 1E-08). Functional categorization of the annotated unigenes sequences showed that 153 (3.3%) genes were assigned to cellular component category, 132 (2.8%) to biological process, and 132 (2.8%) in molecular function. Further, 19 genes were identified differentially expressed between FW- responsive genotypes and 20 between SMD- responsive genotypes. Generated ESTs were compiled together with 908 ESTs available in public domain, at the time of analysis, and a set of 5,085 unigenes were defined that were used for identification of molecular markers in pigeonpea. For instance, 3,583 simple sequence repeat (SSR) motifs were identified in 1,365 unigenes and 383 primer pairs were designed. Assessment of a set of 84 primer pairs on 40 elite pigeonpea lines showed polymorphism with 15 (28.8%) markers with an average of four alleles per marker and an average polymorphic information content (PIC) value of 0.40. Similarly, in silico mining of 133 contigs with ? 5 sequences detected 102 single nucleotide polymorphisms (SNPs) in 37 contigs. As an example, a set of 10 contigs were used for confirming in silico predicted SNPs in a set of four genotypes using wet lab experiments. Occurrence of SNPs were confirmed for all the 6 contigs for which scorable and sequenceable amplicons were generated. PCR amplicons were not obtained in case of 4 contigs. Recognition sites for restriction enzymes were identified for 102 SNPs in 37 contigs that indicates possibility of assaying SNPs in 37 genes using cleaved amplified polymorphic sequences (CAPS) assay. Conclusion The pigeonpea EST dataset generated here provides a transcriptomic resource for gene discovery and development of functional markers associated with biotic stress resistance. Sequence analyses of this dataset have showed conservation of a considerable number of pigeonpea transcripts across legume and model plant species analysed as well as some putative pigeonpea specific genes. Validation of identified biotic stress responsive genes should provide candidate genes for allele mining as well as candidate markers for molecular breeding.
The fragile X mental retardation (FXMR) syndrome is one of the most frequent causes of mental retardation. Affected individuals display a wide range of additional characteristic features including behavioural and physical phenotypes, and the extent to which individuals are affected is highly variable. For these reasons, elucidation of the pathophysiology of this disease has been an important challenge to the scientific community. 1991 marks the year of the discovery of both the FMR1 gene mutations involved in this disease, and of their dynamic nature. Although a mouse model for the disease has been available for 16 years and extensive research has been performed on the FMR1 protein (FMRP), we still understand little about how the disease develops, and no treatment has yet been shown to be effective. In this review, we summarise current knowledge on FXMR with an emphasis on the technical challenges of molecular diagnostics, on its prevalence and dynamics among populations, and on the potential of screening for FMR1 mutations.
Rousseau, Francois; Labelle, Yves; Bussieres, Johanne; Lindsay, Carmen
The emergence of a huge volume of “omics” data enables a computational approach to the investigation of the biology of cancer. The cancer informatics approach is a useful supplement to the traditional experimental approach. I reviewed several reports that used a bioinformatics approach to analyze the associations among aging, stem cells, and cancer by microarray gene expression profiling. The high expression of aging- or human embryonic stem cell-related molecules in cancer suggests that certain important mechanisms are commonly underlying aging, stem cells, and cancer. These mechanisms are involved in cell cycle regulation, metabolic process, DNA damage response, apoptosis, p53 signaling pathway, immune/inflammatory response, and other processes, suggesting that cancer is a developmental and evolutional disease that is strongly related to aging. Moreover, these mechanisms demonstrate that the initiation, proliferation, and metastasis of cancer are associated with the deregulation of stem cells. These findings provide insights into the biology of cancer. Certainly, the findings that are obtained by the informatics approach should be justified by experimental validation. This review also noted that next-generation sequencing data provide enriched sources for cancer informatics study.
This report describes the results of a Laboratory-Directed Research and Development project on techniques for pattern discovery in discrete event time series data. In this project, we explored two different aspects of the pattern matching/discovery proble...
G. N. Conrad J. M. Britanik S. M. DeLand C. L. Jenkin
BACKGROUND: Drug discovery and chemical biology are exceedingly complex and demanding enterprises. In recent years there are been increasing awareness about the importance of predicting\\/optimizing the absorption, distribution, metabolism, excretion and toxicity (ADMET) properties of small chemical compounds along the search process rather than at the final stages. Fast methods for evaluating ADMET properties of small molecules often involve applying
David Lagorce; Olivier Sperandio; Hervé Galons; Maria A. Miteva; Bruno O. Villoutreix
Biologists require genetic as well as molecular tools to decipher genomic information and ultimately to understand gene function. The Berkeley Drosophila Genome Project is addressing these needs with a massive gene disruption project that uses individual, genetically engineered P transposable elements to target open reading frames throughout the Drosophila genome DNA flanking the insertions is sequenced thereby placing and extensive
A. C. Spradling; D. M. Stern; I. Kiss
Peanut is vulnerable to a range of foliar diseases such as spotted wilt caused by Tomato spotted wilt virus (TSWV), early (Cercospora arachidicola) and late (Cercosporidium personatum) leaf spots, southern stem rot (Sclerotium rolfsii), and sclerotinia blight (Sclerotinia minor). In this study, we report the generation of 17,376 peanut expressed sequence tags (ESTs) from leaf tissues of a peanut cultivar (Tifrunner, resistant to TSWV and leaf spots) and a breeding line (GT-C20, susceptible to TSWV and leaf spots). After trimming vector and discarding low quality sequences, a total of 14,432 high-quality ESTs were selected for further analysis and deposition to GenBank. Sequence clustering resulted in 6,888 unique ESTs composed of 1,703 tentative consensus (TCs) sequences and 5185 singletons. A large number of ESTs (5717) representing genes of unknown functions were also identified. Among the unique sequences, there were 856 EST-SSRs identified. A total of 290 new EST-based SSR markers were developed and examined for amplification and polymorphism in cultivated peanut and wild species. Resequencing information of selected amplified alleles revealed that allelic diversity could be attributed mainly to differences in repeat type and length in the SSR regions. In addition, a few additional INDEL mutations and substitutions were observed in the regions flanking the microsatellite regions. In addition, some defense-related transcripts were also identified, such as putative oxalate oxidase (EU024476) and NBS-LRR domains. EST data in this study have provided a new source of information for gene discovery and development of SSR markers in cultivated peanut. A total of 16931 ESTs have been deposited to the NCBI GenBank database with accession numbers ES751523 to ES768453.
Guo, Baozhu; Chen, Xiaoping; Hong, Yanbin; Liang, Xuanqiang; Dang, Phat; Brenneman, Tim; Holbrook, Corley; Culbreath, Albert
Background Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives. Principal Findings Here we explore the use of seriation, a statistical approach for ordering sets of objects based on their similarity, for large-scale expression pattern discovery in SAGE data. For this specific task we implement a seriation heuristic we term ‘progressive construction of contigs’ that constructs local chains of related elements by sequentially rearranging margins of the correlation matrix. We apply the heuristic to the analysis of simulated and experimental SAGE data and compare our results to those obtained with a clustering algorithm developed specifically for SAGE data. We show using simulations that the performance of seriation compares favorably to that of the clustering algorithm on noisy SAGE data. Conclusions We explore the use of a seriation approach for visualization-based pattern discovery in SAGE data. Using both simulations and experimental data, we demonstrate that seriation is able to identify groups of co-expressed genes more accurately than a clustering algorithm developed specifically for SAGE data. Our results suggest that seriation is a useful method for the analysis of gene expression data whose applicability should be further pursued.
Morozova, Olena; Morozov, Vyacheslav; Hoffman, Brad G.; Helgason, Cheryl D.; Marra, Marco A.
Cyanobactins are cyclic peptides assembled through the cleavage and modification of short precursor proteins. An inactive cyanobactin gene cluster has been described from the genome Microcystis aeruginosa NIES843. Here we report the discovery of active counterparts in strains of the genus Microcystis guided by this silent cyanobactin gene cluster. The end products of the gene clusters were structurally diverse cyclic peptides, which we named piricyclamides. Some of the piricyclamides consisted solely of proteinogenic amino acids while others contained disulfide bridges and some were prenylated or geranylated. The piricyclamide gene clusters encoded between 1 and 4 precursor genes. They encoded highly diverse core peptides ranging in length from 7–17 amino acids with just a single conserved amino acid. Heterologous expression of the pir gene cluster from Microcystis aeruginosa PCC7005 in Escherichia coli confirmed that this gene cluster is responsible for the biosynthesis of piricyclamides. Chemical analysis demonstrated that Microcystis strains could produce an array of piricyclamides some of which are geranylated or prenylated. The genetic diversity of piricyclamides in a bloom sample was explored and 19 different piricyclamide precursor genes were found. This study provides evidence for a stunning array of piricyclamides in Microcystis, a worldwide occurring bloom forming cyanobacteria.
Leikoski, Niina; Fewer, David P.; Jokela, Jouni; Alakoski, Pirita; Wahlsten, Matti; Sivonen, Kaarina
We evaluated the usefulness and robustness of Agrobacterium tumefaciens-mediated transformation (ATMT) as a high-throughput transformation tool for pathogenicity gene discovery in the filamentous phytopathogen Leptosphaeria maculans. Thermal asymmetric interlaced polymerase chain reaction allowed us to amplify the left border (LB) flanking sequence in 135 of 400 transformants analysed, and indicated a high level of preservation of the T-DNA LB. In addition, T-DNA preferentially integrated as a single copy in gene-rich regions of the fungal genome, with a probable bias towards intergenic and/or regulatory regions. A total of 53 transformants out of 1388 (3.8%) showed reproducible pathogenicity defects when inoculated on cotyledons of Brassica napus, with diverse altered phenotypes. Co-segregation of the altered phenotype with the T-DNA integration was observed for 6 of 12 transformants crossed. If extrapolated to the whole collection, this indicates that 1.9% of the collection actually corresponds to tagged pathogenicity mutants. The preferential insertion into gene-rich regions along with the high ratio of tagged mutants renders ATMT a tool of choice for large-scale gene discovery in L. maculans. PMID:16979359
Blaise, Françoise; Rémy, Estelle; Meyer, Michel; Zhou, Ligang; Narcy, Jean-Paul; Roux, Jacqueline; Balesdent, Marie-Hélène; Rouxel, Thierry
The mouse has become an indispensable and versatile model organism for the study of development, genetics, behavior, and disease. The application of comprehensive gene expression profiling technologies to compare normal and diseased tissues or to assess molecular alterations resulting from various experimental interventions has the potential to provide highly detailed qualitative and quantitative descriptions of these processes. Ideally, to interpret experimental data, the magnitude and diversity of gene expression for the system under study should be well characterized, yet little is known about the normal variation of mouse gene expression in vivo. To assess natural differences in murine gene expression, we used a 5406-clone spotted cDNA microarray to quantitate transcript levels in the kidney, liver, and testis from each of 6 normal male C57BL6 mice. We used ANOVA to compare the variance across the six mice to the variance among four replicate experiments performed for each mouse tissue. For the 6 kidney samples, 102 of 3,088 genes (3.3%) exhibited a statistically significant mouse variance at a level of 0.05. In the testis, 62 of 3,252 genes (1.9%) showed statistically significant variance, and in the liver, there were 21 of 2,514 (0.8%) genes with significantly variable expression. Immune-modulated, stress-induced, and hormonally regulated genes were highly represented among the transcripts that were most variable. The expression levels of several genes varied significantly in more than one tissue. These studies help to define the baseline level of variability in mouse gene expression and emphasize the importance of replicate microarray experiments.
Pritchard, Colin C.; Hsu, Li; Delrow, Jeffrey; Nelson, Peter S.
The Immunological Genome Project combines immunology and computational biology laboratories in an effort to establish a complete 'road map' of gene-expression and regulatory networks in all immune cells
Tracy S P Heng; Michio W Painter; Kutlu Elpek; Veronika Lukacs-Kornek; Nora Mauermann; Shannon J Turley; Daphne Koller; Francis S Kim; Amy J Wagers; Natasha Asinovski; Scott Davis; Marlys Fassett; Markus Feuerer; Daniel H D Gray; Sokol Haxhinasto; Jonathan A Hill; Gordon Hyatt; Catherine Laplace; Kristen Leatherbee; Diane Mathis; Christophe Benoist; Radu Jianu; David H Laidlaw; J Adam Best; Jamie Knell; Ananda W Goldrath; Jessica Jarjoura; Joseph C Sun; Yanan Zhu; Lewis L Lanier; Ayla Ergun; Zheng Li; James J Collins; Susan A Shinton; Richard R Hardy; Randall Friedline; Katelyn Sylvia; Joonsoo Kang
In an attempt to determine the relationships between the plant profiles (country of collection, taxonomy, plant part) and the compound classes isolated with cytotoxic activity against a panel of human tumor cell lines, the data compiled from a 15-year anticancer drug-discovery project were subjected to an analysis of variance (ANOVA). The results indicate significant trends in cytotoxic activity relative to collection location, taxonomy, plant part, and compound classes isolated. Plant collections were made in tropical forests in six countries, with collections from Ecuador resulting in higher activity than those from Indonesia and Peru. Interestingly, collections from Florida were not statistically different than those from the countries with higher biodiversity. One hundred and forty-five families were represented in the collections, with the Clusiaceae, Elaeocarpaceae, Meliaceae, and Rubiaceae having low ED50 (half maximal effective dose) values. Especially active genera included Aglaia, Casearia, Exostema, Mallotus, and Trichosanthes. Roots and below-ground plant materials were significantly more active than above-ground materials. Cucurbitacins, flavaglines, anthraquinones, fatty acids, tropane alkaloids, lignans, and sesquiterpenoids were significantly more active than xanthones and oligorhamnosides. The results from this study should serve as a guide for future plant collection endeavors for anticancer drug discovery. PMID:17193321
Balunas, Marcy J; Jones, William P; Chin, Young-Won; Mi, Qiuwen; Farnsworth, Norman R; Soejarto, Djaja D; Cordell, Geoffrey A; Swanson, Steven M; Pezzuto, John M; Chai, Hee-Byung; Kinghorn, A Douglas
Background It is well-known that health care workers in today’s general hospitals have to deal with high levels of job demands, which could have negative effects on their health, well-being, and job performance. A way to reduce job-related stress reactions and to optimize positive work-related outcomes is to raise the level of specific job resources and opportunities to recover from work. However, the question remains how to translate the optimization of the balance between job demands, job resources, and recovery opportunities into effective workplace interventions. The aim of the DISCovery project is to develop and implement tailored work-oriented interventions to improve health, well-being, and performance of health care personnel. Methods/Design A quasi-experimental field study with a non-equivalent control group pretest-posttest design will be conducted in a top general hospital. Four existing organizational departments will provide both an intervention and a comparison group. Two types of research methods are used: (1) a longitudinal web-based survey study, and (2) a longitudinal daily diary study. After base-line measures of both methods, existing and yet to be developed interventions will be implemented within the experimental groups. Follow-up measurements will be taken one and two years after the base-line measures to analyze short-term and long-term effects of the interventions. Additionally, a process evaluation and a cost-effectiveness analysis will be carried out. Discussion The DISCovery project fulfills a strong need for theory-driven and scientifically well-performed research on job stress and performance interventions. It will provide insight into (1) how a balance between job demands, job resources, and recovery from work can be optimized, (2) the short-term and long-term effects of tailored work-oriented effects, and (3) indicators for successful or unsuccessful implementation of interventions.
The Gene Ontology (GO) is a collaborative effort that provides structured vocabularies for annotating the molecular function, biological role, and cellular location of gene products in a highly systematic way and in a species-neutral manner with the aim of unifying the representation of gene function across different organisms. Each contributing member of the GO Consortium independently associates GO terms to gene products from the organism(s) they are annotating. Here we introduce the Reference Genome project, which brings together those independent efforts into a unified framework based on the evolutionary relationships between genes in these different organisms. The Reference Genome project has two primary goals: to increase the depth and breadth of annotations for genes in each of the organisms in the project, and to create data sets and tools that enable other genome annotation efforts to infer GO annotations for homologous genes in their organisms. In addition, the project has several important incidental benefits, such as increasing annotation consistency across genome databases, and providing important improvements to the GO's logical structure and biological content. PMID:19578431
INFOGENE is a database of known and predicted genestructures with descriptions of basic functional signalsand gene components. It provides a possibility tocreate compilations of sequences with a given genefeature as well as to accumulate and analyze predictedgenes in finished and unfinished sequences fromgenome sequencing projects. Protein sequence similaritysearches in the database of predicted proteins isoffered through the BLASTP program. INFOGENE
Victor V. Solovyev; Asaf A. Salamov
The SALMFamides are a family of neuropeptides that act as muscle relaxants in the phylum Echinodermata. Two types of SALMFamides have been identified in echinoderms: firstly, the prototypical L-type SALMFamide peptides with the C-terminal sequence Leu-X-Phe-NH(2) (where X is variable), which have been identified in several starfish species and in the sea cucumber Holothuria glaberrima; secondly, F-type SALMFamide peptides with the C-terminal sequence Phe-X-Phe-NH(2), which have been identified in the sea cucumber Apostichopus japonicus. However, the genetic basis and functional significance of the occurrence of these two types of SALMFamides in echinoderms are unknown. Here we have obtained a new insight on this issue with the discovery that in the sea urchin Strongylocentrotus purpuratus there are two SALMFamide genes. In addition to a gene encoding seven putative F-type SALMFamide neuropeptides with the C-terminal sequence Phe-X-Phe-NH(2) (SpurS1-SpurS7), which has been reported previously (Elphick and Thorndyke, 2005; J. Exp. Biol., 208, 4273-4282), we have identified a gene that is expressed in the nervous system and that encodes a precursor of two putative L-type SALMFamide neuropeptides with the C-terminal sequences Ile-His-Phe-NH(2) (SpurS8) and Leu-Leu-Phe-NH(2) (SpurS9). Our discovery has revealed for the first time that L-type and F-type SALMFamide neuropeptides can coexist in an echinoderm species but are encoded by different genes. We speculate that this feature of S. purpuratus may apply to other echinoderms and further insights on this issue will be possible if genomic and/or neural cDNA sequence data are obtained for other echinoderm species. PMID:21798202
Rowe, Matthew L; Elphick, Maurice R
Biologists require genetic as well as molecular tools to decipher genomic information and ultimately to understand gene function. The Berkeley Drosophila Genome Project is addressing these needs with a massive gene disruption project that uses individual, genetically engineered P transposable elements to target open reading frames throughout the Drosophila genome DNA flanking the insertions is sequenced thereby placing and extensive series of genetic markers on the physical genomic map and associating insertions with specific open reading frames and genes. Insertions from the collection now lie within or near most Drosophila genes, greatly reducing the time required to identify new mutations and analyze gene functions. Information revealed from these studies about P element site specificity is being used to target the remaining open reading frames. 38 refs., 5 figs., 1 tab.
Spradling, A.C.; Stern, D.M. [Howard Hughes Medical Institute Research Labs., Baltimore, MD (United States); Kiss, I. [Institute of Genetics, Szeged (Hungary)] [and others
Biologists require genetic as well as molecular tools to decipher genomic information and ultimately to understand gene function. The Berkeley Drosophila Genome Project is addressing these needs with a massive gene disruption project that uses individual, genetically engineered P transposable elements to target open reading frames throughout the Drosophila genome. DNA flanking the insertions is sequenced, thereby placing an extensive series of genetic markers on the physical genomic map and associating insertions with specific open reading frames and genes. Insertions from the collection now lie within or near most Drosophila genes, greatly reducing the time required to identify new mutations and analyze gene functions. Information revealed from these studies about P element site specificity is being used to target the remaining open reading frames. PMID:7479892
Spradling, A C; Stern, D M; Kiss, I; Roote, J; Laverty, T; Rubin, G M
Biologists require genetic as well as molecular tools to decipher genomic information and ultimately to understand gene function. The Berkeley Drosophila Genome Project is addressing these needs with a massive gene disruption project that uses individual, genetically engineered P transposable elements to target open reading frames throughout the Drosophila genome. DNA flanking the insertions is sequenced, thereby placing an extensive series of genetic markers on the physical genomic map and associating insertions with specific open reading frames and genes. Insertions from the collection now lie within or near most Drosophila genes, greatly reducing the time required to identify new mutations and analyze gene functions. Information revealed from these studies about P element site specificity is being used to target the remaining open reading frames. Images Fig. 1 Fig. 2 Fig. 3 Fig. 4 Fig. 5
Spradling, A C; Stern, D M; Kiss, I; Roote, J; Laverty, T; Rubin, G M
The Drosophila Gene Disruption Project (GDP) has created a public collection of mutant strains containing single transposon insertions associated with different genes. These strains often disrupt gene function directly, allow production of new alleles, and have many other applications for analyzing gene function. Here we describe the addition of ?7600 new strains, which were selected from >140,000 additional P or piggyBac element integrations and 12,500 newly generated insertions of the Minos transposon. These additions nearly double the size of the collection and increase the number of tagged genes to at least 9440, approximately two-thirds of all annotated protein-coding genes. We also compare the site specificity of the three major transposons used in the project. All three elements insert only rarely within many Polycomb-regulated regions, a property that may contribute to the origin of “transposon-free regions” (TFRs) in metazoan genomes. Within other genomic regions, Minos transposes essentially at random, whereas P or piggyBac elements display distinctive hotspots and coldspots. P elements, as previously shown, have a strong preference for promoters. In contrast, piggyBac site selectivity suggests that it has evolved to reduce deleterious and increase adaptive changes in host gene expression. The propensity of Minos to integrate broadly makes possible a hybrid finishing strategy for the project that will bring >95% of Drosophila genes under experimental control within their native genomic contexts.
Bellen, Hugo J.; Levis, Robert W.; He, Yuchun; Carlson, Joseph W.; Evans-Holm, Martha; Bae, Eunkyung; Kim, Jaeseob; Metaxakis, Athanasios; Savakis, Charalambos; Schulze, Karen L.; Hoskins, Roger A.; Spradling, Allan C.
The rhodopsin family of G-protein-coupled receptors (GPCRs) is the largest known group of cell-surface mediators of signal transduction. The vast majority of these receptors were discovered by methods based upon shared sequence homologies found throughout this family. While such efforts identified a multitude of receptor subtypes for previously known ligands, numerous receptors have been discovered for which endogenous ligands were unknown. These receptors are commonly referred to as orphan receptors. One of the most important tasks of modern pharmacology lies in elucidating the functions of these receptors. Of particular interest are receptors with recognised expression in the central nervous system, given that many psychiatric and neurodegenerative disorders are mediated by unknown mechanisms. Hence, this collection of putative neurotransmitter and neuromodulator signal mediators represents a substantial and untapped resource for novel drug discovery. Recently, various methodologies have accelerated the discovery of novel ligands for these orphan receptors, identifying the basic components required for further physiological ligand/receptor system characterisation. Equipped with proven ligand identification strategies, the characterisation of all orphan GPCRs and the exploitation of their exciting potential as targets for the discovery of novel drugs is anticipated. PMID:12223080
Lee, Dennis K; George, Susan R; O'Dowd, Brian F
Information on gene clusters for natural product biosynthesis is accumulating rapidly because of the current boom of available genome sequencing data. However, linking a natural product to a specific gene cluster remains challenging. Here, we present a widely applicable strategy for the identification of gene clusters for specific natural products, which we name natural product proteomining. The method is based on using fluctuating growth conditions that ensure differential biosynthesis of the bioactivity of interest. Subsequent combination of metabolomics and quantitative proteomics establishes correlations between abundance of natural products and concomitant changes in the protein pool, which allows identification of the relevant biosynthetic gene cluster. We used this approach to elucidate gene clusters for different natural products in Bacillus and Streptomyces, including a novel juglomycin-type antibiotic. Natural product proteomining does not require prior knowledge of the gene cluster or secondary metabolite and therefore represents a general strategy for identification of all types of gene clusters. PMID:24816229
Gubbens, Jacob; Zhu, Hua; Girard, Geneviève; Song, Lijiang; Florea, Bogdan I; Aston, Philip; Ichinose, Koji; Filippov, Dmitri V; Choi, Young H; Overkleeft, Herman S; Challis, Gregory L; van Wezel, Gilles P
The antibiotics lactonamycin and lactonamycin Z provide attractive leads for antibacterial drug development. Both antibiotics contain a novel aglycone core called lactonamycinone. To gain insight into lactonamycinone biosynthesis, cloning and precursor incorporation experiments were undertaken. The lactonamycin gene cluster was initially cloned from Streptomyces rishiriensis. Sequencing of ca. 61 kb of S. rishiriensis DNA revealed the presence of 57 open reading frames. These included genes coding for the biosynthesis of l-rhodinose, the sugar found in lactonamycin, and genes similar to those in the tetracenomycin biosynthetic gene cluster. Since lactonamycin production by S. rishiriensis could not be sustained, additional proof for the identity of the S. rishiriensis cluster was obtained by cloning the lactonamycin Z gene cluster from Streptomyces sanglieri. Partial sequencing of the S. sanglieri cluster revealed 15 genes that exhibited a very high degree of similarity to genes within the lactonamycin cluster, as well as an identical organization. Double-crossover disruption of one gene in the S. sanglieri cluster abolished lactonamycin Z production, and production was restored by complementation. These results confirm the identity of the genetic locus cloned from S. sanglieri and indicate that the highly similar locus in S. rishiriensis encodes lactonamycin biosynthetic genes. Precursor incorporation experiments with S. sanglieri revealed that lactonamycinone is biosynthesized in an unusual manner whereby glycine or a glycine derivative serves as a starter unit that is extended by nine acetate units. Analysis of the gene clusters and of the precursor incorporation data suggested a hypothetical scheme for lactonamycinone biosynthesis.
Zhang, Xiujun; Alemany, Lawrence B.; Fiedler, Hans-Peter; Goodfellow, Michael; Parry, Ronald J.
The antibiotics lactonamycin and lactonamycin Z provide attractive leads for antibacterial drug development. Both antibiotics contain a novel aglycone core called lactonamycinone. To gain insight into lactonamycinone biosynthesis, cloning and precursor incorporation experiments were undertaken. The lactonamycin gene cluster was initially cloned from Streptomyces rishiriensis. Sequencing of ca. 61 kb of S. rishiriensis DNA revealed the presence of 57 open reading frames. These included genes coding for the biosynthesis of l-rhodinose, the sugar found in lactonamycin, and genes similar to those in the tetracenomycin biosynthetic gene cluster. Since lactonamycin production by S. rishiriensis could not be sustained, additional proof for the identity of the S. rishiriensis cluster was obtained by cloning the lactonamycin Z gene cluster from Streptomyces sanglieri. Partial sequencing of the S. sanglieri cluster revealed 15 genes that exhibited a very high degree of similarity to genes within the lactonamycin cluster, as well as an identical organization. Double-crossover disruption of one gene in the S. sanglieri cluster abolished lactonamycin Z production, and production was restored by complementation. These results confirm the identity of the genetic locus cloned from S. sanglieri and indicate that the highly similar locus in S. rishiriensis encodes lactonamycin biosynthetic genes. Precursor incorporation experiments with S. sanglieri revealed that lactonamycinone is biosynthesized in an unusual manner whereby glycine or a glycine derivative serves as a starter unit that is extended by nine acetate units. Analysis of the gene clusters and of the precursor incorporation data suggested a hypothetical scheme for lactonamycinone biosynthesis. PMID:18070976
Zhang, Xiujun; Alemany, Lawrence B; Fiedler, Hans-Peter; Goodfellow, Michael; Parry, Ronald J
The scientific discovery process comes alive for 70 minority students each year at Uniondale High School in New York where students have won top awards for "in-house" projects. Uniondale High School is in a middle-income school district where over 95% of students are from minority groups. Founded in 2000, the Uniondale High School Research Program…
Zaikowski, Lori; Lichtman, Paul; Quarless, Duncan
In the paper ``The Extragalactic Distance Scale Key Project. III. The Discovery of Cepheids and a New Distance to M101 Using the Hubble Space Telescope'' by Daniel D. Kelson, Garth D. Illingworth, Wendy F. Freedman, John A. Graham, Robert Hill, Barry F. Madore, Abhijit Saha, Peter B. Stetson, Robert C. Kennicutt, Jr., Jeremy R. Mould, Shaun M. Hughes, Laura Ferrarese, Randy Phelps, Anne Turner, Kem H. Cook, Holland Ford, John G. Hoessel, and John Huchra (ApJ, 463, 26 ), two of the tables are in error. The magnitudes in Tables B1 and B2, in Appendix B, are ordered incorrectly. As a result, the Julian dates are not associated with their correct Cepheid magnitudes. We have now corrected these data, and updated versions of the tables are available on the World Wide Web. The tables are available in ASCII format at our Key Project site (http://www.ipac.caltech.edu/H0kp/) and will appear in volume 7 of the AAS CDROM. PostScript and paper copies are also available from the first author (http://www.ucolick.org/~kelson/H0/home.html or firstname.lastname@example.org).
Kelson, Daniel D; Illingworth, Garth D.; Freedman, Wendy F.; Graham, John A.; Hill, Robert; Madore, Barry F.; Saha, Abhijit; Stetson, Peter B.; Kennicutt, Robert C., Jr.; Mould, Jeremy R.; Hughes, Shaun M.; Ferrarese, Laura; Phelps, Randy; Turner, Anne; Cook, Kem H.; Ford, Holland; Hoessel, John G.; Huchra, John
We report on the discovery of 30 new Cepheids in the nearby galaxy M81 based on observations using the Hubble Space Telescope (HST). The periods of these Cepheids lie in the range of 10-55 days, based on 18 independent epochs using the HST wide-band F555W filter. The HST F555W and F785LP data have been transformed to the Cousins standard V and I magnitude system using a ground-based calibration. Apparent period-luminosity relations at V and I were constructed, from which apparent distance moduli were measured with respect to assumed values of mu(sub 0) = 18.50 mag and E(B - V) = 0.10 mag for the Large Magellanic Cloud. The difference in the apparent V and I moduli yields a measure of the difference in the total mean extinction between the M81 and the LMC Cepheid samples. A low total mean extinction to the M81 sample of E(B - V) = 0.03 +/- 0.05 mag is obtained. The true distance modulus to M81 is determined to be 27.80 +/- 0.20 mag, corresponding to a distance of 3.63 +/- 0.34 Mpc. These data illustrate that with an optimal (power-law) sampling strategy, the HST provides a powerful tool for the discovery of extragalactic Cepheids and their application to the distance scale. M81 is the first calibrating galaxy in the target sample of the HST Key Project on the Extragalactic Distance Scale, the ultimate aim of which is to provide a value of the Hubble constant to 10% accuracy.
Freedman, Wendy L.; Hughes, Shaun M.; Madore, Barry F.; Mould, Jeremy R.; Lee, Myung Gyoon; Stetson, Peter; Kennicutt, Robert C.; Turner, Anne; Ferrarese, Laura; Ford, Holland
Background Next generation sequencing (NGS) technologies are providing new ways to accelerate fine-mapping and gene isolation in many species. To date, the majority of these efforts have focused on diploid organisms with readily available whole genome sequence information. In this study, as a proof of concept, we tested the use of NGS for SNP discovery in tetraploid wheat lines differing for the previously cloned grain protein content (GPC) gene GPC-B1. Bulked segregant analysis (BSA) was used to define a subset of putative SNPs within the candidate gene region, which were then used to fine-map GPC-B1. Results We used Illumina paired end technology to sequence mRNA (RNAseq) from near isogenic lines differing across a ~30-cM interval including the GPC-B1 locus. After discriminating for SNPs between the two homoeologous wheat genomes and additional quality filtering, we identified inter-varietal SNPs in wheat unigenes between the parental lines. The relative frequency of these SNPs was examined by RNAseq in two bulked samples made up of homozygous recombinant lines differing for their GPC phenotype. SNPs that were enriched at least 3-fold in the corresponding pool (6.5% of all SNPs) were further evaluated. Marker assays were designed for a subset of the enriched SNPs and mapped using DNA from individuals of each bulk. Thirty nine new SNP markers, corresponding to 67% of the validated SNPs, mapped across a 12.2-cM interval including GPC-B1. This translated to 1 SNP marker per 0.31 cM defining the GPC-B1 gene to within 13-18 genes in syntenic cereal genomes and to a 0.4 cM interval in wheat. Conclusions This study exemplifies the use of RNAseq for SNP discovery in polyploid species and supports the use of BSA as an effective way to target SNPs to specific genetic intervals to fine-map genes in unsequenced genomes.
The ridgetail white prawn Exopalaemon carinicauda is one of the most important commercial species in eastern China. However, little information of immune genes in E. carinicauda has been reported. To identify distinctive genes associated with immunity, an expressed sequence tag (EST) library was constructed from hemocytes of E. carinicauda. A total of 3411 clones were sequenced, yielding 2853 ESTs and the average sequence length is 436 bp. The cluster and assembly analysis yielded 1053 unique sequences including 329 contigs and 724 singletons. Blast analysis identified 593 (56.3%) of the unique sequences as orthologs of genes from other organisms (E-value < 1e-5). Based on the COG and Gene Ontology (GO), 593 unique sequences were classified. Through comparison with previous studies, 153 genes assembled from 367 ESTs have been identified as possibly involved in defense or immune functions. These genes are categorized into seven categories according to their putative functions in shrimp immune system: antimicrobial peptides, prophenoloxidase activating system, antioxidant defense systems, chaperone proteins, clottable proteins, pattern recognition receptors and other immune-related genes. According to EST abundance, the major immune-related genes were thioredoxin (141, 4.94% of all ESTs) and calmodulin (14, 0.49% of all ESTs). The EST sequences of E. carinicauda hemocytes provide important information of the immune system and lay the groundwork for development of molecular markers related to disease resistance in prawn species.
Duan, Yafei; Liu, Ping; Li, Jitao; Li, Jian; Chen, Ping
Background Kaposi's sarcoma-associated herpesvirus (KSHV) and Epstein-Barr virus (EBV) are related human tumor viruses that cause primary effusion lymphomas (PEL) and Burkitt's lymphomas (BL), respectively. Viral genes expressed in naturally-infected cancer cells contribute to disease pathogenesis; knowing which viral genes are expressed is critical in understanding how these viruses cause cancer. To evaluate the expression of viral genes, we used high-resolution separation and mass spectrometry coupled with custom tiling arrays to align the viral proteomes and transcriptomes of three PEL and two BL cell lines under latent and lytic culture conditions. Results The majority of viral genes were efficiently detected at the transcript and/or protein level on manipulating the viral life cycle. Overall the correlation of expressed viral proteins and transcripts was highly complementary in both validating and providing orthogonal data with latent/lytic viral gene expression. Our approach also identified novel viral genes in both KSHV and EBV, and extends viral genome annotation. Several previously uncharacterized genes were validated at both transcript and protein levels. Conclusions This systems biology approach coupling proteome and transcriptome measurements provides a comprehensive view of viral gene expression that could not have been attained using each methodology independently. Detection of viral proteins in combination with viral transcripts is a potentially powerful method for establishing virus-disease relationships.
In spite of its economic importance, very little molecular genetics and genomic research has been targeted at the family Paulownia spp. The little genetic information on this plant is a big obstacle to studying the mechanisms of its ability to resist Paulownia Witches' Broom (PaWB) disease. Analysis of the Paulownia transcriptome and its expression profile data are essential to extending the genetic resources on this species, thus will greatly improves our studies on Paulownia. In the current study, we performed the de novo assembly of a transcriptome on P. tomentosa × P. fortunei using the short-read sequencing technology (Illumina). 203,664 unigenes with a mean length of 1,328 bp was obtained. Of these unigenes, 32,976 (30% of all unigenes) containing complete structures were chosen. Eukaryotic clusters of orthologous groups, gene orthology, and the Kyoto Encyclopedia of Genes and Genomes annotations were performed of these unigenes. Genes related to PaWB disease resistance were analyzed in detail. To our knowledge, this is the first study to elucidate the genetic makeup of Paulownia. This transcriptome provides a quick way to understanding Paulownia, increases the number of gene sequences available for further functional genomics studies and provides clues to the identification of potential PaWB disease resistance genes. This study has provided a comprehensive insight into gene expression profiles at different states, which facilitates the study of each gene's roles in the developmental process and in PaWB disease resistance. PMID:24278262
Liu, Rongning; Dong, Yanpeng; Fan, Guoqiang; Zhao, Zhenli; Deng, Minjie; Cao, Xibing; Niu, Suyan
Exocarp color of sand pear is an important trait for the fruit production and has caused our concern for a long time. Our previous study explored the different expression genes between the two genotypes contrasting for exocarp color, which indicated the different suberin, cutin, wax and lignin biosynthesis between the russet- and green-exocarp. In this study, we carried out microscopic observation and Fourier transform infrared spectroscopy analysis to detect the differences of tissue structure and biochemical composition between the russet- and green-exocarp of sand pear. The green exocarp was covered with epidermis and cuticle which was replaced by a cork layer on the surface of russet exocarp, and the chemicals of the russet exocarp were characterized by lignin, cellulose and hemicellulose. We explored differential gene expression between the russet exocarp of 'Niitaka' and its green exocarp mutant cv. 'Suisho' using Illumina RNA-sequencing. A total of 559 unigenes showed different expression between the two types of exocarp, and 123 of them were common to the previous study. The quantitative real time-PCR analysis supports the RNA-seq-derived gene with different expression between the two types of exocarp and revealed the preferential expression of these genes in exocarp than in mesocarp and fruit core. Gene ontology enrichment analysis revealed divorced expression of lipid metabolic process genes, transport genes, stress responsive genes and other biological process genes in the two types of exocarp. Expression changes in lignin metabolism-related genes were consistent with the different pigmentation of russet and green exocarp. Increased transcripts of putative genes involved the suberin, cutin and wax biosynthesis in 'Suisho' exocarp could facilitate deposition of the chemicals and take a role in the mutant trait responsible for the green exocarp. In addition, the divorced expression of ATP-binding cassette transporters involved in the trans-membrane transport of lignin, cutin, and suberin precursors suggests that the transport process could also affect the composition of exocarp and take a role in the regulation of exocarp pigmentation. Results from this study provide a base for the analysis of the molecular mechanism underlying sand pear russet/green exocarp mutation, and presents a comprehensive list of candidate genes that could be used to further investigate the trait mutation at the molecular level. PMID:24445590
Wang, Yue-zhi; Zhang, Shujun; Dai, Mei-song; Shi, Ze-bin
The role of genetic and environmental factors, as well as their interaction, in the natural history of asthma, allergic rhinitis and chronic obstructive pulmonary disease (COPD) is largely unknown. This is mainly due to the lack of large-scale analytical epidemiological/genetic studies aimed at investigating these 3 respiratory conditions simultaneously. The GEIRD project is a collaborative initiative designed to collect information on biomarkers of inflammation and oxidative stress, individual and ecological exposures, diet, early-life factors, smoking habits, genetic traits and medication use in large and accurately defined series of asthma, allergic rhinitis and COPD phenotypes. It is a population-based multicase-control design, where cases and controls are identified through a 2-stage screening process (postal questionnaire and clinical examination) in pre-existing cohorts or new samples of subjects. It is aimed at elucidating the role that modifiable and genetic factors play in the occurrence, persistence, severity and control of inflammatory airway diseases, by way of the establishment of a historical multicentre standardized databank of phenotypes, contributed by and openly available to international epidemiologists. Researchers conducting population-based surveys with standardized methods may contribute to the public-domain case-control database, and use the resulting increased power to answer their own scientific questions. PMID:20150743
de Marco, R; Accordini, S; Antonicelli, L; Bellia, V; Bettin, M D; Bombieri, C; Bonifazi, F; Bugiani, M; Carosso, A; Casali, L; Cazzoletti, L; Cerveri, I; Corsico, A G; Ferrari, M; Fois, A G; Lo Cascio, V; Marcon, A; Marinoni, A; Olivieri, M; Perbellini, L; Pignatti, P; Pirina, P; Poli, A; Rolla, G; Trabetti, E; Verlato, G; Villani, S; Zanolin, M E
Summary We discuss a Bayesian discovery procedure for multiple comparison problems. We show that under a coherent decision theoretic framework, a loss function combining true positive and false positive counts leads to a decision rule based on a threshold of the posterior probability of the alternative. Under a semi-parametric model for the data, we show that the Bayes rule can be approximated by the optimal discovery procedure (ODP), recently introduced by Storey (2007a). Improving the approximation leads us to a Bayesian discovery procedure (BDP), which exploits the multiple shrinkage in clusters implied by the assumed nonparametric model. We compare the BDP and the ODP estimates in a simple simulation study and in an assessment of differential gene expression based on microarray data from tumor samples. We extend the setting of the ODP by discussing modifications of the loss function that lead to different single thresholding statistics. Finally, we provide an application of the previous arguments to dependent (spatial) data.
Guindani, Michele; Muller, Peter; Zhang, Song
Insecticide resistance has recently become a critical concern for control of many insect pest species. Genome sequencing and global quantization of gene expression through analysis of the transcriptome can provide useful information relevant to this challenging problem. The oriental fruit fly, Bactrocera dorsalis, is one of the world's most destructive agricultural pests, and recently it has been used as a target for studies of genetic mechanisms related to insecticide resistance. However, prior to this study, the molecular data available for this species was largely limited to genes identified through homology. To provide a broader pool of gene sequences of potential interest with regard to insecticide resistance, this study uses whole transcriptome analysis developed through de novo assembly of short reads generated by next-generation sequencing (NGS). The transcriptome of B. dorsalis was initially constructed using Illumina's Solexa sequencing technology. Qualified reads were assembled into contigs and potential splicing variants (isotigs). A total of 29,067 isotigs have putative homologues in the non-redundant (nr) protein database from NCBI, and 11,073 of these correspond to distinct D. melanogaster proteins in the RefSeq database. Approximately 5,546 isotigs contain coding sequences that are at least 80% complete and appear to represent B. dorsalis genes. We observed a strong correlation between the completeness of the assembled sequences and the expression intensity of the transcripts. The assembled sequences were also used to identify large numbers of genes potentially belonging to families related to insecticide resistance. A total of 90 P450-, 42 GST-and 37 COE-related genes, representing three major enzyme families involved in insecticide metabolism and resistance, were identified. In addition, 36 isotigs were discovered to contain target site sequences related to four classes of resistance genes. Identified sequence motifs were also analyzed to characterize putative polypeptide translational products and associate them with specific genes and protein functions.
Hsu, Ju-Chun; Wu, Wen-Jer; Feng, Hai-Tung; Haymer, David S.; Chen, Chien-Yu
We report the generation and analysis of a total of 77,583 expressed sequence tags (ESTs) from two grapevine (Vitis vinifera L.) cultivars, Cabernet Sauvignon (wine grape) and Muscat Hamburg (table grape) with a focus on EST sequence quality and assembly optimization. The majority of the ESTs were derived from normalized cDNA libraries representing berry pericarp and seed developmental series, pooled non-berry tissues including root, flower, and leaf in Cabernet Sauvignon, and pooled tissues of berry, seed, and flower in Muscat Hamburg. EST and unigene sequence quality were determined by computational filtering coupled with small-scale contig reassembly, manual review, and BLAST analyses. EST assembly was optimized to better discriminate among closely related paralogs using two independent grape sequence sets, a previously published set of Vitis spp. gene families and our EST dataset derived from pooled leaf, flower, and root tissues of Cabernet Sauvignon. Sequence assembly within individual libraries indicated that those prepared from pooled tissues contributed the most to gene discovery. Annotations based upon searches against multiple databases including tomato and strawberry sequences helped to identify putative functions of ESTs and unigenes, particularly with respect to fleshy fruit development. Sequence comparison among the three wine grape libraries identified a number of genes preferentially expressed in the pericarp tissue, including transcription factors, receptor-like protein kinases, and hexose transporters. Gene ontology (GO) classification in the biological process aspect showed that GO categories corresponding to 'transport' and 'cell organization and biogenesis', which are associated with metabolite movement and cell wall structural changes during berry ripening, were higher in pericarp than in other tissues in the wine grape studied. The sequence data were used to characterize potential roles of new genes in berry development and composition. PMID:17761391
Peng, Fred Y; Reid, Karen E; Liao, Nancy; Schlosser, James; Lijavetzky, Diego; Holt, Robert; Martínez Zapater, José M; Jones, Steven; Marra, Marco; Bohlmann, Jörg; Lund, Steven T
Understanding the molecular underpinnings involved in the reproduction of the salmon louse is critical for designing novel strategies of pest management for this ectoparasite. However, genomic information on sex-related genes is still limited. In the present work, sex-specific gene transcription was revealed in the salmon louse Caligus rogercresseyi using high-throughput Illumina sequencing. A total of 30,191,914 and 32,292,250 high quality reads were generated for females and males, and these were de novo assembled into 32,173 and 38,177 contigs, respectively. Gene ontology analysis showed a pattern of higher expression in the female as compared to the male transcriptome. Based on our sequence analysis and known sex-related proteins, several genes putatively involved in sex differentiation, including Dmrt3, FOXL2, VASA, and FEM1, and other potentially significant candidate genes in C. rogercresseyi, were identified for the first time. In addition, the occurrence of SNPs in several differentially expressed contigs annotating for sex-related genes was found. This transcriptome dataset provides a useful resource for future functional analyses, opening new opportunities for sea lice pest control. PMID:24642131
Farlora, Rodolfo; Araya-Garay, José; Gallardo-Escárate, Cristian
Little is known about the molecular development and heterogeneity of callosal projection neurons (CPN), cortical commissural neurons that connect homotopic regions of the two cerebral hemispheres via the corpus callosum, and that are critical for bilateral integration of cortical information. Here we report on the identification of a series of genes that individually and in combination define CPN and novel CPN subpopulations during embryonic and postnatal development. We used in situ hybridization analysis, immunocytochemistry, and retrograde labeling to define the layer- and neuron type-specific distribution of these newly identified CPN genes across different stages of maturation. We demonstrate that a subset of these genes (e.g. Hspb3 and Lpl), appear specific to all CPN (in layers II/III and V–VI), while others (e.g. Nectin-3, Plexin-D1 and Dkk3) discriminate between CPN of the deep layers and those of the upper layers. Further, the data show that several genes finely subdivide CPN within individual layers and appear to label CPN subpopulations that have not been previously described using anatomical or morphological criteria. The genes identified here likely reflect the existence of distinct programs of gene expression governing the development, maturation, and function of the newly identified subpopulations of CPN. Together, these data define the first set of genes that identify and molecularly subcategorize distinct populations of callosal projection neurons, often located in distinct subdivisions of the canonical cortical laminae.
Molyneaux, Bradley J.; Arlotta, Paola; Fame, Ryann M.; MacDonald, Jessica L.; MacQuarrie, Kyle L.; Macklis, Jeffrey D.
Background Turbot (Scophthalmus maximus L.) is an important aquacultural resource both in Europe and Asia. However, there is little information on gene sequences available in public databases. Currently, one of the main problems affecting the culture of this flatfish is mortality due to several pathogens, especially viral diseases which are not treatable. In order to identify new genes involved in immune defense, we conducted 454-pyrosequencing of the turbot transcriptome after different immune stimulations. Methodology/Principal Findings Turbot were injected with viral stimuli to increase the expression level of immune-related genes. High-throughput deep sequencing using 454-pyrosequencing technology yielded 915,256 high-quality reads. These sequences were assembled into 55,404 contigs that were subjected to annotation steps. Intriguingly, 55.16% of the deduced protein was not significantly similar to any sequences in the databases used for the annotation and only 0.85% of the BLASTx top-hits matched S. maximus protein sequences. This relatively low level of annotation is possibly due to the limited information for this specie and other flatfish in the database. These results suggest the identification of a large number of new genes in turbot and in fish in general. A more detailed analysis showed the presence of putative members of several innate and specific immune pathways. Conclusions/Significance To our knowledge, this study is the first transcriptome analysis using 454-pyrosequencing for turbot. Previously, there were only 12,471 EST and less of 1,500 nucleotide sequences for S. maximus in NCBI database. Our results provide a rich source of data (55,404 contigs and 181,845 singletons) for discovering and identifying new genes, which will serve as a basis for microarray construction, gene expression characterization and for identification of genetic markers to be used in several applications. Immune stimulation in turbot was very effective, obtaining an enormous variety of sequences belonging to genes involved in the defense mechanisms.
Pereiro, Patricia; Balseiro, Pablo; Romero, Alejandro; Dios, Sonia; Forn-Cuni, Gabriel; Fuste, Berta; Planas, Josep V.; Beltran, Sergi; Novoa, Beatriz; Figueras, Antonio
Background MicroRNAs (miRNAs) are an abundant class of endogenous small RNA molecules that downregulate gene expression at the posttranscriptional level. They play important roles in multiple biological processes by regulating genes that control developmental timing, growth, stem cell division and apoptosis by binding to the mRNA of target genes. Despite the position Atlantic salmon (Salmo salar) has as an economically important domesticated animal, there has been little research on miRNAs in this species. Knowledge about miRNAs and their target genes may be used to control health and to improve performance of economically important traits. However, before their biological function can be unravelled they must be identified and annotated. The aims of this study were to identify and characterize miRNA genes in Atlantic salmon by deep sequencing analysis of small RNA libraries from nine different tissues. Results A total of 180 distinct mature miRNAs belonging to 106 families of evolutionary conserved miRNAs, and 13 distinct novel mature miRNAs were discovered and characterized. The mature miRNAs corresponded to 521 putative precursor sequences located at unique genome locations. About 40% of these precursors were part of gene clusters, and the majority of the Salmo salar gene clusters discovered were conserved across species. Comparison of expression levels in samples from different tissues applying DESeq indicated that there were tissue specific expression differences in three conserved and one novel miRNA. Ssa-miR 736 was detected in heart tissue only, while two other clustered miRNAs (ssa-miR 212 and132) seems to be at a higher expression level in brain tissue. These observations correlate well with their expected functions as regulators of signal pathways in cardiac and neuronal cells, respectively. Ssa-miR 8163 is one of the novel miRNAs discovered and its function remains unknown. However, differential expression analysis using DESeq suggests that this miRNA is enriched in liver tissue and the precursor was mapped to intron 7 of the transferrin gene. Conclusions The identification and annotation of evolutionary conserved and novel Salmo salar miRNAs as well as the characterization of miRNA gene clusters provide biological knowledge that will greatly facilitate further functional studies on miRNAs in this species.
The oriental river prawn, Macrobrachium nipponense, is an important crustacean species in aquaculture. However, early gonad maturity is a ubiquitous problem which devalues the product quality. While husbandry and nutritional management have achieved little success in tackling this issue, a molecular approach may discover the genes involved in reproduction and development, which will provide the basic knowledge on reproductive control. In this study, a high-quality cDNA library of prawn was constructed from the ovary tissue. A total of 3294 successful sequencing reactions yielded 3256 expressed sequence tags (ESTs) longer than 100 bp. The cluster and assembly analyses yielded 1514 unique sequences including 414 contigs and 1168 singletons. About 719 (47.49%) unique sequences were identified as orthologs of genes from other organisms. By sequence comparability analysis, 28 important genes including cathepsin B, chromobox protein, Cdc2, cyclin B, DEAD box protein and ADF/cofilin protein were expressed. These genes may be involved in reproductive and developmental functions in prawn. Peritrophin consisting of cortical rods was also found in this species. The identification of these EST sequences in M. nipponense would improve our understanding on the genes that regulate reproduction and development in prawn species. This study also lays the groundwork for development of molecular markers related to ovary development in other prawn species. PMID:20403747
Wu, Ping; Qi, Dan; Chen, Liqiao; Zhang, Hao; Zhang, Xiaowei; Qin, Jian Guang; Hu, Songnian
Genes involved in ribosome biogenesis and assembly (RBA) are responsible for ribosome formation. In Saccharomyces cerevisiae, their transcription is regulated by two dissimilar DNA motifs. We were interested in analyzing conservation and divergence of RBA transcription regulation machinery throughout fungal evolution. We have identified orthologs of S. cerevisiae RBA genes in 39 species across fungal phylogeny and searched upstream regions of these gene sets for DNA sequences significantly similar to S. cerevisiae RBA regulatory motifs. In addition to confirming known motif arrangements comprising two different motifs in a set of S. cerevisiae close relatives or two instances of the same motif (that we refer to as modules), we have also discovered novel modules in a group of fungi closely related to Neurospora crassa. Despite a single nucleotide difference between consensus sequences of RBA motifs, modules associated with S, cerevisiae group and N. crassa group displayed consistently different characteristics with respect to preferred module organization and several other module properties. For a given species, we have found a correlation between the configuration of the RBA module and significant enrichment in a set of specific Gene Ontology biological processes. We have identified several likely new candidates for a role in ribosome biogenesis in S. cerevisiae based on the combined evidence of RBA module presence in the upstream regions, functional annotation information and microarray expression profiles. We believe that this approach will be useful in terms of generating hypotheses about functional roles of genes for which only fragmentary data from a single source are available.
Martyanov, Viktor; Gross, Robert H.
Osteoporosis is a complex disorder and commonly leads to fractures in elderly persons. Genome-wide association studies (GWAS) have become an unbiased approach to identify variations in the genome that potentially affect health. However, the genetic variants identified so far only explain a small proportion of the heritability for complex traits. Due to the modest genetic effect size and inadequate power, true association signals may not be revealed based on a stringent genome-wide significance threshold. Here, we take advantage of SNP and transcript arrays and integrate GWAS and expression signature profiling relevant to the skeletal system in cellular and animal models to prioritize the discovery of novel candidate genes for osteoporosis-related traits, including bone mineral density (BMD) at the lumbar spine (LS) and femoral neck (FN), as well as geometric indices of the hip (femoral neck-shaft angle, NSA; femoral neck length, NL; and narrow-neck width, NW). A two-stage meta-analysis of GWAS from 7,633 Caucasian women and 3,657 men, revealed three novel loci associated with osteoporosis-related traits, including chromosome 1p13.2 (RAP1A, p?=?3.6×10?8), 2q11.2 (TBC1D8), and 18q11.2 (OSBPL1A), and confirmed a previously reported region near TNFRSF11B/OPG gene. We also prioritized 16 suggestive genome-wide significant candidate genes based on their potential involvement in skeletal metabolism. Among them, 3 candidate genes were associated with BMD in women. Notably, 2 out of these 3 genes (GPR177, p?=?2.6×10?13; SOX6, p?=?6.4×10?10) associated with BMD in women have been successfully replicated in a large-scale meta-analysis of BMD, but none of the non-prioritized candidates (associated with BMD) did. Our results support the concept of our prioritization strategy. In the absence of direct biological support for identified genes, we highlighted the efficiency of subsequent functional characterization using publicly available expression profiling relevant to the skeletal system in cellular or whole animal models to prioritize candidate genes for further functional validation.
Demissie, Serkalem; Soranzo, Nicole; Bianchi, Estelle N.; Grundberg, Elin; Liang, Liming; Richards, J. Brent; Estrada, Karol; Zhou, Yanhua; van Nas, Atila; Moffatt, Miriam F.; Zhai, Guangju; Hofman, Albert; van Meurs, Joyce B.; Pols, Huibert A. P.; Price, Roger I.; Nilsson, Olle; Pastinen, Tomi; Cupples, L. Adrienne; Lusis, Aldons J.; Schadt, Eric E.; Ferrari, Serge; Uitterlinden, Andre G.
Compared with the actinomycetes in stone corals, the phylogenetic diversity of soft coral-associated culturable actinomycetes is essentially unexplored. Meanwhile, the knowledge of the natural products from coral-associated actinomycetes is very limited. In this study, thirty-two strains were isolated from the tissue of the soft coral Scleronephthya sp. in the East China Sea, which were grouped into eight genera by 16S rDNA phylogenetic analysis: Micromonospora, Gordonia, Mycobacterium, Nocardioides, Streptomyces, Cellulomonas, Dietzia and Rhodococcus. 6 Micromonospora strains and 4 Streptomyces strains were found to be with the potential for producing aromatic polyketides based on the analysis of KS? (ketoacyl-synthase) gene in the PKS II (type II polyketides synthase) gene cluster. Among the 6 Micromonospora strains, angucycline cyclase gene was amplified in 2 strains (A5-1 and A6-2), suggesting their potential in synthesizing angucyclines e.g. jadomycin. Under the guidance of functional gene prediction, one jadomycin B analogue (7b, 13-dihydro-7-O-methyl jadomycin B) was detected in the fermentation broth of Micromonospora sp. strain A5-1. This study highlights the phylogenetically diverse culturable actinomycetes associated with the tissue of soft coral Scleronephthya sp. and the potential of coral-derived actinomycetes especially Micromonospora in producing aromatic polyketides.
Sun, Wei; Peng, Chongsheng; Zhao, Yunyu; Li, Zhiyong
Background microRNAs (miRNAs) are a class of small non-coding RNAs which have been recognized as ubiquitous post-transcriptional regulators. The analysis of interactions between different miRNAs and their target genes is necessary for the understanding of miRNAs' role in the control of cell life and death. In this paper we propose a novel data mining algorithm, called HOCCLUS2, specifically designed to bicluster miRNAs and target messenger RNAs (mRNAs) on the basis of their experimentally-verified and/or predicted interactions. Indeed, existing biclustering approaches, typically used to analyze gene expression data, fail when applied to miRNA:mRNA interactions since they usually do not extract possibly overlapping biclusters (miRNAs and their target genes may have multiple roles), extract a huge amount of biclusters (difficult to browse and rank on the basis of their importance) and work on similarities of feature values (do not limit the analysis to reliable interactions). Results To overcome these limitations, HOCCLUS2 i) extracts possibly overlapping biclusters, to catch multiple roles of both miRNAs and their target genes; ii) extracts hierarchically organized biclusters, to facilitate bicluster browsing and to distinguish between universe and pathway-specific miRNAs; iii) extracts highly cohesive biclusters, to consider only reliable interactions; iv) ranks biclusters according to the functional similarities, computed on the basis of Gene Ontology, to facilitate bicluster analysis. Conclusions Our results show that HOCCLUS2 is a valid tool to support biologists in the identification of context-specific miRNAs regulatory modules and in the detection of possibly unknown miRNAs target genes. Indeed, results prove that HOCCLUS2 is able to extract cohesiveness-preserving biclusters, when compared with competitive approaches, and statistically confirm (at a confidence level of 99%) that mRNAs which belong to the same biclusters are, on average, more functionally similar than mRNAs which belong to different biclusters. Finally, the hierarchy of biclusters provides useful insights to understand the intrinsic hierarchical organization of miRNAs and their potential multiple interactions on target genes.
Chickpea is the world's third most important legume crop and belongs to Fabaceae family but suffered from severe yield loss due to various biotic and abiotic stresses. Development of modern genomic tools such as molecular markers and identification of resistant genes associated with these stresses facilitate improvement in chickpea breeding towards abiotic stress tolerance. In this study, 1597 high-quality expressed sequence tags (ESTs) were generated from a cDNA library of variety Pusa 1105 root tissue after cadmium (Cd) treatment. Assembly of ESTs resulted in a total of 914 unigenes of which putative homology was obtained for 38.8 % of unigenes after BLASTX search. In terms of species distribution, majority of sequences found similarity with Medicago truncatula followed by Glycine max, Vitis vinifera and Populus trichocarpa and Pisum sativum sequences. Functional annotation was assigned using Blast2Go, and the Gene Ontology (GO) terms were categorized into biological process, molecular function and cellular component. Approximately 10.83 % of unigenes were assigned at least one GO term. Moreover, in the distribution of transcripts into various biological pathways, 20 of the annotated transcripts were assigned to ten pathways in KEGG database. A majority of the genes were found to be involved in sulphur and nitrogen metabolism. In the quantitative real-time PCR analysis, five of the transcription factors and three of the transporter genes were found to be highly expressed after Cd treatment. Besides, the utility of ESTs was demonstrated by exploiting them for the development of 83 genic molecular markers including EST-simple sequence repeats and intron targeted polymorphism that would assist in tagging of genes related to metal stress for future prospects. PMID:24414095
Gaur, Rashmi; Bhatia, Sabhyata; Gupta, Meetu
In the effort to prepare the mouse full-length cDNA encyclopedia, we previously developed several techniques to prepare and select full-length cDNAs. To increase the number of different cDNAs, we introduce here a strategy to prepare normalized and subtracted cDNA libraries in a single step. The method is based on hybridization of the first-strand, full-length cDNA with several RNA drivers, including starting mRNA as the normalizing driver and run-off transcripts from minilibraries containing highly expressed genes, rearrayed clones, and previously sequenced cDNAs as subtracting drivers. Our method keeps the proportion of full-length cDNAs in the subtracted/normalized library high. Moreover, our method dramatically enhances the discovery of new genes as compared to results obtained by using standard, full-length cDNA libraries. This procedure can be extended to the preparation of full-length cDNA encyclopedias from other organisms. PMID:11042159
Carninci, P; Shibata, Y; Hayatsu, N; Sugahara, Y; Shibata, K; Itoh, M; Konno, H; Okazaki, Y; Muramatsu, M; Hayashizaki, Y
In the effort to prepare the mouse full-length cDNA encyclopedia, we previously developed several techniques to prepare and select full-length cDNAs. To increase the number of different cDNAs, we introduce here a strategy to prepare normalized and subtracted cDNA libraries in a single step. The method is based on hybridization of the first-strand, full-length cDNA with several RNA drivers, including starting mRNA as the normalizing driver and run-off transcripts from minilibraries containing highly expressed genes, rearrayed clones, and previously sequenced cDNAs as subtracting drivers. Our method keeps the proportion of full-length cDNAs in the subtracted/normalized library high. Moreover, our method dramatically enhances the discovery of new genes as compared to results obtained by using standard, full-length cDNA libraries. This procedure can be extended to the preparation of full-length cDNA encyclopedias from other organisms.
Carninci, Piero; Shibata, Yuko; Hayatsu, Norihito; Sugahara, Yuichi; Shibata, Kazuhiro; Itoh, Masayoshi; Konno, Hideaki; Okazaki, Yasushi; Muramatsu, Masami; Hayashizaki, Yoshihide
The students will understand that science theories change in the face of new evidence, but those changes can be slow in coming. To download the lesson plan as a pdf, see the document below. Students willResearch scientific discoveries that happened by accident in the past and learn how gamma-rays were discovered by 20th century scientists
The report is a transcript of workshop held to discuss issues involved in costs of human genome projects, held to support OTA assessment 'Mapping Our Genes'. The workshop was chaired by Paul Berg, Stanford University. Also participating were Christian Bur...
Background Whirling disease, caused by the pathogen Myxobolus cerebralis, afflicts several salmonid species. Rainbow trout are particularly susceptible and may suffer high mortality rates. The disease is persistent and spreading in hatcheries and natural waters of several countries, including the U.S.A., and the economic losses attributed to whirling disease are substantial. In this study, genome-wide expression profiling using cDNA microarrays was conducted for resistant Hofer and susceptible Trout Lodge rainbow trout strains following pathogen exposure with the primary objective of identifying specific genes implicated in whirling disease resistance. Results Several genes were significantly up-regulated in skin following pathogen exposure for both the resistant and susceptible rainbow trout strains. For both strains, response to infection appears to be linked with the interferon system. Expression profiles for three genes identified with microarrays were confirmed with qRT-PCR. Ubiquitin-like protein 1 was up-regulated over 100 fold and interferon regulating factor 1 was up-regulated over 15 fold following pathogen exposure for both strains. Expression of metallothionein B, which has known roles in inflammation and immune response, was up-regulated over 5 fold in the resistant Hofer strain but was unchanged in the susceptible Trout Lodge strain following pathogen exposure. Conclusion The present study has provided an initial view into the genetic basis underlying immune response and resistance of rainbow trout to the whirling disease parasite. The identified genes have allowed us to gain insight into the molecular mechanisms implicated in salmonid immune response and resistance to whirling disease infection.
Baerwald, Melinda R; Welsh, Amy B; Hedrick, Ronald P; May, Bernie
BackgroundEustigmatos cf. polyphem is a yellow-green unicellular soil microalga belonging to the eustimatophyte with high biomass and considerable production of triacylglycerols (TAGs) for biofuels, which is thus referred to as an oleaginous microalga. The paucity of microalgae genome sequences, however, limits development of gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for a
LingLin Wan; Juan Han; Min Sang; AiFen Li; Hong Wu; ShunJi Yin; ChengWu Zhang
The aim of this study was to investigate the gene and protein expression profiles of important drug-transporting proteins in human cell lines commonly used for studies of drug transport mechanisms. Human cell lines used to transiently or stably express single transporters [HeLa, human embryonic kidney (HEK) 293] and leukemia cell lines used to study drug resistance by ATP-binding cassette transporters (HL-60, K562) were investigated and compared with organotypic cell lines (HepG2, Saos-2, Caco-2, and Caco-2 TC7). For gene expression studies, real-time polymerase chain reaction was used, whereas monospecific polyclonal antibodies were generated and used to investigate protein expression by immunohistochemistry. Thirty-six transporters were studied for gene expression, and nine were studied for protein expression. The antibodies were validated using expression patterns in human tissues. Finally, the function of one ubiquitously expressed transporter, MCT1/SLC16A1, was investigated using [(14)C]lactic acid as a substrate. In general, the adherent cell lines (HeLa, HEK293) displayed low transporter expression, and the expression patterns were barely affected by transfection. The leukemia cell lines (K562, HL-60) and Saos-2 also had low endogenous transporter expression, whereas the organotypic cell lines (HepG2 and Caco-2) showed higher expression of some transporters. Comparison of gene and protein expression profiles gave poor correlations, but better agreement was obtained for antibodies with a good validation score, indicating that antibody quality was a significant variable. It is noteworthy that the monocarboxylic acid-transporting protein MCT1 was significantly expressed in all and was functional in most of the cell lines, indicating that MCT1 may be a confounding factor when the transport of small anionic drugs is investigated. PMID:19741037
Ahlin, Gustav; Hilgendorf, Constanze; Karlsson, Johan; Szigyarto, Cristina Al-Khalili; Uhlén, Mathias; Artursson, Per
Background The swimming crab, Portunus trituberculatus, is an important farmed species in China, has been attracting extensive studies, which require more and more genome background knowledge. To date, the sequencing of its whole genome is unavailable and transcriptomic information is also scarce for this species. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset for major tissues of Portunus trituberculatus by the Illumina paired-end sequencing technology. Results Total RNA was isolated from eyestalk, gill, heart, hepatopancreas and muscle. Equal quantities of RNA from each tissue were pooled to construct a cDNA library. Using the Illumina paired-end sequencing technology, we generated a total of 120,137 transcripts with an average length of 1037 bp. Further assembly analysis showed that all contigs contributed to 87,100 unigenes, of these, 16,029 unigenes (18.40% of the total) can be matched in the GenBank non-redundant database. Potential genes and their functions were predicted by GO, KEGG pathway mapping and COG analysis. Based on our sequence analysis and published literature, many putative genes with fundamental roles in growth and muscle development, including actin, myosin, tropomyosin, troponin and other potentially important candidate genes were identified for the first time in this specie. Furthermore, 22,673 SSRs and 66,191 high-confidence SNPs were identified in this EST dataset. Conclusion The transcriptome provides an invaluable new data for a functional genomics resource and future biological research in Portunus trituberculatus. The data will also instruct future functional studies to manipulate or select for genes influencing growth that should find practical applications in aquaculture breeding programs. The molecular markers identified in this study will provide a material basis for future genetic linkage and quantitative trait loci analyses, and will be essential for accelerating aquaculture breeding programs with this species.
Lv, Jianjian; Liu, Ping; Gao, Baoquan; Wang, Yu; Wang, Zheng; Chen, Ping; Li, Jian
Isogenic cell lines differing only in the expression of the protein of interest provide the ideal platform for cell-based screening. However, related natural lines differentially expressing the therapeutic target of choice are rare. Here the authors report a strategy for drug screening employing isogenic human cell lines in which the expression of the target protein is regulated by a gene-specific engineered zinc-finger protein (ZFP) transcription factor (TF). To demonstrate this approach, a ZFP TF activator of the human parathyroid hormone receptor 1 (PTHR1) gene was identified and introduced into HEK293 cells (negative for PTHR1). Following induction of ZFP TF expression, this cell line produced functional PTHR1 protein, resulting in a robust and ligand-specific cyclic adenosine monophosphate (cAMP) response. Reciprocally, the natural expression of PTHR1 observed in SAOS2 cells was dramatically reduced by the introduction of the appropriate PTHR1-specific ZFP TF repressor. Moreover, this ZFP-driven PTHR1 repression selectively eliminated the functional cAMP response invoked by known ligands of PTHR1. These data establish ZFP TF-generated isogenic lines as a general approach for the identification of therapeutic agents specific for the target gene of interest. PMID:15964931
Liu, Pei-Qi; Tan, Siyuan; Mendel, Matthew C; Murrills, Richard J; Bhat, Bheem M; Schlag, Brian; Samuel, Rachelle; Matteo, Jeanne J; de la Rosa, Ragan; Howes, Katherine; Reik, Andreas; Case, Casey C; Bex, Frederick J; Young, Kathleen; Gregory, Philip D
Background The oriental river prawn, Macrobrachium nipponense, is an economically and nutritionally important species of the Palaemonidae family of decapod crustaceans. To date, the sequencing of its whole genome is unavailable as a non-model organism. Transcriptomic information is also scarce for this species. In this study, we performed de novo transcriptome sequencing to produce the first comprehensive expressed sequence tag (EST) dataset for M. nipponense using high-throughput sequencing technologies. Methodology and Principal Findings Total RNA was isolated from eyestalk, gill, heart, ovary, testis, hepatopancreas, muscle, and embryos at the cleavage, gastrula, nauplius and zoea stages. Equal quantities of RNA from each tissue and stage were pooled to construct a cDNA library. Using 454 pyrosequencing technology, we generated a total of 984,204 high quality reads (338.59Mb) with an average length of 344 bp. Clustering and assembly of these reads produced a non-redundant set of 81,411 unique sequences, comprising 42,551 contigs and 38,860 singletons. All of the unique sequences were involved in the molecular function (30,425), cellular component (44,112) and biological process (67,679) categories by GO analysis. Potential genes and their functions were predicted by KEGG pathway mapping and COG analysis. Based on our sequence analysis and published literature, many putative genes involved in sex determination, including DMRT1, FTZ-F1, FOXL2, FEM1 and other potentially important candidate genes, were identified for the first time in this prawn. Furthermore, 6,689 SSRs and 18,107 high-confidence SNPs were identified in this EST dataset. Conclusions The transcriptome provides an invaluable new data for a functional genomics resource and future biological research in M. nipponense. The molecular markers identified in this study will provide a material basis for future genetic linkage and quantitative trait loci analyses, and will be essential for accelerating aquaculture breeding programs with this species.
Ma, Keyi; Qiu, Gaofeng; Feng, Jianbin; Li, Jiale
Background The Antarctic clam, Laternula elliptica, is an infaunal stenothermal bivalve mollusc with a circumpolar distribution. It plays a significant role in bentho-pelagic coupling and hence has been proposed as a sentinel species for climate change monitoring. Previous studies have shown that this mollusc displays a high level of plasticity with regard to shell deposition and damage repair against a background of genetic homogeneity. The Southern Ocean has amongst the lowest present-day CaCO3 saturation rate of any ocean region, and is predicted to be among the first to become undersaturated under current ocean acidification scenarios. Hence, this species presents as an ideal candidate for studies into the processes of calcium regulation and shell deposition in our changing ocean environments. Results 454 sequencing of L. elliptica mantle tissue generated 18,290 contigs with an average size of 535 bp (ranging between 142 bp-5.591 kb). BLAST sequence similarity searching assigned putative function to 17% of the data set, with a significant proportion of these transcripts being involved in binding and potentially of a secretory nature, as defined by GO molecular function and biological process classifications. These results indicated that the mantle is a transcriptionally active tissue which is actively proliferating. All transcripts were screened against an in-house database of genes shown to be involved in extracellular matrix formation and calcium homeostasis in metazoans. Putative identifications were made for a number of classical shell deposition genes, such as tyrosinase, carbonic anhydrase and metalloprotease 1, along with novel members of the family 2 G-Protein Coupled Receptors (GPCRs). A membrane transport protein (SEC61) was also characterised and this demonstrated the utility of the clam sequence data as a resource for examining cold adapted amino acid substitutions. The sequence data contained 46,235 microsatellites and 13,084 Single Nucleotide Polymorphisms(SNPs/INDELS), providing a resource for population and also gene function studies. Conclusions This is the first 454 data from an Antarctic marine invertebrate. Sequencing of mantle tissue from this non-model species has considerably increased resources for the investigation of the processes of shell deposition and repair in molluscs in a changing environment. A number of promising candidate genes were identified for functional analyses, which will be the subject of further investigation in this species and also used in model-hopping experiments in more tractable and economically important model aquaculture species, such as Crassostrea gigas and Mytilus edulis.
Discovery of Seven Novel Mammalian and Avian Coronaviruses in the Genus Deltacoronavirus Supports Bat Coronaviruses as the Gene Source of Alphacoronavirus and Betacoronavirus and Avian Coronaviruses as the Gene Source of Gammacoronavirus and Deltacoronavirus
Recently, we reported the discovery of three novel coronaviruses, bulbul coronavirus HKU11, thrush coronavirus HKU12, and munia coronavirus HKU13, which were identified as representatives of a novel genus, Deltacoronavirus, in the subfamily Coronavirinae. In this territory-wide molecular epidemiology study involving 3,137 mammals and 3,298 birds, we discovered seven additional novel deltacoronaviruses in pigs and birds, which we named porcine coronavirus HKU15, white-eye coronavirus HKU16, sparrow coronavirus HKU17, magpie robin coronavirus HKU18, night heron coronavirus HKU19, wigeon coronavirus HKU20, and common moorhen coronavirus HKU21. Complete genome sequencing and comparative genome analysis showed that the avian and mammalian deltacoronaviruses have similar genome characteristics and structures. They all have relatively small genomes (25.421 to 26.674 kb), the smallest among all coronaviruses. They all have a single papain-like protease domain in the nsp3 gene; an accessory gene, NS6 open reading frame (ORF), located between the M and N genes; and a variable number of accessory genes (up to four) downstream of the N gene. Moreover, they all have the same putative transcription regulatory sequence of ACACCA. Molecular clock analysis showed that the most recent common ancestor of all coronaviruses was estimated at approximately 8100 BC, and those of Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus were at approximately 2400 BC, 3300 BC, 2800 BC, and 3000 BC, respectively. From our studies, it appears that bats and birds, the warm blooded flying vertebrates, are ideal hosts for the coronavirus gene source, bats for Alphacoronavirus and Betacoronavirus and birds for Gammacoronavirus and Deltacoronavirus, to fuel coronavirus evolution and dissemination.
Woo, Patrick C. Y.; Lau, Susanna K. P.; Lam, Carol S. F.; Lau, Candy C. Y.; Tsang, Alan K. L.; Lau, John H. N.; Bai, Ru; Teng, Jade L. L.; Tsang, Chris C. C.; Wang, Ming; Zheng, Bo-Jian; Chan, Kwok-Hung
Discovery of seven novel Mammalian and avian coronaviruses in the genus deltacoronavirus supports bat coronaviruses as the gene source of alphacoronavirus and betacoronavirus and avian coronaviruses as the gene source of gammacoronavirus and deltacoronavirus.
Recently, we reported the discovery of three novel coronaviruses, bulbul coronavirus HKU11, thrush coronavirus HKU12, and munia coronavirus HKU13, which were identified as representatives of a novel genus, Deltacoronavirus, in the subfamily Coronavirinae. In this territory-wide molecular epidemiology study involving 3,137 mammals and 3,298 birds, we discovered seven additional novel deltacoronaviruses in pigs and birds, which we named porcine coronavirus HKU15, white-eye coronavirus HKU16, sparrow coronavirus HKU17, magpie robin coronavirus HKU18, night heron coronavirus HKU19, wigeon coronavirus HKU20, and common moorhen coronavirus HKU21. Complete genome sequencing and comparative genome analysis showed that the avian and mammalian deltacoronaviruses have similar genome characteristics and structures. They all have relatively small genomes (25.421 to 26.674 kb), the smallest among all coronaviruses. They all have a single papain-like protease domain in the nsp3 gene; an accessory gene, NS6 open reading frame (ORF), located between the M and N genes; and a variable number of accessory genes (up to four) downstream of the N gene. Moreover, they all have the same putative transcription regulatory sequence of ACACCA. Molecular clock analysis showed that the most recent common ancestor of all coronaviruses was estimated at approximately 8100 BC, and those of Alphacoronavirus, Betacoronavirus, Gammacoronavirus, and Deltacoronavirus were at approximately 2400 BC, 3300 BC, 2800 BC, and 3000 BC, respectively. From our studies, it appears that bats and birds, the warm blooded flying vertebrates, are ideal hosts for the coronavirus gene source, bats for Alphacoronavirus and Betacoronavirus and birds for Gammacoronavirus and Deltacoronavirus, to fuel coronavirus evolution and dissemination. PMID:22278237
Woo, Patrick C Y; Lau, Susanna K P; Lam, Carol S F; Lau, Candy C Y; Tsang, Alan K L; Lau, John H N; Bai, Ru; Teng, Jade L L; Tsang, Chris C C; Wang, Ming; Zheng, Bo-Jian; Chan, Kwok-Hung; Yuen, Kwok-Yung
Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize.
Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael
Tef (Eragrostis tef) is a major cereal crop in Ethiopia. Lodging is the primary constraint to increasing productivity in this allotetraploid species, accounting for losses of ?15–45% in yield each year. As a first step toward identifying semi-dwarf varieties that might have improved lodging resistance, an ?6× fosmid library was constructed and used to identify both homeologues of the dw3 semi-dwarfing gene of Sorghum bicolor. An EMS mutagenized population, consisting of ?21,210 tef plants, was planted and leaf materials were collected into 23 superpools. Two dwarfing candidate genes, homeologues of dw3 of sorghum and rht1 of wheat, were sequenced directly from each superpool with 454 technology, and 120 candidate mutations were identified. Out of 10 candidates tested, six independent mutations were validated by Sanger sequencing, including two predicted detrimental mutations in both dw3 homeologues with a potential to improve lodging resistance in tef through further breeding. This study demonstrates that high-throughput sequencing can identify potentially valuable mutations in under-studied plant species like tef and has provided mutant lines that can now be combined and tested in breeding programs for improved lodging resistance.
Zhu, Qihui; Smith, Shavannor M.; Ayele, Mulu; Yang, Lixing; Jogi, Ansuya; Chaluvadi, Srinivasa R.; Bennetzen, Jeffrey L.
What is math for anyway? Ever hear that question from your students? Bookmark this National Science Foundation link and begin to share news of important math-related discoveries and developments. Sample: "Math Could Aid in Curing Cancer" or, perhaps closer to the middle school mindset: "Cloaking Device Concept Moves Beyond Theory: Applied mathematician Graeme Milton brings the dream of cloaking devices portrayed in Star Trek and Harry Potter closer to reality."
Chickpea (Cicer arietinum) is an important food legume crop but lags in the availability of genomic resources. In this study, we have generated about 2 million high-quality sequences of average length of 372 bp using pyrosequencing technology. The optimization of de novo assembly clearly indicated that hybrid assembly of long-read and short-read primary assemblies gave better results. The hybrid assembly generated a set of 34,760 transcripts with an average length of 1,020 bp representing about 4.8% (35.5 Mb) of the total chickpea genome. We identified more than 4,000 simple sequence repeats, which can be developed as functional molecular markers in chickpea. Putative function and Gene Ontology terms were assigned to at least 73.2% and 71.0% of chickpea transcripts, respectively. We have also identified several chickpea transcripts that showed tissue-specific expression and validated the results using real-time polymerase chain reaction analysis. Based on sequence comparison with other species within the plant kingdom, we identified two sets of lineage-specific genes, including those conserved in the Fabaceae family (legume specific) and those lacking significant similarity with any non chickpea species (chickpea specific). Finally, we have developed a Web resource, Chickpea Transcriptome Database, which provides public access to the data and results reported in this study. The strategy for optimization of de novo assembly presented here may further facilitate the transcriptome sequencing and characterization in other organisms. Most importantly, the data and results reported in this study will help to accelerate research in various areas of genomics and implementing breeding programs in chickpea. PMID:21653784
Garg, Rohini; Patel, Ravi K; Jhanwar, Shalu; Priya, Pushp; Bhattacharjee, Annapurna; Yadav, Gitanjali; Bhatia, Sabhyata; Chattopadhyay, Debasis; Tyagi, Akhilesh K; Jain, Mukesh
Salicorn 46, an endophytic fungus isolated from Salicornia herbacea Torr., was identified as Penicillium citrinum based on its internal transcribed spacer and ribosomal large-subunit DNA sequences using a type I polyketide synthase (PKS I) gene screening approach. A new polyketide, penicitriketo (1), and seven known compounds, including ergone (2), (3?,5?,8?,22E)-5,8-epidioxyergosta-6,9,22-trien-3-ol (3), (3?,5?,8?,22E)-5,8-epidioxyergosta-6,22-dien-3-ol (4), stigmasta-7,22-diene-3?,5?,6?-triol (5), 3?,5?-dihydroxy-(22E,24R)-ergosta-7,22-dien-6?-yl oleate (6), N b-acetyltryptamine (7), and 2-(1-oxo-2-hydroxyethyl) furan (8), were isolated from the culture of Salicorn 46, and their chemical structures were elucidated by spectroscopic analysis. Antioxidant experiments revealed that compound 1 possessed moderate DPPH radical scavenging activity with an IC50 value of 85.33?±?1.61 ?M. Antimicrobial assays revealed that compound 2 exhibited broad-spectrum antimicrobial activity against Candida albicans, Clostridium perfringens, Mycobacterium smegmatis, and Mycobacterium phlei with minimal inhibitory concentration (MIC) values of 25.5, 25.5, 18.5, and 51.0 ?M, respectively. Compound 3 displayed potent antimicrobial activities against C. perfringens and Micrococcus tetragenus with a MIC value of 23.5 ?M. Compounds 5 and 6 showed high levels of selectivity toward Bacillus subtilis and M. phlei with MIC values of 22.5 and 14.4 ?M, respectively. The results of this study highlight the use of PCR-based techniques for the screening of new polyketides from endophytic fungi containing PKS I genes. PMID:24535256
Wang, Xiaomin; Wang, Hui; Liu, Tianxing; Xin, Zhihong
Contrary to the scepticism that characterised the planning stages of the human genome project, the technology and sequence data resulting from the project are set to revolutionise medical practice for good. The expected benefits include: enhanced discovery of disease genes, which will lead to improved knowledge on the genetic basis of diseases; availability of DNA-based diagnostic methods, which will find
Bennett C. Nwanguma
SILVA (from Latin silva, forest, http://www.arb-silva.de) is a comprehensive web resource for up to date, quality-controlled databases of aligned ribosomal RNA (rRNA) gene sequences from the Bacteria, Archaea and Eukaryota domains and supplementary online services. The referred database release 111 (July 2012) contains 3 194 778 small subunit and 288 717 large subunit rRNA gene sequences. Since the initial description of the project, substantial new features have been introduced, including advanced quality control procedures, an improved rRNA gene aligner, online tools for probe and primer evaluation and optimized browsing, searching and downloading on the website. Furthermore, the extensively curated SILVA taxonomy and the new non-redundant SILVA datasets provide an ideal reference for high-throughput classification of data from next-generation sequencing approaches. PMID:23193283
Quast, Christian; Pruesse, Elmar; Yilmaz, Pelin; Gerken, Jan; Schweer, Timmy; Yarza, Pablo; Peplies, Jörg; Glöckner, Frank Oliver
Purpose: The development of adverse effects resulting from the radiotherapy of cancer limits the use of this treatment modality. The validation of a test capable of predicting which patients would be most likely to develop adverse responses to radiation treatment, based on the possession of specific genetic variants, would therefore be of value. The purpose of the Genetic Predictors of Adverse Radiotherapy Effects (Gene-PARE) project is to help achieve this goal. Methods and Materials: A continuously expanding biorepository has been created consisting of frozen lymphocytes and DNA isolated from patients treated with radiotherapy. In conjunction with this biorepository, a database is maintained with detailed clinical information pertaining to diagnosis, treatment, and outcome. The DNA samples are screened using denaturing high performance liquid chromatography (DHPLC) and the Surveyor nuclease assay for variants in ATM, TGFB1, XRCC1, XRCC3, SOD2, and hHR21. It is anticipated that additional genes that control the biologic response to radiation will be screened in future work. Results: Evidence has been obtained that possession of variants in genes, the products of which play a role in radiation response, is predictive for the development of adverse effects after radiotherapy. Conclusions: It is anticipated that the Gene-PARE project will yield information that will allow radiation oncologists to use genetic data to optimize treatment on an individual basis.
Ho, Alice Y. [Department of Radiation Oncology, Mount Sinai School of Medicine, New York, NY (United States); Atencio, David P. [Department of Radiation Oncology, Mount Sinai School of Medicine, New York, NY (United States); Peters, Sheila [Department of Radiation Oncology, Mount Sinai School of Medicine, New York, NY (United States); Stock, Richard G.; Cesaretti, Jamie A.; Green, Sheryl [Department of Radiation Oncology, Mount Sinai School of Medicine, New York, NY (United States); Formenti, Silvia C. [Department of Radiation Oncology, New York University School of Medicine, New York, NY (United States); Haffty, Bruce [Department of Therapeutic Radiology, Yale University School of Medicine, New Haven, CT (United States); Drumea, Karen [Department of Oncology, Rambam Medical Center, Haifa (Israel); Leitzin, Larisa M.D. [Department of Oncology, Rambam Medical Center, Haifa (Israel); Kuten, Abraham [Department of Oncology, Rambam Medical Center, Haifa (Israel); Azria, David [Department of Radiation Oncology, CRLC Val d'Aurelle, Montpellier (France); Ozsahin, Mahmut [Department of Radiation Oncology, Centre Hospitalier Universitaire Vaudois (CHUV), Lausanne (Switzerland); Overgaard, Jens; Andreassen, Christian N. [Department of Experimental Clinical Oncology, Aarhus University Hospital, Aarhus (Denmark); Trop, Cynthia S. [Department of Urology, Bronx VA Medical Center, Bronx, NY (United States); Park, Janelle [Department of Radiation Oncology, Bronx VA Medical Center, Bronx, NY (United States); Rosenstein, Barry S. [Department of Radiation Oncology, Mount Sinai School of Medicine, New York, NY (United States)]|[Department of Community and Preventive Medicine, Mount Sinai School of Medicine, New York NY (United States)]|[Department of Dermatology, Mount Sinai School of Medicine, New York, NY (United States)]|[Department of Radiation Oncology, New York University School of Medicine, New York, NY (United States)]. E-mail: email@example.com
About half of schwannomas are driven by loss of the NF2 tumor suppressor gene; however, a major barrier to more effective therapies is the lack of a comprehensive understanding of the genetic events outside of the NF2 gene in these tumors. In this project...
P. M. Horowitz
Fibromyalgia syndrome (FMS) is a chronic musculoskeletal pain disorder affecting 2% to 5% of the general population. Both genetic and environmental factors may be involved. To ascertain in an unbiased manner which genes play a role in the disorder, we performed complete exome sequencing on a subset of FMS patients. Out of 150 nuclear families (trios) DNA from 19 probands was subjected to complete exome sequencing. Since >80,000 SNPs were found per proband, the data were further filtered, including analysis of those with stop codons, a rare frequency (<2.5%) in the 1000 Genomes database, and presence in at least 2/19 probands sequenced. Two nonsense mutations, W32X in C11orf40 and Q100X in ZNF77 among 150 FMS trios had a significantly elevated frequency of transmission to affected probands (p?=?0.026 and p?=?0.032, respectively) and were present in a subset of 13% and 11% of FMS patients, respectively. Among 9 patients bearing more than one of the variants we have described, 4 had onset of symptoms between the ages of 10 and 18. The subset with the C11orf40 mutation had elevated plasma levels of the inflammatory cytokines, MCP-1 and IP-10, compared with unaffected controls or FMS patients with the wild-type allele. Similarly, patients with the ZNF77 mutation have elevated levels of the inflammatory cytokine, IL-12, compared with controls or patients with the wild type allele. Our results strongly implicate an inflammatory basis for FMS, as well as specific cytokine dysregulation, in at least 35% of our FMS cohort. PMID:23762283
Feng, Jinong; Zhang, Zhifang; Wu, Xiwei; Mao, Allen; Chang, Frances; Deng, Xutao; Gao, Harry; Ouyang, Ching; Dery, Kenneth J; Le, Keith; Longmate, Jeffrey; Marek, Claudia; St Amand, R Paul; Krontiris, Theodore G; Shively, John E
The arterivirus family (order Nidovirales) of single-stranded, positive-sense RNA viruses includes porcine respiratory and reproductive syndrome virus and equine arteritis virus (EAV). Their replicative enzymes are translated from their genomic RNA, while their seven structural proteins are encoded by a set of small, partially overlapping genes in the genomic 3?-proximal region. The latter are expressed via synthesis of a set of subgenomic mRNAs that, in general, are functionally monocistronic (except for a bicistronic mRNA encoding the E and GP2 proteins). ORF5, which encodes the major glycoprotein GP5, has been used extensively for phylogenetic analyses. However, an in-depth computational analysis now reveals the arterivirus-wide conservation of an additional AUG-initiated ORF, here termed ORF5a, that overlaps the 5? end of ORF5. The pattern of substitutions across sequence alignments indicated that ORF5a is subject to functional constraints at the amino acid level, while an analysis of substitutions at synonymous sites in ORF5 revealed a greatly reduced frequency of substitution in the portion of ORF5 that is overlapped by ORF5a. The 43–64 aa ORF5a protein and GP5 are probably expressed from the same subgenomic mRNA, via a translation initiation mechanism involving leaky ribosomal scanning. Inactivation of ORF5a expression by reverse genetics yielded a severely crippled EAV mutant, which displayed lower titres and a tiny plaque phenotype. These defects, which could be partially complemented in ORF5a-expressing cells, indicate that the novel protein, which may be the eighth structural protein of arteriviruses, is expressed and important for arterivirus infection.
Firth, Andrew E.; Zevenhoven-Dobbe, Jessika C.; Wills, Norma M.; Go, Yun Young; Balasuriya, Udeni B. R.; Atkins, John F.
Most genomic resources available for insects represent the Holometabola, which are insects that undergo complete metamorphosis like beetles and flies. In contrast, the Hemimetabola (direct developing insects), representing the basal branches of the insect tree, have very few genomic resources. We have therefore created a large and publicly available transcriptome for the hemimetabolous insect Gryllus bimaculatus (cricket), a well-developed laboratory model organism whose potential for functional genetic experiments is currently limited by the absence of genomic resources. cDNA was prepared using mRNA obtained from adult ovaries containing all stages of oogenesis, and from embryo samples on each day of embryogenesis. Using 454 Titanium pyrosequencing, we sequenced over four million raw reads, and assembled them into 21,512 isotigs (predicted transcripts) and 120,805 singletons with an average coverage per base pair of 51.3. We annotated the transcriptome manually for over 400 conserved genes involved in embryonic patterning, gametogenesis, and signaling pathways. BLAST comparison of the transcriptome against the NCBI non-redundant protein database (nr) identified significant similarity to nr sequences for 55.5% of transcriptome sequences, and suggested that the transcriptome may contain 19,874 unique transcripts. For predicted transcripts without significant similarity to known sequences, we assessed their similarity to other orthopteran sequences, and determined that these transcripts contain recognizable protein domains, largely of unknown function. We created a searchable, web-based database to allow public access to all raw, assembled and annotated data. This database is to our knowledge the largest de novo assembled and annotated transcriptome resource available for any hemimetabolous insect. We therefore anticipate that these data will contribute significantly to more effective and higher-throughput deployment of molecular analysis tools in Gryllus.
Zeng, Victor; Ewen-Campen, Ben; Horch, Hadley W.; Roth, Siegfried; Mito, Taro; Extavour, Cassandra G.
Many drugs inhibit the human ether-a-go-go-related gene (HERG) cardiac K+ channel. This leads to action potential prolongation on the cellular level, a prolongation of the QT interval on the electrocardiogram, and sometimes cardiac arrhythmia. To date, no activators of this channel have been reported. Here, we describe the in vitro electrophysiological effects of (3R,4R)-4-[3-(6-methoxyquinolin-4-yl)-3-oxo-propyl]-1-[3-(2,3,5-trifluoro-phenyl)-prop-2-ynyl]-piperidine-3-carboxylic acid (RPR260243), a novel activator of HERG. Using patch-clamp electrophysiology, we found that RPR260243 dramatically slowed current deactivation when applied to cells stably expressing HERG. The effects of RPR260243 on HERG channel deactivation were temperature- and voltage-dependent and occurred over the concentration range of 1 to 30 microM. RPR260243-modified HERG currents were inhibited by dofetilide (IC50 = 58 nM). RPR260243 had little effect on HERG current amplitude and no significant effects on steady-state activation parameters or on channel inactivation processes. RPR260243 displayed no activator-like effects on other voltage-dependent ion channels, including the closely related erg3 K+ channel. RPR260243 enhanced the delayed rectifier current in guinea pig myocytes but, when administered alone, had little effect on action potential parameters in these cells. However, RPR260243 completely reversed the action potential-prolonging effects of dofetilide in this preparation. Using the Langendorff heart method, we found that 5 microM RPR260243 increased T-wave amplitude, prolonged the PR interval, and shortened the QT interval. We believe RPR260243 represents the first known HERG channel activator and that the drug works primarily by inhibiting channel closure, leading to a persistent HERG channel current upon repolarization. Compounds like RPR260243 will be useful for studying the physiological role of HERG and may one day find use in treating cardiac disease. PMID:15548764
Kang, Jiesheng; Chen, Xiao-Liang; Wang, Hongge; Ji, Junzhi; Cheng, Hsien; Incardona, Josephine; Reynolds, William; Viviani, Fabrice; Tabart, Michel; Rampe, David
Extreme Engineering is a program on the Discovery Channel that unveils "some of the most ambitious architectural plans of our times." The projects highlighted in each episode come from locations around the world. Some are in the planning stages, while others are only theoretical. This Web site serves as a companion to the television broadcasts and has interactive multimedia tours that offer a glimpse of what the projects would look like when completed. An underwater train tunnel linking New York and London, a massive skyscraper city in Tokyo, and a bridge over the Bering Strait are just some of the remarkable projects featured. The online elements are added after the corresponding episode is aired.
Socio-Culturally Oriented Plan Discovery Environment (SCOPE) is a link discovery project in the Evidence Assessment, Grouping, Linking, and Evaluation (EAGLE) program. The primary objective was to model terrorist organization (TO) mission plans from a cou...
The Mayo Clinic is one of the most well-respected medical facilities in the world, so it makes sense for them to have a great online publication to celebrate their work. Designed as a general interest publication, Discovery's Edge offers "insight into the process and progress of medical science in support of the world's largest group medical practice." Visitors can explore the user-friendly site by clicking through recent stories such as, "Putting the hurt on tobacco addiction" and "Genomics: The dawn of a new medical era.Ã¢ÂÂ In the Features Archive users can browse through some recent triumphs, including reports on asthma triggers and the future of biomechanics. Visitors can also browse the complete online archive or sign up to receive each new edition via email or RSS feed.
Discovery and molecular mapping of a new gene conferring resistance to stem rust, Sr53 , derived from Aegilops geniculata and characterization of spontaneous translocation stocks with reduced alien chromatin
This study reports the discovery and molecular mapping of a resistance gene effective against stem rust races RKQQC and TTKSK\\u000a (Ug99) derived from Aegilops geniculata (2n?=?4x?=?28, UgUgMgMg). Two populations from the crosses TA5599 (T5DL-5MgL·5MgS)\\/TA3809 (ph1b mutant in Chinese Spring background) and TA5599\\/Lakin were developed and used for genetic mapping to identify markers linked\\u000a to the resistance gene. Further molecular and
Wenxuan Liu; Matthew Rouse; Bernd Friebe; Yue Jin; Bikram Gill; Michael O. Pumphrey
Background Phosphorus (P) is an essential macronutrient for plant growth and development. To modulate their P homeostasis, plants must balance P uptake, mobilisation, and partitioning to various organs. Despite the worldwide importance of wheat as a cultivated food crop, molecular mechanisms associated with phosphate (Pi) starvation in wheat remain unclear. To elucidate these mechanisms, we used RNA-Seq methods to generate transcriptome profiles of the wheat variety ‘Chinese Spring’ responding to 10 days of Pi starvation. Results We carried out de novo assembly on 73.8 million high-quality reads generated from RNA-Seq libraries. We then constructed a transcript dataset containing 29,617 non-redundant wheat transcripts, comprising 15,047 contigs and 14,570 non-redundant full-length cDNAs from the TriFLDB database. When compared with barley full-length cDNAs, 10,656 of the 15,047 contigs were unalignable, suggesting that many might be distinct from barley transcripts. The average expression level of the contigs was lower than that of the known cDNAs, implying that these contigs included transcripts that were rarely represented in the full-length cDNA library. Within the non-redundant transcript set, we identified 892–2,833 responsive transcripts in roots and shoots, corresponding on average to 23.4% of the contigs not covered by cDNAs in TriFLDB under Pi starvation. The relative expression level of the wheat IPS1 (Induced by Phosphate Starvation 1) homologue, TaIPS1, was 341-fold higher in roots and 13-fold higher in shoots; this finding was further confirmed by qRT-PCR analysis. A comparative analysis of the wheat- and rice-responsive transcripts for orthologous genes under Pi-starvation revealed commonly upregulated transcripts, most of which appeared to be involved in a general response to Pi starvation, namely, an IPS1-mediated signalling cascade and its downstream functions such as Pi remobilisation, Pi uptake, and changes in Pi metabolism. Conclusions Our transcriptome profiles demonstrated the impact of Pi starvation on global gene expression in wheat. This study revealed that enhancement of the Pi-mediated signalling cascade using IPS1 is a potent adaptation mechanism to Pi starvation that is conserved in both wheat and rice and validated the effectiveness of using short-read next-generation sequencing data for wheat transcriptome analysis in the absence of reference genome information.
Objective: This paper presents the results of the first public consultation for the creation of a large-scale genetic database, the Quebec CARTaGENE project. A consultation has been undertaken in order to gauge whether the general public is receptive to the project. An integral part of the approach of the researchers is to establish a dialogue with the public. Methods: Two
Béatrice Godard; Jennifer Marshall; Claude Laberge
We report on the discovery of 29 Cepheid variables in the galaxy M101 using the original Wide Field Camera (WFC) and the new Wide Field and Planetary Camera 2 (WFPC2) on the Hubble Space Telescope. We observed a field in M101 at 17 independent epochs in V (F555W), five epochs in I (F785LP\\/ F814W), and one epoch in B (F439W),
Daniel D. Kelson; Garth D. Illingworth; Wendy F. Freedman; John A. Graham; Robert Hill; Barry F. Madore; Abhijit Saha; Peter B. Stetson; Robert C. Kennicutt Jr.; Jeremy R. Mould; Shaun M. Hughes; Laura Ferrarese; Randy Phelps; Anne Turner; Kem H. Cook; Holland Ford; John G. Hoessel; John Huchra
We report on the discovery of 30 new Cepheids in the nearby galaxy M81 based on observations using the Hubble Space Telescope (HST). The periods of these Cepheids lie in the range of 10-55 days, based on 18 independent epochs using the HST wide-band F555W filter. The HST F555W and F785LP data have been transformed to the Cousins standard V
Wendy L. Freedman; Shaun M. Hughes; Barry F. Madore; Jeremy R. Mould; Myung Gyoon Lee; Peter Stetson; Robert C. Kennicutt; Anne Turner; Laura Ferrarese; Holland Ford; John A. Graham; Robert Hill; John G. Hoessel; John Huchra; Garth D. Illingworth
"Harry Stottlemeier's Discovery" is the student book for the project in philosophical thinking described in SO 008 123-126. It offers a model of dialogue -- both of children with one another and of children with adults. The story is set among a classroom of children who begin to understand the basics of logical reasoning when Harry, who isn't…
The Discovery Channel promotes student participation in science fairs at this appealing, vibrant website. Users can find a terrific, thorough guide to creating science fair projects, including project ideas and lists of books and external web sites for students to utilize during their research. Students can find tip sheets for projects in many science subjects including astronomy, chemistry, and earth science. Educators can discover how to organize a science fair and parents can learn how to get involved with their children's projects. This site is a great way to excite children about science and scientific investigations.
The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine.
Horton, Roger; Gibson, Richard; Coggill, Penny; Miretti, Marcos; Allcock, Richard J.; Almeida, Jeff; Forbes, Simon; Gilbert, James G. R.; Halls, Karen; Harrow, Jennifer L.; Hart, Elizabeth; Howe, Kevin; Jackson, David K.; Palmer, Sophie; Roberts, Anne N.; Sims, Sarah; Stewart, C. Andrew; Traherne, James A.; Trevanion, Steve; Wilming, Laurens; Rogers, Jane; de Jong, Pieter J.; Elliott, John F.; Sawcer, Stephen; Todd, John A.; Trowsdale, John
To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing.
Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng
The Discovery Channel Telescope (DCT) is a 4.2-m telescope to be built at a new site near Happy Jack, Arizona. The DCT features a large prime focus mosaic CCD camera with a 2-degree-diameter field of view especially designed for surveys of KBOs, Centaurs, NEAs and other moving or time-variable targets. The telescope can be switched quickly to a Ritchey-Chretien configuration for optical/IR spectroscopy or near-IR imaging. This flexibility allows timely follow-up physical studies of high priority objects discovered in survey mode. The ULE (ultra-low-expansion) meniscus primary and secondary mirror blanks for the telescope are currently in fabrication by Corning Glass. Goodrich Aerospace, Vertex RSI, M3 Engineering and Technology Corp., and e2v Technologies have recently completed in-depth conceptual design studies of the optics, mount, enclosure, and mosaic focal plane, respectively. The results of these studies were subjected to a formal design review in July, 2004. Site testing at the 7760-ft altitude Happy Jack site began in 2001. Differential image motion observations from 117 nights since January 1, 2003 gave median seeing of 0.84 arcsec FWHM, and the average of the first quartile was 0.62 arcsec. The National Environmental Policy Act (NEPA) process for securing long-term access to this site on the Coconino National Forest is nearing completion and ground breaking is expected in the spring of 2005. The Discovery Channel Telescope is a project of the Lowell Observatory with major financial support from Discovery Communications, Inc. (DCI). DCI plans ongoing television programming featuring the construction of the telescope and the research ultimately undertaken with the DCT. An additional partner can be accommodated in the project. Interested parties should contact the lead author.
Millis, R. L.; Dunham, E. W.; Sebring, T. A.; Smith, B. W.; de Kock, M.; Wiecha, O.
The DFCI Gene Index Project creates databases for specific organisms. The goal for these databases is to provide an analysis of publicly available Expressed Sequence Transcripts (ESTs). ESTs are fragments of genes that were, at some time, copied from DNA to RNA. and gene sequence data to identify transcrips. The databases are in zipped files and free for download. The website also provides software and tools for use with the data, along with instructions from the website on how to link to background resources. The Gene Indices are organized into four categories: Animals, Plants, Protist, and Fungi.
Although the list of completed genome sequencing projects has expanded rapidly, sequencing and ana- lysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http:\\/\\/www.tigr.org\\/tdb\\/tgi) are a collection of 77 species-specific databases that use a highly refined protocol to analyze gene
Yuandan Lee; Jennifer Tsai; S. Sunkara; Svetlana Karamycheva; Geo Pertea; Razvan Sultana; Valentin Antonescu; Agnes P. Chan; Foo Cheung; John Quackenbush
Background: An ever increasing number of techniques are being used to find genes with similar profiles from microarray studies. Visualization of gene expression profiles can aid this process, potentially contributing to the identification of co-regulated genes and gene function as well as network development. Results: We introduce the h-Profile plot to display gene expression profiles. Thumbnail versions of plots of
Yvonne E. Pittelkow; Susan R. Wilson
Dr. Griffin, NASA Administrator, is accompanied by members of The U.S. House of Representatives in this STS-114 Discovery Impromptu briefing. The U.S. House of Representatives present include: Sherwood Boehlert, House Science Committee Chairman, Senator Hutchinson, Sheila Jackson, 18th Congressional District Texas, Al Green, 9th Congressional District, Representative Jim Davis, Florida, and Gene Green, 29th District, Texas. Griffin talks about the problem that occurred with the external fuel tank sensor of the Space Shuttle Discovery and the effort NASA is pursuing to track the problem, and identify the root cause. He answers questions from the news media about the next steps for the Space Shuttle Discovery, time frame for the launch, and activities for the astronauts for the next few days.
Phenotype-driven approaches in mice are powerful strategies for the discovery of genes and gene functions and for unravelling complex biological mechanisms. Traditional methods for mutation discovery are reliable and robust, but they can also be laborious and time consuming. Recently, high-throughput sequencing (HTS) technologies have revolutionised the process of forward genetics in mice by paving the way to rapid mutation discovery. However, successful application of HTS for mutation discovery relies heavily on the sequencing approach employed and strategies for data analysis. Here we review current HTS applications and resources for mutation discovery and provide an overview of the practical considerations for HTS implementation and data analysis. PMID:22991087
Simon, Michelle M; Mallon, Ann-Marie; Howell, Gareth R; Reinholdt, Laura G
Motiviation: Classification is a powerful tool for uncovering interesting phenomena, for example classes of cancer, in microarray data. Due to the small number of observations (n) in comparison to the number of variables (p), genes, classification on microarray data is challenging. Thus, multivariate dimension reduction techniques are commonly used as a precursor to classification of microarray data; typically this is principal component analysis (PCA) or singular value decomposition (SVD). Since PCA and SVD are concerned with explaining the variance-covariance structure of the data, they may not be the best choice when the between-cluster variance is smaller than the within-cluster variance. Recently an attractive alternative to PCA, sequential projection pursuit (SPP), has been introduced which is designed to elicit clustering tendencies in the data. Thus, in some cases SPP may be more appropriate when performing clustering or classification analysis. Results: We compare the performance of SPP to PCA on two cancer gene expression datasets related to leukemia and colon cancer. Using PCA and SPP to reduce the dimensionality of the data to m<
Webb-Robertson, Bobbie-Jo M.; Havre, Susan L.
Imaging can potentially make a major contribution to the zebrafish phenome project, which will probe the functions of vertebrate genes through the generation and phenotyping of mutants. Imaging of whole animals at different developmental stages through adulthood will be used to infer biological function. Cell resolutions will be required to identify cellular mechanism and to detect a full range of organ effects. Light-based imaging of live zebrafish embryos is practical only up to ~2 days of development, due to increasing pigmentation and diminishing tissue lucency with age. The small size of the zebrafish makes possible whole-animal imaging at cell resolutions by histology and micron-scale tomography (microCT). The histological study of larvae is facilitated by the use of arrays, and histology’s standard use in the study of human disease enhances its translational value. Synchrotron microCT with X-rays of moderate energy (10-25 keV) is unimpeded by pigmentation or the tissue thicknesses encountered in zebrafish of larval stages and beyond, and is well-suited to detecting phenotypes that may require 3D modeling. The throughput required for this project will require robotic sample preparation and loading, increases in the dimensions and sensitivity of scintillator and CCD chips, increases in computer power, and the development of new approaches to image processing, segmentation, and quantification.
Cheng, Keith C.; Xin, Xuying; Clark, Darin; La Riviere, Patrick
Imaging can potentially make a major contribution to the Zebrafish Phenome Project, which will probe the functions of vertebrate genes through the generation and phenotyping of mutants. Imaging of whole animals at different developmental stages through adulthood will be used to infer biological function. Cell resolutions will be required to identify cellular mechanism and to detect a full range of organ effects. Light-based imaging of live zebrafish embryos is practical only up to ?2 days of development, owing to increasing pigmentation and diminishing tissue lucency with age. The small size of the zebrafish makes possible whole-animal imaging at cell resolutions by histology and micron-scale tomography (microCT). The histological study of larvae is facilitated by the use of arrays, and histology's standard use in the study of human disease enhances its translational value. Synchrotron microCT with X-rays of moderate energy (10-25 keV) is unimpeded by pigmentation or the tissue thicknesses encountered in zebrafish of larval stages and beyond, and is well-suited to detecting phenotypes that may require 3D modeling. The throughput required for this project will require robotic sample preparation and loading, increases in the dimensions and sensitivity of scintillator and CCD chips, increases in computer power, and the development of new approaches to image processing, segmentation, and quantification. PMID:21963132
Cheng, Keith C; Xin, Xuying; Clark, Darin P; La Riviere, Patrick
This online exhibit from the American Institute of Physics contains the history of the discovery of the first optical pulsar in the Crab Nebula. It includes both a transcription of the conversation at the moment of the discovery and commentary by the astronomers who made the discovery and by Philip Morrison. MP3 files of the conversations, commentary, and ideas for use by teachers are included.
Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as
Leming Shi; Laura H Reid; Wendell D Jones; Richard Shippy; Janet A Warrington; Shawn C Baker; Patrick J Collins; Francoise de Longueville; Ernest S Kawasaki; Kathleen Y Lee; Yuling Luo; Yongming Andrew Sun; James C Willey; Robert A Setterquist; Gavin M Fischer; Weida Tong; Yvonne P Dragan; David J Dix; Felix W Frueh; Federico M Goodsaid; Damir Herman; Roderick V Jensen; Charles D Johnson; Edward K Lobenhofer; Raj K Puri; Uwe Scherf; Jean Thierry-Mieg; Charles Wang; Mike Wilson; Paul K Wolber; Lu Zhang; Shashi Amur; Wenjun Bao; Catalin C Barbacioru; Anne Bergstrom Lucas; Vincent Bertholet; Cecilie Boysen; Bud Bromley; Donna Brown; Alan Brunner; Roger Canales; Xiaoxi Megan Cao; Thomas A Cebula; James J Chen; Jing Cheng; Tzu-Ming Chu; Eugene Chudin; John Corson; J Christopher Corton; Lisa J Croner; Christopher Davies; Timothy S Davison; Glenda Delenstarr; Xutao Deng; David Dorris; Aron C Eklund; Xiao-hui Fan; Hong Fang; Stephanie Fulmer-Smentek; James C Fuscoe; Kathryn Gallagher; Weigong Ge; Lei Guo; Xu Guo; Janet Hager; Paul K Haje; Jing Han; Tao Han; Heather C Harbottle; Stephen C Harris; Eli Hatchwell; Craig A Hauser; Susan Hester; Huixiao Hong; Patrick Hurban; Scott A Jackson; Hanlee Ji; Charles R Knight; Winston P Kuo; J Eugene LeClerc; Shawn Levy; Quan-Zhen Li; Chunmei Liu; Michael J Lombardi; Yunqing Ma; Scott R Magnuson; Botoul Maqsodi; Tim McDaniel; Nan Mei; Ola Myklebost; Baitang Ning; Natalia Novoradovskaya; Michael S Orr; Terry W Osborn; Adam Papallo; Tucker A Patterson; Roger G Perkins; Elizabeth H Peters; Ron Peterson; Kenneth L Philips; P Scott Pine; Lajos Pusztai; Feng Qian; Hongzu Ren; Mitch Rosen; Barry A Rosenzweig; Raymond R Samaha; Mark Schena; Gary P Schroth; Svetlana Shchegrova; Dave D Smith; Frank Staedtler; Zhenqiang Su; Hongmei Sun; Zoltan Szallasi; Zivana Tezak; Danielle Thierry-Mieg; Karol L Thompson; Irina Tikhonova; Yaron Turpaz; Beena Vallanat; Christophe Van; Stephen J Walker; Sue Jane Wang; Yonghong Wang; Russ Wolfinger; Alex Wong; Jie Wu; Chunlin Xiao; Qian Xie; Jun Xu; Wen Yang; Liang Zhang; Sheng Zhong; Yaping Zong; William Slikker; Ying Liu
Background Molecular breeding of pepper (Capsicum spp.) can be accelerated by developing DNA markers associated with transcriptomes in breeding germplasm. Before the advent of next generation sequencing (NGS) technologies, the majority of sequencing data were generated by the Sanger sequencing method. By leveraging Sanger EST data, we have generated a wealth of genetic information for pepper including thousands of SNPs and Single Position Polymorphic (SPP) markers. To complement and enhance these resources, we applied NGS to three pepper genotypes: Maor, Early Jalapeño and Criollo de Morelos-334 (CM334) to identify SNPs and SSRs in the assembly of these three genotypes. Results Two pepper transcriptome assemblies were developed with different purposes. The first reference sequence, assembled by CAP3 software, comprises 31,196 contigs from >125,000 Sanger-EST sequences that were mainly derived from a Korean F1-hybrid line, Bukang. Overlapping probes were designed for 30,815 unigenes to construct a pepper Affymetrix GeneChip® microarray for whole genome analyses. In addition, custom Python scripts were used to identify 4,236 SNPs in contigs of the assembly. A total of 2,489 simple sequence repeats (SSRs) were identified from the assembly, and primers were designed for the SSRs. Annotation of contigs using Blast2GO software resulted in information for 60% of the unigenes in the assembly. The second transcriptome assembly was constructed from more than 200 million Illumina Genome Analyzer II reads (80–120 nt) using a combination of Velvet, CLC workbench and CAP3 software packages. BWA, SAMtools and in-house Perl scripts were used to identify SNPs among three pepper genotypes. The SNPs were filtered to be at least 50 bp from any intron-exon junctions as well as flanking SNPs. More than 22,000 high-quality putative SNPs were identified. Using the MISA software, 10,398 SSR markers were also identified within the Illumina transcriptome assembly and primers were designed for the identified markers. The assembly was annotated by Blast2GO and 14,740 (12%) of annotated contigs were associated with functional proteins. Conclusions Before availability of pepper genome sequence, assembling transcriptomes of this economically important crop was required to generate thousands of high-quality molecular markers that could be used in breeding programs. In order to have a better understanding of the assembled sequences and to identify candidate genes underlying QTLs, we annotated the contigs of Sanger-EST and Illumina transcriptome assemblies. These and other information have been curated in a database that we have dedicated for pepper project.
This study describes and validates a new method for metagenomic biomarker discovery by way of class comparison, tests of biological consistency and effect size estimation. This addresses the challenge of finding organisms, genes, or pathways that consistently explain the differences between two or more microbial communities, which is a central problem to the study of metagenomics. We extensively validate our method on several microbiomes and a convenient online interface for the method is provided at http://huttenhower.sph.harvard.edu/lefse/.
The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5?-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline.
Retroviral insertional mutagenesis in mouse hematopoietic tumors provides a potent cancer gene discovery tool in the post-genome-sequence era. To manage multiple high-throughput insertional mutagenesis screening projects, we developed the Retroviral Tagged Cancer Gene Database (RTCGD; http:\\/\\/RTCGD.ncifcrf.gov). A sequence analysis pipeline determines the genomic position of each retroviral integration site cloned from a mouse tumor, the distance between it and the
Keiko Akagi; Takeshi Suzuki; Robert M. Stephens; Nancy A. Jenkins; Neal G. Copeland
The Generate, Prune & Prove (GPP) methodology for discovering definitions of mathematical operators is introduced. GPP is a task within the IL exploration discovery system. We developed GPP for use in the discovery of mathematical operators with a wider c...
M. H. Sims J. L. Bresina
Altered glucose metabolism has been described in Alzheimer's disease (AD). We re-investigated the interaction of the insulin (INS) and the peroxisome proliferator-activated receptor alpha (PPARA) genes in AD risk in the Epistasis Project, including 1,757 AD cases and 6,294 controls. Allele frequencies of both SNPs (PPARA L162V, INS intron 0 A/T) differed between Northern Europeans and Northern Spanish. The PPARA 162LL genotype increased AD risk in Northern Europeans (p = 0.04), but not in Northern Spanish (p = 0.2). There was no association of the INS intron 0 TT genotype with AD. We observed an interaction on AD risk between PPARA 162LL and INS intron 0 TT genotypes in Northern Europeans (Synergy factor 2.5, p = 0.016), but not in Northern Spanish. We suggest that dysregulation of glucose metabolism contributes to the development of AD and might be due in part to genetic variations in INS and PPARA and their interaction especially in Northern Europeans. PMID:22065208
Kölsch, Heike; Lehmann, Donald J; Ibrahim-Verbaas, Carla A; Combarros, Onofre; van Duijn, Cornelia M; Hammond, Naomi; Belbin, Olivia; Cortina-Borja, Mario; Lehmann, Michael G; Aulchenko, Yurii S; Schuur, Maaike; Breteler, Monique; Wilcock, Gordon K; Brown, Kristelle; Kehoe, Patrick G; Barber, Rachel; Coto, Eliecer; Alvarez, Victoria; Deloukas, Panos; Mateo, Ignacio; Maier, Wolfgang; Morgan, Kevin; Warden, Donald R; Smith, A David; Heun, Reinhard
Iron overload may contribute to the risk of Alzheimer's disease (AD). In the Epistasis Project, with 1757 cases of AD and 6295 controls, we studied 4 variants in 2 genes of iron metabolism: hemochromatosis (HFE) C282Y and H63D, and transferrin (TF) C2 and -2G/A. We replicated the reported interaction between HFE 282Y and TF C2 in the risk of AD: synergy factor, 1.75 (95% confidence interval, 1.1-2.8, p = 0.02) in Northern Europeans. The synergy factor was 3.1 (1.4-6.9; 0.007) in subjects with the APOE?4 allele. We found another interaction, between HFE 63HH and TF -2AA, markedly modified by age. Both interactions were found mainly or only in Northern Europeans. The interaction between HFE 282Y and TF C2 has now been replicated twice, in altogether 2313 cases of AD and 7065 controls, and has also been associated with increased iron load. We therefore suggest that iron overload may be a causative factor in the development of AD. Treatment for iron overload might thus be protective in some cases. PMID:20817350
Lehmann, Donald J; Schuur, Maaike; Warden, Donald R; Hammond, Naomi; Belbin, Olivia; Kölsch, Heike; Lehmann, Michael G; Wilcock, Gordon K; Brown, Kristelle; Kehoe, Patrick G; Morris, Chris M; Barker, Rachel; Coto, Eliecer; Alvarez, Victoria; Deloukas, Panos; Mateo, Ignacio; Gwilliam, Rhian; Combarros, Onofre; Arias-Vásquez, Alejandro; Aulchenko, Yurii S; Ikram, M Arfan; Breteler, Monique M; van Duijn, Cornelia M; Oulhaj, Abderrahim; Heun, Reinhard; Cortina-Borja, Mario; Morgan, Kevin; Robson, Kathryn; Smith, A David
Rheumatoid arthritis (RA) is a heterogeneous disease. We used cDNA microarray technology to subclassify RA patients and disclose disease pathways in rheumatoid synovium. Hierarchical clustering of gene expression data identified two main groups of tissues (RA-I and RA-II). A total of 121 genes were significantly higher expressed in the RA-I tissues, whereas 39 genes were overexpressed in the RA-II tissues.
T C T M van der Pouw Kraan; F A van Gaalen; T W J Huizinga; E Pieterman; F. C. Breedveld; C. L. Verweij; CTM van der Pouw Kraan
The PTF (ATEL #1964, #3253; http://www.astro.caltech.edu/ptf; Law et al. 2009, PASP, 121, 1395; Rau et al. 2009, PASP, 121, 1334) reports the discovery of 9 new supernovae. PTF discoveries are made by autonomous PTF software (Bloom et al. 2011, http://adsabs.harvard.edu/abs/2011arXiv1106.5491B ), as well as by the Galaxy Zoo Supernova Project (Smith et al. 2011, MNRAS, 412, 1309; http://supernova.galaxyzoo.org ).
Gal-Yam, A.; Nugent, P.; Cao, Y.; Levitan, D.; Hallinan, G.; Kyne, G.; Silverman, J.; Clubb, K.; Miller, A.; Fox, O.; Suzuki, N.; Quimby, R.
The PTF (ATEL #1964, #3253; http://www.astro.caltech.edu/ptf; Law et al. 2009, PASP, 121, 1395; Rau et al. 2009, PASP, 121, 1334) reports the discovery of 12 new supernovae. PTF discoveries are made by autonomous PTF software (Bloom et al. 2011, http://adsabs.harvard.edu/abs/2011arXiv1106.5491B ), as well as by the Galaxy Zoo Supernova Project (Smith et al. 2011, MNRAS, 412, 1309; http://supernova.galaxyzoo.org ).
Arcavi, Iair; Gal-Yam, A.; Ben-Ami, S.; Yaron, O.; Horesh, P. Nugent A.; Cao, Y.; Bellm, E.; Fynbo, J.; Wiis, J.; Olesen, J.; Engedal, L.; Larsen, A.; Kasliwal, M.; Pan, Y.-C.; Graham, M.; Parrent, J.; Quimby, R.; PTF Team
As of today, no comprehensive study has been made covering the initial observations and identifications of isotopes. A project has been undertaken at MSU to document the discovery of all the known isotopes. The criteria defining discovery of a given isotope is the publication of clear mass and element assignment in a refereed journal. Prior to the current work the documentation of the discovery of eleven elements had been completed^1. These elements are cerium^2, arsenic, gold, tungsten, krypton, silver, vanadium, einsteinium, iron, barium, and cobalt. We will present the new documentation for the cadmium, indium, and tin isotopes. Thirty-seven cadmium isotopes, thirty-eight indium isotopes, and thirty-eight tin isotopes have been discovered so far. The description for each discovered isotope includes the year of discovery, the article published on the discovery, the article's author, the method of production, the method of identification, and any previous information concerning the isotope discovery. A summary and overview of all ˜500 isotopes documented so far as a function of discovery year, method and place will also be presented. ^1http://www.nscl.msu.edu/˜thoennes/2009/discovery.htm ^2J.Q. Ginepro, J. Snyder, and M. Thoennessen, At. Data Nucl. Data. Tables, in press (2009), doi:10.1016/j.adt.2009.06.002
Amos, Stephanie; Thoennessen, Michael
Current research in drug discovery from medicinal plants involves a multifaceted approach combining botanical, phytochemical, biological, and molecular techniques. Medicinal plant drug discovery continues to provide new and important leads against various pharmacological targets including cancer, HIV/AIDS, Alzheimer's, malaria, and pain. Several natural product drugs of plant origin have either recently been introduced to the United States market, including arteether, galantamine, nitisinone, and tiotropium, or are currently involved in late-phase clinical trials. As part of our National Cooperative Drug Discovery Group (NCDDG) research project, numerous compounds from tropical rainforest plant species with potential anticancer activity have been identified. Our group has also isolated several compounds, mainly from edible plant species or plants used as dietary supplements, that may act as chemopreventive agents. Although drug discovery from medicinal plants continues to provide an important source of new drug leads, numerous challenges are encountered including the procurement of plant materials, the selection and implementation of appropriate high-throughput screening bioassays, and the scale-up of active compounds. PMID:16198377
Balunas, Marcy J; Kinghorn, A Douglas
Background The Hedgehog (Hh) signaling pathway plays important roles in human and animal development as well as in carcinogenesis. Hh molecules have been found in both protostomes and deuterostomes, but curiously the nematode Caenorhabditis elegans lacks a bona-fide Hh. Instead a series of Hh-related proteins are found, which share the Hint/Hog domain with Hh, but have distinct N-termini. Results We performed extensive genome searches such as the cnidarian Nematostella vectensis and several nematodes to gain further insights into Hh evolution. We found six genes in N. vectensis with a relationship to Hh: two Hh genes, one gene with a Hh N-terminal domain fused to a Willebrand factor type A domain (VWA), and three genes containing Hint/Hog domains with distinct novel N-termini. In the nematode Brugia malayi we find the same types of hh-related genes as in C. elegans. In the more distantly related Enoplea nematodes Xiphinema and Trichinella spiralis we find a bona-fide Hh. In addition, T. spiralis also has a quahog gene like C. elegans, and there are several additional hh-related genes, some of which have secreted N-terminal domains of only 15 to 25 residues. Examination of other Hh pathway components revealed that T. spiralis - like C. elegans - lacks some of these components. Extending our search to all eukaryotes, we recovered genes containing a Hog domain similar to Hh from many different groups of protists. In addition, we identified a novel Hint gene family present in many eukaryote groups that encodes a VWA domain fused to a distinct Hint domain we call Vint. Further members of a poorly characterized Hint family were also retrieved from bacteria. Conclusion In Cnidaria and nematodes the evolution of hh genes occurred in parallel to the evolution of other genes that contain a Hog domain but have different N-termini. The fact that Hog genes comprising a secreted N-terminus and a Hog domain are found in many protists indicates that this gene family must have arisen in very early eukaryotic evolution, and gave rise eventually to hh and hh-related genes in animals. The results indicate a hitherto unsuspected ability of Hog domain encoding genes to evolve new N-termini. In one instance in Cnidaria, the Hh N-terminal signaling domain is associated with a VWA domain and lacks a Hog domain, suggesting a modular mode of evolution also for the N-terminal domain. The Hog domain proteins, the inteins and VWA-Vint proteins are three families of Hint domain proteins that evolved in parallel in eukaryotes.
Burglin, Thomas R
The results of the first stage of the "Cosmological Gene" project of the Russian Academy of Sciences are reported. These results consist in the accumulation of multi-frequency data in 31 frequency channels in the wavelength interval 1-55 cm with maximum achievable statistical sensitivity limited by the noise of background radio sources at all wavelengths exceeding 1.38 cm. The survey region is determined by constraints 00 h < RA < 24 h and 40°30' < DEC < 42°30'. The scientific goals of the project are refined in view of recent proposals to use cosmological background radiation data for the development of a unified physical theory. Experimental data obtained with the RATAN-600 radio telescope are used to refine the contribution of the main "screens" located between the observer and the formation epoch of cosmic background radiation ( z = 1100). Experimental data for synchrotron radiation and free-free noise on scales that are of interest for the anisotropy of cosmic microwave background are reported as well as the contribution of these noise components in millimeter-wave experiments to be performed in the nearest years. The role of dipole radio emission of fullerene-type dust nanostructures is shown to be small. The most precise estimates of the role of background radio sources with inverted spectra are given and these sources are shown to create no serious interference in experiments. The average spectral indices of the weakest sources of the NVSS and FIRST catalogs are estimated. The "saturation" data for all wavelengths allowed a constraint to be imposed on the Sunyaev-Zeldovich noise (the SZ noise) at all wavelengths, and made it possible to obtain independent estimates of the average sky temperature from sources, substantially weaker than those listed in the NVSS catalog. These estimates are inconsistent with the existence of powerful extragalactic synchrotron background associated with radio sources. Appreciable "quadrupole" anisotropy in is detected in the distribution of the spectral index of the synchrotron radiation of the Galaxy, and this anisotropy should be taken into account when estimating the polarization of the cosmic microwave background on small l. All the results are compared to the results obtained by foreign researchers in recent years.
Parijskij, Yu. N.; Mingaliev, M. G.; Nizhel'Skii, N. A.; Bursov, N. N.; Berlin, A. B.; Grechkin, A. A.; Zharov, V. I.; Zhekanis, G. V.; Majorova, E. K.; Semenova, T. A.; Stolyarov, V. A.; Tsybulev, P. G.; Kratov, D. V.; Udovitskii, R. Yu.; Khaikin, V. B.
Gene finding is complicated in organisms that exhibit insertional RNA editing. Here, we demonstrate how our new algorithm Predictor of Insertional Editing (PIE) can be used to locate genes whose mRNAs are subjected to multiple frameshifting events, and extend the algorithm to include probabilistic predic- tions for sites of nucleotide insertion; this feature is particularly useful when designing primers for
Jonatha M. Gott; Neeta Parimi; Ralf Bundschuh
During early embryogenesis the zygotic genome is transcriptionally silent and all mRNAs present are of maternal origin. The maternal-zygotic transition marks the time over which embryogenesis changes its dependence from maternal RNAs to zygotically transcribed RNAs. Here we present the first systematic investigation of early zygotic genes (EZGs) in a mosquito species and focus on genes involved in the onset
James K. Biedler; Wanqi Hu; Hongseok Tae; Zhijian Tu
Over the last decade, the introduction of microarray technology has had a profound impact on gene expression research. The publication of studies with dissimilar or altogether contradictory results, obtained using different microarray platforms to analyze identical RNA samples, has raised concerns about the reliability of this technology. The MicroArray Quality Control (MAQC) project was initiated to address these concerns, as well as other performance and data analysis issues. Expression data on four titration pools from two distinct reference RNA samples were generated at multiple test sites using a variety of microarray-based and alternative technology platforms. Here we describe the experimental design and probe mapping efforts behind the MAQC project. We show intraplatform consistency across test sites as well as a high level of interplatform concordance in terms of genes identified as differentially expressed. This study provides a resource that represents an important first step toward establishing a framework for the use of microarrays in clinical and regulatory settings.
Bacterial leaf streak of rice, caused by Xanthomonas oryzae pv. oryzicola (Xoc) is an increasingly important yield constraint in this staple crop. A mesophyll colonizer, Xoc differs from X. oryzae pv. oryzae (Xoo), which invades xylem to cause bacterial blight of rice. Both produce multiple distinct TAL effectors, type III-delivered proteins that transactivate effector-specific host genes. A TAL effector finds its target(s) via a partially degenerate code whereby the modular effector amino acid sequence identifies nucleotide sequences to which the protein binds. Virulence contributions of some Xoo TAL effectors have been shown, and their relevant targets, susceptibility (S) genes, identified, but the role of TAL effectors in leaf streak is uncharacterized. We used host transcript profiling to compare leaf streak to blight and to probe functions of Xoc TAL effectors. We found that Xoc and Xoo induce almost completely different host transcriptional changes. Roughly one in three genes upregulated by the pathogens is preceded by a candidate TAL effector binding element. Experimental analysis of the 44 such genes predicted to be Xoc TAL effector targets verified nearly half, and identified most others as false predictions. None of the Xoc targets is a known bacterial blight S gene. Mutational analysis revealed that Tal2g, which activates two genes, contributes to lesion expansion and bacterial exudation. Use of designer TAL effectors discriminated a sulfate transporter gene as the S gene. Across all targets, basal expression tended to be higher than genome-average, and induction moderate. Finally, machine learning applied to real vs. falsely predicted targets yielded a classifier that recalled 92% of the real targets with 88% precision, providing a tool for better target prediction in the future. Our study expands the number of known TAL effector targets, identifies a new class of S gene, and improves our ability to predict functional targeting. PMID:24586171
Cernadas, Raul A; Doyle, Erin L; Niño-Liu, David O; Wilkins, Katherine E; Bancroft, Timothy; Wang, Li; Schmidt, Clarice L; Caldo, Rico; Yang, Bing; White, Frank F; Nettleton, Dan; Wise, Roger P; Bogdanove, Adam J
Mycosporines and mycosporine-like amino acids (MAAs), including shinorine (mycosporine-glycine-serine) and porphyra-334 (mycosporine-glycine-threonine), are UV-absorbing compounds produced by cyanobacteria, fungi, and marine micro- and macroalgae. These MAAs have the ability to protect these organisms from damage by environmental UV radiation. Although no reports have described the production of MAAs and the corresponding genes involved in MAA biosynthesis from Gram-positive bacteria to date, genome mining of the Gram-positive bacterial database revealed that two microorganisms belonging to the order Actinomycetales, Actinosynnema mirum DSM 43827 and Pseudonocardia sp. strain P1, possess a gene cluster homologous to the biosynthetic gene clusters identified from cyanobacteria. When the two strains were grown in liquid culture, Pseudonocardia sp. accumulated a very small amount of MAA-like compound in a medium-dependent manner, whereas A. mirum did not produce MAAs under any culture conditions, indicating that the biosynthetic gene cluster of A. mirum was in a cryptic state in this microorganism. In order to characterize these biosynthetic gene clusters, each biosynthetic gene cluster was heterologously expressed in an engineered host, Streptomyces avermitilis SUKA22. Since the resultant transformants carrying the entire biosynthetic gene cluster controlled by an alternative promoter produced mainly shinorine, this is the first confirmation of a biosynthetic gene cluster for MAA from Gram-positive bacteria. Furthermore, S. avermitilis SUKA22 transformants carrying the biosynthetic gene cluster for MAA of A. mirum accumulated not only shinorine and porphyra-334 but also a novel MAA. Structure elucidation revealed that the novel MAA is mycosporine-glycine-alanine, which substitutes l-alanine for the l-serine of shinorine. PMID:24907338
Miyamoto, Kiyoko T; Komatsu, Mamoru; Ikeda, Haruo
The authors have used an online community approach, and tools that were readily available via the Internet, to discover genealogically and therefore phylogenetically relevant Y-chromosome polymorphisms within core haplogroup R1b1a2-L11/S127 (rs9786076). Presented here is the analysis of 135 unrelated L11 derived samples from the 1000 Genomes Project. We were able to discover new variants and build a much more complex phylogenetic relationship for L11 sub-clades. Many of the variants were further validated using PCR amplification and Sanger sequencing. The identification of these new variants will help further the understanding of population history including patrilineal migrations in Western and Central Europe where R1b1a2 is the most frequent haplogroup. The fine-grained phylogenetic tree we present here will also help to refine historical genetic dating studies. Our findings demonstrate the power of citizen science for analysis of whole genome sequence data.
Rocca, Richard A.; Magoon, Gregory; Reynolds, David F.; Krahn, Thomas; Tilroe, Vincent O.; Op den Velde Boots, Peter M.; Grierson, Andrew J.
The authors have used an online community approach, and tools that were readily available via the Internet, to discover genealogically and therefore phylogenetically relevant Y-chromosome polymorphisms within core haplogroup R1b1a2-L11/S127 (rs9786076). Presented here is the analysis of 135 unrelated L11 derived samples from the 1000 Genomes Project. We were able to discover new variants and build a much more complex phylogenetic relationship for L11 sub-clades. Many of the variants were further validated using PCR amplification and Sanger sequencing. The identification of these new variants will help further the understanding of population history including patrilineal migrations in Western and Central Europe where R1b1a2 is the most frequent haplogroup. The fine-grained phylogenetic tree we present here will also help to refine historical genetic dating studies. Our findings demonstrate the power of citizen science for analysis of whole genome sequence data. PMID:22911832
Rocca, Richard A; Magoon, Gregory; Reynolds, David F; Krahn, Thomas; Tilroe, Vincent O; Op den Velde Boots, Peter M; Grierson, Andrew J
Background Interpretation of ever-increasing raw sequence information generated by modern genome sequencing technologies faces multiple challenges, such as gene function analysis and genome annotation. Indeed, nearly 40% of genes in plants encode proteins of unknown function. Functional characterization of these genes is one of the main challenges in modern biology. In this regard, the availability of full-length cDNA clones may fill in the gap created between sequence information and biological knowledge. Full-length cDNA clones facilitate functional analysis of the corresponding genes enabling manipulation of their expression in heterologous systems and the generation of a variety of tagged versions of the native protein. In addition, the development of full-length cDNA sequences has the power to improve the quality of genome annotation. Results We developed an integrated method to generate a new normalized EST collection enriched in full-length and rare transcripts of different citrus species from multiple tissues and developmental stages. We constructed a total of 15 cDNA libraries, from which we isolated 10,898 high-quality ESTs representing 6142 different genes. Percentages of redundancy and proportion of full-length clones range from 8 to 33, and 67 to 85, respectively, indicating good efficiency of the approach employed. The new EST collection adds 2113 new citrus ESTs, representing 1831 unigenes, to the collection of citrus genes available in the public databases. To facilitate functional analysis, cDNAs were introduced in a Gateway-based cloning vector for high-throughput functional analysis of genes in planta. Herein, we describe the technical methods used in the library construction, sequence analysis of clones and the overexpression of CitrSEP, a citrus homolog to the Arabidopsis SEP3 gene, in Arabidopsis as an example of a practical application of the engineered Gateway vector for functional analysis. Conclusion The new EST collection denotes an important step towards the identification of all genes in the citrus genome. Furthermore, public availability of the cDNA clones generated in this study, and not only their sequence, enables testing of the biological function of the genes represented in the collection. Expression of the citrus SEP3 homologue, CitrSEP, in Arabidopsis results in early flowering, along with other phenotypes resembling the over-expression of the Arabidopsis SEPALLATA genes. Our findings suggest that the members of the SEP gene family play similar roles in these quite distant plant species.
Marques, M Carmen; Alonso-Cantabrana, Hugo; Forment, Javier; Arribas, Raquel; Alamar, Santiago; Conejero, Vicente; Perez-Amador, Miguel A
As the prevalence of Alzheimer's disease (AD) grows, so do the costs it imposes on society. Scientific, clinical, and financial interests have focused current drug discovery efforts largely on the single biological pathway that leads to amyloid deposition. This effort has resulted in slow progress and disappointing outcomes. Here, we describe a "portfolio approach" in which multiple distinct drug development projects are undertaken simultaneously. Although a greater upfront investment is required, the probability of at least one success should be higher with "multiple shots on goal," increasing the efficiency of this undertaking. However, our portfolio simulations show that the risk-adjusted return on investment of parallel discovery is insufficient to attract private-sector funding. Nevertheless, the future cost savings of an effective AD therapy to Medicare and Medicaid far exceed this investment, suggesting that government funding is both essential and financially beneficial. PMID:24944190
Lo, Andrew W; Ho, Carole; Cummings, Jayna; Kosik, Kenneth S
An accurate and precisely annotated genome assembly is a fundamental requirement for functional genomic analysis. Here, the complete DNA sequence and gene annotation of mouse Chromosome 11 was used to test the efficacy of large-scale sequencing for mutation identification. We re-sequenced the 14,000 annotated exons and boundaries from over 900 genes in 41 recessive mutant mouse lines that were isolated
Melissa K. Boles; Bonney M. Wilkinson; Laurens G. Wilming; Bin Liu; Frank J. Probst; Jennifer Harrow; Darren Grafham; Kathryn E. Hentges; Lanette P. Woodward; Andrea Maxwell; Karen Mitchell; Michael D. Risley; Randy Johnson; Karen Hirschi; James R. Lupski; Yosuke Funato; Hiroaki Miki; Pablo Marin-Garcia; Lucy Matthews; Alison J. Coffey; Anne Parker; Tim J. Hubbard; Jane Rogers; Allan Bradley; David J. Adams; Monica J. Justice
Paralogous sequences of the RPB2 gene are demonstrated in the angiosperm order Gentianales. Two different copies were found by using different PCR primer pairs targeting a region that corresponds to exons 22-24 in the Arabi- dopsis RPB2 gene. One of the copies (RPB2-d) lacks introns in this region, whereas the other has introns at locations corresponding to those of green
Bengt Oxelman; Birgitta Bremer
NAD+ is essential for life in all organisms, both as a coenzyme for oxidoreductases and as a source of ADPribosyl groups used in various reactions, including those that retard aging in experimental systems. Nicotinic acid and nicotinamide were defined as the vitamin precursors of NAD+ in Elvehjem's classic discoveries of the 1930s. The accepted view of eukaryotic NAD+ biosynthesis, that all anabolism flows through nicotinic acid mononucleotide, was challenged experimentally and revealed that nicotinamide riboside is an unanticipated NAD+ precursor in yeast. Nicotinamide riboside kinases from yeast and humans essential for this pathway were identified and found to be highly specific for phosphorylation of nicotinamide riboside and the cancer drug tiazofurin. Nicotinamide riboside was discovered as a nutrient in milk, suggesting that nicotinamide riboside is a useful compound for elevation of NAD+ levels in humans. PMID:15137942
Bieganowski, Pawel; Brenner, Charles
Background Kiwifruit (Actinidia spp.) are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have produced a collection of 132,577 expressed sequence tags (ESTs). Results The ESTs were derived mainly from four Actinidia species (A. chinensis, A. deliciosa, A. arguta and A. eriantha) and fell into 41,858 non redundant clusters (18,070 tentative consensus sequences and 23,788 EST singletons). Analysis of flavor and fragrance-related gene families (acyltransferases and carboxylesterases) and pathways (terpenoid biosynthesis) is presented in comparison with a chemical analysis of the compounds present in Actinidia including esters, acids, alcohols and terpenes. ESTs are identified for most genes in color pathways controlling chlorophyll degradation and carotenoid biosynthesis. In the health area, data are presented on the ESTs involved in ascorbic acid and quinic acid biosynthesis showing not only that genes for many of the steps in these pathways are represented in the database, but that genes encoding some critical steps are absent. In the convenience area, genes related to different stages of fruit softening are identified. Conclusion This large EST resource will allow researchers to undertake the tremendous challenge of understanding the molecular basis of genetic diversity in the Actinidia genus as well as provide an EST resource for comparative fruit genomics. The various bioinformatics analyses we have undertaken demonstrates the extent of coverage of ESTs for genes encoding different biochemical pathways in Actinidia.
Crowhurst, Ross N; Gleave, Andrew P; MacRae, Elspeth A; Ampomah-Dwamena, Charles; Atkinson, Ross G; Beuning, Lesley L; Bulley, Sean M; Chagne, David; Marsh, Ken B; Matich, Adam J; Montefiori, Mirco; Newcomb, Richard D; Schaffer, Robert J; Usadel, Bjorn; Allan, Andrew C; Boldingh, Helen L; Bowen, Judith H; Davy, Marcus W; Eckloff, Rheinhart; Ferguson, A Ross; Fraser, Lena G; Gera, Emma; Hellens, Roger P; Janssen, Bart J; Klages, Karin; Lo, Kim R; MacDiarmid, Robin M; Nain, Bhawana; McNeilage, Mark A; Rassam, Maysoon; Richardson, Annette C; Rikkerink, Erik HA; Ross, Gavin S; Schroder, Roswitha; Snowden, Kimberley C; Souleyre, Edwige JF; Templeton, Matt D; Walton, Eric F; Wang, Daisy; Wang, Mindy Y; Wang, Yanming Y; Wood, Marion; Wu, Rongmei; Yauk, Yar-Khing; Laing, William A
SUMMARY Azaphilones are a class of fungal metabolites characterized by a highly oxygenated pyrano-quinone bicyclic core and exhibits a broad range of bioactivities. While widespread among various fungi, their biosynthesis has not been thoroughly elucidated. By activation of a silent (aza) gene cluster in Aspergillus niger ATCC 1015, we have discovered six new azaphilone compounds, azanigerones A-F (1, 3-7). Transcriptional analysis and deletion of a key polyketide synthase (PKS) gene further confirmed the involvement of the aza gene cluster. The biosynthetic pathway was shown to involve the convergent actions of a highly-reducing and a non-reducing PKSs. Most significantly, in vitro reaction of a key flavin-dependent monooxygenase encoded in the cluster with an early benzaldehyde intermediate revealed its roles in hydroxylation and pyran-ring formation to afford the characteristic bicylic core shared by azaphilones.
Zabala, Angelica O.; Xu, Wei; Chooi, Yit-Heng; Tang, Yi
Today's security threats are being met with 30-year old radiation technology. Discovery of new radiation detection materials is currently a slow and Edisonian process. With heightened concerns over nuclear proliferation, terrorism and unconventional warfare, an alternative strategy for identification and development of potential radiation detection materials must be adopted. Through the Radiation Detection Materials Discovery Initiative, PNNL focuses on the science-based discovery of next generation materials for radiation detection by addressing three ``grand challenges'': fundamental understanding of radiation detection, identification of new materials, and accelerating the discovery process. The new initiative has eight projects addressing these challenges, which will be described, including early work, paths forward and the opportunities for collaboration.
Rationale: Left ventricular hypertrophy (LVH) is a heritable predictor of cardiovascular disease, particularly in blacks. Objective: Determine the feasibility of combining evidence from two distinct but complementary experimental approaches to identify novel genetic predictors of increased LV mass. Methods: Whole-exome sequencing (WES) was conducted in seven African-American sibling trios ascertained on high average familial LV mass indexed to height (LVMHT) using Illumina HiSeq technology. Identified missense or nonsense (MS/NS) mutations were examined for association with LVMHT using linear mixed models adjusted for age, sex, body weight, and familial relationship. To functionally assess WES findings, human induced pluripotent stem cell-derived cardiomyocytes (induced pluripotent stem cell-CM) were stimulated to induce hypertrophy; mRNA sequencing (RNA-seq) was used to determine gene expression differences associated with hypertrophy onset. Statistically significant findings under both experimental approaches identified LVH candidate genes. Candidate genes were further prioritized by seven supportive criteria that included additional association tests (two criteria), regional linkage evidence in the larger HyperGEN cohort (one criterion), and publically available gene and variant based annotations (four criteria). Results: WES reads covered 91% of the target capture region (of size 37.2?MB) with an average coverage of 65×. WES identified 31,426 MS/NS mutations among the 21 individuals. A total of 295 MS/NS variants in 265 genes were associated with LVMHT with q-value <0.25. Of the 265 WES genes, 44 were differentially expressed (P?0.05) in hypertrophied cells. Among the 44 candidate genes identified, 5, including HLA-B, HTT, MTSS1, SLC5A12, and THBS1, met 3 of 7 supporting criteria. THBS1 encodes an adhesive glycoprotein that promotes matrix preservation in pressure-overload LVH. THBS1 gene expression was 34% higher in hypertrophied cells (P?=?0.0003) and a predicted conserved and damaging NS variant in exon 13 (A2099G) was significantly associated with LVHMT (P?=?4?×?10?6). Conclusion: Combining evidence from cutting-edge genetic and cellular experiments can enable identification of novel LVH risk loci.
Zhi, D.; Irvin, M. R.; Gu, C. C.; Stoddard, A. J.; Lorier, R.; Matter, A.; Rao, D. C.; Srinivasasainagendra, V.; Tiwari, H. K.; Turner, A.; Broeckel, U.; Arnett, D. K.
Deinococcus radiodurans (Dr) possesses a prominent ability to repair the DNA injury induced by various DNA-damaging agents including mitomycin C (MC), ultraviolet light (UV) and ionizing radiation. DNA damage resistance was restored in MC sensitive (MCS) mutants 2621 and 3021 by transforming with DNAs of four cosmid clones derived from the gene library of strain KD8301, which showed wild type
Issay Narumi; Korbkit Cherdchu; Shigeru Kitayama; Hiroshi Watanabe
BACKGROUND: Kiwifruit (Actinidia spp.) are a relatively new, but economically important crop grown in many different parts of the world. Commercial success is driven by the development of new cultivars with novel consumer traits including flavor, appearance, healthful components and convenience. To increase our understanding of the genetic diversity and gene-based control of these key traits in Actinidia, we have
Ross N Crowhurst; Andrew P Gleave; Elspeth A MacRae; Charles Ampomah-Dwamena; Ross G Atkinson; Lesley L Beuning; Sean M Bulley; David Chagne; Ken B Marsh; Adam J Matich; Mirco Montefiori; Richard D Newcomb; Robert J Schaffer; Björn Usadel; Andrew C Allan; Helen L Boldingh; Judith H Bowen; Marcus W Davy; Rheinhart Eckloff; A Ross Ferguson; Lena G Fraser; Emma Gera; Roger P Hellens; Bart J Janssen; Karin Klages; Kim R Lo; Robin M MacDiarmid; Bhawana Nain; Mark A McNeilage; Maysoon Rassam; Annette C Richardson; Erik HA Rikkerink; Gavin S Ross; Roswitha Schröder; Kimberley C Snowden; Edwige JF Souleyre; Matt D Templeton; Eric F Walton; Daisy Wang; Mindy Y Wang; Yanming Y Wang; Marion Wood; Rongmei Wu; Yar-Khing Yauk; William A Laing
Background Biodiesel or ethanol derived from lipids or starch produced by microalgae may overcome many of the sustainability challenges previously ascribed to petroleum-based fuels and first generation plant-based biofuels. The paucity of microalgae genome sequences, however, limits gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for the non-model microalgae species, Dunaliella tertiolecta, and identify pathways and genes of importance related to biofuel production. Results Next generation DNA pyrosequencing technology applied to D. tertiolecta transcripts produced 1,363,336 high quality reads with an average length of 400 bases. Following quality and size trimming, ~ 45% of the high quality reads were assembled into 33,307 isotigs with a 31-fold coverage and 376,482 singletons. Assembled sequences and singletons were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology (KO) identifiers. These analyses identified the majority of lipid and starch biosynthesis and catabolism pathways in D. tertiolecta. Conclusions The construction of metabolic pathways involved in the biosynthesis and catabolism of fatty acids, triacylglycrols, and starch in D. tertiolecta as well as the assembled transcriptome provide a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character of microalgae-based biofuel feedstock.
Accidental discoveries have been of significant value in the progress of science. Although accidental discoveries are more common in pharmacology and chemistry, other branches of science have also benefited from such discoveries. While most discoveries are the result of persistent research, famous accidental discoveries provide a fascinating…
ABSTRACT Security researchers are applying software reliability models to vulnerability data, in an attempt to model the vulnera- bility discovery process. I show that most current work on these vulnerability discovery models (VDMs) is theoretically unsound. I propose a standard set of definitions relevant to measuring characteristics of vulnerabilities and their discov- ery process. I then describe the theoretical requirements
This article features Friends' Discovery Camp, a program that allows children with and without autism spectrum disorder to learn and play together. In Friends' Discovery Camp, campers take part in sensory-rich experiences, ranging from hands-on activities and performing arts to science experiments and stories teaching social skills. Now in its 7th…
The discovery of penicillin is cited in a discussion of the role of serendipity as it relates to scientific discovery. The importance of sagacity as a personality trait is noted. Successful researchers have questioning minds, are willing to view data from several perspectives, and recognize and appreciate the unexpected. (JW)
Rosenman, Martin F.
Genetics has played only a modest role in drug discovery, but new technologies will radically change this. Whole genome sequencing will identify new drug discovery targets, and emerging methods for the determination of gene function will increase the ability to select robust targets. Detection of single nucleotide polymorphisms and common polymorphisms will enhance the investigation of polygenic diseases and the
Lawrence M Gelbert; Richard E Gregg
Contents: Bibliometric analysis of work on human gene mapping; Medical implications of extensive physical and sequence characterization of the human genome; Mapping the human genome: Some implications; Mapping and sequencing the human genome: Consideratio...
H. F. Judson J. Glover J. L. Heilbron S. R. Reisher T. Friedmann
There is an acceptance that plasmid-based delivery of interfering RNA always generates the intended targeting sequences in cells, making it as specific as its synthetic counterpart. However, recent studies have reported on cellular inefficiencies of the former, especially in light of emerging gene discordance at inter-screen level and across formats. Focusing primarily on the TRC plasmid-based shRNA hairpins, we reasoned that alleged specificities were perhaps compromised due to altered processing; resulting in a multitude of random interfering sequences. For this purpose, we opted to study the processing of hairpin TRCN#40273 targeting CTTN; which showed activity in a miRNA-21 gain-of-function shRNA screen, but inactive when used as an siRNA duplex. Using a previously described walk-through method, we identified 36 theoretical cleavage variants resulting in 78 potential siRNA duplexes targeting 53 genes. We synthesized and tested all of them. Surprisingly, six duplexes targeting ASH1L, DROSHA, GNG7, PRKCH, THEM4, and WDR92 scored as active. QRT-PCR analysis on hairpin transduced reporter cells confirmed knockdown of all six genes, besides CTTN; revealing a surprising 7 gene-signature perturbation by this one single hairpin. We expanded our qRT-PCR studies to 26 additional cell lines and observed unique knockdown profiles associated with each cell line tested; even for those lacking functional DICER1 gene suggesting no obvious dependence on dicer for shRNA hairpin processing; contrary to published models. Taken together, we report on a novel dicer independent, cell-type dependent mechanism for non-specific RNAi gene silencing we coin Alternate Targeting Sequence Generator (ATSG). In summary, ATSG adds another dimension to the already complex interpretation of RNAi screening data, and provides for the first time strong evidence in support of arrayed screening, and questions the scientific merits of performing pooled RNAi screens, where deconvolution of up to genome-scale pools is indispensable for target identification.
Bhinder, Bhavneet; Shum, David; Li, Mu; Ibanez, Glorymar; Vlassov, Alexander V.; Magdaleno, Susan; Djaballah, Hakim
There is an acceptance that plasmid-based delivery of interfering RNA always generates the intended targeting sequences in cells, making it as specific as its synthetic counterpart. However, recent studies have reported on cellular inefficiencies of the former, especially in light of emerging gene discordance at inter-screen level and across formats. Focusing primarily on the TRC plasmid-based shRNA hairpins, we reasoned that alleged specificities were perhaps compromised due to altered processing; resulting in a multitude of random interfering sequences. For this purpose, we opted to study the processing of hairpin TRCN#40273 targeting CTTN; which showed activity in a miRNA-21 gain-of-function shRNA screen, but inactive when used as an siRNA duplex. Using a previously described walk-through method, we identified 36 theoretical cleavage variants resulting in 78 potential siRNA duplexes targeting 53 genes. We synthesized and tested all of them. Surprisingly, six duplexes targeting ASH1L, DROSHA, GNG7, PRKCH, THEM4, and WDR92 scored as active. QRT-PCR analysis on hairpin transduced reporter cells confirmed knockdown of all six genes, besides CTTN; revealing a surprising 7 gene-signature perturbation by this one single hairpin. We expanded our qRT-PCR studies to 26 additional cell lines and observed unique knockdown profiles associated with each cell line tested; even for those lacking functional DICER1 gene suggesting no obvious dependence on dicer for shRNA hairpin processing; contrary to published models. Taken together, we report on a novel dicer independent, cell-type dependent mechanism for non-specific RNAi gene silencing we coin Alternate Targeting Sequence Generator (ATSG). In summary, ATSG adds another dimension to the already complex interpretation of RNAi screening data, and provides for the first time strong evidence in support of arrayed screening, and questions the scientific merits of performing pooled RNAi screens, where deconvolution of up to genome-scale pools is indispensable for target identification. PMID:24987961
Bhinder, Bhavneet; Shum, David; Li, Mu; Ibáñez, Glorymar; Vlassov, Alexander V; Magdaleno, Susan; Djaballah, Hakim
Recently, we described a novel denaturing high-performance liquid chromatography (DHPLC) approach useful for initial detection and identification of crustacean parasites. Because this approach utilizes general primers targeted to conserved regions of the 18S rRNA gene, a priori genetic sequence information on eukaryotic parasites is not required. This distinction provides a significant advantage over specifically targeted PCR assays that do not
Christofer Troedsson; Richard F. Lee; Tina Walters; Vivica Stokes; Karrie Brinkley; Verena Naegele; Marc E. Frischer
The caudate is a subcortical brain structure implicated in many common neurological and psychiatric disorders. To identify specific genes associated with variations in caudate volume, structural magnetic resonance imaging and genome-wide genotypes were acquired from two large cohorts, the Alzheimer's Disease NeuroImaging Initiative (ADNI; N=734) and the Brisbane Adolescent\\/Young Adult Longitudinal Twin Study (BLTS; N=464). In a preliminary analysis of
J L Stein; D P Hibar; S K Madsen; M Khamis; K L McMahon; G I de Zubicaray; N K Hansell; G W Montgomery; N G Martin; M J Wright; A J Saykin; C R Jack; M W Weiner; A W Toga; P M Thompson
Objective Neuroinflammation contributes to the pathogenesis of sporadic Alzheimer’s disease (AD). Variations in genes relevant to inflammation may be candidate genes for AD risk. Whole-genome association studies have identified relevant new and known genes. Their combined effects do not explain 100% of the risk, genetic interactions may contribute. We investigated whether genes involved in inflammation, i.e. PPAR-?, interleukins (IL) IL- 1?, IL-1?, IL-6, and IL-10 may interact to increase AD risk. Methods The Epistasis Project identifies interactions that affect the risk of AD. Genotyping of single nucleotide polymorphisms (SNPs) in PPARA, IL1A, IL1B, IL6 and IL10 was performed. Possible associations were analyzed by fitting logistic regression models with AD as outcome, controlling for centre, age, sex and presence of apolipoprotein ?4 allele (APOE?4). Adjusted synergy factors were derived from interaction terms (p<0.05 two-sided). Results We observed four significant interactions between different SNPs in PPARA and in interleukins IL1A, IL1B, IL10 that may affect AD risk. There were no significant interactions between PPARA and IL6. Conclusions In addition to an association of the PPARA L162V polymorphism with the AD risk, we observed four significant interactions between SNPs in PPARA and SNPs in IL1A, IL1B and IL10 affecting AD risk. We prove that gene-gene interactions explain part of the heritability of AD and are to be considered when assessing the genetic risk. Necessary replications will require between 1450 and 2950 of both cases and controls, depending on the prevalence of the SNP, to have 80% power to detect the observed synergy factors.
Heun, Reinhard; Kolsch, Heike; Ibrahim-Verbaas, Carla A; Combarros, Onofre; Aulchenko, Yurii S; Breteler, Monique; Schuur, Maaike; van Duijn, Cornelia M; Hammond, Naomi; Belbin, Olivia; Cortina-Borja, Mario; Wilcock, Gordon K; Brown, Kristelle; Barber, Rachel; Kehoe, Patrick G; Coto, Eliecer; Alvarez, Victoria; Lehmann, Michael G; Deloukas, Panos; Mateo, Ignacio; Morgan, Kevin; Warden, Donald R; Smith, A David; Lehmann, Donald J
This chapter addresses the topic of knowledge discovery in heterogeneous environments. It begins with an overview of the knowledge- discovery process. Because of the importance of using clean, consistent data in the knowledge-discovery process, the chapte...
M. N. Kamel M. G. Ceruti
Selected works are discussed which clearly demonstrate that mimicking various aspects of the process by which natural products evolved is becoming a powerful tool in contemporary drug discovery. Natural products are an established and rich source of drugs. The term "natural product" is often used synonymously with "secondary metabolite." Knowledge of genetics and molecular evolution helps us understand how biosynthesis of many classes of secondary metabolites evolved. One proposed hypothesis is termed "inventive evolution." It invokes duplication of genes, and mutation of the gene copies, among other genetic events. The modified duplicate genes, per se or in conjunction with other genetic events, may give rise to new enzymes, which, in turn, may generate new products, some of which may be selected for. Steps of the inventive evolution can be mimicked in several ways for purpose of drug discovery. For example, libraries of chemical compounds of any imaginable structure may be produced by combinatorial synthesis. Out of these libraries new active compounds can be selected. In another example, genetic system can be manipulated to produce modified natural products ("unnatural natural products"), from which new drugs can be selected. In some instances, similar natural products turn up in species that are not direct descendants of each other. This is presumably due to a horizontal gene transfer. The mechanism of this inter-species gene transfer can be mimicked in therapeutic gene delivery. Mimicking specifics or principles of chemical evolution including experimental and test-tube evolution also provides leads for new drug discovery. PMID:9949862
Kolb, V M
This manual is intended for use with the PINE (Projects in Imaginative Nature Education) discovery box in elementary school conservation education. The box contains 21 natural specimens which can serve as the starting point for simple student investigations. Specimens and activities are keyed for grade level. For each item, background information…
Busch, Phyllis S.
This paper explores the role of self discovery in the early stages of caregiver professional development, with a focus on the array of choices available to university students. The assumption is that many people do not know their repertoire of skills until asked to complete a project requiring those skills; thus, "the heart of becoming a…
This paper examines a software tool that assists Defence intelligence analysts in efficient discovery and assimilation of large volumes of information derived from a range of sources. The Health INTelligence System (HINTS) is a prototype that was developed during a collaborative Research and Development project between Computer Sciences Corporation (CSC) Australia and the Defence Science and Technology Organisation (DSTO). The
K. Lang; Mark Burnett
Scientists routinely integrate information from various channels to explore topics under study. We designed a 4-wk undergraduate laboratory module that used a multifaceted approach to study a question in molecular genetics. Specifically, students investigated whether Caenorhabditis elegans can be a useful model system for studying genes associated with human disease. In a large-enrollment, sophomore-level laboratory course, groups of three to four students were assigned a gene associated with either breast cancer (brc-1), Wilson disease (cua-1), ovarian dysgenesis (fshr-1), or colon cancer (mlh-1). Students compared observable phenotypes of wild-type C. elegans and C. elegans with a homozygous deletion in the assigned gene. They confirmed the genetic deletion with nested polymerase chain reaction and performed a bioinformatics analysis to predict how the deletion would affect the encoded mRNA and protein. Students also performed RNA interference (RNAi) against their assigned gene and evaluated whether RNAi caused a phenotype similar to that of the genetic deletion. As a capstone activity, students prepared scientific posters in which they presented their data, evaluated whether C. elegans was a useful model system for studying their assigned genes, and proposed future directions. Assessment showed gains in understanding genotype versus phenotype, RNAi, common bioinformatics tools, and the utility of model organisms. PMID:22665589
Cox-Paulson, Elisabeth A; Grana, Theresa M; Harris, Michelle A; Batzli, Janet M
Scientists routinely integrate information from various channels to explore topics under study. We designed a 4-wk undergraduate laboratory module that used a multifaceted approach to study a question in molecular genetics. Specifically, students investigated whether Caenorhabditis elegans can be a useful model system for studying genes associated with human disease. In a large-enrollment, sophomore-level laboratory course, groups of three to four students were assigned a gene associated with either breast cancer (brc-1), Wilson disease (cua-1), ovarian dysgenesis (fshr-1), or colon cancer (mlh-1). Students compared observable phenotypes of wild-type C. elegans and C. elegans with a homozygous deletion in the assigned gene. They confirmed the genetic deletion with nested polymerase chain reaction and performed a bioinformatics analysis to predict how the deletion would affect the encoded mRNA and protein. Students also performed RNA interference (RNAi) against their assigned gene and evaluated whether RNAi caused a phenotype similar to that of the genetic deletion. As a capstone activity, students prepared scientific posters in which they presented their data, evaluated whether C. elegans was a useful model system for studying their assigned genes, and proposed future directions. Assessment showed gains in understanding genotype versus phenotype, RNAi, common bioinformatics tools, and the utility of model organisms.
Cox-Paulson, Elisabeth A.; Grana, Theresa M.; Harris, Michelle A.; Batzli, Janet M.
We present a novel laboratory project employing "real-time" RT-qPCR to measure the effect of environment on the expression of the "FLOWERING LOCUS C" gene, a key regulator of floral timing in "Arabidopsis thaliana" plants. The project requires four 3-hr laboratory sessions and is aimed at upper-level undergraduate…
Eickelberg, Garrett J.; Fisher, Alison J.
The Generate, Prune & Prove (GPP) methodology for discovering definitions of mathematical operators is introduced. GPP is a task within the IL exploration discovery system. We developed GPP for use in the discovery of mathematical operators with a wider class of representations than was possible with the previous methods by Lenat and by Shen. GPP utilizes the purpose for which an operator is created to prune the possible definitions. The relevant search spaces are immense and there exists insufficient information for a complete evaluation of the purpose constraint, so it is necessary to perform a partial evaluation of the purpose (i.e., pruning) constraint. The constraint is first transformed so that it is operational with respect to the partial information, and then it is applied to examples in order to test the generated candidates for an operator's definition. In the GPP process, once a candidate definition survives this empirical prune, it is passed on to a theorem prover for formal verification. We describe the application of this methodology to the (re)discovery of the definition of multiplication for Conway numbers, a discovery which is difficult for human mathematicians. We successfully model this discovery process utilizing information which was reasonably available at the time of Conway's original discovery. As part of this discovery process, we reduce the size of the search space from a computationally intractable size to 3468 elements.
Sims, Michael H.; Bresina, John L.
In the laboratory, students can actively explore concepts and experience the nature of scientific research. We have devised a 5-wk laboratory project in our introductory college biology course whose aim was to improve understanding in five major concepts that are central to basic cellular, molecular biology, and genetics while teaching molecular biology techniques. The project was focused on the production of adenine in Saccharomyces cerevisiae and investigated the nature of mutant red colonies of this yeast. Students created red mutants from a wild-type strain, amplified the two genes capable of giving rise to the red phenotype, and then analyzed the nucleotide sequences. A quiz assessing student understanding in the five areas was given at the start and the end of the course. Analysis of the quiz showed significant improvement in each of the areas. These areas were taught in the laboratory and the classroom; therefore, students were surveyed to determine whether the laboratory played a role in their improved understanding of the five areas. Student survey data demonstrated that the laboratory did have an important role in their learning of the concepts. This project simulated steps in a research project and could be adapted for an advanced course in genetics.
Silveira, Linda A.
Data mining of available biomedical data and information has greatly boosted target discovery in the 'omics' era. Target discovery is the key step in the biomarker and drug discovery pipeline to diagnose and fight human diseases. In biomedical science, the 'target' is a broad concept ranging from molecular entities (such as genes, proteins and miRNAs) to biological phenomena (such as molecular functions, pathways and phenotypes). Within the context of biomedical science, data mining refers to a bioinformatics approach that combines biological concepts with computer tools or statistical methods that are mainly used to discover, select and prioritize targets. In response to the huge demand of data mining for target discovery in the 'omics' era, this review explicates various data mining approaches and their applications to target discovery with emphasis on text and microarray data analysis. Two emerging data mining approaches, chemogenomic data mining and proteomic data mining, are briefly introduced. Also discussed are the limitations of various data mining approaches found in the level of database integration, the quality of data annotation, sample heterogeneity and the performance of analytical and mining tools. Tentative strategies of integrating different data sources for target discovery, such as integrated text mining with high-throughput data analysis and integrated mining with pathway databases, are introduced. PMID:22178890
Yang, Yongliang; Adelstein, S James; Kassis, Amin I
Objective ABT-751, a novel orally available antitubulin agent, is mainly eliminated as inactive glucuronide (ABT-751G) and sulfate (ABT-751S) conjugates. We performed a pharmacogenetic investigation of ABT-751 pharmacokinetics using in-vitro data to guide the selection of genes for genotyping in a phase I trial of ABT-751. Methods UDP-glucuronosyltransferase (UGT) and sulfotransferase (SULT) enzymes were screened for ABT-751 metabolite formation in vitro. Forty-seven cancer patients treated with ABT-751 were genotyped for 21 variants in these genes. Results UGT1A1, UGT1A4, UGT1A8, UGT2B7, and SULT1A1 were found to be involved in the formation of inactive ABT-751 glucuronide (ABT-751G) and sulfate (ABT-751S). SULT1A1 copy number (> 2) was associated with an average 34% increase in ABT-751 clearance (P= 0.044), an 18% reduction in ABT-751 AUC (P = 0.045), and a 50% increase in sulfation metabolic ratios (P=0.025). UGT1A8 rs6431558 was associated with a 28% increase in glucuronidation metabolic ratios (P =0.022), and UGT1A4*2 was associated with a 65% decrease in ABT-751 Ctrough (P = 0.009). Conclusion These results might represent the first example of a clinical pharmacokinetic effect of the SULT1A1 copy number variant on the clearance of a SULT1A1 substrate. A-priori selection of candidate genes guided by in-vitro metabolic screening enhanced our ability to identify genetic determinants of interpatient pharmacokinetic variability.
Innocenti, Federico; Ramirez, Jacqueline; Obel, Jennifer; Xiong, Julia; Mirkov, Snezana; Chiu, Yi-Lin; Katz, David A.; Carr, Robert A.; Zhang, Wei; Das, Soma; Adjei, Araba; Moyer, Ann M.; Chen, Pei Xian; Krivoshik, Andrew; Medina, Diane; Gordon, Gary B.; Ratain, Mark J.; Sahelijo, Leonardo; Weinshilboum, Richard M.; Fleming, Gini F.; Bhathena, Anahita
During early embryogenesis the zygotic genome is transcriptionally silent and all mRNAs present are of maternal origin. The maternal-zygotic transition marks the time over which embryogenesis changes its dependence from maternal RNAs to zygotically transcribed RNAs. Here we present the first systematic investigation of early zygotic genes (EZGs) in a mosquito species and focus on genes involved in the onset of transcription during 2-4 hr. We used transcriptome sequencing to identify the "pure" (without maternal expression) EZGs by analyzing transcripts from four embryonic time ranges of 0-2, 2-4, 4-8, and 8-12 hr, which includes the time of cellular blastoderm formation and up to the start of gastrulation. Blast of 16,789 annotated transcripts vs. the transcriptome reads revealed evidence for 63 (P<0.001) and 143 (P<0.05) nonmaternally derived transcripts having a significant increase in expression at 2-4 hr. One third of the 63 EZG transcripts do not have predicted introns compared to 10% of all Ae. aegypti genes. We have confirmed by RT-PCR that zygotic transcription starts as early as 2-3 hours. A degenerate motif VBRGGTA was found to be overrepresented in the upstream sequences of the identified EZGs using a motif identification software called SCOPE. We find evidence for homology between this motif and the TAGteam motif found in Drosophila that has been implicated in EZG activation. A 38 bp sequence in the proximal upstream sequence of a kinesin light chain EZG (KLC2.1) contains two copies of the mosquito motif. This sequence was shown to support EZG transcription by luciferase reporter assays performed on injected early embryos, and confers early zygotic activity to a heterologous promoter from a divergent mosquito species. The results of these studies are consistent with the model of early zygotic genome activation via transcriptional activators, similar to what has been found recently in Drosophila. PMID:22457801
Biedler, James K; Hu, Wanqi; Tae, Hongseok; Tu, Zhijian
During early embryogenesis the zygotic genome is transcriptionally silent and all mRNAs present are of maternal origin. The maternal-zygotic transition marks the time over which embryogenesis changes its dependence from maternal RNAs to zygotically transcribed RNAs. Here we present the first systematic investigation of early zygotic genes (EZGs) in a mosquito species and focus on genes involved in the onset of transcription during 2–4 hr. We used transcriptome sequencing to identify the “pure” (without maternal expression) EZGs by analyzing transcripts from four embryonic time ranges of 0–2, 2–4, 4–8, and 8–12 hr, which includes the time of cellular blastoderm formation and up to the start of gastrulation. Blast of 16,789 annotated transcripts vs. the transcriptome reads revealed evidence for 63 (P<0.001) and 143 (P<0.05) nonmaternally derived transcripts having a significant increase in expression at 2–4 hr. One third of the 63 EZG transcripts do not have predicted introns compared to 10% of all Ae. aegypti genes. We have confirmed by RT-PCR that zygotic transcription starts as early as 2–3 hours. A degenerate motif VBRGGTA was found to be overrepresented in the upstream sequences of the identified EZGs using a motif identification software called SCOPE. We find evidence for homology between this motif and the TAGteam motif found in Drosophila that has been implicated in EZG activation. A 38 bp sequence in the proximal upstream sequence of a kinesin light chain EZG (KLC2.1) contains two copies of the mosquito motif. This sequence was shown to support EZG transcription by luciferase reporter assays performed on injected early embryos, and confers early zygotic activity to a heterologous promoter from a divergent mosquito species. The results of these studies are consistent with the model of early zygotic genome activation via transcriptional activators, similar to what has been found recently in Drosophila.
Biedler, James K.; Hu, Wanqi; Tae, Hongseok; Tu, Zhijian
Describes Project COLD (Climate, Ocean, Land, Discovery) a scientific study of the Polar Regions, a collection of 35 modules used within the framework of existing subjects: oceanography, biology, geology, meterology, geography, social science. Includes a partial list of topics and one activity (geodesic dome) from a module. (Author/SK)
Kazanjian, Wendy C.
NAGS catalyzes the conversion of glutamate and acetyl-CoA to N-acetylglutamate (NAG) the essential allosteric activator of carbamyl phosphate synthetase I, the first urea cycle enzyme in mammals. A 17-year-old female with recurrent hyperammonemia attacks, the cause of which remained undiagnosed for 8 years in spite of multiple molecular and biochemical investigations, showed markedly enhanced ureagenesis (measured by isotope incorporation) in response to N-carbamylglutamate (NCG). This led to sequencing of the regulatory regions of the NAG synthase (NAGS) gene and identification of a deleterious single-base substitution in the upstream enhancer. The homozygous mutation (-3063C>A), affecting a highly conserved nucleotide within the Hepatic Nuclear Factor 1 (HNF-1) binding site, was not found in SNP databases and in a screen of 1086 alleles from a diverse population. Functional assays demonstrated that this mutation decreases transcription and binding of HNF-1 to the NAGS gene, while a consensus HNF-1 binding sequence enhances binding to HNF-1 and increases transcription. Oral daily NCG therapy appears to have restored ureagenesis in this patient, normalizing her biochemical markers, and allowing discontinuation of alternate pathway therapy and normalization of her diet with no recurrence of hyperammonemia.
Heibel, Sandra K.; Mew, Nicholas Ah; Caldovic, Ljubica; Daikhin, Yevgeny; Yudkoff, Marc; Tuchman, Mendel
More and more application domains, from financial market analysis to weatherprediction, from monitoring supermarket purchases to monitoring satellite images, arebecomingly increasingly data-intensive. The result is massive databases that are growingat a rapid rate - it has been estimated that the worldÃ¢Â¬Â\\
James Clifford; Vasant Dhar; Alex Tuzhilin
The Physiome Project seeks to facilitate the exchange of information among the research community and to "speed up the discovery of how biological systems work." By acting as a central repository for data and computational models, the project hopes to advance the study of the physiome for the benefit of the scientific community. Along with a description of the physiome and the project, the Web site will include various databases and models organized into categories based on body systems. The site is still under construction, so many of the categories are not yet active. There are also opportunities for researchers to contribute to the project in a variety of ways.
This is the home page of Project Exploration, a living classroom that involves the public and students, especially city kids and girls, in scientific discoveries involving paleontology. Project Exploration was founded in 1999 to make science accessible to the public, with a special focus on city kids and girls. Interactive exhibits, labs, online journals, unique science programs and interactions with real scientists help people go beyond the edge of science.
Scientific and technical journals in biology and medicine in recent years have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part…
Congress of the U.S., Washington, DC. Office of Technology Assessment.
Cost and schedule overruns are often caused by poor requirements that are produced by people who do not understand the requirement process. This paper provides a high-level overview of the requirements discovery process.
Bahill, A.T. [Univ. of Arizona, Tucson, AZ (United States). Systems and Industrial Engineering; Dean, F.F. [Sandia National Lab., Albuquerque, NM (United States)
Discusses recently published work that appears to have many of the answers to the question of how the nervous system develops. Focuses on the discovery of what is believed to be neural inducer, a protein called noggin. (LZ)
Oppenheimer, Steven B.
This paper describes the substructure discovery method used in the SUBDUE system. The method involves a computationally constrained best-first search guided by four heuristics: cognitive savings, compactness, connectivity and coverage. The two main proces...
L. B. Holder
NASA's Discovery Program represents an new era in planetary exploration. Discovery's primary goal: to maintain U.S. scientific leadership in planetary research by conducting a series of highly focused, cost effective missions to answer critical questions in solar system science. The Program will stimulate the development of innovative management approaches by encouraging new teaming arrangements among industry, universities and the government. The program encourages the prudent use of new technologies to enable/enhance science return and to reduce life cycle cost, and it supports the transfer of these technologies to the private sector for secondary applications. The Near-Earth Asteroid Rendezvous and Mars Pathfinder missions have been selected as the first two Discovery missions. Both will be launched in 1996. Subsequent, competitively selected missions will be conceived and proposed to NASA by teams of scientists and engineers from industry, academia, and government organizations. This paper summarizes the status of Discovery Program planning.
Kicza, Mary; Bruegge, Richard Vorder
Chlorophyll d-producing cyanobacteria are a recently described group of phototrophic bacteria that is a major focus of photosynthesis research, previously known only from marine environments in symbiosis with eukaryotes. We have discovered a free-living member of this group from a eutrophic hypersaline lake. Phylogenetic analyses indicated these strains are closely related to each other but not to prochlorophyte cyanobacteria that also use an alternative form of chlorophyll as the major light-harvesting pigment. We have also demonstrated that these bacteria acquired a fragment of the small-subunit rRNA gene encoding a conserved hairpin in the bacterial ribosome from a proteobacterial donor at least 10 million years before the present. Thus, our most widely used phylogenetic marker can be a mosaic of sequence fragments with widely divergent evolutionary histories.
Miller, Scott R.; Augustine, Sunny; Olson, Tien Le; Blankenship, Robert E.; Selker, Jeanne; Wood, A. Michelle
Chickpea (Cicer arietinum) is an important food legume crop but lags in the availability of genomic resources. In this study, we have generated about 2 million high-quality sequences of average length of 372 bp using pyrosequencing technology. The optimization of de novo assembly clearly indicated that hybrid assembly of long-read and short-read primary assemblies gave better results. The hybrid assembly generated a set of 34,760 transcripts with an average length of 1,020 bp representing about 4.8% (35.5 Mb) of the total chickpea genome. We identified more than 4,000 simple sequence repeats, which can be developed as functional molecular markers in chickpea. Putative function and Gene Ontology terms were assigned to at least 73.2% and 71.0% of chickpea transcripts, respectively. We have also identified several chickpea transcripts that showed tissue-specific expression and validated the results using real-time polymerase chain reaction analysis. Based on sequence comparison with other species within the plant kingdom, we identified two sets of lineage-specific genes, including those conserved in the Fabaceae family (legume specific) and those lacking significant similarity with any non chickpea species (chickpea specific). Finally, we have developed a Web resource, Chickpea Transcriptome Database, which provides public access to the data and results reported in this study. The strategy for optimization of de novo assembly presented here may further facilitate the transcriptome sequencing and characterization in other organisms. Most importantly, the data and results reported in this study will help to accelerate research in various areas of genomics and implementing breeding programs in chickpea.
Garg, Rohini; Patel, Ravi K.; Jhanwar, Shalu; Priya, Pushp; Bhattacharjee, Annapurna; Yadav, Gitanjali; Bhatia, Sabhyata; Chattopadhyay, Debasis; Tyagi, Akhilesh K.; Jain, Mukesh
The striped catfish (Pangasianodon hypophthalmus) culture industry in the Mekong Delta in Vietnam has developed rapidly over the past decade. The culture industry now however, faces some significant challenges, especially related to climate change impacts notably from predicted extensive saltwater intrusion into many low topographical coastal provinces across the Mekong Delta. This problem highlights a need for development of culture stocks that can tolerate more saline culture environments as a response to expansion of saline water-intruded land. While a traditional artificial selection program can potentially address this need, understanding the genomic basis of salinity tolerance can assist development of more productive culture lines. The current study applied a transcriptomic approach using Ion PGM technology to generate expressed sequence tag (EST) resources from the intestine and swim bladder from striped catfish reared at a salinity level of 9ppt which showed best growth performance. Total sequence data generated was 467.8Mbp, consisting of 4,116,424 reads with an average length of 112bp. De novo assembly was employed that generated 51,188 contigs, and allowed identification of 16,116 putative genes based on the GenBank non-redundant database. GO annotation, KEGG pathway mapping, and functional annotation of the EST sequences recovered with a wide diversity of biological functions and processes. In addition, more than 11,600 simple sequence repeats were also detected. This is the first comprehensive analysis of a striped catfish transcriptome, and provides a valuable genomic resource for future selective breeding programs and functional or evolutionary studies of genes that influence salinity tolerance in this important culture species. PMID:24841517
Thanh, Nguyen Minh; Jung, Hyungtaek; Lyons, Russell E; Chand, Vincent; Tuan, Nguyen Viet; Thu, Vo Thi Minh; Mather, Peter
In this lesson, learners will use History of Discovery cards and interpretive skits to examine how scientists throughout history have explored Saturn. The lesson enables students to discern the multicultural nature of scientific inquiry and to see how technology improvements increase our ability to solve scientific mysteries. The lesson also prepares students to create and interpret their own timelines spanning the years 1610 to 2010. The timelines depict scientists, technologies, and discoveries. This is lesson 4 of 6 in the Saturn Educators Guide.
The wide diversity of routes to astronomical, astrophysical and cosmological discovery is discussed through a number of historical case studies. Prime ingredients for success include new technology, precision observation, extensive databases, capitalising upon discoveries in cognate disciplines, imagination and luck. Being in the right place at the right time is a huge advantage. The changing perspectives on the essential tools for tackling frontier problems and astronomical advance are discussed.
We have developed a new class of cloning vectors: lambda-full-length cDNA (lambda-FLC) cloning vectors. These vectors can be bulk-excised for preparing full-length cDNA libraries in which a high proportion of the plasmids carry large inserts that can be transferred into other (for example, functional) vectors. Unlike other cloning vectors, lambda-FLC vectors accommodate a broad range of sizes of eukaryotic cDNA inserts because they contain "size balancers." Further, the main protocol we use for direct bulk excision of plasmids is mediated by a Cre-lox system and is apparently free of size bias. The average size of the inserts from excised plasmid cDNA libraries was 2.9 kb for standard and 6.9 kb for size-selected cDNA. The average insert size of the full-length cDNA libraries was correlated to the rate of new gene discovery, suggesting that effectively cloning rarely expressed mRNAs requires vectors that can accommodate large inserts from a variety of sources. Part of the vectors are also suitable for bulk transfer of inserts into various functional vectors. PMID:11543636
Carninci, P; Shibata, Y; Hayatsu, N; Itoh, M; Shiraki, T; Hirozane, T; Watahiki, A; Shibata, K; Konno, H; Muramatsu, M; Hayashizaki, Y
As a gateway for scientific discovery, the Argonne Leadership Computing Facility (ALCF) works hand in hand with the world's best computational scientists to advance research in a diverse span of scientific domains, ranging from chemistry, applied mathematics, and materials science to engineering physics and life sciences. Sponsored by the U.S. Department of Energy's (DOE) Office of Science, researchers are using the IBM Blue Gene/L supercomputer at the ALCF to study and explore key scientific problems that underlie important challenges facing our society. For instance, a research team at the University of California-San Diego/ SDSC is studying the molecular basis of Parkinson's disease. The researchers plan to use the knowledge they gain to discover new drugs to treat the disease and to identify risk factors for other diseases that are equally prevalent. Likewise, scientists from Pratt & Whitney are using the Blue Gene to understand the complex processes within aircraft engines. Expanding our understanding of jet engine combustors is the secret to improved fuel efficiency and reduced emissions. Lessons learned from the scientific simulations of jet engine combustors have already led Pratt & Whitney to newer designs with unprecedented reductions in emissions, noise, and cost of ownership. ALCF staff members provide in-depth expertise and assistance to those using the Blue Gene/L and optimizing user applications. Both the Catalyst and Applications Performance Engineering and Data Analytics (APEDA) teams support the users projects. In addition to working with scientists running experiments on the Blue Gene/L, we have become a nexus for the broader global community. In partnership with the Mathematics and Computer Science Division at Argonne National Laboratory, we have created an environment where the world's most challenging computational science problems can be addressed. Our expertise in high-end scientific computing enables us to provide guidance for applications that are transitioning to petascale as well as to produce software that facilitates their development, such as the MPICH library, which provides a portable and efficient implementation of the MPI standard--the prevalent programming model for large-scale scientific applications--and the PETSc toolkit that provides a programming paradigm that eases the development of many scientific applications on high-end computers.
Beckman, P.; Dave, P.; Drugan, C.
Drugs that enhance a process called oxidative stress were found to kill rhabdomyosarcoma tumor cells growing in the laboratory and possibly bolstered the effectiveness of chemotherapy against this aggressive tumor of muscle and other soft tissue. The findings are the latest from the St. Jude Children’s Research Hospital–Washington University Pediatric Cancer Genome Project and appear in the December 9 edition of the scientific journal Cancer Cell.
Antisense technology provides a high-throughput and systematic approach to drug target validation and gene function discovery. In combination with other emerging technologies (such as microarrays), this technology will enable efficient evaluation of the sequence data generated by the Human Genome Project. The authors review recent advances in the antisense field and discuss the potential use of antisense technology for functional
Margaret F Taylor; Kristin Wiederholt; Fran Sverdrup
Even the most cursory explorations into how scientific discoveries are made reveals that many of these discoveries are tinged with a certain serendipity and circumstances that are not immediately attributable to a wholly reasoned and logical progression of methodical experiments. Presented by the American Institute of Physics, this online multimedia exhibit tells the story of two important 20th century scientific discoveries: the discovery of nuclear fission and the detection of the first optical pulsar. The discovery of nuclear fission section contains audio clips from some of those responsible for this scientific endeavor, including Enrico Fermi, Arthur Holly Compton, and Otto Hahn. One particularly noteworthy clip features Compton's firsthand recollection of the first successful self-sustaining nuclear chain reaction under the bleacher of Stagg Field on the campus of the University of Chicago. The second exhibit hones in on the detection of the first optical pulsar, and includes clips from Philip Morrison, John Cocke, and Michael Disney. The site is rounded out by a set of teachers' guides designed to complement these online exhibits.
Here we describe KODAMA (knowledge discovery by accuracy maximization), an unsupervised and semisupervised learning algorithm that performs feature extraction from noisy and high-dimensional data. Unlike other data mining methods, the peculiarity of KODAMA is that it is driven by an integrated procedure of cross-validation of the results. The discovery of a local manifold's topology is led by a classifier through a Monte Carlo procedure of maximization of cross-validated predictive accuracy. Briefly, our approach differs from previous methods in that it has an integrated procedure of validation of the results. In this way, the method ensures the highest robustness of the obtained solution. This robustness is demonstrated on experimental datasets of gene expression and metabolomics, where KODAMA compares favorably with other existing feature extraction methods. KODAMA is then applied to an astronomical dataset, revealing unexpected features. Interesting and not easily predictable features are also found in the analysis of the State of the Union speeches by American presidents: KODAMA reveals an abrupt linguistic transition sharply separating all post-Reagan from all pre-Reagan speeches. The transition occurs during Reagan's presidency and not from its beginning. PMID:24706821
Cacciatore, Stefano; Luchinat, Claudio; Tenori, Leonardo
This site describes what early civilizations knew about our solar system and how astronomy developed over the centuries. The early theories describing the movements of the planets, development of the first telescopes, and discoveries of the planets Uranus, Neptune and Pluto are some of the topics addressed in Discovery. Here you will find the Pluto discovery plate, the photographic plate taken the day Pluto's position was discovered by Clyde Tombaugh. Other topics covered at this site include: the Renaissance with the ideas of Copernicus and Kepler; the age of the telescope, which traces its development; Galileo, who is credited with discovering the moons of Jupiter, phases of Venus, and the craters on the Moon; and planetary satellites.