USDA-ARS?s Scientific Manuscript database
This article documents the addition of 220 microsatellite marker loci to the Molecular Ecology Resources Database. Loci were developed for the following species: Allanblackia floribunda, Amblyraja radiata, Bactrocera cucurbitae, Brachycaudus helichrysi, Calopogonium mucunoides, Dissodactylus primiti...
ABREU, ALUANA G.; ALBAINA, A.; ALPERMANN, TILMAN J.; APKENAS, VANESSA E.; BANKHEAD-DRONNET, S.; BERGEK, SARA; BERUMEN, MICHAEL L.; CHO, CHANG-HUNG; CLOBERT, JEAN; COULON, AURÉLIE; DE FERAUDY, D.; ESTONBA, A.; HANKELN, THOMAS; HOCHKIRCH, AXEL; HSU, TSAI-WEN; HUANG, TSURNG-JUHN; IRIGOIEN, X.; IRIONDO, M.; KAY, KATHLEEN M.; KINITZ, TIM; KOTHERA, LINDA; LE HÉNANFF, MAXIME; LIEUTIER, F.; LOURDAIS, OLIVIER; MACRINI, CAMILA M. T.; MANZANO, C.; MARTIN, C.; MORRIS, VERONICA R. F.; NANNINGA, GERRIT; PARDO, M. A.; PLIESKE, JÖRG; POINTEAU, S.; PRESTEGAARD, TORE; QUACK, MARKUS; RICHARD, MURIELLE; SAVAGE, HARRY M.; SCHWARCZ, KAISER D.; SHADE, JESSICA; SIMMS, ELLEN L.; SOLFERINI, VERA N.; STEVENS, VIRGINIE M.; VEITH, MICHAEL; WEN, MEI-JUAN; WICKER, FLORIAN; YOST, JENNIFER M.; ZARRAONAINDIA, I.
2017-01-01
This article documents the addition of 139 microsatellite marker loci and 90 pairs of single-nucleotide polymorphism sequencing primers to the Molecular Ecology Resources Database. Loci were developed for the following species: Aglaoctenus lagotis, Costus pulverulentus, Costus scaber, Culex pipiens, Dascyllus marginatus, Lupinus nanus Benth, Phloeomyzus passerini, Podarcis muralis, Rhododendron rubropilosum Hayata var. taiwanalpinum and Zoarces viviparus. These loci were cross-tested on the following species: Culex quinquefasciatus, Rhododendron pseudochrysanthum Hay. ssp. morii (Hay.) Yamazaki and R. pseudochrysanthum Hayata. This article also documents the addition of 48 sequencing primer pairs and 90 allele-specific primers for Engraulis encrasicolus. PMID:22296658
An, Junghwa; Bechet, Arnaud; Berggren, Asa; Brown, Sarah K; Bruford, Michael W; Cai, Qingui; Cassel-Lundhagen, Anna; Cezilly, Frank; Chen, Song-Lin; Cheng, Wei; Choi, Sung-Kyoung; Ding, X Y; Fan, Yong; Feldheim, Kevin A; Feng, Z Y; Friesen, Vicki L; Gaillard, Maria; Galaraza, Juan A; Gallo, Leonardo; Ganeshaiah, K N; Geraci, Julia; Gibbons, John G; Grant, William S; Grauvogel, Zac; Gustafsson, S; Guyon, Jeffrey R; Han, L; Heath, Daniel D; Hemmilä, S; Hogan, J Derek; Hou, B W; Jakse, Jernej; Javornik, Branka; Kaňuch, Peter; Kim, Kyung-Kil; Kim, Kyung-Seok; Kim, Sang-Gyu; Kim, Sang-In; Kim, Woo-Jin; Kim, Yi-Kyung; Klich, Maren A; Kreiser, Brian R; Kwan, Ye-Seul; Lam, Athena W; Lasater, Kelly; Lascoux, M; Lee, Hang; Lee, Yun-Sun; Li, D L; Li, Shao-Jing; Li, W Y; Liao, Xiaolin; Liber, Zlatko; Lin, Lin; Liu, Shaoying; Luo, Xin-Hui; Ma, Y H; Ma, Yajun; Marchelli, Paula; Min, Mi-Sook; Moccia, Maria Domenica; Mohana, Kumara P; Moore, Marcelle; Morris-Pocock, James A; Park, Han-Chan; Pfunder, Monika; Ivan, Radosavljević; Ravikanth, G; Roderick, George K; Rokas, Antonis; Sacks, Benjamin N; Saski, Christopher A; Satovic, Zlatko; Schoville, Sean D; Sebastiani, Federico; Sha, Zhen-Xia; Shin, Eun-Ha; Soliani, Carolina; Sreejayan, N; Sun, Zhengxin; Tao, Yong; Taylor, Scott A; Templin, William D; Shaanker, R Uma; Vasudeva, R; Vendramin, Giovanni G; Walter, Ryan P; Wang, Gui-Zhong; Wang, Ke-Jian; Wang, Y Q; Wattier, Rémi A; Wei, Fuwen; Widmer, Alex; Woltmann, Stefan; Won, Yong-Jin; Wu, Jing; Xie, M L; Xu, Genbo; Xu, Xiao-Jun; Ye, Hai-Hui; Zhan, Xiangjiang; Zhang, F; Zhong, J
2010-03-01
This article documents the addition of 411 microsatellite marker loci and 15 pairs of Single Nucleotide Polymorphism (SNP) sequencing primers to the Molecular Ecology Resources Database. Loci were developed for the following species: Acanthopagrus schlegeli, Anopheles lesteri, Aspergillus clavatus, Aspergillus flavus, Aspergillus fumigatus, Aspergillus oryzae, Aspergillus terreus, Branchiostoma japonicum, Branchiostoma belcheri, Colias behrii, Coryphopterus personatus, Cynogolssus semilaevis, Cynoglossus semilaevis, Dendrobium officinale, Dendrobium officinale, Dysoxylum malabaricum, Metrioptera roeselii, Myrmeciza exsul, Ochotona thibetana, Neosartorya fischeri, Nothofagus pumilio, Onychodactylus fischeri, Phoenicopterus roseus, Salvia officinalis L., Scylla paramamosain, Silene latifo, Sula sula, and Vulpes vulpes. These loci were cross-tested on the following species: Aspergillus giganteus, Colias pelidne, Colias interior, Colias meadii, Colias eurytheme, Coryphopterus lipernes, Coryphopterus glaucofrenum, Coryphopterus eidolon, Gnatholepis thompsoni, Elacatinus evelynae, Dendrobium loddigesii Dendrobium devonianum, Dysoxylum binectariferum, Nothofagus antarctica, Nothofagus dombeyii, Nothofagus nervosa, Nothofagus obliqua, Sula nebouxii, and Sula variegata. This article also documents the addition of 39 sequencing primer pairs and 15 allele specific primers or probes for Paralithodes camtschaticus. © 2010 Blackwell Publishing Ltd.
A'Hara, S W; Amouroux, P; Argo, Emily E; Avand-Faghih, A; Barat, Ashoktaru; Barbieri, Luiz; Bert, Theresa M; Blatrix, R; Blin, Aurélie; Bouktila, D; Broome, A; Burban, C; Capdevielle-Dulac, C; Casse, N; Chandra, Suresh; Cho, Kyung Jin; Cottrell, J E; Crawford, Charles R; Davis, Michelle C; Delatte, H; Desneux, Nicolas; Djieto-Lordon, C; Dubois, M P; El-Mergawy, R A A M; Gallardo-Escárate, C; Garcia, M; Gardiner, Mary M; Guillemaud, Thomas; Haye, P A; Hellemans, B; Hinrichsen, P; Jeon, Ji Hyun; Kerdelhué, C; Kharrat, I; Kim, Ki Hwan; Kim, Yong Yul; Kwan, Ye-Seul; Labbe, Ellen M; LaHood, Eric; Lee, Kyung Mi; Lee, Wan-Ok; Lee, Yat-Hung; Legoff, Isabelle; Li, H; Lin, Chung-Ping; Liu, S S; Liu, Y G; Long, D; Maes, G E; Magnoux, E; Mahanta, Prabin Chandra; Makni, H; Makni, M; Malausa, Thibaut; Matura, Rakesh; McKey, D; McMillen-Jackson, Anne L; Méndez, M A; Mezghani-Khemakhem, M; Michel, Andy P; Paul, Moran; Muriel-Cunha, Janice; Nibouche, S; Normand, F; Palkovacs, Eric P; Pande, Veena; Parmentier, K; Peccoud, J; Piatscheck, F; Puchulutegui, Cecilia; Ramos, R; Ravest, G; Richner, Heinz; Robbens, J; Rochat, D; Rousselet, J; Saladin, Verena; Sauve, M; Schlei, Ora; Schultz, Thomas F; Scobie, A R; Segovia, N I; Seyoum, Seifu; Silvain, J-F; Tabone, Elisabeth; Van Houdt, J K J; Vandamme, S G; Volckaert, F A M; Wenburg, John; Willis, Theodore V; Won, Yong-Jin; Ye, N H; Zhang, W; Zhang, Y X
2012-01-01
This article documents the addition of 299 microsatellite marker loci and nine pairs of single-nucleotide polymorphism (SNP) EPIC primers to the Molecular Ecology Resources (MER) Database. Loci were developed for the following species: Alosa pseudoharengus, Alosa aestivalis, Aphis spiraecola, Argopecten purpuratus, Coreoleuciscus splendidus, Garra gotyla, Hippodamia convergens, Linnaea borealis, Menippe mercenaria, Menippe adina, Parus major, Pinus densiflora, Portunus trituberculatus, Procontarinia mangiferae, Rhynchophorus ferrugineus, Schizothorax richardsonii, Scophthalmus rhombus, Tetraponera aethiops, Thaumetopoea pityocampa, Tuta absoluta and Ugni molinae. These loci were cross-tested on the following species: Barilius bendelisis, Chiromantes haematocheir, Eriocheir sinensis, Eucalyptus camaldulensis, Eucalyptus cladocalix, Eucalyptus globulus, Garra litaninsis vishwanath, Garra para lissorhynchus, Guindilla trinervis, Hemigrapsus sanguineus, Luma chequen. Guayaba, Myrceugenia colchagüensis, Myrceugenia correifolia, Myrceugenia exsucca, Parasesarma plicatum, Parus major, Portunus pelagicus, Psidium guayaba, Schizothorax richardsonii, Scophthalmus maximus, Tetraponera latifrons, Thaumetopoea bonjeani, Thaumetopoea ispartensis, Thaumetopoea libanotica, Thaumetopoea pinivora, Thaumetopoea pityocampa ena clade, Thaumetopoea solitaria, Thaumetopoea wilkinsoni and Tor putitora. This article also documents the addition of nine EPIC primer pairs for Euphaea decorata, Euphaea formosa, Euphaea ornata and Euphaea yayeyamana. © 2011 Blackwell Publishing Ltd.
Permanent Genetic Resources added to Molecular Ecology Resources Database 1 May 2009-31 July 2009.
Almany, Glenn R; DE Arruda, Maurício P; Arthofer, Wolfgang; Atallah, Z K; Beissinger, Steven R; Berumen, Michael L; Bogdanowicz, S M; Brown, S D; Bruford, Michael W; Burdine, C; Busch, Jeremiah W; Campbell, Nathan R; Carey, D; Carstens, Bryan C; Chu, K H; Cubeta, Marc A; Cuda, J P; Cui, Zhaoxia; Datnoff, L E; Dávila, J A; Davis, Emily S; Davis, R M; Diekmann, Onno E; Eizirik, Eduardo; Fargallo, J A; Fernandes, Fabiano; Fukuda, Hideo; Gale, L R; Gallagher, Elizabeth; Gao, Yongqiang; Girard, Philippe; Godhe, Anna; Gonçalves, Evonnildo C; Gouveia, Licinia; Grajczyk, Amber M; Grose, M J; Gu, Zhifeng; Halldén, Christer; Härnström, Karolina; Hemmingsen, Amanda H; Holmes, Gerald; Huang, C H; Huang, Chuan-Chin; Hudman, S P; Jones, Geoffrey P; Kanetis, Loukas; Karunasagar, Iddya; Karunasagar, Indrani; Keyghobadi, Nusha; Klosterman, S J; Klug, Page E; Koch, J; Koopman, Margaret M; Köppler, Kirsten; Koshimizu, Eriko; Krumböck, Susanne; Kubisiak, T; Landis, J B; Lasta, Mario L; Lee, Chow-Yang; Li, Qianqian; Li, Shou-Hsien; Lin, Rong-Chien; Liu, M; Liu, Na; Liu, W C; Liu, Yuan; Loiseau, A; Luan, Weisha; Maruthachalam, K K; McCormick, Helen M; Mellick, Rohan; Monnahan, P J; Morielle-Versute, Eliana; Murray, Tomás E; Narum, Shawn R; Neufeld, Katie; De Nova, P J G; Ojiambo, Peter S; Okamoto, Nobuaki; Othman, Ahmad Sofiman; Overholt, W A; Pardini, Renata; Paterson, Ian G; Patty, Olivia A; Paxton, Robert J; Planes, Serge; Porter, Carolyn; Pratchett, Morgan S; Püttker, Thomas; Rasic, Gordana; Rasool, Bilal; Rey, O; Riegler, Markus; Riehl, C; Roberts, John M K; Roberts, P D; Rochel, Elisabeth; Roe, Kevin J; Rossetto, Maurizio; Ruzzante, Daniel E; Sakamoto, Takashi; Saravanan, V; Sarturi, Cladinara Roberts; Schmidt, Anke; Schneider, Maria Paula Cruz; Schuler, Hannes; Serb, Jeanne M; Serrão, Ester T A; Shi, Yaohua; Silva, Artur; Sin, Y W; Sommer, Simone; Stauffer, Christian; Strüssmann, Carlos Augusto; Subbarao, K V; Syms, Craig; Tan, Feng; Tejedor, Eugenio Daniel; Thorrold, Simon R; Trigiano, Robert N; Trucco, María I; Tsuchiya-Jerep, Mirian Tieko Nunes; Vergara, P; Van De Vliet, Mirjam S; Wadl, Phillip A; Wang, Aimin; Wang, Hongxia; Wang, R X; Wang, Xinwang; Wang, Yan; Weeks, Andrew R; Wei, Fuwen; Werner, William J; Wiley, E O; Williams, D A; Wilkins, Richard J; Wisely, Samantha M; With, Kimberly A; Wu, Danhua; Yao, Cheng-Te; Yau, Cynthia; Yeap, Beng-Keok; Zhai, Bao-Ping; Zhan, Xiangjiang; Zhang, Guo-Yan; Zhang, S Y; Zhao, Ru; Zhu, Lifeng
2009-11-01
This article documents the addition of 512 microsatellite marker loci and nine pairs of Single Nucleotide Polymorphism (SNP) sequencing primers to the Molecular Ecology Resources Database. Loci were developed for the following species: Alcippe morrisonia morrisonia, Bashania fangiana, Bashania fargesii, Chaetodon vagabundus, Colletes floralis, Coluber constrictor flaviventris, Coptotermes gestroi, Crotophaga major, Cyprinella lutrensis, Danaus plexippus, Fagus grandifolia, Falco tinnunculus, Fletcherimyia fletcheri, Hydrilla verticillata, Laterallus jamaicensis coturniculus, Leavenworthia alabamica, Marmosops incanus, Miichthys miiuy, Nasua nasua, Noturus exilis, Odontesthes bonariensis, Quadrula fragosa, Pinctada maxima, Pseudaletia separata, Pseudoperonospora cubensis, Podocarpus elatus, Portunus trituberculatus, Rhagoletis cerasi, Rhinella schneideri, Sarracenia alata, Skeletonema marinoi, Sminthurus viridis, Syngnathus abaster, Uroteuthis (Photololigo) chinensis, Verticillium dahliae, Wasmannia auropunctata, and Zygochlamys patagonica. These loci were cross-tested on the following species: Chaetodon baronessa, Falco columbarius, Falco eleonorae, Falco naumanni, Falco peregrinus, Falco subbuteo, Didelphis aurita, Gracilinanus microtarsus, Marmosops paulensis, Monodelphis Americana, Odontesthes hatcheri, Podocarpus grayi, Podocarpus lawrencei, Podocarpus smithii, Portunus pelagicus, Syngnathus acus, Syngnathus typhle,Uroteuthis (Photololigo) edulis, Uroteuthis (Photololigo) duvauceli and Verticillium albo-atrum. This article also documents the addition of nine sequencing primer pairs and sixteen allele specific primers or probes for Oncorhynchus mykiss and Oncorhynchus tshawytscha; these primers and assays were cross-tested in both species. © 2009 Blackwell Publishing Ltd.
Jung, Won Yong; Lee, Sang Sook; Kim, Chul Wook; Kim, Hyun-Soon; Min, Sung Ran; Moon, Jae Sun; Kwon, Suk-Yoon; Jeon, Jae-Heung; Cho, Hye Sun
2014-01-01
Jerusalem artichoke (Helianthus tuberosus L.) has long been cultivated as a vegetable and as a source of fructans (inulin) for pharmaceutical applications in diabetes and obesity prevention. However, transcriptomic and genomic data for Jerusalem artichoke remain scarce. In this study, Illumina RNA sequencing (RNA-Seq) was performed on samples from Jerusalem artichoke leaves, roots, stems and two different tuber tissues (early and late tuber development). Data were used for de novo assembly and characterization of the transcriptome. In total 206,215,632 paired-end reads were generated. These were assembled into 66,322 loci with 272,548 transcripts. Loci were annotated by querying against the NCBI non-redundant, Phytozome and UniProt databases, and 40,215 loci were homologous to existing database sequences. Gene Ontology terms were assigned to 19,848 loci, 15,434 loci were matched to 25 Clusters of Eukaryotic Orthologous Groups classifications, and 11,844 loci were classified into 142 Kyoto Encyclopedia of Genes and Genomes pathways. The assembled loci also contained 10,778 potential simple sequence repeats. The newly assembled transcriptome was used to identify loci with tissue-specific differential expression patterns. In total, 670 loci exhibited tissue-specific expression, and a subset of these were confirmed using RT-PCR and qRT-PCR. Gene expression related to inulin biosynthesis in tuber tissue was also investigated. Exsiting genetic and genomic data for H. tuberosus are scarce. The sequence resources developed in this study will enable the analysis of thousands of transcripts and will thus accelerate marker-assisted breeding studies and studies of inulin biosynthesis in Jerusalem artichoke.
Plant rDNA database: update and new features.
Garcia, Sònia; Gálvez, Francisco; Gras, Airy; Kovařík, Aleš; Garnatje, Teresa
2014-01-01
The Plant rDNA database (www.plantrdnadatabase.com) is an open access online resource providing detailed information on numbers, structures and positions of 5S and 18S-5.8S-26S (35S) ribosomal DNA loci. The data have been obtained from >600 publications on plant molecular cytogenetics, mostly based on fluorescent in situ hybridization (FISH). This edition of the database contains information on 1609 species derived from 2839 records, which means an expansion of 55.76 and 94.45%, respectively. It holds the data for angiosperms, gymnosperms, bryophytes and pteridophytes available as of June 2013. Information from publications reporting data for a single rDNA (either 5S or 35S alone) and annotation regarding transcriptional activity of 35S loci now appears in the database. Preliminary analyses suggest greater variability in the number of rDNA loci in gymnosperms than in angiosperms. New applications provide ideograms of the species showing the positions of rDNA loci as well as a visual representation of their genome sizes. We have also introduced other features to boost the usability of the Web interface, such as an application for convenient data export and a new section with rDNA-FISH-related information (mostly detailing protocols and reagents). In addition, we upgraded and/or proofread tabs and links and modified the website for a more dynamic appearance. This manuscript provides a synopsis of these changes and developments. http://www.plantrdnadatabase.com. © The Author(s) 2014. Published by Oxford University Press.
On-line resources for bacterial micro-evolution studies using MLVA or CRISPR typing.
Grissa, Ibtissem; Bouchon, Patrick; Pourcel, Christine; Vergnaud, Gilles
2008-04-01
The control of bacterial pathogens requires the development of tools allowing the precise identification of strains at the subspecies level. It is now widely accepted that these tools will need to be DNA-based assays (in contrast to identification at the species level, where biochemical based assays are still widely used, even though very powerful 16S DNA sequence databases exist). Typing assays need to be cheap and amenable to the designing of international databases. The success of such subspecies typing tools will eventually be measured by the size of the associated reference databases accessible over the internet. Three methods have shown some potential in this direction, the so-called spoligotyping assay (Mycobacterium tuberculosis, 40,000 entries database), Multiple Loci Sequence Typing (MLST; up to a few thousands entries for the more than 20 bacterial species), and more recently Multiple Loci VNTR Analysis (MLVA; up to a few hundred entries, assays available for more than 20 pathogens). In the present report we will review the current status of the tools and resources we have developed along the past seven years to help in the setting-up or the use of MLVA assays or lately for analysing Clustered Regularly Interspaced Short Palindromic Repeats called CRISPRs which are the basis for spoligotyping assays.
Phytophthora database 2.0: update and future direction.
Park, Bongsoo; Martin, Frank; Geiser, David M; Kim, Hye-Seon; Mansfield, Michele A; Nikolaeva, Ekaterina; Park, Sook-Young; Coffey, Michael D; Russo, Joseph; Kim, Seong H; Balci, Yilmaz; Abad, Gloria; Burgess, Treena; Grünwald, Niklaus J; Cheong, Kyeongchae; Choi, Jaeyoung; Lee, Yong-Hwan; Kang, Seogchan
2013-12-01
The online community resource Phytophthora database (PD) was developed to support accurate and rapid identification of Phytophthora and to help characterize and catalog the diversity and evolutionary relationships within the genus. Since its release in 2008, the sequence database has grown to cover 1 to 12 loci for ≈2,600 isolates (representing 138 described and provisional species). Sequences of multiple mitochondrial loci were added to complement nuclear loci-based phylogenetic analyses and diagnostic tool development. Key characteristics of most newly described and provisional species have been summarized. Other additions to improve the PD functionality include: (i) geographic information system tools that enable users to visualize the geographic origins of chosen isolates on a global-scale map, (ii) a tool for comparing genetic similarity between isolates via microsatellite markers to support population genetic studies, (iii) a comprehensive review of molecular diagnostics tools and relevant references, (iv) sequence alignments used to develop polymerase chain reaction-based diagnostics tools to support their utilization and new diagnostic tool development, and (v) an online community forum for sharing and preserving experience and knowledge accumulated in the global Phytophthora community. Here we present how these improvements can support users and discuss the PD's future direction.
Yang, Tsun-Po; Beazley, Claude; Montgomery, Stephen B; Dimas, Antigone S; Gutierrez-Arcelus, Maria; Stranger, Barbara E; Deloukas, Panos; Dermitzakis, Emmanouil T
2010-10-01
Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols. http://www.sanger.ac.uk/resources/software/genevar.
Gramene database in 2010: updates and extensions.
Youens-Clark, Ken; Buckler, Ed; Casstevens, Terry; Chen, Charles; Declerck, Genevieve; Derwent, Paul; Dharmawardhana, Palitha; Jaiswal, Pankaj; Kersey, Paul; Karthikeyan, A S; Lu, Jerry; McCouch, Susan R; Ren, Liya; Spooner, William; Stein, Joshua C; Thomason, Jim; Wei, Sharon; Ware, Doreen
2011-01-01
Now in its 10th year, the Gramene database (http://www.gramene.org) has grown from its primary focus on rice, the first fully-sequenced grass genome, to become a resource for major model and crop plants including Arabidopsis, Brachypodium, maize, sorghum, poplar and grape in addition to several species of rice. Gramene began with the addition of an Ensembl genome browser and has expanded in the last decade to become a robust resource for plant genomics hosting a wide array of data sets including quantitative trait loci (QTL), metabolic pathways, genetic diversity, genes, proteins, germplasm, literature, ontologies and a fully-structured markers and sequences database integrated with genome browsers and maps from various published studies (genetic, physical, bin, etc.). In addition, Gramene now hosts a variety of web services including a Distributed Annotation Server (DAS), BLAST and a public MySQL database. Twice a year, Gramene releases a major build of the database and makes interim releases to correct errors or to make important updates to software and/or data.
Yang, Tsun-Po; Beazley, Claude; Montgomery, Stephen B.; Dimas, Antigone S.; Gutierrez-Arcelus, Maria; Stranger, Barbara E.; Deloukas, Panos; Dermitzakis, Emmanouil T.
2010-01-01
Summary: Genevar (GENe Expression VARiation) is a database and Java tool designed to integrate multiple datasets, and provides analysis and visualization of associations between sequence variation and gene expression. Genevar allows researchers to investigate expression quantitative trait loci (eQTL) associations within a gene locus of interest in real time. The database and application can be installed on a standard computer in database mode and, in addition, on a server to share discoveries among affiliations or the broader community over the Internet via web services protocols. Availability: http://www.sanger.ac.uk/resources/software/genevar Contact: emmanouil.dermitzakis@unige.ch PMID:20702402
Meyer, Michael J; Geske, Philip; Yu, Haiyuan
2016-05-15
Biological sequence databases are integral to efforts to characterize and understand biological molecules and share biological data. However, when analyzing these data, scientists are often left holding disparate biological currency-molecular identifiers from different databases. For downstream applications that require converting the identifiers themselves, there are many resources available, but analyzing associated loci and variants can be cumbersome if data is not given in a form amenable to particular analyses. Here we present BISQUE, a web server and customizable command-line tool for converting molecular identifiers and their contained loci and variants between different database conventions. BISQUE uses a graph traversal algorithm to generalize the conversion process for residues in the human genome, genes, transcripts and proteins, allowing for conversion across classes of molecules and in all directions through an intuitive web interface and a URL-based web service. BISQUE is freely available via the web using any major web browser (http://bisque.yulab.org/). Source code is available in a public GitHub repository (https://github.com/hyulab/BISQUE). haiyuan.yu@cornell.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
CGDSNPdb: a database resource for error-checked and imputed mouse SNPs.
Hutchins, Lucie N; Ding, Yueming; Szatkiewicz, Jin P; Von Smith, Randy; Yang, Hyuna; de Villena, Fernando Pardo-Manuel; Churchill, Gary A; Graber, Joel H
2010-07-06
The Center for Genome Dynamics Single Nucleotide Polymorphism Database (CGDSNPdb) is an open-source value-added database with more than nine million mouse single nucleotide polymorphisms (SNPs), drawn from multiple sources, with genotypes assigned to multiple inbred strains of laboratory mice. All SNPs are checked for accuracy and annotated for properties specific to the SNP as well as those implied by changes to overlapping protein-coding genes. CGDSNPdb serves as the primary interface to two unique data sets, the 'imputed genotype resource' in which a Hidden Markov Model was used to assess local haplotypes and the most probable base assignment at several million genomic loci in tens of strains of mice, and the Affymetrix Mouse Diversity Genotyping Array, a high density microarray with over 600,000 SNPs and over 900,000 invariant genomic probes. CGDSNPdb is accessible online through either a web-based query tool or a MySQL public login. Database URL: http://cgd.jax.org/cgdsnpdb/
e-GRASP: an integrated evolutionary and GRASP resource for exploring disease associations.
Karim, Sajjad; NourEldin, Hend Fakhri; Abusamra, Heba; Salem, Nada; Alhathli, Elham; Dudley, Joel; Sanderford, Max; Scheinfeldt, Laura B; Chaudhary, Adeel G; Al-Qahtani, Mohammed H; Kumar, Sudhir
2016-10-17
Genome-wide association studies (GWAS) have become a mainstay of biological research concerned with discovering genetic variation linked to phenotypic traits and diseases. Both discrete and continuous traits can be analyzed in GWAS to discover associations between single nucleotide polymorphisms (SNPs) and traits of interest. Associations are typically determined by estimating the significance of the statistical relationship between genetic loci and the given trait. However, the prioritization of bona fide, reproducible genetic associations from GWAS results remains a central challenge in identifying genomic loci underlying common complex diseases. Evolutionary-aware meta-analysis of the growing GWAS literature is one way to address this challenge and to advance from association to causation in the discovery of genotype-phenotype relationships. We have created an evolutionary GWAS resource to enable in-depth query and exploration of published GWAS results. This resource uses the publically available GWAS results annotated in the GRASP2 database. The GRASP2 database includes results from 2082 studies, 177 broad phenotype categories, and ~8.87 million SNP-phenotype associations. For each SNP in e-GRASP, we present information from the GRASP2 database for convenience as well as evolutionary information (e.g., rate and timespan). Users can, therefore, identify not only SNPs with highly significant phenotype-association P-values, but also SNPs that are highly replicated and/or occur at evolutionarily conserved sites that are likely to be functionally important. Additionally, we provide an evolutionary-adjusted SNP association ranking (E-rank) that uses cross-species evolutionary conservation scores and population allele frequencies to transform P-values in an effort to enhance the discovery of SNPs with a greater probability of biologically meaningful disease associations. By adding an evolutionary dimension to the GWAS results available in the GRASP2 database, our e-GRASP resource will enable a more effective exploration of SNPs not only by the statistical significance of trait associations, but also by the number of studies in which associations have been replicated, and the evolutionary context of the associated mutations. Therefore, e-GRASP will be a valuable resource for aiding researchers in the identification of bona fide, reproducible genetic associations from GWAS results. This resource is freely available at http://www.mypeg.info/egrasp .
NEIBank: Genomics and bioinformatics resources for vision research
Peterson, Katherine; Gao, James; Buchoff, Patee; Jaworski, Cynthia; Bowes-Rickman, Catherine; Ebright, Jessica N.; Hauser, Michael A.; Hoover, David
2008-01-01
NEIBank is an integrated resource for genomics and bioinformatics in vision research. It includes expressed sequence tag (EST) data and sequence-verified cDNA clones for multiple eye tissues of several species, web-based access to human eye-specific SAGE data through EyeSAGE, and comprehensive, annotated databases of known human eye disease genes and candidate disease gene loci. All expression- and disease-related data are integrated in EyeBrowse, an eye-centric genome browser. NEIBank provides a comprehensive overview of current knowledge of the transcriptional repertoires of eye tissues and their relation to pathology. PMID:18648525
Baum, Thierry-Pascal; Hierle, Vivien; Pasqual, Nicolas; Bellahcene, Fatena; Chaume, Denys; Lefranc, Marie-Paule; Jouvin-Marche, Evelyne; Marche, Patrice Noël; Demongeot, Jacques
2006-01-01
Background Adaptative immune repertoire diversity in vertebrate species is generated by recombination of variable (V), diversity (D) and joining (J) genes in the immunoglobulin (IG) loci of B lymphocytes and in the T cell receptor (TR) loci of T lymphocytes. These V-J and V-D-J gene rearrangements at the DNA level involve recombination signal sequences (RSS). Whereas many data exist, they are scattered in non specialized resources with different nomenclatures (eg. flat files) and are difficult to extract. Description IMGT/GeneInfo is an online information system that provides, through a user-friendly interface, exhaustive information resulting from the complex mechanisms of T cell receptor V-J and V-D-J recombinations. T cells comprise two populations which express the αβ and γδ TR, respectively. The first version of the system dealt with the Homo sapiens and Mus musculus TRA and TRB loci whose gene rearrangements allow the synthesis of the αβ TR chains. In this paper, we present the second version of IMGT/GeneInfo where we complete the database for the Homo sapiens and Mus musculus TRG and TRD loci along with the introduction of a quality control procedure for existing and new data. We also include new functionalities to the four loci analysis, giving, to date, a very informative tool which allows to work on V(D)J genes of all TR loci in both human and mouse species. IMGT/GeneInfo provides more than 59,000 rearrangement combinations with a full gene description which is freely available at . Conclusion IMGT/GeneInfo allows all TR information sequences to be in the same spot, and are now available within two computer-mouse clicks. This is useful for biologists and bioinformaticians for the study of T lymphocyte V(D)J gene rearrangements and their applications in immune response analysis. PMID:16640788
Cannistraci, Carlo V; Ogorevc, Jernej; Zorc, Minja; Ravasi, Timothy; Dovc, Peter; Kunej, Tanja
2013-02-14
Cryptorchidism is the most frequent congenital disorder in male children; however the genetic causes of cryptorchidism remain poorly investigated. Comparative integratomics combined with systems biology approach was employed to elucidate genetic factors and molecular pathways underlying testis descent. Literature mining was performed to collect genomic loci associated with cryptorchidism in seven mammalian species. Information regarding the collected candidate genes was stored in MySQL relational database. Genomic view of the loci was presented using Flash GViewer web tool (http://gmod.org/wiki/Flashgviewer/). DAVID Bioinformatics Resources 6.7 was used for pathway enrichment analysis. Cytoscape plug-in PiNGO 1.11 was employed for protein-network-based prediction of novel candidate genes. Relevant protein-protein interactions were confirmed and visualized using the STRING database (version 9.0). The developed cryptorchidism gene atlas includes 217 candidate loci (genes, regions involved in chromosomal mutations, and copy number variations) identified at the genomic, transcriptomic, and proteomic level. Human orthologs of the collected candidate loci were presented using a genomic map viewer. The cryptorchidism gene atlas is freely available online: http://www.integratomics-time.com/cryptorchidism/. Pathway analysis suggested the presence of twelve enriched pathways associated with the list of 179 literature-derived candidate genes. Additionally, a list of 43 network-predicted novel candidate genes was significantly associated with four enriched pathways. Joint pathway analysis of the collected and predicted candidate genes revealed the pivotal importance of the muscle-contraction pathway in cryptorchidism and evidence for genomic associations with cardiomyopathy pathways in RASopathies. The developed gene atlas represents an important resource for the scientific community researching genetics of cryptorchidism. The collected data will further facilitate development of novel genetic markers and could be of interest for functional studies in animals and human. The proposed network-based systems biology approach elucidates molecular mechanisms underlying co-presence of cryptorchidism and cardiomyopathy in RASopathies. Such approach could also aid in molecular explanation of co-presence of diverse and apparently unrelated clinical manifestations in other syndromes.
Towards pathogenomics: a web-based resource for pathogenicity islands
Yoon, Sung Ho; Park, Young-Kyu; Lee, Soohyun; Choi, Doil; Oh, Tae Kwang; Hur, Cheol-Goo; Kim, Jihyun F.
2007-01-01
Pathogenicity islands (PAIs) are genetic elements whose products are essential to the process of disease development. They have been horizontally (laterally) transferred from other microbes and are important in evolution of pathogenesis. In this study, a comprehensive database and search engines specialized for PAIs were established. The pathogenicity island database (PAIDB) is a comprehensive relational database of all the reported PAIs and potential PAI regions which were predicted by a method that combines feature-based analysis and similarity-based analysis. Also, using the PAI Finder search application, a multi-sequence query can be analyzed onsite for the presence of potential PAIs. As of April 2006, PAIDB contains 112 types of PAIs and 889 GenBank accessions containing either partial or all PAI loci previously reported in the literature, which are present in 497 strains of pathogenic bacteria. The database also offers 310 candidate PAIs predicted from 118 sequenced prokaryotic genomes. With the increasing number of prokaryotic genomes without functional inference and sequenced genetic regions of suspected involvement in diseases, this web-based, user-friendly resource has the potential to be of significant use in pathogenomics. PAIDB is freely accessible at . PMID:17090594
Moore, Jean-Sébastien; Bourret, Vincent; Dionne, Mélanie; Bradbury, Ian; O'Reilly, Patrick; Kent, Matthew; Chaput, Gérald; Bernatchez, Louis
2014-12-01
Anadromous Atlantic salmon (Salmo salar) is a species of major conservation and management concern in North America, where population abundance has been declining over the past 30 years. Effective conservation actions require the delineation of conservation units to appropriately reflect the spatial scale of intraspecific variation and local adaptation. Towards this goal, we used the most comprehensive genetic and genomic database for Atlantic salmon to date, covering the entire North American range of the species. The database included microsatellite data from 9142 individuals from 149 sampling locations and data from a medium-density SNP array providing genotypes for >3000 SNPs for 50 sampling locations. We used neutral and putatively selected loci to integrate adaptive information in the definition of conservation units. Bayesian clustering with the microsatellite data set and with neutral SNPs identified regional groupings largely consistent with previously published regional assessments. The use of outlier SNPs did not result in major differences in the regional groupings, suggesting that neutral markers can reflect the geographic scale of local adaptation despite not being under selection. We also performed assignment tests to compare power obtained from microsatellites, neutral SNPs and outlier SNPs. Using SNP data substantially improved power compared to microsatellites, and an assignment success of 97% to the population of origin and of 100% to the region of origin was achieved when all SNP loci were used. Using outlier SNPs only resulted in minor improvements to assignment success to the population of origin but improved regional assignment. We discuss the implications of these new genetic resources for the conservation and management of Atlantic salmon in North America. © 2014 John Wiley & Sons Ltd.
Mapping Genetic Variants Associated with Beta-Adrenergic Responses in Inbred Mice
Hersch, Micha; Peter, Bastian; Kang, Hyun Min; Schüpfer, Fanny; Abriel, Hugues; Pedrazzini, Thierry; Eskin, Eleazar; Beckmann, Jacques S.
2012-01-01
β-blockers and β-agonists are primarily used to treat cardiovascular diseases. Inter-individual variability in response to both drug classes is well recognized, yet the identity and relative contribution of the genetic players involved are poorly understood. This work is the first genome-wide association study (GWAS) addressing the values and susceptibility of cardiovascular-related traits to a selective β 1-blocker, Atenolol (ate), and a β-agonist, Isoproterenol (iso). The phenotypic dataset consisted of 27 highly heritable traits, each measured across 22 inbred mouse strains and four pharmacological conditions. The genotypic panel comprised 79922 informative SNPs of the mouse HapMap resource. Associations were mapped by Efficient Mixed Model Association (EMMA), a method that corrects for the population structure and genetic relatedness of the various strains. A total of 205 separate genome-wide scans were analyzed. The most significant hits include three candidate loci related to cardiac and body weight, three loci for electrocardiographic (ECG) values, two loci for the susceptibility of atrial weight index to iso, four loci for the susceptibility of systolic blood pressure (SBP) to perturbations of the β-adrenergic system, and one locus for the responsiveness of QTc (p<10−8). An additional 60 loci were suggestive for one or the other of the 27 traits, while 46 others were suggestive for one or the other drug effects (p<10−6). Most hits tagged unexpected regions, yet at least two loci for the susceptibility of SBP to β-adrenergic drugs pointed at members of the hypothalamic-pituitary-thyroid axis. Loci for cardiac-related traits were preferentially enriched in genes expressed in the heart, while 23% of the testable loci were replicated with datasets of the Mouse Phenome Database (MPD). Altogether these data and validation tests indicate that the mapped loci are relevant to the traits and responses studied. PMID:22859963
Tanabe, Akifumi S; Toju, Hirokazu
2013-01-01
Taxonomic identification of biological specimens based on DNA sequence information (a.k.a. DNA barcoding) is becoming increasingly common in biodiversity science. Although several methods have been proposed, many of them are not universally applicable due to the need for prerequisite phylogenetic/machine-learning analyses, the need for huge computational resources, or the lack of a firm theoretical background. Here, we propose two new computational methods of DNA barcoding and show a benchmark for bacterial/archeal 16S, animal COX1, fungal internal transcribed spacer, and three plant chloroplast (rbcL, matK, and trnH-psbA) barcode loci that can be used to compare the performance of existing and new methods. The benchmark was performed under two alternative situations: query sequences were available in the corresponding reference sequence databases in one, but were not available in the other. In the former situation, the commonly used "1-nearest-neighbor" (1-NN) method, which assigns the taxonomic information of the most similar sequences in a reference database (i.e., BLAST-top-hit reference sequence) to a query, displays the highest rate and highest precision of successful taxonomic identification. However, in the latter situation, the 1-NN method produced extremely high rates of misidentification for all the barcode loci examined. In contrast, one of our new methods, the query-centric auto-k-nearest-neighbor (QCauto) method, consistently produced low rates of misidentification for all the loci examined in both situations. These results indicate that the 1-NN method is most suitable if the reference sequences of all potentially observable species are available in databases; otherwise, the QCauto method returns the most reliable identification results. The benchmark results also indicated that the taxon coverage of reference sequences is far from complete for genus or species level identification in all the barcode loci examined. Therefore, we need to accelerate the registration of reference barcode sequences to apply high-throughput DNA barcoding to genus or species level identification in biodiversity research.
Tanabe, Akifumi S.; Toju, Hirokazu
2013-01-01
Taxonomic identification of biological specimens based on DNA sequence information (a.k.a. DNA barcoding) is becoming increasingly common in biodiversity science. Although several methods have been proposed, many of them are not universally applicable due to the need for prerequisite phylogenetic/machine-learning analyses, the need for huge computational resources, or the lack of a firm theoretical background. Here, we propose two new computational methods of DNA barcoding and show a benchmark for bacterial/archeal 16S, animal COX1, fungal internal transcribed spacer, and three plant chloroplast (rbcL, matK, and trnH-psbA) barcode loci that can be used to compare the performance of existing and new methods. The benchmark was performed under two alternative situations: query sequences were available in the corresponding reference sequence databases in one, but were not available in the other. In the former situation, the commonly used “1-nearest-neighbor” (1-NN) method, which assigns the taxonomic information of the most similar sequences in a reference database (i.e., BLAST-top-hit reference sequence) to a query, displays the highest rate and highest precision of successful taxonomic identification. However, in the latter situation, the 1-NN method produced extremely high rates of misidentification for all the barcode loci examined. In contrast, one of our new methods, the query-centric auto-k-nearest-neighbor (QCauto) method, consistently produced low rates of misidentification for all the loci examined in both situations. These results indicate that the 1-NN method is most suitable if the reference sequences of all potentially observable species are available in databases; otherwise, the QCauto method returns the most reliable identification results. The benchmark results also indicated that the taxon coverage of reference sequences is far from complete for genus or species level identification in all the barcode loci examined. Therefore, we need to accelerate the registration of reference barcode sequences to apply high-throughput DNA barcoding to genus or species level identification in biodiversity research. PMID:24204702
RatMap--rat genome tools and data.
Petersen, Greta; Johnson, Per; Andersson, Lars; Klinga-Levan, Karin; Gómez-Fabre, Pedro M; Ståhl, Fredrik
2005-01-01
The rat genome database RatMap (http://ratmap.org or http://ratmap.gen.gu.se) has been one of the main resources for rat genome information since 1994. The database is maintained by CMB-Genetics at Goteborg University in Sweden and provides information on rat genes, polymorphic rat DNA-markers and rat quantitative trait loci (QTLs), all curated at RatMap. The database is under the supervision of the Rat Gene and Nomenclature Committee (RGNC); thus much attention is paid to rat gene nomenclature. RatMap presents information on rat idiograms, karyotypes and provides a unified presentation of the rat genome sequence and integrated rat linkage maps. A set of tools is also available to facilitate the identification and characterization of rat QTLs, as well as the estimation of exon/intron number and sizes in individual rat genes. Furthermore, comparative gene maps of rat in regard to mouse and human are provided.
RatMap—rat genome tools and data
Petersen, Greta; Johnson, Per; Andersson, Lars; Klinga-Levan, Karin; Gómez-Fabre, Pedro M.; Ståhl, Fredrik
2005-01-01
The rat genome database RatMap (http://ratmap.org or http://ratmap.gen.gu.se) has been one of the main resources for rat genome information since 1994. The database is maintained by CMB–Genetics at Göteborg University in Sweden and provides information on rat genes, polymorphic rat DNA-markers and rat quantitative trait loci (QTLs), all curated at RatMap. The database is under the supervision of the Rat Gene and Nomenclature Committee (RGNC); thus much attention is paid to rat gene nomenclature. RatMap presents information on rat idiograms, karyotypes and provides a unified presentation of the rat genome sequence and integrated rat linkage maps. A set of tools is also available to facilitate the identification and characterization of rat QTLs, as well as the estimation of exon/intron number and sizes in individual rat genes. Furthermore, comparative gene maps of rat in regard to mouse and human are provided. PMID:15608244
Guo, Rui; Landis, Jacob B.; Moore, Michael J.; Meng, Aiping; Jian, Shuguang; Yao, Xiaohong; Wang, Hengchang
2017-01-01
Actinidia eriantha Benth. is a diploid perennial woody vine native to China and is recognized as a valuable species for commercial kiwifruit improvement with high levels of ascorbic acid as well as having been used in traditional Chinese medicine. Due to the lack of genomic resources for the species, microsatellite markers for population genetics studies are scarce. In this study, RNASeq was conducted on fruit tissue of A. eriantha, yielding 5,678,129 reads with a total output of 3.41 Gb. De novo assembly yielded 69,783 non-redundant unigenes (41.3 Mb), of which 21,730 were annotated using protein databases. A total of 8,658 EST-SSR loci were identified in 7,495 unigene sequences, for which primer pairs were successfully designed for 3,842 loci (44.4%). Among these, 183 primer pairs were assayed for PCR amplification, yielding 69 with detectable polymorphism in A. eriantha. Additionally, 61 of the 69 polymorphic loci could be successfully amplified in at least one other Actinidia species. Of these, 14 polymorphic loci (mean NA = 6.07 ± 2.30) were randomly selected for assessing levels of genetic diversity and population structure within A. eriantha. Finally, a neighbor-joining tree and Bayesian clustering analysis showed distinct clustering into two groups (K = 2), agreeing with the geographical distributions of these populations. Overall, our results will facilitate further studies of genetic diversity within A. eriantha and will aid in discriminating outlier loci involved in local adaptation. PMID:28890721
Evolutionary trends in animal ribosomal DNA loci: introduction to a new online database.
Sochorová, Jana; Garcia, Sònia; Gálvez, Francisco; Symonová, Radka; Kovařík, Aleš
2018-03-01
Ribosomal DNA (rDNA) loci encoding 5S and 45S (18S-5.8S-28S) rRNAs are important components of eukaryotic chromosomes. Here, we set up the animal rDNA database containing cytogenetic information about these loci in 1343 animal species (264 families) collected from 542 publications. The data are based on in situ hybridisation studies (both radioactive and fluorescent) carried out in major groups of vertebrates (fish, reptiles, amphibians, birds, and mammals) and invertebrates (mostly insects and mollusks). The database is accessible online at www.animalrdnadatabase.com . The median number of 45S and 5S sites was close to two per diploid chromosome set for both rDNAs despite large variation (1-74 for 5S and 1-54 for 45S sites). No significant correlation between the number of 5S and 45S rDNA loci was observed, suggesting that their distribution and amplification across the chromosomes follow independent evolutionary trajectories. Each group, irrespective of taxonomic classification, contained rDNA sites at any chromosome location. However, the distal and pericentromeric positions were the most prevalent (> 75% karyotypes) for 45S loci, while the position of 5S loci was more variable. We also examined potential relationships between molecular attributes of rDNA (homogenisation and expression) and cytogenetic parameters such as rDNA positions, chromosome number, and morphology.
An imputed genotype resource for the laboratory mouse
Szatkiewicz, Jin P.; Beane, Glen L.; Ding, Yueming; Hutchins, Lucie; de Villena, Fernando Pardo-Manuel; Churchill, Gary A.
2009-01-01
We have created a high-density SNP resource encompassing 7.87 million polymorphic loci across 49 inbred mouse strains of the laboratory mouse by combining data available from public databases and training a hidden Markov model to impute missing genotypes in the combined data. The strong linkage disequilibrium found in dense sets of SNP markers in the laboratory mouse provides the basis for accurate imputation. Using genotypes from eight independent SNP resources, we empirically validated the quality of the imputed genotypes and demonstrate that they are highly reliable for most inbred strains. The imputed SNP resource will be useful for studies of natural variation and complex traits. It will facilitate association study designs by providing high density SNP genotypes for large numbers of mouse strains. We anticipate that this resource will continue to evolve as new genotype data become available for laboratory mouse strains. The data are available for bulk download or query at http://cgd.jax.org/. PMID:18301946
Franke, Lude; Bakel, Harm van; Fokkens, Like; de Jong, Edwin D.; Egmont-Petersen, Michael; Wijmenga, Cisca
2006-01-01
Most common genetic disorders have a complex inheritance and may result from variants in many genes, each contributing only weak effects to the disease. Pinpointing these disease genes within the myriad of susceptibility loci identified in linkage studies is difficult because these loci may contain hundreds of genes. However, in any disorder, most of the disease genes will be involved in only a few different molecular pathways. If we know something about the relationships between the genes, we can assess whether some genes (which may reside in different loci) functionally interact with each other, indicating a joint basis for the disease etiology. There are various repositories of information on pathway relationships. To consolidate this information, we developed a functional human gene network that integrates information on genes and the functional relationships between genes, based on data from the Kyoto Encyclopedia of Genes and Genomes, the Biomolecular Interaction Network Database, Reactome, the Human Protein Reference Database, the Gene Ontology database, predicted protein-protein interactions, human yeast two-hybrid interactions, and microarray coexpressions. We applied this network to interrelate positional candidate genes from different disease loci and then tested 96 heritable disorders for which the Online Mendelian Inheritance in Man database reported at least three disease genes. Artificial susceptibility loci, each containing 100 genes, were constructed around each disease gene, and we used the network to rank these genes on the basis of their functional interactions. By following up the top five genes per artificial locus, we were able to detect at least one known disease gene in 54% of the loci studied, representing a 2.8-fold increase over random selection. This suggests that our method can significantly reduce the cost and effort of pinpointing true disease genes in analyses of disorders for which numerous loci have been reported but for which most of the genes are unknown. PMID:16685651
SZDB: A Database for Schizophrenia Genetic Research
Wu, Yong; Yao, Yong-Gang
2017-01-01
Abstract Schizophrenia (SZ) is a debilitating brain disorder with a complex genetic architecture. Genetic studies, especially recent genome-wide association studies (GWAS), have identified multiple variants (loci) conferring risk to SZ. However, how to efficiently extract meaningful biological information from bulk genetic findings of SZ remains a major challenge. There is a pressing need to integrate multiple layers of data from various sources, eg, genetic findings from GWAS, copy number variations (CNVs), association and linkage studies, gene expression, protein–protein interaction (PPI), co-expression, expression quantitative trait loci (eQTL), and Encyclopedia of DNA Elements (ENCODE) data, to provide a comprehensive resource to facilitate the translation of genetic findings into SZ molecular diagnosis and mechanism study. Here we developed the SZDB database (http://www.szdb.org/), a comprehensive resource for SZ research. SZ genetic data, gene expression data, network-based data, brain eQTL data, and SNP function annotation information were systematically extracted, curated and deposited in SZDB. In-depth analyses and systematic integration were performed to identify top prioritized SZ genes and enriched pathways. Multiple types of data from various layers of SZ research were systematically integrated and deposited in SZDB. In-depth data analyses and integration identified top prioritized SZ genes and enriched pathways. We further showed that genes implicated in SZ are highly co-expressed in human brain and proteins encoded by the prioritized SZ risk genes are significantly interacted. The user-friendly SZDB provides high-confidence candidate variants and genes for further functional characterization. More important, SZDB provides convenient online tools for data search and browse, data integration, and customized data analyses. PMID:27451428
Cormier, Alexandre; Avia, Komlan; Sterck, Lieven; Derrien, Thomas; Wucher, Valentin; Andres, Gwendoline; Monsoor, Misharl; Godfroy, Olivier; Lipinska, Agnieszka; Perrineau, Marie-Mathilde; Van De Peer, Yves; Hitte, Christophe; Corre, Erwan; Coelho, Susana M; Cock, J Mark
2017-04-01
The genome of the filamentous brown alga Ectocarpus was the first to be completely sequenced from within the brown algal group and has served as a key reference genome both for this lineage and for the stramenopiles. We present a complete structural and functional reannotation of the Ectocarpus genome. The large-scale assembly of the Ectocarpus genome was significantly improved and genome-wide gene re-annotation using extensive RNA-seq data improved the structure of 11 108 existing protein-coding genes and added 2030 new loci. A genome-wide analysis of splicing isoforms identified an average of 1.6 transcripts per locus. A large number of previously undescribed noncoding genes were identified and annotated, including 717 loci that produce long noncoding RNAs. Conservation of lncRNAs between Ectocarpus and another brown alga, the kelp Saccharina japonica, suggests that at least a proportion of these loci serve a function. Finally, a large collection of single nucleotide polymorphism-based markers was developed for genetic analyses. These resources are available through an updated and improved genome database. This study significantly improves the utility of the Ectocarpus genome as a high-quality reference for the study of many important aspects of brown algal biology and as a reference for genomic analyses across the stramenopiles. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Du, Lianming; Li, Wujiao; Fan, Zhenxin; Shen, Fujun; Yang, Mingyu; Wang, Zili; Jian, Zuoyi; Hou, Rong; Yue, Bisong; Zhang, Xiuyue
2015-07-01
The giant panda (Ailuropoda melanoleuca) is one of the most famous flagship species for conservation, and its draft genome has recently been assembled. However, the transcriptome is not yet available. In this study, the blood transcriptomes of three pandas were characterized and about 160 million sequencing reads were generated using Illumina HiSeq 2000 paired-end sequencing technology. The assembly yielded 92 598 transcripts with an average length of 1626 bp and N50 length of 2842 bp. Based on a sequence similarity search against nonredundant (nr) protein database, a total of 38 522 (41.6%) transcripts were annotated. Of these annotated transcripts, 25 142 and 8272 transcripts were assigned to gene ontology terms and clusters of orthologous group, respectively. A search against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG) indicated that 9098 (9.83%) transcripts mapped to 324 KEGG pathways, and the best represented functional categories of pathways were signal transduction and immune system. We have also identified 23 460 microsatellites, 43 560 SNPs as well as 21 456 alternative splicing events in the assembly. Additionally, a total of 24 341 complete open reading frames (ORFs) were detected from the assembly where 1492 ORFs were found to be novel gene loci as these have not been annotated so far in any public database. © 2014 John Wiley & Sons Ltd.
ALDB: a domestic-animal long noncoding RNA database.
Li, Aimin; Zhang, Junying; Zhou, Zhongyin; Wang, Lei; Liu, Yujuan; Liu, Yajun
2015-01-01
Long noncoding RNAs (lncRNAs) have attracted significant attention in recent years due to their important roles in many biological processes. Domestic animals constitute a unique resource for understanding the genetic basis of phenotypic variation and are ideal models relevant to diverse areas of biomedical research. With improving sequencing technologies, numerous domestic-animal lncRNAs are now available. Thus, there is an immediate need for a database resource that can assist researchers to store, organize, analyze and visualize domestic-animal lncRNAs. The domestic-animal lncRNA database, named ALDB, is the first comprehensive database with a focus on the domestic-animal lncRNAs. It currently archives 12,103 pig intergenic lncRNAs (lincRNAs), 8,923 chicken lincRNAs and 8,250 cow lincRNAs. In addition to the annotations of lincRNAs, it offers related data that is not available yet in existing lncRNA databases (lncRNAdb and NONCODE), such as genome-wide expression profiles and animal quantitative trait loci (QTLs) of domestic animals. Moreover, a collection of interfaces and applications, such as the Basic Local Alignment Search Tool (BLAST), the Generic Genome Browser (GBrowse) and flexible search functionalities, are available to help users effectively explore, analyze and download data related to domestic-animal lncRNAs. ALDB enables the exploration and comparative analysis of lncRNAs in domestic animals. A user-friendly web interface, integrated information and tools make it valuable to researchers in their studies. ALDB is freely available from http://res.xaut.edu.cn/aldb/index.jsp.
Van Neste, Christophe; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip
2016-01-01
It is difficult to predict if and when massively parallel sequencing of forensic STR loci will replace capillary electrophoresis as the new standard technology in forensic genetics. The main benefits of sequencing are increased multiplexing scales and SNP detection. There is not yet a consensus on how sequenced profiles should be reported. We present the Forensic Loci Allele Database (FLAD) service, made freely available on http://forensic.ugent.be/FLAD/. It offers permanent identifiers for sequenced forensic alleles (STR or SNP) and their microvariants for use in forensic allele nomenclature. Analogous to Genbank, its aim is to provide permanent identifiers for forensically relevant allele sequences. Researchers that are developing forensic sequencing kits or are performing population studies, can register on http://forensic.ugent.be/FLAD/ and add loci and allele sequences with a short and simple application interface (API). Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Testing independence of fragment lengths within VNTR loci
DOE Office of Scientific and Technical Information (OSTI.GOV)
Geisser, S.; Johnson, W.
1993-11-01
Methods that were devised to test independence of the bivariate fragment lengths obtained from VNTR loci are applied to several population databases. It is shown that for many of the probes independence (Hardy-Weinberg equilibrium) cannot be sustained. 3 refs., 3 tabs.
Jaiswal, Sarika; Sheoran, Sonia; Arora, Vasu; Angadi, Ulavappa B; Iquebal, Mir A; Raghav, Nishu; Aneja, Bharti; Kumar, Deepender; Singh, Rajender; Sharma, Pradeep; Singh, G P; Rai, Anil; Tiwari, Ratan; Kumar, Dinesh
2017-01-01
Wheat fulfills 20% of global caloric requirement. World needs 60% more wheat for 9 billion population by 2050 but climate change with increasing temperature is projected to affect wheat productivity adversely. Trait improvement and management of wheat germplasm requires genomic resource. Simple Sequence Repeats (SSRs) being highly polymorphic and ubiquitously distributed in the genome, can be a marker of choice but there is no structured marker database with options to generate primer pairs for genotyping on desired chromosome/physical location. Previously associated markers with different wheat trait are also not available in any database. Limitations of in vitro SSR discovery can be overcome by genome-wide in silico mining of SSR. Triticum aestivum SSR database ( TaSSRDb ) is an integrated online database with three-tier architecture, developed using PHP and MySQL and accessible at http://webtom.cabgrid.res.in/wheatssr/. For genotyping, Primer3 standalone code computes primers on user request. Chromosome-wise SSR calling for all the three sub genomes along with choice of motif types is provided in addition to the primer generation for desired marker. We report here a database of highest number of SSRs (476,169) from complex, hexaploid wheat genome (~17 GB) along with previously reported 268 SSR markers associated with 11 traits. Highest (116.93 SSRs/Mb) and lowest (74.57 SSRs/Mb) SSR densities were found on 2D and 3A chromosome, respectively. To obtain homozygous locus, e-PCR was done. Such 30 loci were randomly selected for PCR validation in panel of 18 wheat Advance Varietal Trial (AVT) lines. TaSSRDb can be a valuable genomic resource tool for linkage mapping, gene/QTL (Quantitative trait locus) discovery, diversity analysis, traceability and variety identification. Varietal specific profiling and differentiation can supplement DUS (Distinctiveness, Uniformity, and Stability) testing, EDV (Essentially Derived Variety)/IV (Initial Variety) disputes, seed purity and hybrid wheat testing. All these are required in germplasm management as well as also in the endeavor of wheat productivity.
Jaiswal, Sarika; Sheoran, Sonia; Arora, Vasu; Angadi, Ulavappa B.; Iquebal, Mir A.; Raghav, Nishu; Aneja, Bharti; Kumar, Deepender; Singh, Rajender; Sharma, Pradeep; Singh, G. P.; Rai, Anil; Tiwari, Ratan; Kumar, Dinesh
2017-01-01
Wheat fulfills 20% of global caloric requirement. World needs 60% more wheat for 9 billion population by 2050 but climate change with increasing temperature is projected to affect wheat productivity adversely. Trait improvement and management of wheat germplasm requires genomic resource. Simple Sequence Repeats (SSRs) being highly polymorphic and ubiquitously distributed in the genome, can be a marker of choice but there is no structured marker database with options to generate primer pairs for genotyping on desired chromosome/physical location. Previously associated markers with different wheat trait are also not available in any database. Limitations of in vitro SSR discovery can be overcome by genome-wide in silico mining of SSR. Triticum aestivum SSR database (TaSSRDb) is an integrated online database with three-tier architecture, developed using PHP and MySQL and accessible at http://webtom.cabgrid.res.in/wheatssr/. For genotyping, Primer3 standalone code computes primers on user request. Chromosome-wise SSR calling for all the three sub genomes along with choice of motif types is provided in addition to the primer generation for desired marker. We report here a database of highest number of SSRs (476,169) from complex, hexaploid wheat genome (~17 GB) along with previously reported 268 SSR markers associated with 11 traits. Highest (116.93 SSRs/Mb) and lowest (74.57 SSRs/Mb) SSR densities were found on 2D and 3A chromosome, respectively. To obtain homozygous locus, e-PCR was done. Such 30 loci were randomly selected for PCR validation in panel of 18 wheat Advance Varietal Trial (AVT) lines. TaSSRDb can be a valuable genomic resource tool for linkage mapping, gene/QTL (Quantitative trait locus) discovery, diversity analysis, traceability and variety identification. Varietal specific profiling and differentiation can supplement DUS (Distinctiveness, Uniformity, and Stability) testing, EDV (Essentially Derived Variety)/IV (Initial Variety) disputes, seed purity and hybrid wheat testing. All these are required in germplasm management as well as also in the endeavor of wheat productivity. PMID:29234333
Comprehensive annotated STR physical map of the human Y chromosome: Forensic implications.
Hanson, Erin K; Ballantyne, Jack
2006-03-01
A plethora of Y-STR markers from diverse sources have been deposited in public databases and represent potential candidates for incorporation into the next generation of Y-STR multiplexes for forensic use. Here, based upon all of the Y-STR loci that have been deposited in the human genome database (>400), we have sequentially positioned each one along the Y chromosome using the most current human genome sequencing data (NCBI Build 35). The information derived from this work defines the number and relative position of all potentially forensically relevant Y-STR loci, their location within the physical linkage map of the Y chromosome and their relationship to structural genes. We conclude that there exists at present at least 417 separate Y-STR markers available for potential forensic use, although many of these will be found to be unsuitable for other reasons. However, from this data, we were able to identify 28 pairs of duplicated loci that were given separate DYS designations and four pairs of loci with overlapping flanking regions. Removing one locus from each set of duplicates reduced the number of potentially useful loci from 417 to 389. The derived information should be useful for workers who are designing novel Y-STR multiplexes to ensure the presence of non-synonymous loci and, if so desired, to avoid loci that lie within structural genes. It may also be useful for forensic casework practitioners (or molecular anthropologists) to aid in distinguishing between chromosomal rearrangements (such as duplications and deletions) and bona fide DNA admixtures or null alleles caused by primer binding site mutations. We illustrate the practical usefulness of the chromosomal positioning data in the design of eight multiplex systems using 94 Y-STR loci.
Larson, E L; Bogdanowicz, S M; Agrawal, A A; Johnson, M T J; Harrison, R G
2008-03-01
We developed nine polymorphic microsatellite loci for evening primrose (Oenothera biennis). These loci have two to 18 alleles per locus and observed heterozygosities ranging from 0 to 0.879 in a sample of 34 individuals. In a pattern consistent with the functionally asexual reproductive system of this species, 17/36 pairs of loci revealed significant linkage disequilibrium and three loci showed significant deviations from Hardy-Weinberg equilibrium. The loci will be informative in identifying genotypes in multigenerational field studies to assess changes in genotype frequencies. © 2007 The Authors.
Chen, H X; Cai, C; Liu, J Y; Zhang, Z G; Yuan, M; Jia, J N; Sun, Z G; Huang, H R; Gao, J M; Li, W M
2017-06-10
Objective: Using the standard genotype method, variable number of tandem repeats (VNTR), we constructed a VNTR database to cover all provinces and proposed a set of optimized VNTR loci combinations for each province, in order to improve the preventive and control programs on tuberculosis, in China. Methods: A total of 15 loci VNTR was used to analyze 4 116 Mycobacterium tuberculosis strains, isolated from national survey of Drug Resistant Tuberculosis, in 2007. Hunter-Gaston Index (HGI) was also used to analyze the discriminatory power of each VNTR site. A set combination of 12-VNTR, 10-VNTR, 8-VNTR and 5-VNTR was respectively constructed for each province, based on 1) epidemic characteristics of M. tuberculosis lineages in China, with high discriminatory power and genetic stability. Results: Through the completed 15 loci VNTR patterns of 3 966 strains under 96.36 % (3 966/4 116) coverage, we found seven high HGI loci (including QUB11b and MIRU26) as well as low stable loci (including QUB26, MIRU16, Mtub21 and QUB11b) in several areas. In all the 31 provinces, we found an optimization VNTR combination as 10-VNTR loci in Inner Mongolia, Chongqing and Heilongjiang, but with 8-VNTR combination shared in other provinces. Conclusions: It is necessary to not only use the VNTR database for tracing the source of infection and cluster of M. tuberculosis in the nation but also using the set of optimized VNTR combinations in monitoring those local epidemics and M. tuberculosis (genetics in local) population.
High-throughput STR analysis for DNA database using direct PCR.
Sim, Jeong Eun; Park, Su Jeong; Lee, Han Chul; Kim, Se-Yong; Kim, Jong Yeol; Lee, Seung Hwan
2013-07-01
Since the Korean criminal DNA database was launched in 2010, we have focused on establishing an automated DNA database profiling system that analyzes short tandem repeat loci in a high-throughput and cost-effective manner. We established a DNA database profiling system without DNA purification using a direct PCR buffer system. The quality of direct PCR procedures was compared with that of conventional PCR system under their respective optimized conditions. The results revealed not only perfect concordance but also an excellent PCR success rate, good electropherogram quality, and an optimal intra/inter-loci peak height ratio. In particular, the proportion of DNA extraction required due to direct PCR failure could be minimized to <3%. In conclusion, the newly developed direct PCR system can be adopted for automated DNA database profiling systems to replace or supplement conventional PCR system in a time- and cost-saving manner. © 2013 American Academy of Forensic Sciences Published 2013. This article is a U.S. Government work and is in the public domain in the U.S.A.
Domínguez-Contreras, J. F.; Munguía-Vega, A.; Ceballos-Vázquez, B. P.; Arellano-Martínez, M.; Culver, Melanie
2014-01-01
We characterized 22 novel microsatellite loci in the two-spotted octopus Octopus bimaculatus using 454 pyrosequencing reads. All loci were polymorphic and will be used in studies of marine connectivity aimed at increasing sustainability of the resource. The mean number alleles per locus was 13.09 (range 7–19) and observed heterozygosities ranged from 0.50 to 1.00. Four loci pairs were linked and three deviated from Hardy–Weinberg equilibrium. Eighteen and 12 loci were polymorphic in Octopus bimaculoides and Octopus hubbsorum, respectively.
Signature of genetic associations in oral cancer.
Sharma, Vishwas; Nandan, Amrita; Sharma, Amitesh Kumar; Singh, Harpreet; Bharadwaj, Mausumi; Sinha, Dhirendra Narain; Mehrotra, Ravi
2017-10-01
Oral cancer etiology is complex and controlled by multi-factorial events including genetic events. Candidate gene studies, genome-wide association studies, and next-generation sequencing identified various chromosomal loci to be associated with oral cancer. There is no available review that could give us the comprehensive picture of genetic loci identified to be associated with oral cancer by candidate gene studies-based, genome-wide association studies-based, and next-generation sequencing-based approaches. A systematic literature search was performed in the PubMed database to identify the loci associated with oral cancer by exclusive candidate gene studies-based, genome-wide association studies-based, and next-generation sequencing-based study approaches. The information of loci associated with oral cancer is made online through the resource "ORNATE." Next, screening of the loci validated by candidate gene studies and next-generation sequencing approach or by two independent studies within candidate gene studies or next-generation sequencing approaches were performed. A total of 264 loci were identified to be associated with oral cancer by candidate gene studies, genome-wide association studies, and next-generation sequencing approaches. In total, 28 loci, that is, 14q32.33 (AKT1), 5q22.2 (APC), 11q22.3 (ATM), 2q33.1 (CASP8), 11q13.3 (CCND1), 16q22.1 (CDH1), 9p21.3 (CDKN2A), 1q31.1 (COX-2), 7p11.2 (EGFR), 22q13.2 (EP300), 4q35.2 (FAT1), 4q31.3 (FBXW7), 4p16.3 (FGFR3), 1p13.3 (GSTM1-GSTT1), 11q13.2 (GSTP1), 11p15.5 (H-RAS), 3p25.3 (hOGG1), 1q32.1 (IL-10), 4q13.3 (IL-8), 12p12.1 (KRAS), 12q15 (MDM2), 12q13.12 (MLL2), 9q34.3 (NOTCH1), 17p13.1 (p53), 3q26.32 (PIK3CA), 10q23.31 (PTEN), 13q14.2 (RB1), and 5q14.2 (XRCC4), were validated to be associated with oral cancer. "ORNATE" gives a snapshot of genetic loci associated with oral cancer. All 28 loci were validated to be linked to oral cancer for which further fine-mapping followed by gene-by-gene and gene-environment interaction studies is needed to confirm their involvement in modifying oral cancer.
Satya, Pratik; Paswan, Pramod Kumar; Ghosh, Swagata; Majumdar, Snehalata; Ali, Nasim
2016-06-01
Cross-species transferability is a quick and economic method to enrich SSR database, particularly for minor crops where little genomic information is available. However, transferability of SSR markers varies greatly between species, genera and families of plant species. We assessed confamiliar transferability of SSR markers from cotton (Gossypium hirsutum) and jute (Corchorus olitorius) to 22 species distributed in different taxonomic groups of Malvaceae. All the species selected were potential industrial crop species having little or no genomic resources or SSR database. Of the 14 cotton SSR loci tested, 13 (92.86 %) amplified in G. arboreum and 71.43 % exhibited cross-genera transferability. Nine out of 11 jute SSRs (81.81 %) showed cross-transferability across genera. SSRs from both the species exhibited high polymorphism and resolving power in other species. The correlation between transferability of cotton and jute SSRs were highly significant (r = 0.813). The difference in transferability among species was also significant for both the marker groups. High transferability was observed at genus, tribe and subfamily level. At tribe level, transferability of jute SSRs (41.04 %) was higher than that of cotton SSRs (33.74 %). The tribe Byttnerieae exhibited highest SSR transferability (48.7 %). The high level of cross-genera transferability (>50 %) in ten species of Malvaceae, where no SSR resource is available, calls for large scale transferability testing from the enriched SSR databases of cotton and jute.
Oostdik, Kathryn; Lenz, Kristy; Nye, Jeffrey; Schelling, Kristin; Yet, Donald; Bruski, Scott; Strong, Joshua; Buchanan, Clint; Sutton, Joel; Linner, Jessica; Frazier, Nicole; Young, Hays; Matthies, Learden; Sage, Amber; Hahn, Jeff; Wells, Regina; Williams, Natasha; Price, Monica; Koehler, Jody; Staples, Melisa; Swango, Katie L; Hill, Carolyn; Oyerly, Karen; Duke, Wendy; Katzilierakis, Lesley; Ensenberger, Martin G; Bourdeau, Jeanne M; Sprecher, Cynthia J; Krenke, Benjamin; Storts, Douglas R
2014-09-01
The original CODIS database based on 13 core STR loci has been overwhelmingly successful for matching suspects with evidence. Yet there remain situations that argue for inclusion of more loci and increased discrimination. The PowerPlex(®) Fusion System allows simultaneous amplification of the following loci: Amelogenin, D3S1358, D1S1656, D2S441, D10S1248, D13S317, Penta E, D16S539, D18S51, D2S1338, CSF1PO, Penta D, TH01, vWA, D21S11, D7S820, D5S818, TPOX, DYS391, D8S1179, D12S391, D19S433, FGA, and D22S1045. The comprehensive list of loci amplified by the system generates a profile compatible with databases based on either the expanded CODIS or European Standard Set (ESS) requirements. Developmental validation testing followed SWGDAM guidelines and demonstrated the quality and robustness of the PowerPlex(®) Fusion System across a number of variables. Consistent and high-quality results were compiled using data from 12 separate forensic and research laboratories. The results verify that the PowerPlex(®) Fusion System is a robust and reliable STR-typing multiplex suitable for human identification. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Database of cattle candidate genes and genetic markers for milk production and mastitis
Ogorevc, J; Kunej, T; Razpet, A; Dovc, P
2009-01-01
A cattle database of candidate genes and genetic markers for milk production and mastitis has been developed to provide an integrated research tool incorporating different types of information supporting a genomic approach to study lactation, udder development and health. The database contains 943 genes and genetic markers involved in mammary gland development and function, representing candidates for further functional studies. The candidate loci were drawn on a genetic map to reveal positional overlaps. For identification of candidate loci, data from seven different research approaches were exploited: (i) gene knockouts or transgenes in mice that result in specific phenotypes associated with mammary gland (143 loci); (ii) cattle QTL for milk production (344) and mastitis related traits (71); (iii) loci with sequence variations that show specific allele-phenotype interactions associated with milk production (24) or mastitis (10) in cattle; (iv) genes with expression profiles associated with milk production (207) or mastitis (107) in cattle or mouse; (v) cattle milk protein genes that exist in different genetic variants (9); (vi) miRNAs expressed in bovine mammary gland (32) and (vii) epigenetically regulated cattle genes associated with mammary gland function (1). Fourty-four genes found by multiple independent analyses were suggested as the most promising candidates and were further in silico analysed for expression levels in lactating mammary gland, genetic variability and top biological functions in functional networks. A miRNA target search for mammary gland expressed miRNAs identified 359 putative binding sites in 3′UTRs of candidate genes. PMID:19508288
From biomedicine to natural history research: EST resources for ambystomatid salamanders
Putta, Srikrishna; Smith, Jeramiah J; Walker, John A; Rondet, Mathieu; Weisrock, David W; Monaghan, James; Samuels, Amy K; Kump, Kevin; King, David C; Maness, Nicholas J; Habermann, Bianca; Tanaka, Elly; Bryant, Susan V; Gardiner, David M; Parichy, David M; Voss, S Randal
2004-01-01
Background Establishing genomic resources for closely related species will provide comparative insights that are crucial for understanding diversity and variability at multiple levels of biological organization. We developed ESTs for Mexican axolotl (Ambystoma mexicanum) and Eastern tiger salamander (A. tigrinum tigrinum), species with deep and diverse research histories. Results Approximately 40,000 quality cDNA sequences were isolated for these species from various tissues, including regenerating limb and tail. These sequences and an existing set of 16,030 cDNA sequences for A. mexicanum were processed to yield 35,413 and 20,599 high quality ESTs for A. mexicanum and A. t. tigrinum, respectively. Because the A. t. tigrinum ESTs were obtained primarily from a normalized library, an approximately equal number of contigs were obtained for each species, with 21,091 unique contigs identified overall. The 10,592 contigs that showed significant similarity to sequences from the human RefSeq database reflected a diverse array of molecular functions and biological processes, with many corresponding to genes expressed during spinal cord injury in rat and fin regeneration in zebrafish. To demonstrate the utility of these EST resources, we searched databases to identify probes for regeneration research, characterized intra- and interspecific nucleotide polymorphism, saturated a human – Ambystoma synteny group with marker loci, and extended PCR primer sets designed for A. mexicanum / A. t. tigrinum orthologues to a related tiger salamander species. Conclusions Our study highlights the value of developing resources in traditional model systems where the likelihood of information transfer to multiple, closely related taxa is high, thus simultaneously enabling both laboratory and natural history research. PMID:15310388
AgdbNet – antigen sequence database software for bacterial typing
Jolley, Keith A; Maiden, Martin CJ
2006-01-01
Background Bacterial typing schemes based on the sequences of genes encoding surface antigens require databases that provide a uniform, curated, and widely accepted nomenclature of the variants identified. Due to the differences in typing schemes, imposed by the diversity of genes targeted, creating these databases has typically required the writing of one-off code to link the database to a web interface. Here we describe agdbNet, widely applicable web database software that facilitates simultaneous BLAST querying of multiple loci using either nucleotide or peptide sequences. Results Databases are described by XML files that are parsed by a Perl CGI script. Each database can have any number of loci, which may be defined by nucleotide and/or peptide sequences. The software is currently in use on at least five public databases for the typing of Neisseria meningitidis, Campylobacter jejuni and Streptococcus equi and can be set up to query internal isolate tables or suitably-configured external isolate databases, such as those used for multilocus sequence typing. The style of the resulting website can be fully configured by modifying stylesheets and through the use of customised header and footer files that surround the output of the script. Conclusion The software provides a rapid means of setting up customised Internet antigen sequence databases. The flexible configuration options enable typing schemes with differing requirements to be accommodated. PMID:16790057
Li, Caijuan; Ling, Qufei; Ge, Chen; Ye, Zhuqing; Han, Xiaofei
2015-02-25
The large-scale loach (Paramisgurnus dabryanus, Cypriniformes) is a bottom-dwelling freshwater species of fish found mainly in eastern Asia. The natural germplasm resources of this important aquaculture species has been recently threatened due to overfishing and artificial propagation. The objective of this study is to obtain the first functional genomic resource and candidate molecular markers for future conservation and breeding research. Illumina paired-end sequencing generated over one hundred million reads that resulted in 71,887 assembled transcripts, with an average length of 1465bp. 42,093 (58.56%) protein-coding sequences were predicted; and 43,837 transcripts had significant matches to NCBI nonredundant protein (Nr) database. 29,389 and 14,419 transcripts were assigned into gene ontology (GO) categories and Eukaryotic Orthologous Groups (KOG), respectively. 22,102 (31.14%) transcripts were mapped to 302 KEGG pathways. In addition, 15,106 candidate SSR markers were identified, with 11,037 pairs of PCR primers designed. 400 primers pairs of SSR selected randomly were validated, of which 364 (91%) pairs of primers were able to produce PCR products. Further test with 41 loci and 20 large-scale loach specimens collected from the four largest lakes in China showed that 36 (87.8%) loci were polymorphic. The transcriptomic profile and SSR repertoire obtained in this study will facilitate population genetic studies and selective breeding of large-scale loach in the future. Copyright © 2015. Published by Elsevier B.V.
Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao
2018-01-01
Abstract Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. PMID:29069510
Ramey, A; Graziano, S L; Nielsen, J L
2008-03-01
Eight polymorphic microsatellite loci were isolated and characterized for the Arctic cisco, Coregonus autumnalis. Loci were evaluated in 21 samples from the Colville River subsistence fishery. The number of alleles per locus ranged from two to 18. Observed heterozygosity of loci varied from 0.10 to 1.00, and expected heterozygosity ranged from 0.09 to 0.92. All eight microsatellite markers were in Hardy-Weinberg equilibrium. The loci presented here will be useful in describing population structure and exploring populations of origin for Arctic cisco. © 2007 Blackwell Publishing Ltd No claim to original US government works.
Allele frequency distribution for 21 autosomal STR loci in Nepal.
Kraaijenbrink, T; van Driem, G L; Opgenort, J R M L; Tuladhar, N M; de Knijff, P
2007-05-24
The allele frequency distributions of 21 autosomal loci contained in the AmpFlSTR Identifiler, the Powerplex 16 and the FFFL multiplex PCR kits, was studied in 953 unrelated individuals from Nepal. Several new alleles (i.e. not yet reported in the NIST Short Tandem Repeat DNA Internet DataBase [http://www.cstl.nist.gov/biotech/strbase/]) have been detected in the process.
Gene Expression Elucidates Functional Impact of Polygenic Risk for Schizophrenia
Fromer, Menachem; Roussos, Panos; Sieberts, Solveig K; Johnson, Jessica S; Kavanagh, David H; Perumal, Thanneer M; Ruderfer, Douglas M; Oh, Edwin C; Topol, Aaron; Shah, Hardik R; Klei, Lambertus L; Kramer, Robin; Pinto, Dalila; Gümüş, Zeynep H; Cicek, A. Ercument; Dang, Kristen K; Browne, Andrew; Lu, Cong; Xie, Lu; Readhead, Ben; Stahl, Eli A; Parvizi, Mahsa; Hamamsy, Tymor; Fullard, John F; Wang, Ying-Chih; Mahajan, Milind C; Derry, Jonathan M J; Dudley, Joel; Hemby, Scott E; Logsdon, Benjamin A; Talbot, Konrad; Raj, Towfique; Bennett, David A; De Jager, Philip L; Zhu, Jun; Zhang, Bin; Sullivan, Patrick F; Chess, Andrew; Purcell, Shaun M; Shinobu, Leslie A; Mangravite, Lara M; Toyoshiba, Hiroyoshi; Gur, Raquel E; Hahn, Chang-Gyu; Lewis, David A; Haroutunian, Vahram; Peters, Mette A; Lipska, Barbara K; Buxbaum, Joseph D; Schadt, Eric E; Hirai, Keisuke; Roeder, Kathryn; Brennand, Kristen J; Katsanis, Nicholas; Domenici, Enrico; Devlin, Bernie; Sklar, Pamela
2016-01-01
Over 100 genetic loci harbor schizophrenia associated variants, yet how these variants confer liability is uncertain. The CommonMind Consortium sequenced RNA from dorsolateral prefrontal cortex of schizophrenia cases (N = 258) and control subjects (N = 279), creating a resource of gene expression and its genetic regulation. Using this resource, ~20% of schizophrenia loci have variants that could contribute to altered gene expression and liability. In five loci, only a single gene was involved: FURIN, TSNARE1, CNTN4, CLCN3, or SNAP91. Altering expression of FURIN, TSNARE1, or CNTN4 changes neurodevelopment in zebrafish; knockdown of FURIN in human neural progenitor cells yields abnormal migration. Of 693 genes showing significant case/control differential expression, their fold changes are ≤ 1.33, and an independent cohort yields similar results. Gene co-expression implicates a network relevant for schizophrenia. Our findings show schizophrenia is polygenic and highlight the utility of this resource for mechanistic interpretations of genetic liability for brain diseases. PMID:27668389
Gene expression elucidates functional impact of polygenic risk for schizophrenia.
Fromer, Menachem; Roussos, Panos; Sieberts, Solveig K; Johnson, Jessica S; Kavanagh, David H; Perumal, Thanneer M; Ruderfer, Douglas M; Oh, Edwin C; Topol, Aaron; Shah, Hardik R; Klei, Lambertus L; Kramer, Robin; Pinto, Dalila; Gümüş, Zeynep H; Cicek, A Ercument; Dang, Kristen K; Browne, Andrew; Lu, Cong; Xie, Lu; Readhead, Ben; Stahl, Eli A; Xiao, Jianqiu; Parvizi, Mahsa; Hamamsy, Tymor; Fullard, John F; Wang, Ying-Chih; Mahajan, Milind C; Derry, Jonathan M J; Dudley, Joel T; Hemby, Scott E; Logsdon, Benjamin A; Talbot, Konrad; Raj, Towfique; Bennett, David A; De Jager, Philip L; Zhu, Jun; Zhang, Bin; Sullivan, Patrick F; Chess, Andrew; Purcell, Shaun M; Shinobu, Leslie A; Mangravite, Lara M; Toyoshiba, Hiroyoshi; Gur, Raquel E; Hahn, Chang-Gyu; Lewis, David A; Haroutunian, Vahram; Peters, Mette A; Lipska, Barbara K; Buxbaum, Joseph D; Schadt, Eric E; Hirai, Keisuke; Roeder, Kathryn; Brennand, Kristen J; Katsanis, Nicholas; Domenici, Enrico; Devlin, Bernie; Sklar, Pamela
2016-11-01
Over 100 genetic loci harbor schizophrenia-associated variants, yet how these variants confer liability is uncertain. The CommonMind Consortium sequenced RNA from dorsolateral prefrontal cortex of people with schizophrenia (N = 258) and control subjects (N = 279), creating a resource of gene expression and its genetic regulation. Using this resource, ∼20% of schizophrenia loci have variants that could contribute to altered gene expression and liability. In five loci, only a single gene was involved: FURIN, TSNARE1, CNTN4, CLCN3 or SNAP91. Altering expression of FURIN, TSNARE1 or CNTN4 changed neurodevelopment in zebrafish; knockdown of FURIN in human neural progenitor cells yielded abnormal migration. Of 693 genes showing significant case-versus-control differential expression, their fold changes were ≤ 1.33, and an independent cohort yielded similar results. Gene co-expression implicates a network relevant for schizophrenia. Our findings show that schizophrenia is polygenic and highlight the utility of this resource for mechanistic interpretations of genetic liability for brain diseases.
Uchihi, Rieko; Yamamoto, Toshimichi; Yoshimoto, Takashi; Inoue, Chikako; Kishida, Tetsuko; Yoshioka, Naofumi; Katsumata, Yoshinao
2007-07-04
The genetic differences of the allele frequency distributions for six STR loci (D20S480, D6S2439, D6S1056, D9S1118, D4S2639, and D17S1290) among regions in Japan were examined using our recently designed hexaplex amplification and typing system, "Midi-6" newly named, to construct a database in the Japanese population. Genotypes at six loci were analyzed in 198, 200, 175, and 196 individuals from the area of Akita, Nagoya, Oita, and Okinawa, respectively, in Japan. The allele frequency distributions were significantly different (p<0.05) at from one to five loci among the four populations when compared pairwise. Significant differences were also observed at two or three loci between Oita- or Okinawa-Japanese and the "pooled" population (n=769), respectively. However, since F(ST) (theta) values were extremely low (<0.05), ranging from 0.0020 to 0.0118 for six loci, genetic differentiation within the pooled Japanese population was negligible. Therefore, it suggested that the data of the allele frequencies at six loci in the pooled population would be employed as the base of calculation for statistical probabilities.
Genomic copy number variations in three Southeast Asian populations.
Ku, Chee-Seng; Pawitan, Yudi; Sim, Xueling; Ong, Rick T H; Seielstad, Mark; Lee, Edmund J D; Teo, Yik-Ying; Chia, Kee-Seng; Salim, Agus
2010-07-01
Research on the role of copy number variations (CNVs) in the genetic risk of diseases in Asian populations has been hampered by a relative lack of reference CNV maps for Asian populations outside the East Asians. In this article, we report the population characteristics of CNVs in Chinese, Malay, and Asian Indian populations in Singapore. Using the Illumina Human 1M Beadchip array, we identify 1,174 CNV loci in these populations that corroborated with findings when the same samples were typed on the Affymetrix 6.0 platform. We identify 441 novel loci not previously reported in the Database of Genomic Variations (DGV). We observe a considerable number of loci that span all three populations and were previously unreported, as well as population-specific loci that are quite common in the respective populations. From this we observe the distribution of CNVs in the Asian Indian population to be considerably different from the Chinese and Malay populations. About half of the deletion loci and three-quarters of duplication loci overlap UCSC genes. Tens of loci show population differentiation and overlap with genes previously known to be associated with genetic risk of diseases. One of these loci is the CYP2A6 deletion, previously linked to reduced susceptibility to lung cancer. (c) 2010 Wiley-Liss, Inc.
Broad-Scale Genetic Diversity of Cannabis for Forensic Applications.
Dufresnes, Christophe; Jan, Catherine; Bienert, Friederike; Goudet, Jérôme; Fumagalli, Luca
2017-01-01
Cannabis (hemp and marijuana) is an iconic yet controversial crop. On the one hand, it represents a growing market for pharmaceutical and agricultural sectors. On the other hand, plants synthesizing the psychoactive THC produce the most widespread illicit drug in the world. Yet, the difficulty to reliably distinguish between Cannabis varieties based on morphological or biochemical criteria impedes the development of promising industrial programs and hinders the fight against narcotrafficking. Genetics offers an appropriate alternative to characterize drug vs. non-drug Cannabis. However, forensic applications require rapid and affordable genotyping of informative and reliable molecular markers for which a broad-scale reference database, representing both intra- and inter-variety variation, is available. Here we provide such a resource for Cannabis, by genotyping 13 microsatellite loci (STRs) in 1 324 samples selected specifically for fibre (24 hemp varieties) and drug (15 marijuana varieties) production. We showed that these loci are sufficient to capture most of the genome-wide diversity patterns recently revealed by NGS data. We recovered strong genetic structure between marijuana and hemp and demonstrated that anonymous samples can be confidently assigned to either plant types. Fibres appear genetically homogeneous whereas drugs show low (often clonal) diversity within varieties, but very high genetic differentiation between them, likely resulting from breeding practices. Based on an additional test dataset including samples from 41 local police seizures, we showed that the genetic signature of marijuana cultivars could be used to trace crime scene evidence. To date, our study provides the most comprehensive genetic resource for Cannabis forensics worldwide.
Broad-Scale Genetic Diversity of Cannabis for Forensic Applications
Dufresnes, Christophe; Jan, Catherine; Bienert, Friederike; Goudet, Jérôme; Fumagalli, Luca
2017-01-01
Cannabis (hemp and marijuana) is an iconic yet controversial crop. On the one hand, it represents a growing market for pharmaceutical and agricultural sectors. On the other hand, plants synthesizing the psychoactive THC produce the most widespread illicit drug in the world. Yet, the difficulty to reliably distinguish between Cannabis varieties based on morphological or biochemical criteria impedes the development of promising industrial programs and hinders the fight against narcotrafficking. Genetics offers an appropriate alternative to characterize drug vs. non-drug Cannabis. However, forensic applications require rapid and affordable genotyping of informative and reliable molecular markers for which a broad-scale reference database, representing both intra- and inter-variety variation, is available. Here we provide such a resource for Cannabis, by genotyping 13 microsatellite loci (STRs) in 1 324 samples selected specifically for fibre (24 hemp varieties) and drug (15 marijuana varieties) production. We showed that these loci are sufficient to capture most of the genome-wide diversity patterns recently revealed by NGS data. We recovered strong genetic structure between marijuana and hemp and demonstrated that anonymous samples can be confidently assigned to either plant types. Fibres appear genetically homogeneous whereas drugs show low (often clonal) diversity within varieties, but very high genetic differentiation between them, likely resulting from breeding practices. Based on an additional test dataset including samples from 41 local police seizures, we showed that the genetic signature of marijuana cultivars could be used to trace crime scene evidence. To date, our study provides the most comprehensive genetic resource for Cannabis forensics worldwide. PMID:28107530
The Mouse Genome Database (MGD): facilitating mouse as a model for human biology and disease.
Eppig, Janan T; Blake, Judith A; Bult, Carol J; Kadin, James A; Richardson, Joel E
2015-01-01
The Mouse Genome Database (MGD, http://www.informatics.jax.org) serves the international biomedical research community as the central resource for integrated genomic, genetic and biological data on the laboratory mouse. To facilitate use of mouse as a model in translational studies, MGD maintains a core of high-quality curated data and integrates experimentally and computationally generated data sets. MGD maintains a unified catalog of genes and genome features, including functional RNAs, QTL and phenotypic loci. MGD curates and provides functional and phenotype annotations for mouse genes using the Gene Ontology and Mammalian Phenotype Ontology. MGD integrates phenotype data and associates mouse genotypes to human diseases, providing critical mouse-human relationships and access to repositories holding mouse models. MGD is the authoritative source of nomenclature for genes, genome features, alleles and strains following guidelines of the International Committee on Standardized Genetic Nomenclature for Mice. A new addition to MGD, the Human-Mouse: Disease Connection, allows users to explore gene-phenotype-disease relationships between human and mouse. MGD has also updated search paradigms for phenotypic allele attributes, incorporated incidental mutation data, added a module for display and exploration of genes and microRNA interactions and adopted the JBrowse genome browser. MGD resources are freely available to the scientific community. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Yoneyama, Sachiko; Guo, Yiran; Lanktree, Matthew B.; Barnes, Michael R.; Elbers, Clara C.; Karczewski, Konrad J; Padmanabhan, Sandosh; Bauer, Florianne; Baumert, Jens; Beitelshees, Amber; Berenson, Gerald S.; Boer, Jolanda M.A.; Burke, Gregory; Cade, Brian; Chen, Wei; Cooper-Dehoff, Rhonda M.; Gaunt, Tom R.; Gieger, Christian; Gong, Yan; Gorski, Mathias; Heard-Costa, Nancy; Johnson, Toby; Lamonte, Michael J.; Mcdonough, Caitrin; Monda, Keri L.; Onland-Moret, N. Charlotte; Nelson, Christopher P.; O'Connell, Jeffrey R.; Ordovas, Jose; Peter, Inga; Peters, Annette; Shaffer, Jonathan; Shen, Haiqinq; Smith, Erin; Speilotes, Liz; Thomas, Fridtjof; Thorand, Barbara; Monique Verschuren, W. M.; Anand, Sonia S.; Dominiczak, Anna; Davidson, Karina W.; Hegele, Robert A.; Heid, Iris; Hofker, Marten H.; Huggins, Gordon S.; Illig, Thomas; Johnson, Julie A.; Kirkland, Susan; König, Wolfgang; Langaee, Taimour Y.; Mccaffery, Jeanne; Melander, Olle; Mitchell, Braxton D.; Munroe, Patricia; Murray, Sarah S.; Papanicolaou, George; Redline, Susan; Reilly, Muredach; Samani, Nilesh J.; Schork, Nicholas J.; Van Der Schouw, Yvonne T.; Shimbo, Daichi; Shuldiner, Alan R.; Tobin, Martin D.; Wijmenga, Cisca; Yusuf, Salim; Hakonarson, Hakon; Lange, Leslie A.; Demerath, Ellen W; Fox, Caroline S.; North, Kari E; Reiner, Alex P.; Keating, Brendan; Taylor, Kira C.
2014-01-01
Waist circumference (WC) and waist-to-hip ratio (WHR) are surrogate measures of central adiposity that are associated with adverse cardiovascular events, type 2 diabetes and cancer independent of body mass index (BMI). WC and WHR are highly heritable with multiple susceptibility loci identified to date. We assessed the association between SNPs and BMI-adjusted WC and WHR and unadjusted WC in up to 57 412 individuals of European descent from 22 cohorts collaborating with the NHLBI's Candidate Gene Association Resource (CARe) project. The study population consisted of women and men aged 20–80 years. Study participants were genotyped using the ITMAT/Broad/CARE array, which includes ∼50 000 cosmopolitan tagged SNPs across ∼2100 cardiovascular-related genes. Each trait was modeled as a function of age, study site and principal components to control for population stratification, and we conducted a fixed-effects meta-analysis. No new loci for WC were observed. For WHR analyses, three novel loci were significantly associated (P < 2.4 × 10−6). Previously unreported rs2811337-G near TMCC1 was associated with increased WHR (β ± SE, 0.048 ± 0.008, P = 7.7 × 10−9) as was rs7302703-G in HOXC10 (β = 0.044 ± 0.008, P = 2.9 × 10−7) and rs936108-C in PEMT (β = 0.035 ± 0.007, P = 1.9 × 10−6). Sex-stratified analyses revealed two additional novel signals among females only, rs12076073-A in SHC1 (β = 0.10 ± 0.02, P = 1.9 × 10−6) and rs1037575-A in ATBDB4 (β = 0.046 ± 0.01, P = 2.2 × 10−6), supporting an already established sexual dimorphism of central adiposity-related genetic variants. Functional analysis using ENCODE and eQTL databases revealed that several of these loci are in regulatory regions or regions with differential expression in adipose tissue. PMID:24345515
Genenames.org: the HGNC and VGNC resources in 2017.
Yates, Bethan; Braschi, Bryony; Gray, Kristian A; Seal, Ruth L; Tweedie, Susan; Bruford, Elspeth A
2017-01-04
The HUGO Gene Nomenclature Committee (HGNC) based at the European Bioinformatics Institute (EMBL-EBI) assigns unique symbols and names to human genes. Currently the HGNC database contains almost 40 000 approved gene symbols, over 19 000 of which represent protein-coding genes. In addition to naming genomic loci we manually curate genes into family sets based on shared characteristics such as homology, function or phenotype. We have recently updated our gene family resources and introduced new improved visualizations which can be seen alongside our gene symbol reports on our primary website http://www.genenames.org In 2016 we expanded our remit and formed the Vertebrate Gene Nomenclature Committee (VGNC) which is responsible for assigning names to vertebrate species lacking a dedicated nomenclature group. Using the chimpanzee genome as a pilot project we have approved symbols and names for over 14 500 protein-coding genes in chimpanzee, and have developed a new website http://vertebrate.genenames.org to distribute these data. Here, we review our online data and resources, focusing particularly on the improvements and new developments made during the last two years. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Zhi, Hui; Li, Xin; Wang, Peng; Gao, Yue; Gao, Baoqing; Zhou, Dianshuang; Zhang, Yan; Guo, Maoni; Yue, Ming; Shen, Weitao; Ning, Shangwei; Jin, Lianhong; Li, Xia
2018-01-04
Lnc2Meth (http://www.bio-bigdata.com/Lnc2Meth/), an interactive resource to identify regulatory relationships between human long non-coding RNAs (lncRNAs) and DNA methylation, is not only a manually curated collection and annotation of experimentally supported lncRNAs-DNA methylation associations but also a platform that effectively integrates tools for calculating and identifying the differentially methylated lncRNAs and protein-coding genes (PCGs) in diverse human diseases. The resource provides: (i) advanced search possibilities, e.g. retrieval of the database by searching the lncRNA symbol of interest, DNA methylation patterns, regulatory mechanisms and disease types; (ii) abundant computationally calculated DNA methylation array profiles for the lncRNAs and PCGs; (iii) the prognostic values for each hit transcript calculated from the patients clinical data; (iv) a genome browser to display the DNA methylation landscape of the lncRNA transcripts for a specific type of disease; (v) tools to re-annotate probes to lncRNA loci and identify the differential methylation patterns for lncRNAs and PCGs with user-supplied external datasets; (vi) an R package (LncDM) to complete the differentially methylated lncRNAs identification and visualization with local computers. Lnc2Meth provides a timely and valuable resource that can be applied to significantly expand our understanding of the regulatory relationships between lncRNAs and DNA methylation in various human diseases. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Cerqueira, Gustavo C; Arnaud, Martha B; Inglis, Diane O; Skrzypek, Marek S; Binkley, Gail; Simison, Matt; Miyasato, Stuart R; Binkley, Jonathan; Orvis, Joshua; Shah, Prachi; Wymore, Farrell; Sherlock, Gavin; Wortman, Jennifer R
2014-01-01
The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome.
Yoo, Seong Yeon; Cho, Nam Soo; Park, Myung Jin; Seong, Ki Min; Hwang, Jung Ho; Song, Seok Bean; Han, Myun Soo; Lee, Won Tae; Chung, Ki Wha
2011-01-01
Genotyping of highly polymorphic short tandem repeat (STR) markers is widely used for the genetic identification of individuals in forensic DNA analyses and in paternity disputes. The National DNA Profile Databank recently established by the DNA Identification Act in Korea contains the computerized STR DNA profiles of individuals convicted of crimes. For the establishment of a large autosomal STR loci population database, 1805 samples were obtained at random from Korean individuals and 15 autosomal STR markers were analyzed using the AmpFlSTR Identifiler PCR Amplification kit. For the 15 autosomal STR markers, no deviations from the Hardy-Weinberg equilibrium were observed. The most informative locus in our data set was the D2S1338 with a discrimination power of 0.9699. The combined matching probability was 1.521 × 10-17. This large STR profile dataset including atypical alleles will be important for the establishment of the Korean DNA database and for forensic applications. PMID:21597912
Yoo, Seong Yeon; Cho, Nam Soo; Park, Myung Jin; Seong, Ki Min; Hwang, Jung Ho; Song, Seok Bean; Han, Myun Soo; Lee, Won Tae; Chung, Ki Wha
2011-07-01
Genotyping of highly polymorphic short tandem repeat (STR) markers is widely used for the genetic identification of individuals in forensic DNA analyses and in paternity disputes. The National DNA Profile Databank recently established by the DNA Identification Act in Korea contains the computerized STR DNA profiles of individuals convicted of crimes. For the establishment of a large autosomal STR loci population database, 1805 samples were obtained at random from Korean individuals and 15 autosomal STR markers were analyzed using the AmpFlSTR Identifiler PCR Amplification kit. For the 15 autosomal STR markers, no deviations from the Hardy-Weinberg equilibrium were observed. The most informative locus in our data set was the D2S1338 with a discrimination power of 0.9699. The combined matching probability was 1.521 × 10(-17). This large STR profile dataset including atypical alleles will be important for the establishment of the Korean DNA database and for forensic applications.
Chau, John H; Rahfeldt, Wolfgang A; Olmstead, Richard G
2018-03-01
Targeted sequence capture can be used to efficiently gather sequence data for large numbers of loci, such as single-copy nuclear loci. Most published studies in plants have used taxon-specific locus sets developed individually for a clade using multiple genomic and transcriptomic resources. General locus sets can also be developed from loci that have been identified as single-copy and have orthologs in large clades of plants. We identify and compare a taxon-specific locus set and three general locus sets (conserved ortholog set [COSII], shared single-copy nuclear [APVO SSC] genes, and pentatricopeptide repeat [PPR] genes) for targeted sequence capture in Buddleja (Scrophulariaceae) and outgroups. We evaluate their performance in terms of assembly success, sequence variability, and resolution and support of inferred phylogenetic trees. The taxon-specific locus set had the most target loci. Assembly success was high for all locus sets in Buddleja samples. For outgroups, general locus sets had greater assembly success. Taxon-specific and PPR loci had the highest average variability. The taxon-specific data set produced the best-supported tree, but all data sets showed improved resolution over previous non-sequence capture data sets. General locus sets can be a useful source of sequence capture targets, especially if multiple genomic resources are not available for a taxon.
Gill, Peter; Haned, Hinda; Bleka, Oyvind; Hansson, Oskar; Dørum, Guro; Egeland, Thore
2015-09-01
The introduction of Short Tandem Repeat (STR) DNA was a revolution within a revolution that transformed forensic DNA profiling into a tool that could be used, for the first time, to create National DNA databases. This transformation would not have been possible without the concurrent development of fluorescent automated sequencers, combined with the ability to multiplex several loci together. Use of the polymerase chain reaction (PCR) increased the sensitivity of the method to enable the analysis of a handful of cells. The first multiplexes were simple: 'the quad', introduced by the defunct UK Forensic Science Service (FSS) in 1994, rapidly followed by a more discriminating 'six-plex' (Second Generation Multiplex) in 1995 that was used to create the world's first national DNA database. The success of the database rapidly outgrew the functionality of the original system - by the year 2000 a new multiplex of ten-loci was introduced to reduce the chance of adventitious matches. The technology was adopted world-wide, albeit with different loci. The political requirement to introduce pan-European databases encouraged standardisation - the development of European Standard Set (ESS) of markers comprising twelve-loci is the latest iteration. Although development has been impressive, the methods used to interpret evidence have lagged behind. For example, the theory to interpret complex DNA profiles (low-level mixtures), had been developed fifteen years ago, but only in the past year or so, are the concepts starting to be widely adopted. A plethora of different models (some commercial and others non-commercial) have appeared. This has led to a confusing 'debate' about the 'best' to use. The different models available are described along with their advantages and disadvantages. A section discusses the development of national DNA databases, along with details of an associated controversy to estimate the strength of evidence of matches. Current methodology is limited to searches of complete profiles - another example where the interpretation of matches has not kept pace with development of theory. STRs have also transformed the area of Disaster Victim Identification (DVI) which frequently requires kinship analysis. However, genotyping efficiency is complicated by complex, degraded DNA profiles. Finally, there is now a detailed understanding of the causes of stochastic effects that cause DNA profiles to exhibit the phenomena of drop-out and drop-in, along with artefacts such as stutters. The phenomena discussed include: heterozygote balance; stutter; degradation; the effect of decreasing quantities of DNA; the dilution effect. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
The Pharmacogenetics of Type 2 Diabetes: A Systematic Review
Maruthur, Nisa M.; Gribble, Matthew O.; Bennett, Wendy L.; Bolen, Shari; Wilson, Lisa M.; Balakrishnan, Poojitha; Sahu, Anita; Bass, Eric; Kao, W.H. Linda; Clark, Jeanne M.
2014-01-01
OBJECTIVE We performed a systematic review to identify which genetic variants predict response to diabetes medications. RESEARCH DESIGN AND METHODS We performed a search of electronic databases (PubMed, EMBASE, and Cochrane Database) and a manual search to identify original, longitudinal studies of the effect of diabetes medications on incident diabetes, HbA1c, fasting glucose, and postprandial glucose in prediabetes or type 2 diabetes by genetic variation. Two investigators reviewed titles, abstracts, and articles independently. Two investigators abstracted data sequentially and evaluated study quality independently. Quality evaluations were based on the Strengthening the Reporting of Genetic Association Studies guidelines and Human Genome Epidemiology Network guidance. RESULTS Of 7,279 citations, we included 34 articles (N = 10,407) evaluating metformin (n = 14), sulfonylureas (n = 4), repaglinide (n = 8), pioglitazone (n = 3), rosiglitazone (n = 4), and acarbose (n = 4). Studies were not standalone randomized controlled trials, and most evaluated patients with diabetes. Significant medication–gene interactions for glycemic outcomes included 1) metformin and the SLC22A1, SLC22A2, SLC47A1, PRKAB2, PRKAA2, PRKAA1, and STK11 loci; 2) sulfonylureas and the CYP2C9 and TCF7L2 loci; 3) repaglinide and the KCNJ11, SLC30A8, NEUROD1/BETA2, UCP2, and PAX4 loci; 4) pioglitazone and the PPARG2 and PTPRD loci; 5) rosiglitazone and the KCNQ1 and RBP4 loci; and 5) acarbose and the PPARA, HNF4A, LIPC, and PPARGC1A loci. Data were insufficient for meta-analysis. CONCLUSIONS We found evidence of pharmacogenetic interactions for metformin, sulfonylureas, repaglinide, thiazolidinediones, and acarbose consistent with their pharmacokinetics and pharmacodynamics. While high-quality controlled studies with prespecified analyses are still lacking, our results bring the promise of personalized medicine in diabetes one step closer to fruition. PMID:24558078
Sequence capture of ultraconserved elements from bird museum specimens.
McCormack, John E; Tsai, Whitney L E; Faircloth, Brant C
2016-09-01
New DNA sequencing technologies are allowing researchers to explore the genomes of the millions of natural history specimens collected prior to the molecular era. Yet, we know little about how well specific next-generation sequencing (NGS) techniques work with the degraded DNA typically extracted from museum specimens. Here, we use one type of NGS approach, sequence capture of ultraconserved elements (UCEs), to collect data from bird museum specimens as old as 120 years. We targeted 5060 UCE loci in 27 western scrub-jays (Aphelocoma californica) representing three evolutionary lineages that could be species, and we collected an average of 3749 UCE loci containing 4460 single nucleotide polymorphisms (SNPs). Despite older specimens producing fewer and shorter loci in general, we collected thousands of markers from even the oldest specimens. More sequencing reads per individual helped to boost the number of UCE loci we recovered from older specimens, but more sequencing was not as successful at increasing the length of loci. We detected contamination in some samples and determined that contamination was more prevalent in older samples that were subject to less sequencing. For the phylogeny generated from concatenated UCE loci, contamination led to incorrect placement of some individuals. In contrast, a species tree constructed from SNPs called within UCE loci correctly placed individuals into three monophyletic groups, perhaps because of the stricter analytical procedures used for SNP calling. This study and other recent studies on the genomics of museum specimens have profound implications for natural history collections, where millions of older specimens should now be considered genomic resources. © 2015 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
Genomic resources in fruit plants: an assessment of current status.
Rai, Manoj K; Shekhawat, N S
2015-01-01
The availability of many genomic resources such as genome sequences, functional genomics resources including microarrays and RNA-seq, sufficient numbers of molecular markers, express sequence tags (ESTs) and high-density genetic maps is causing a rapid acceleration of genetics and genomic research of many fruit plants. This is leading to an increase in our knowledge of the genes that are linked to many horticultural and agronomically important traits. Recently, some progress has also been made on the identification and functional analysis of miRNAs in some fruit plants. This is one of the most active research fields in plant sciences. The last decade has witnessed development of genomic resources in many fruit plants such as apple, banana, citrus, grapes, papaya, pears, strawberry etc.; however, many of them are still not being exploited. Furthermore, owing to lack of resources, infrastructure and research facilities in many lesser-developed countries, development of genomic resources in many underutilized or less-studied fruit crops, which grow in these countries, is limited. Thus, research emphasis should be given to those fruit crops for which genomic resources are relatively scarce. The development of genomic databases of these less-studied fruit crops will enable biotechnologists to identify target genes that underlie key horticultural and agronomical traits. This review presents an overview of the current status of the development of genomic resources in fruit plants with the main emphasis being on genome sequencing, EST resources, functional genomics resources including microarray and RNA-seq, identification of quantitative trait loci and construction of genetic maps as well as efforts made on the identification and functional analysis of miRNAs in fruit plants.
Song, Xuhao; Shen, Fujun; Huang, Jie; Huang, Yan; Du, Lianming; Wang, Chengdong; Fan, Zhenxin; Hou, Rong; Yue, Bisong; Zhang, Xiuyue
2016-09-01
Recently, an increasing number of microsatellites or simple sequence repeats (SSRs) have been found and characterized from transcriptomes. Such SSRs can be employed as putative functional markers to easily tag corresponding genes, which play an important role in biomedical studies and genetic analysis. However, the transcriptome-derived SSRs for giant panda (Ailuropoda melanoleuca) are not yet available. In this work, we identified and characterized 20 tetranucleotide microsatellite loci from a transcript database generated from the blood of giant panda. Furthermore, we assigned their predicted transcriptome locations: 16 loci were assigned to untranslated regions (UTRs) and 4 loci were assigned to coding regions (CDSs). Gene identities of 14 transcripts contained corresponding microsatellites were determined, which provide useful information to study the potential contribution of SSRs to gene regulation in giant panda. The polymorphic information content (PIC) values ranged from 0.293 to 0.789 with an average of 0.603 for the 16 UTRs-derived SSRs. Interestingly, 4 CDS-derived microsatellites developed in our study were also polymorphic, and the instability of these 4 CDS-derived SSRs was further validated by re-genotyping and sequencing. The genes containing these 4 CDS-derived SSRs were embedded with various types of repeat motifs. The interaction of all the length-changing SSRs might provide a way against coding region frameshift caused by microsatellite instability. We hope these newly gene-associated biomarkers will pave the way for genetic and biomedical studies for giant panda in the future. In sum, this set of transcriptome-derived markers complements the genetic resources available for giant panda. © The American Genetic Association. 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Markers and mapping revisited: finding your gene.
Jones, Neil; Ougham, Helen; Thomas, Howard; Pasakinskiene, Izolda
2009-01-01
This paper is an update of our earlier review (Jones et al., 1997, Markers and mapping: we are all geneticists now. New Phytologist 137: 165-177), which dealt with the genetics of mapping, in terms of recombination as the basis of the procedure, and covered some of the first generation of markers, including restriction fragment length polymorphisms (RFLPs), random amplified polymorphic DNA (RAPDs), simple sequence repeats (SSRs) and quantitative trait loci (QTLs). In the intervening decade there have been numerous developments in marker science with many new systems becoming available, which are herein described: cleavage amplification polymorphism (CAP), sequence-specific amplification polymorphism (S-SAP), inter-simple sequence repeat (ISSR), sequence tagged site (STS), sequence characterized amplification region (SCAR), selective amplification of microsatellite polymorphic loci (SAMPL), single nucleotide polymorphism (SNP), expressed sequence tag (EST), sequence-related amplified polymorphism (SRAP), target region amplification polymorphism (TRAP), microarrays, diversity arrays technology (DArT), single-strand conformation polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE) and methylation-sensitive PCR. In addition there has been an explosion of knowledge and databases in the area of genomics and bioinformatics. The number of flowering plant ESTs is c. 19 million and counting, with all the opportunity that this provides for gene-hunting, while the survey of bioinformatics and computer resources points to a rapid growth point for future activities in unravelling and applying the burst of new information on plant genomes. A case study is presented on tracking down a specific gene (stay-green (SGR), a post-transcriptional senescence regulator) using the full suite of mapping tools and comparative mapping resources. We end with a brief speculation on how genome analysis may progress into the future of this highly dynamic arena of plant science.
Bubier, Jason A.; Jay, Jeremy J.; Baker, Christopher L.; Bergeson, Susan E.; Ohno, Hiroshi; Metten, Pamela; Crabbe, John C.; Chesler, Elissa J.
2014-01-01
Extensive genetic and genomic studies of the relationship between alcohol drinking preference and withdrawal severity have been performed using animal models. Data from multiple such publications and public data resources have been incorporated in the GeneWeaver database with >60,000 gene sets including 285 alcohol withdrawal and preference-related gene sets. Among these are evidence for positional candidates regulating these behaviors in overlapping quantitative trait loci (QTL) mapped in distinct mouse populations. Combinatorial integration of functional genomics experimental results revealed a single QTL positional candidate gene in one of the loci common to both preference and withdrawal. Functional validation studies in Ap3m2 knockout mice confirmed these relationships. Genetic validation involves confirming the existence of segregating polymorphisms that could account for the phenotypic effect. By exploiting recent advances in mouse genotyping, sequence, epigenetics, and phylogeny resources, we confirmed that Ap3m2 resides in an appropriately segregating genomic region. We have demonstrated genetic and alcohol-induced regulation of Ap3m2 expression. Although sequence analysis revealed no polymorphisms in the Ap3m2-coding region that could account for all phenotypic differences, there are several upstream SNPs that could. We have identified one of these to be an H3K4me3 site that exhibits strain differences in methylation. Thus, by making cross-species functional genomics readily computable we identified a common QTL candidate for two related bio-behavioral processes via functional evidence and demonstrate sufficiency of the genetic locus as a source of variation underlying two traits. PMID:24923803
Bubier, Jason A; Jay, Jeremy J; Baker, Christopher L; Bergeson, Susan E; Ohno, Hiroshi; Metten, Pamela; Crabbe, John C; Chesler, Elissa J
2014-08-01
Extensive genetic and genomic studies of the relationship between alcohol drinking preference and withdrawal severity have been performed using animal models. Data from multiple such publications and public data resources have been incorporated in the GeneWeaver database with >60,000 gene sets including 285 alcohol withdrawal and preference-related gene sets. Among these are evidence for positional candidates regulating these behaviors in overlapping quantitative trait loci (QTL) mapped in distinct mouse populations. Combinatorial integration of functional genomics experimental results revealed a single QTL positional candidate gene in one of the loci common to both preference and withdrawal. Functional validation studies in Ap3m2 knockout mice confirmed these relationships. Genetic validation involves confirming the existence of segregating polymorphisms that could account for the phenotypic effect. By exploiting recent advances in mouse genotyping, sequence, epigenetics, and phylogeny resources, we confirmed that Ap3m2 resides in an appropriately segregating genomic region. We have demonstrated genetic and alcohol-induced regulation of Ap3m2 expression. Although sequence analysis revealed no polymorphisms in the Ap3m2-coding region that could account for all phenotypic differences, there are several upstream SNPs that could. We have identified one of these to be an H3K4me3 site that exhibits strain differences in methylation. Thus, by making cross-species functional genomics readily computable we identified a common QTL candidate for two related bio-behavioral processes via functional evidence and demonstrate sufficiency of the genetic locus as a source of variation underlying two traits. Copyright © 2014 by the Genetics Society of America.
DNA typing in forensic medicine and in criminal investigations: a current survey.
Benecke, M
1997-05-01
Since 1985 DNA typing of biological material has become one of the most powerful tools for personal identification in forensic medicine and in criminal investigations [1-6]. Classical DNA "fingerprinting" is increasingly being replaced by polymerase chain reaction (PCR) based technology which detects very short polymorphic stretches of DNA [7-15]. DNA loci which forensic scientists study do not code for proteins, and they are spread over the whole genome [16, 17]. These loci are neutral, and few provide any information about individuals except for their identity. Minute amounts of biological material are sufficient for DNA typing. Many European countries are beginning to establish databases to store DNA profiles of crime scenes and known offenders. A brief overview is given of past and present DNA typing and the establishment of forensic DNA databases in Europe.
DNA typing in forensic medicine and in criminal investigations: a current survey
NASA Astrophysics Data System (ADS)
Benecke, Mark
Since 1985 DNA typing of biological material has become one of the most powerful tools for personal identification in forensic medicine and in criminal investigations [1-6]. Classical DNA "fingerprinting" is increasingly being replaced by polymerase chain reaction (PCR) based technology which detects very short polymorphic stretches of DNA [7-15]. DNA loci which forensic scientists study do not code for proteins, and they are spread over the whole genome [16, 17]. These loci are neutral, and few provide any information about individuals except for their identity. Minute amounts of biological material are sufficient for DNA typing. Many European countries are beginning to establish databases to store DNA profiles of crime scenes and known offenders. A brief overview is given of past and present DNA typing and the establishment of forensic DNA databases in Europe.
The insertional history of an active family of L1 retrotransposons in humans.
Boissinot, Stéphane; Entezam, Ali; Young, Lynn; Munson, Peter J; Furano, Anthony V
2004-07-01
As humans contain a currently active L1 (LINE-1) non-LTR retrotransposon family (Ta-1), the human genome database likely provides only a partial picture of Ta-1-generated diversity. We used a non-biased method to clone Ta-1 retrotransposon-containing loci from representatives of four ethnic populations. We obtained 277 distinct Ta-1 loci and identified an additional 67 loci in the human genome database. This collection represents approximately 90% of the Ta-1 population in the individuals examined and is thus more representative of the insertional history of Ta-1 than the human genome database, which lacked approximately 40% of our cloned Ta-1 elements. As both polymorphic and fixed Ta-1 elements are as abundant in the GC-poor genomic regions as in ancestral L1 elements, the enrichment of L1 elements in GC-poor areas is likely due to insertional bias rather than selection. Although the chromosomal distribution of Ta-1 inserts is generally a function of chromosomal length and gene density, chromosome 4 significantly deviates from this pattern and has been much more hospitable to Ta-1 insertions than any other chromosome. Also, the intra-chromosomal distribution of Ta-1 elements is not uniform. Ta-1 elements tend to cluster, and the maximal gaps between Ta-1 inserts are larger than would be expected from a model of uniform random insertion. Copyright 2004 Cold Spring Harbor Laboratory Press ISSN
Huel, René L. M.; Bašić, Lara; Madacki-Todorović, Kamelija; Smajlović, Lejla; Eminović, Izet; Berbić, Irfan; Miloš, Ana; Parsons, Thomas J.
2007-01-01
Aim To present a compendium of off-ladder alleles and other genotyping irregularities relating to rare/unexpected population genetic variation, observed in a large short tandem repeat (STR) database from Bosnia and Serbia. Methods DNA was extracted from blood stain cards relating to reference samples from a population of 32 800 individuals from Bosnia and Serbia, and typed using Promega’s PowerPlex®16 STR kit. Results There were 31 distinct off-ladder alleles were observed in 10 of the 15 STR loci amplified from the PowerPlex®16 STR kit. Of these 31 alleles, 3 have not been previously reported. Furthermore, 16 instances of triallelic patterns were observed in 9 of the 15 loci. Primer binding site mismatches that affected amplification were observed in two loci, D5S818 and D8S1179. Conclusion Instances of deviations from manufacturer’s allelic ladders should be expected and caution taken to properly designate the correct alleles in large DNA databases. Particular care should be taken in kinship matching or paternity cases as incorrect designation of any of these deviations from allelic ladders could lead to false exclusions. PMID:17696304
Sakai, Hiroaki; Lee, Sung Shin; Tanaka, Tsuyoshi; Numa, Hisataka; Kim, Jungsok; Kawahara, Yoshihiro; Wakimoto, Hironobu; Yang, Ching-chia; Iwamoto, Masao; Abe, Takashi; Yamada, Yuko; Muto, Akira; Inokuchi, Hachiro; Ikemura, Toshimichi; Matsumoto, Takashi; Sasaki, Takuji; Itoh, Takeshi
2013-02-01
The Rice Annotation Project Database (RAP-DB, http://rapdb.dna.affrc.go.jp/) has been providing a comprehensive set of gene annotations for the genome sequence of rice, Oryza sativa (japonica group) cv. Nipponbare. Since the first release in 2005, RAP-DB has been updated several times along with the genome assembly updates. Here, we present our newest RAP-DB based on the latest genome assembly, Os-Nipponbare-Reference-IRGSP-1.0 (IRGSP-1.0), which was released in 2011. We detected 37,869 loci by mapping transcript and protein sequences of 150 monocot species. To provide plant researchers with highly reliable and up to date rice gene annotations, we have been incorporating literature-based manually curated data, and 1,626 loci currently incorporate literature-based annotation data, including commonly used gene names or gene symbols. Transcriptional activities are shown at the nucleotide level by mapping RNA-Seq reads derived from 27 samples. We also mapped the Illumina reads of a Japanese leading japonica cultivar, Koshihikari, and a Chinese indica cultivar, Guangluai-4, to the genome and show alignments together with the single nucleotide polymorphisms (SNPs) and gene functional annotations through a newly developed browser, Short-Read Assembly Browser (S-RAB). We have developed two satellite databases, Plant Gene Family Database (PGFD) and Integrative Database of Cereal Gene Phylogeny (IDCGP), which display gene family and homologous gene relationships among diverse plant species. RAP-DB and the satellite databases offer simple and user-friendly web interfaces, enabling plant and genome researchers to access the data easily and facilitating a broad range of plant research topics.
Xanthopoulou, Aliki; Ganopoulos, Ioannis; Psomopoulos, Fotis; Manioudaki, Maria; Moysiadis, Theodoros; Kapazoglou, Aliki; Osathanunkul, Maslin; Michailidou, Sofia; Kalivas, Apostolos; Tsaftaris, Athanasios; Nianiou-Obeidat, Irini; Madesis, Panagiotis
2017-07-30
The genetic basis of fruit size and shape was investigated for the first time in Cucurbita species and genetic loci associated with fruit morphology have been identified. Although extensive genomic resources are available at present for tomato (Solanum lycopersicum), cucumber (Cucumis sativus), melon (Cucumis melo) and watermelon (Citrullus lanatus), genomic databases for Cucurbita species are limited. Recently, our group reported the generation of pumpkin (Cucurbita pepo) transcriptome databases from two contrasting cultivars with extreme fruit sizes. In the current study we used these databases to perform comparative transcriptome analysis in order to identify genes with potential roles in fruit morphology and fruit size. Differential Gene Expression (DGE) analysis between cv. 'Munchkin' (small-fruit) and cv. 'Big Moose' (large-fruit) revealed a variety of candidate genes associated with fruit morphology with significant differences in gene expression between the two cultivars. In addition, we have set the framework for generating EST-SSR markers, which discriminate different C. pepo cultivars and show transferability to related Cucurbitaceae species. The results of the present study will contribute to both further understanding the molecular mechanisms regulating fruit morphology and furthermore identifying the factors that determine fruit size. Moreover, they may lead to the development of molecular marker tools for selecting genotypes with desired morphological traits. Copyright © 2017. Published by Elsevier B.V.
10. international mouse genome conference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meisler, M.H.
Ten years after hosting the First International Mammalian Genome Conference in Paris in 1986, Dr. Jean-Louis Guenet presided over the Tenth Conference at the Pasteur Institute, October 7--10, 1996. The 1986 conference was a satellite to the Human Gene Mapping Workshop and had approximately 50 attendees. The 1996 meeting was attended by 300 scientists from around the world. In the interim, the number of mapped loci in the mouse increased from 1,000 to over 20,000. This report contains a listing of the program and its participants, and two articles that review the meeting and the role of the laboratory mousemore » in the Human Genome project. More than 200 papers were presented at the conference covering the following topics: International mouse chromosome committee meetings; Mutant generation and identification; Physical and genetic maps; New technology and resources; Chromatin structure and gene regulation; Rate and hamster genetic maps; Informatics and databases; and Quantitative trait analysis.« less
Warburton, Marilyn L; Williams, William Paul; Hawkins, Leigh; Bridges, Susan; Gresham, Cathy; Harper, Jonathan; Ozkan, Seval; Mylroie, J Erik; Shan, Xueyan
2011-07-01
A public candidate gene testing pipeline for resistance to aflatoxin accumulation or Aspergillus flavus infection in maize is presented here. The pipeline consists of steps for identifying, testing, and verifying the association of selected maize gene sequences with resistance under field conditions. Resources include a database of genetic and protein sequences associated with the reduction in aflatoxin contamination from previous studies; eight diverse inbred maize lines for polymorphism identification within any maize gene sequence; four Quantitative Trait Loci (QTL) mapping populations and one association mapping panel, all phenotyped for aflatoxin accumulation resistance and associated phenotypes; and capacity for Insertion/Deletion (InDel) and SNP genotyping in the population(s) for mapping. To date, ten genes have been identified as possible candidate genes and put through the candidate gene testing pipeline, and results are presented here to demonstrate the utility of the pipeline.
Haplotype data for 23 Y-chromosome markers in a reference sample from Bosnia and Herzegovina.
Kovačević, Lejla; Fatur-Cerić, Vera; Hadzic, Negra; Čakar, Jasmina; Primorac, Dragan; Marjanović, Damir
2013-06-01
To detect polymorphisms of 23 Y-chromosomal short tandem repeat (STR) loci, including 6 new loci, in a reference database of male population of Bosnia and Herzegovina, as well as to assess the importance of increasing the number of Y-STR loci utilized in forensic DNA analysis. The reference sample consisted of 100 healthy, unrelated men originating from Bosnia and Herzegovina. Sample collection using buccal swabs was performed in all geographical regions of Bosnia and Herzegovina in the period from 2010 to 2011. DNA samples were typed for 23 Y STR loci, including 6 new loci: DYS576, DYS481, DYS549, DYS533, DYS570, and DYS643, which are included in the new PowerPlex® Y 23 amplification kit. The absolute frequency of generated haplotypes was calculated and results showed that 98 samples had unique Y 23 haplotypes, and that only two samples shared the same haplotype. The most polymorphic locus was DYS418, with 14 detected alleles and the least polymorphic loci were DYS389I, DYS391, DYS437, and DYS393. This study showed that by increasing the number of highly polymorphic Y STR markers, to include those tested in our analysis, leads to a reduction of repeating haplotypes, which is very important in the application of forensic DNA analysis.
Mutation rates for 20 STR loci in a population from São Paulo state, Southeast, Brazil.
Martinez, Juliana; Braganholi, Danilo Faustino; Ambrósio, Isabela Brunelli; Polverari, Fernanda Silva; Cicarelli, Regina Maria Barretto
2017-11-01
Short tandem repeats (STRs) are genetic markers largely employed in forensic analysis and paternity investigation cases. When an inconsistency between the parent and child is considered as a possible mutation, the mutation rate should be incorporated into paternity index calculations to give a robust result and to reduce the chance of misinterpretation. The aim of this study was to estimate the mutation rates of 20 autosomal STRs loci used for paternity tests. In these loci we analysed 29,831 parent-child allelic transfers from 929 duo or trio paternity tests carried out during 2012?2016 from São Paulo State, Brazil. We identified 35 mutations in 16 loci, and they were more frequent in the paternal germline compared to the maternal germline. The loci with the highest rate were vWA and FGA and the ones with the lowest rate were PENTA E, PENTA D, D21S11, D7S820 and D6S1043. We did not identified any mutation in D2S1338, TH01, TPOX and D16S539 loci. All mutations consisted of losses or gains of one repeat unit. Mutation rates found in the São Paulo population have peculiarities, which justifies the use of regional databases in laboratories.
Liu, Luxian; Jin, Xinjie; Chen, Nan; Li, Xian; Li, Pan; Fu, Chengxin
2015-01-01
Phylogenetic relationships among Chinese species of Morella (Myricaceae) are unresolved. Here, we use restriction site-associated DNA sequencing (RAD-seq) to identify candidate loci that will help in determining phylogenetic relationships among Morella rubra, M. adenophora, M. nana and M. esculenta. Three methods for inferring phylogeny, maximum parsimony (MP), maximum likelihood (ML) and Bayesian concordance, were applied to data sets including as many as 4253 RAD loci with 8360 parsimony informative variable sites. All three methods significantly favored the topology of (((M. rubra, M. adenophora), M. nana), M. esculenta). Two species from North America (M. cerifera and M. pensylvanica) were placed as sister to the four Chinese species. According to BEAST analysis, we deduced speciation of M. rubra to be at about the Miocene-Pliocene boundary (5.28 Ma). Intraspecific divergence in M. rubra occurred in the late Pliocene (3.39 Ma). From pooled data, we assembled 29378, 21902 and 23552 de novo contigs with an average length of 229, 234 and 234 bp for M. rubra, M. nana and M. esculenta respectively. The contigs were used to investigate functional classification of RAD tags in a BLASTX search. Additionally, we identified 3808 unlinked SNP sites across the four populations of M. rubra and discovered genes associated with fruit ripening and senescence, fruit quality and disease/defense metabolism based on KEGG database. PMID:26431030
Genome wide association mapping for grain shape traits in indica rice.
Feng, Yue; Lu, Qing; Zhai, Rongrong; Zhang, Mengchen; Xu, Qun; Yang, Yaolong; Wang, Shan; Yuan, Xiaoping; Yu, Hanyong; Wang, Yiping; Wei, Xinghua
2016-10-01
Using genome-wide association mapping, 47 SNPs within 27 significant loci were identified for four grain shape traits, and 424 candidate genes were predicted from public database. Grain shape is a key determinant of grain yield and quality in rice (Oryza sativa L.). However, our knowledge of genes controlling rice grain shape remains limited. Genome-wide association mapping based on linkage disequilibrium (LD) has recently emerged as an effective approach for identifying genes or quantitative trait loci (QTL) underlying complex traits in plants. In this study, association mapping based on 5291 single nucleotide polymorphisms (SNPs) was conducted to identify significant loci associated with grain shape traits in a global collection of 469 diverse rice accessions. A total of 47 SNPs were located in 27 significant loci for four grain traits, and explained ~44.93-65.90 % of the phenotypic variation for each trait. In total, 424 candidate genes within a 200 kb extension region (±100 kb of each locus) of these loci were predicted. Of them, the cloned genes GS3 and qSW5 showed very strong effects on grain length and grain width in our study. Comparing with previously reported QTLs for grain shape traits, we found 11 novel loci, including 3, 3, 2 and 3 loci for grain length, grain width, grain length-width ratio and thousand grain weight, respectively. Validation of these new loci would be performed in the future studies. These results revealed that besides GS3 and qSW5, multiple novel loci and mechanisms were involved in determining rice grain shape. These findings provided valuable information for understanding of the genetic control of grain shape and molecular marker assistant selection (MAS) breeding in rice.
Exploring Genetic, Genomic, and Phenotypic Data at the Rat Genome Database
Laulederkind, Stanley J. F.; Hayman, G. Thomas; Wang, Shur-Jen; Lowry, Timothy F.; Nigam, Rajni; Petri, Victoria; Smith, Jennifer R.; Dwinell, Melinda R.; Jacob, Howard J.; Shimoyama, Mary
2013-01-01
The laboratory rat, Rattus norvegicus, is an important model of human health and disease, and experimental findings in the rat have relevance to human physiology and disease. The Rat Genome Database (RGD, http://rgd.mcw.edu) is a model organism database that provides access to a wide variety of curated rat data including disease associations, phenotypes, pathways, molecular functions, biological processes and cellular components for genes, quantitative trait loci, and strains. We present an overview of the database followed by specific examples that can be used to gain experience in employing RGD to explore the wealth of functional data available for the rat. PMID:23255149
Glenn, Travis C; Lance, Stacey L; McKee, Anna M; Webster, Bonnie L; Emery, Aidan M; Zerlotini, Adhemar; Oliveira, Guilherme; Rollinson, David; Faircloth, Brant C
2013-10-17
Urogenital schistosomiasis caused by Schistosoma haematobium is widely distributed across Africa and is increasingly being targeted for control. Genome sequences and population genetic parameters can give insight into the potential for population- or species-level drug resistance. Microsatellite DNA loci are genetic markers in wide use by Schistosoma researchers, but there are few primers available for S. haematobium. We sequenced 1,058,114 random DNA fragments from clonal cercariae collected from a snail infected with a single Schistosoma haematobium miracidium. We assembled and aligned the S. haematobium sequences to the genomes of S. mansoni and S. japonicum, identifying microsatellite DNA loci across all three species and designing primers to amplify the loci in S. haematobium. To validate our primers, we screened 32 randomly selected primer pairs with population samples of S. haematobium. We designed >13,790 primer pairs to amplify unique microsatellite loci in S. haematobium, (available at http://www.cebio.org/projetos/schistosoma-haematobium-genome). The three Schistosoma genomes contained similar overall frequencies of microsatellites, but the frequency and length distributions of specific motifs differed among species. We identified 15 primer pairs that amplified consistently and were easily scored. We genotyped these 15 loci in S. haematobium individuals from six locations: Zanzibar had the highest levels of diversity; Malawi, Mauritius, Nigeria, and Senegal were nearly as diverse; but the sample from South Africa was much less diverse. About half of the primers in the database of Schistosoma haematobium microsatellite DNA loci should yield amplifiable and easily scored polymorphic markers, thus providing thousands of potential markers. Sequence conservation among S. haematobium, S. japonicum, and S. mansoni is relatively high, thus it should now be possible to identify markers that are universal among Schistosoma species (i.e., using DNA sequences conserved among species), as well as other markers that are specific to species or species-groups (i.e., using DNA sequences that differ among species). Full genome-sequencing of additional species and specimens of S. haematobium, S. japonicum, and S. mansoni is desirable to better characterize differences within and among these species, to develop additional genetic markers, and to examine genes as well as conserved non-coding elements associated with drug resistance.
An Enhanced Linkage Map of the Sheep Genome Comprising More Than 1000 Loci
Maddox, Jillian F.; Davies, Kizanne P.; Crawford, Allan M.; Hulme, Dennis J.; Vaiman, Daniel; Cribiu, Edmond P.; Freking, Bradley A.; Beh, Ken J.; Cockett, Noelle E.; Kang, Nina; Riffkin, Christopher D.; Drinkwater, Roger; Moore, Stephen S.; Dodds, Ken G.; Lumsden, Joanne M.; van Stijn, Tracey C.; Phua, Sin H.; Adelson, David L.; Burkin, Heather R.; Broom, Judith E.; Buitkamp, Johannes; Cambridge, Lisa; Cushwa, William T.; Gerard, Emily; Galloway, Susan M.; Harrison, Blair; Hawken, Rachel J.; Hiendleder, Stefan; Henry, Hannah M.; Medrano, Juan F.; Paterson, Korena A.; Schibler, Laurent; Stone, Roger T.; van Hest, Beryl
2001-01-01
A medium-density linkage map of the ovine genome has been developed. Marker data for 550 new loci were generated and merged with the previous sheep linkage map. The new map comprises 1093 markers representing 1062 unique loci (941 anonymous loci, 121 genes) and spans 3500 cM (sex-averaged) for the autosomes and 132 cM (female) on the X chromosome. There is an average spacing of 3.4 cM between autosomal loci and 8.3 cM between highly polymorphic [polymorphic information content (PIC) ≥ 0.7] autosomal loci. The largest gap between markers is 32.5 cM, and the number of gaps of >20 cM between loci, or regions where loci are missing from chromosome ends, has been reduced from 40 in the previous map to 6. Five hundred and seventy-three of the loci can be ordered on a framework map with odds of >1000 : 1. The sheep linkage map contains strong links to both the cattle and goat maps. Five hundred and seventy-two of the loci positioned on the sheep linkage map have also been mapped by linkage analysis in cattle, and 209 of the loci mapped on the sheep linkage map have also been placed on the goat linkage map. Inspection of ruminant linkage maps indicates that the genomic coverage by the current sheep linkage map is comparable to that of the available cattle maps. The sheep map provides a valuable resource to the international sheep, cattle, and goat gene mapping community. PMID:11435411
Feng, Chunmei; Wang, Xin; Wang, Xiaolong; Yu, Hao; Zhang, Guohua
2018-03-01
We investigated the frequencies of 15 autosomal STR loci in the Kazak population of the Ili Kazak Autonomous Prefecture with the aim of expanding the available population information in human genetic databases and for forensic DNA analysis. Genetic polymorphisms of 15 autosomal short tandem repeat (STR) loci were analysed in 456 individuals of the Kazak population from Ili Kazakh Autonomous Prefecture, northwestern China. A total of 173 alleles at 15 autosomal STR loci were found; the allele frequencies ranged from 0.5022-0.0011. The combined power of discrimination and exclusion statistics for the 15 STR loci were 0.999 999 999 85 and 0.999 998 800 65, respectively. In addition, phylogenetic analysis involving the Ili Uygur population and other relevant populations was carried out. A neighbour-joining tree and multidimensional scaling plot were generated based on Nei's standard genetic distance. Results of the population comparison indicated that the Ili Uygur population was most closely related genetically to the Uygur populations from other regions in China. These findings are consistent with the historical and geographic backgrounds of these populations.
Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize
2010-01-01
Background Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. Results In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. Conclusions CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu. PMID:20946609
Integrated database for identifying candidate genes for Aspergillus flavus resistance in maize.
Kelley, Rowena Y; Gresham, Cathy; Harper, Jonathan; Bridges, Susan M; Warburton, Marilyn L; Hawkins, Leigh K; Pechanova, Olga; Peethambaran, Bela; Pechan, Tibor; Luthe, Dawn S; Mylroie, J E; Ankala, Arunkanth; Ozkan, Seval; Henry, W B; Williams, W P
2010-10-07
Aspergillus flavus Link:Fr, an opportunistic fungus that produces aflatoxin, is pathogenic to maize and other oilseed crops. Aflatoxin is a potent carcinogen, and its presence markedly reduces the value of grain. Understanding and enhancing host resistance to A. flavus infection and/or subsequent aflatoxin accumulation is generally considered an efficient means of reducing grain losses to aflatoxin. Different proteomic, genomic and genetic studies of maize (Zea mays L.) have generated large data sets with the goal of identifying genes responsible for conferring resistance to A. flavus, or aflatoxin. In order to maximize the usage of different data sets in new studies, including association mapping, we have constructed a relational database with web interface integrating the results of gene expression, proteomic (both gel-based and shotgun), Quantitative Trait Loci (QTL) genetic mapping studies, and sequence data from the literature to facilitate selection of candidate genes for continued investigation. The Corn Fungal Resistance Associated Sequences Database (CFRAS-DB) (http://agbase.msstate.edu/) was created with the main goal of identifying genes important to aflatoxin resistance. CFRAS-DB is implemented using MySQL as the relational database management system running on a Linux server, using an Apache web server, and Perl CGI scripts as the web interface. The database and the associated web-based interface allow researchers to examine many lines of evidence (e.g. microarray, proteomics, QTL studies, SNP data) to assess the potential role of a gene or group of genes in the response of different maize lines to A. flavus infection and subsequent production of aflatoxin by the fungus. CFRAS-DB provides the first opportunity to integrate data pertaining to the problem of A. flavus and aflatoxin resistance in maize in one resource and to support queries across different datasets. The web-based interface gives researchers different query options for mining the database across different types of experiments. The database is publically available at http://agbase.msstate.edu.
Van Neste, Christophe; Vandewoestyne, Mado; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip
2014-03-01
Forensic scientists are currently investigating how to transition from capillary electrophoresis (CE) to massive parallel sequencing (MPS) for analysis of forensic DNA profiles. MPS offers several advantages over CE such as virtually unlimited multiplexy of loci, combining both short tandem repeat (STR) and single nucleotide polymorphism (SNP) loci, small amplicons without constraints of size separation, more discrimination power, deep mixture resolution and sample multiplexing. We present our bioinformatic framework My-Forensic-Loci-queries (MyFLq) for analysis of MPS forensic data. For allele calling, the framework uses a MySQL reference allele database with automatically determined regions of interest (ROIs) by a generic maximal flanking algorithm which makes it possible to use any STR or SNP forensic locus. Python scripts were designed to automatically make allele calls starting from raw MPS data. We also present a method to assess the usefulness and overall performance of a forensic locus with respect to MPS, as well as methods to estimate whether an unknown allele, which sequence is not present in the MySQL database, is in fact a new allele or a sequencing error. The MyFLq framework was applied to an Illumina MiSeq dataset of a forensic Illumina amplicon library, generated from multilocus STR polymerase chain reaction (PCR) on both single contributor samples and multiple person DNA mixtures. Although the multilocus PCR was not yet optimized for MPS in terms of amplicon length or locus selection, the results show excellent results for most loci. The results show a high signal-to-noise ratio, correct allele calls, and a low limit of detection for minor DNA contributors in mixed DNA samples. Technically, forensic MPS affords great promise for routine implementation in forensic genomics. The method is also applicable to adjacent disciplines such as molecular autopsy in legal medicine and in mitochondrial DNA research. Copyright © 2013 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Forsythe, Stephen J; Dickins, Benjamin; Jolley, Keith A
2014-12-16
Following the association of Cronobacter spp. to several publicized fatal outbreaks in neonatal intensive care units of meningitis and necrotising enterocolitis, the World Health Organization (WHO) in 2004 requested the establishment of a molecular typing scheme to enable the international control of the organism. This paper presents the application of Next Generation Sequencing (NGS) to Cronobacter which has led to the establishment of the Cronobacter PubMLST genome and sequence definition database (http://pubmlst.org/cronobacter/) containing over 1000 isolates with metadata along with the recognition of specific clonal lineages linked to neonatal meningitis and adult infections Whole genome sequencing and multilocus sequence typing (MLST) has supports the formal recognition of the genus Cronobacter composed of seven species to replace the former single species Enterobacter sakazakii. Applying the 7-loci MLST scheme to 1007 strains revealed 298 definable sequence types, yet only C. sakazakii clonal complex 4 (CC4) was principally associated with neonatal meningitis. This clonal lineage has been confirmed using ribosomal-MLST (51-loci) and whole genome-MLST (1865 loci) to analyse 107 whole genomes via the Cronobacter PubMLST database. This database has enabled the retrospective analysis of historic cases and outbreaks following re-identification of those strains. The Cronobacter PubMLST database offers a central, open access, reliable sequence-based repository for researchers. It has the capacity to create new analysis schemes 'on the fly', and to integrate metadata (source, geographic distribution, clinical presentation). It is also expandable and adaptable to changes in taxonomy, and able to support the development of reliable detection methods of use to industry and regulatory authorities. Therefore it meets the WHO (2004) request for the establishment of a typing scheme for this emergent bacterial pathogen. Whole genome sequencing has additionally shown a range of potential virulence and environmental fitness traits which may account for the association of C. sakazakii CC4 pathogenicity, and propensity for neonatal CNS.
da Costa Francez, Pablo Abdon; Rodrigues, Elzemar Martins Ribeiro; Frazão, Gleycianne Furtado; dos Reis Borges, Nathalia Danielly; dos Santos, Sidney Emanuel Batista
2011-01-01
The allelic frequencies of 12 short tandem repeat loci were obtained from a sample of 307 unrelated individuals living in Macapá, a city in the northern Amazon region, Brazil. These loci are the most commonly used in forensics and paternity testing. Based on the allele frequency obtained for the population of Macapá, we estimated an interethnic admixture for the three parental groups (European, Native American and African) of, respectively, 46%, 35% and 19%. Comparing these allele frequencies with those of other Brazilian populations and of the Iberian Peninsula population, no significant distances were observed. The interpopulation genetic distances (FST coefficients) to the present database ranged from FST = 0.0016 between Macapá and Belém to FST = 0.0036 between Macapá and the Iberian Peninsula. PMID:21637540
da Costa Francez, Pablo Abdon; Rodrigues, Elzemar Martins Ribeiro; Frazão, Gleycianne Furtado; Dos Reis Borges, Nathalia Danielly; Dos Santos, Sidney Emanuel Batista
2011-01-01
The allelic frequencies of 12 short tandem repeat loci were obtained from a sample of 307 unrelated individuals living in Macapá, a city in the northern Amazon region, Brazil. These loci are the most commonly used in forensics and paternity testing. Based on the allele frequency obtained for the population of Macapá, we estimated an interethnic admixture for the three parental groups (European, Native American and African) of, respectively, 46%, 35% and 19%. Comparing these allele frequencies with those of other Brazilian populations and of the Iberian Peninsula population, no significant distances were observed. The interpopulation genetic distances (F(ST) coefficients) to the present database ranged from F(ST) = 0.0016 between Macapá and Belém to F(ST) = 0.0036 between Macapá and the Iberian Peninsula.
Haplotype data for 23 Y-chromosome markers in a reference sample from Bosnia and Herzegovina
Kovačević, Lejla; Fatur-Cerić, Vera; Hadžić, Negra; Čakar, Jasmina; Primorac, Dragan; Marjanović, Damir
2013-01-01
Aim To detect polymorphisms of 23 Y-chromosomal short tandem repeat (STR) loci, including 6 new loci, in a reference database of male population of Bosnia and Herzegovina, as well as to assess the importance of increasing the number of Y-STR loci utilized in forensic DNA analysis. Methods The reference sample consisted of 100 healthy, unrelated men originating from Bosnia and Herzegovina. Sample collection using buccal swabs was performed in all geographical regions of Bosnia and Herzegovina in the period from 2010 to 2011. DNA samples were typed for 23 Y STR loci, including 6 new loci: DYS576, DYS481, DYS549, DYS533, DYS570, and DYS643, which are included in the new PowerPlex® Y 23 amplification kit. Results The absolute frequency of generated haplotypes was calculated and results showed that 98 samples had unique Y 23 haplotypes, and that only two samples shared the same haplotype. The most polymorphic locus was DYS418, with 14 detected alleles and the least polymorphic loci were DYS389I, DYS391, DYS437, and DYS393. Conclusion This study showed that by increasing the number of highly polymorphic Y STR markers, to include those tested in our analysis, leads to a reduction of repeating haplotypes, which is very important in the application of forensic DNA analysis. PMID:23771760
Sinclair, Samuel Justin; Slavin-Mulford, Jenelle; Antonius, Daniel; Stein, Michelle B; Siefert, Caleb J; Haggerty, Greg; Malone, Johanna C; O'Keefe, Sheila; Blais, Mark A
2013-06-01
Research over the last decade has been promising in terms of the incremental utility of psychometric tools in predicting important clinical outcomes, such as mental health service utilization and inpatient psychiatric hospitalization. The purpose of this study was to develop and validate a new Level of Care Index (LOCI) from the Personality Assessment Inventory (PAI). Logistic regression was initially used in a development sample (n = 253) of psychiatric patients to identify unique PAI indicators associated with inpatient (n = 75) as opposed to outpatient (n = 178) status. Five PAI variables were ultimately retained (Suicidal Ideation, Antisocial Personality-Stimulus Seeking, Paranoia-Persecution, Negative Impression Management, and Depression-Affective) and were then aggregated into a single LOCI and independently evaluated in a second validation sample (n = 252). Results indicated the LOCI effectively differentiated inpatients from outpatients after controlling for demographic variables and was significantly associated with both internalizing and externalizing risk factors for psychiatric admission (range of ds = 0.46 for history of arrests to 0.88 for history of suicidal ideation). The LOCI was additionally found to be meaningfully associated with measures of normal personality, performance-based tests of psychological functioning, and measures of neurocognitive (executive) functioning. The clinical implications of these findings and potential utility of the LOCI are discussed. PsycINFO Database Record (c) 2013 APA, all rights reserved.
Vera, Manuel; Bello, Xabier; Álvarez-Dios, Jose-Antonio; Pardo, Belen G; Sánchez, Laura; Carlsson, Jens; Carlsson, Jeanette E L; Bartolomé, Carolina; Maside, Xulio; Martinez, Paulino
2015-12-01
The flat oyster (Ostrea edulis) is one of the most appreciated molluscs in Europe, but its production has been greatly reduced by the parasite Bonamia ostreae. Here, new generation genomic resources were used to analyse the repetitive fraction of the oyster genome, with the aim of developing molecular markers to face this main oyster production challenge. The resulting oyster database, consists of two sets of 10,318 and 7159 unique contigs (4.8 Mbp and 6.8 Mbp in total length) representing the oyster's genome (WG) and haemocyte transcriptome (HT), respectively. A total of 1083 sequences were identified as TE-derived, which corresponded to 4.0% of WG and 1.1% of HT. They were clustered into 142 homology groups, most of which were assigned to the Penelope order of retrotransposons, and to the Helitron and TIR DNA-transposons. Simple repeats and rRNA pseudogenes, also made a significant contribution to the oyster's genome (0.5% and 0.3% of WG and HT, respectively).The most frequent short tandem repeats identified in WG were tetranucleotide motifs while trinucleotide motifs were in HT. Forty identified microsatellite loci, 20 from each database, were selected for technical validation. Success was much lower among WG than HT microsatellites (15% vs 55%), which could reflect higher variation in anonymous regions interfering with primer annealing. All microsatellites developed adjusted to Hardy-Weinberg proportions and represent a useful tool to support future breeding programmes and to manage genetic resources of natural flat oyster beds. Copyright © 2015 Elsevier B.V. All rights reserved.
Estimating haplotype frequencies by combining data from large DNA pools with database information.
Gasbarra, Dario; Kulathinal, Sangita; Pirinen, Matti; Sillanpää, Mikko J
2011-01-01
We assume that allele frequency data have been extracted from several large DNA pools, each containing genetic material of up to hundreds of sampled individuals. Our goal is to estimate the haplotype frequencies among the sampled individuals by combining the pooled allele frequency data with prior knowledge about the set of possible haplotypes. Such prior information can be obtained, for example, from a database such as HapMap. We present a Bayesian haplotyping method for pooled DNA based on a continuous approximation of the multinomial distribution. The proposed method is applicable when the sizes of the DNA pools and/or the number of considered loci exceed the limits of several earlier methods. In the example analyses, the proposed model clearly outperforms a deterministic greedy algorithm on real data from the HapMap database. With a small number of loci, the performance of the proposed method is similar to that of an EM-algorithm, which uses a multinormal approximation for the pooled allele frequencies, but which does not utilize prior information about the haplotypes. The method has been implemented using Matlab and the code is available upon request from the authors.
Zhang, Kunpu; Wang, Junjun; Zhang, Liyi; Rong, Chaowu; Zhao, Fengwu; Peng, Tao; Li, Huimin; Cheng, Dongmei; Liu, Xin; Qin, Huanju; Zhang, Aimin; Tong, Yiping; Wang, Daowen
2013-01-01
Grain weight, an essential yield component, is under strong genetic control and markedly influenced by the environment. Here, by genome-wide association analysis with a panel of 94 elite common wheat varieties, 37 loci were found significantly associated with thousand-grain weight (TGW) in one or more environments differing in water and fertiliser levels. Five loci were stably associated with TGW under all 12 environments examined. Their elite alleles had positive effects on TGW. Four, two, three, and two loci were consistently associated with TGW in the irrigated and fertilised (IF), rainfed (RF), reduced nitrogen (RN), and reduced phosphorus (RP) environments. The elite alleles of the IF-specific loci enhanced TGW under well-resourced conditions, whereas those of the RF-, RN-, or RP-specific loci conferred tolerance to the TGW decrease when irrigation, nitrogen, or phosphorus were reduced. Moreover, the elite alleles of the environment-independent and -specific loci often acted additively to enhance TGW. Four additional loci were found associated with TGW in specific locations, one of which was shown to contribute to the TGW difference between two experimental sites. Further analysis of 14 associated loci revealed that nine affected both grain length and width, whereas the remaining loci influenced either grain length or width, indicating that these loci control grain weight by regulating kernel size. Finally, the elite allele of Xpsp3152 frequently co-segregated with the larger grain haplotype of TaGW2-6A, suggesting probable genetic and functional linkages between Xpsp3152 and GW2 that are important for grain weight control in cereal plants. Our study provides new knowledge on TGW control in elite common wheat lines, which may aid the improvement of wheat grain weight trait in further research. PMID:23469248
Yuan, Can; Peng, Fang; Yang, Ze-Mao; Zhong, Wen-Juan; Mou, Fang-Sheng; Gong, Yi-Yun; Ji, Pei-Cheng; Pu, De-Qiang; Huang, Hai-Yan; Yang, Xiao; Zhang, Chao
2017-09-01
Ligusticum chuanxiong is a well-known traditional Chinese medicine plant. The study on its molecular markers development and germplasm resources is very important. In this study, we obtained 24 422 unigenes by assembling transcriptome sequencing reads of L. chuanxiong root. EST-SSR was detected and 4 073 SSR loci were identified. EST-SSR distribution and characteristic analysis results showed that the mono-nucleotide repeats were the main repeat types, accounting for 41.0%. In addition, the sequences containing SSR were functionally annotated in Gene Ontology (GO) and KEGG pathway and were assigned to 49 GO categories, 242 KEGG pathways, among them 2 201 sequences were annotated against Nr database. By validating 235 EST-SSRs,74 primer pairs were ultimately proved to have high quality amplification. Subsequently, genetic diversity analysis, UPGMA cluster analysis, PCoA analysis and population structure analysis of 34 L. chuanxiong germplasm resources were carried out with 74 primer pairs. In both UPGMA tree and PCoA results, L. chuanxiong resources were clustered into two groups, which are believed to be partial related to their geographical distribution. In this study, EST-SSRs in L. chuanxiong was firstly identified, and newly developed molecular markers would contribute significantly to further genetic diversity study, the purity detection, gene mapping, and molecular breeding. Copyright© by the Chinese Pharmaceutical Association.
Filipino DNA variation at 12 X-chromosome short tandem repeat markers.
Salvador, Jazelyn M; Apaga, Dame Loveliness T; Delfin, Frederick C; Calacal, Gayvelline C; Dennis, Sheila Estacio; De Ungria, Maria Corazon A
2018-06-08
Demands for solving complex kinship scenarios where only distant relatives are available for testing have risen in the past years. In these instances, other genetic markers such as X-chromosome short tandem repeat (X-STR) markers are employed to supplement autosomal and Y-chromosomal STR DNA typing. However, prior to use, the degree of STR polymorphism in the population requires evaluation through generation of an allele or haplotype frequency population database. This population database is also used for statistical evaluation of DNA typing results. Here, we report X-STR data from 143 unrelated Filipino male individuals who were genotyped via conventional polymerase chain reaction-capillary electrophoresis (PCR-CE) using the 12 X-STR loci included in the Investigator ® Argus X-12 kit (Qiagen) and via massively parallel sequencing (MPS) of seven X-STR loci included in the ForenSeq ™ DNA Signature Prep kit of the MiSeq ® FGx ™ Forensic Genomics System (Illumina). Allele calls between PCR-CE and MPS systems were consistent (100% concordance) across seven overlapping X-STRs. Allele and haplotype frequencies and other parameters of forensic interest were calculated based on length (PCR-CE, 12 X-STRs) and sequence (MPS, seven X-STRs) variations observed in the population. Results of our study indicate that the 12 X-STRs in the PCR-CE system are highly informative for the Filipino population. MPS of seven X-STR loci identified 73 X-STR alleles compared with 55 X-STR alleles that were identified solely by length via PCR-CE. Of the 73 sequence-based alleles observed, six alleles have not been reported in the literature. The population data presented here may serve as a reference Philippine frequency database of X-STRs for forensic casework applications. Copyright © 2018 Elsevier B.V. All rights reserved.
Bioinformatic mining of EST-SSR loci in the Pacific oyster, Crassostrea gigas.
Wang, Y; Ren, R; Yu, Z
2008-06-01
A set of expressed sequence tag-simple sequence repeat (EST-SSR) markers of the Pacific oyster, Crassostrea gigas, was developed through bioinformatic mining of the GenBank public database. As of June 30, 2007, a total of 5132 EST sequences from GenBank were downloaded and screened for di-, tri- and tetra-nucleotide repeats, with criteria set at a minimum of 5, 4 and 4 repeats for the three categories of SSRs respectively. Seventeen polymorphic microsatellite markers were characterized. Allele numbers ranged from 3 to 10, and the observed and expected heterozygosity values varied from 0.125 to 0.770 and from 0.113 to 0.732 respectively. Eleven loci were at Hardy-Weinberg equilibrium (HWE); the other six loci showed significant departure from HWE (P < 0.01), suggesting possible presence of null alleles. Pairwise check of linkage disequilibrium (LD) indicated that 11 of 136 pairs of loci showed significant LD (P < 0.01), likely due to HWE present in single markers. Cross-species amplification was examined for five other Crassostrea species and reasonable results were obtained, promising usefulness of these markers in oyster genetics.
Tromp, Gerard; Kuivaniemi, Helena; Gretarsdottir, Solveig; Baas, Annette F.; Giusti, Betti; Strauss, Ewa; van‘t Hof, Femke N.G.; Webb, Thomas R.; Erdman, Robert; Ritchie, Marylyn D.; Elmore, James R.; Verma, Anurag; Pendergrass, Sarah; Kullo, Iftikhar J.; Ye, Zi; Peissig, Peggy L.; Gottesman, Omri; Verma, Shefali S.; Malinowski, Jennifer; Rasmussen-Torvik, Laura J.; Borthwick, Kenneth M.; Smelser, Diane T.; Crosslin, David R.; de Andrade, Mariza; Ryer, Evan J.; McCarty, Catherine A.; Böttinger, Erwin P.; Pacheco, Jennifer A.; Crawford, Dana C.; Carrell, David S.; Gerhard, Glenn S.; Franklin, David P.; Carey, David J.; Phillips, Victoria L.; Williams, Michael J.A.; Wei, Wenhua; Blair, Ross; Hill, Andrew A.; Vasudevan, Thodor M.; Lewis, David R.; Thomson, Ian A.; Krysa, Jo; Hill, Geraldine B.; Roake, Justin; Merriman, Tony R.; Oszkinis, Grzegorz; Galora, Silvia; Saracini, Claudia; Abbate, Rosanna; Pulli, Raffaele; Pratesi, Carlo; Saratzis, Athanasios; Verissimo, Ana R.; Bumpstead, Suzannah; Badger, Stephen A.; Clough, Rachel E.; Cockerill, Gillian; Hafez, Hany; Scott, D. Julian A.; Futers, T. Simon; Romaine, Simon P.R.; Bridge, Katherine; Griffin, Kathryn J.; Bailey, Marc A.; Smith, Alberto; Thompson, Matthew M.; van Bockxmeer, Frank M.; Matthiasson, Stefan E.; Thorleifsson, Gudmar; Thorsteinsdottir, Unnur; Blankensteijn, Jan D.; Teijink, Joep A.W.; Wijmenga, Cisca; de Graaf, Jacqueline; Kiemeney, Lambertus A.; Lindholt, Jes S.; Hughes, Anne; Bradley, Declan T.; Stirrups, Kathleen; Golledge, Jonathan; Norman, Paul E.; Powell, Janet T.; Humphries, Steve E.; Hamby, Stephen E.; Goodall, Alison H.; Nelson, Christopher P.; Sakalihasan, Natzi; Courtois, Audrey; Ferrell, Robert E.; Eriksson, Per; Folkersen, Lasse; Franco-Cereceda, Anders; Eicher, John D.; Johnson, Andrew D.; Betsholtz, Christer; Ruusalepp, Arno; Franzén, Oscar; Schadt, Eric E.; Björkegren, Johan L.M.; Lipovich, Leonard; Drolet, Anne M.; Verhoeven, Eric L.; Zeebregts, Clark J.; Geelkerken, Robert H.; van Sambeek, Marc R.; van Sterkenburg, Steven M.; de Vries, Jean-Paul; Stefansson, Kari; Thompson, John R.; de Bakker, Paul I.W.; Deloukas, Panos; Sayers, Robert D.; Harrison, Seamus C.; van Rij, Andre M.; Samani, Nilesh J.
2017-01-01
Rationale: Abdominal aortic aneurysm (AAA) is a complex disease with both genetic and environmental risk factors. Together, 6 previously identified risk loci only explain a small proportion of the heritability of AAA. Objective: To identify additional AAA risk loci using data from all available genome-wide association studies. Methods and Results: Through a meta-analysis of 6 genome-wide association study data sets and a validation study totaling 10 204 cases and 107 766 controls, we identified 4 new AAA risk loci: 1q32.3 (SMYD2), 13q12.11 (LINC00540), 20q13.12 (near PCIF1/MMP9/ZNF335), and 21q22.2 (ERG). In various database searches, we observed no new associations between the lead AAA single nucleotide polymorphisms and coronary artery disease, blood pressure, lipids, or diabetes mellitus. Network analyses identified ERG, IL6R, and LDLR as modifiers of MMP9, with a direct interaction between ERG and MMP9. Conclusions: The 4 new risk loci for AAA seem to be specific for AAA compared with other cardiovascular diseases and related traits suggesting that traditional cardiovascular risk factor management may only have limited value in preventing the progression of aneurysmal disease. PMID:27899403
NASA Astrophysics Data System (ADS)
Zhan, Aibin; Bao, Zhenmin; Wang, Mingling; Chang, Dan; Yuan, Jian; Wang, Xiaolong; Hu, Xiaoli; Liang, Chengzhu; Hu, Jingjie
2008-05-01
The EST database of the Pacific abalone ( Haliotis discus) was mined for developing microsatellite markers. A total of 1476 EST sequences were registered in GenBank when data mining was performed. Fifty sequences (approximately 3.4%) were found to contain one or more microsatellites. Based on the length and GC content of the flanking regions, cluster analysis and BLASTN, 13 microsatellite-containing ESTs were selected for PCR primer design. The results showed that 10 out of 13 primer pairs could amplify scorable PCR products and showed polymorphism. The number of alleles ranged from 2 to 13 and the values of H o and H e varied from 0.1222 to 0.8611 and 0.2449 to 0.9311, respectively. No significant linkage disequilibrium (LD) between any pairs of these loci was found, and 6 of 10 loci conformed to the Hardy-Weinberg equilibrium (HWE). These EST-SSRs are therefore potential tools for studies of intraspecies variation and hybrid identification.
Strope, Pooja K; Chaverri, Priscila; Gazis, Romina; Ciufo, Stacy; Domrachev, Michael; Schoch, Conrad L
2017-01-01
Abstract The ITS (nuclear ribosomal internal transcribed spacer) RefSeq database at the National Center for Biotechnology Information (NCBI) is dedicated to the clear association between name, specimen and sequence data. This database is focused on sequences obtained from type material stored in public collections. While the initial ITS sequence curation effort together with numerous fungal taxonomy experts attempted to cover as many orders as possible, we extended our latest focus to the family and genus ranks. We focused on Trichoderma for several reasons, mainly because the asexual and sexual synonyms were well documented, and a list of proposed names and type material were recently proposed and published. In this case study the recent taxonomic information was applied to do a complete taxonomic audit for the genus Trichoderma in the NCBI Taxonomy database. A name status report is available here: https://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi. As a result, the ITS RefSeq Targeted Loci database at NCBI has been augmented with more sequences from type and verified material from Trichoderma species. Additionally, to aid in the cross referencing of data from single loci and genomes we have collected a list of quality records of the RPB2 gene obtained from type material in GenBank that could help validate future submissions. During the process of curation misidentified genomes were discovered, and sequence records from type material were found hidden under previous classifications. Source metadata curation, although more cumbersome, proved to be useful as confirmation of the type material designation. Database URL: http://www.ncbi.nlm.nih.gov/bioproject/PRJNA177353 PMID:29220466
Morrison, Cheryl L.; Springmann, Marcus J.; Iwanowicz, Deborah D.; Wade, Christopher M.
2015-01-01
A suite of tetra-nucleotide microsatellite loci were developed for the invasive giant African land snail, Achatina (=Lissachatina) fulica Bowdich, 1822, from Ion Torrent next-generation sequencing data. Ten of the 96 primer sets tested amplified consistently in 30 snails from Miami, Florida, plus 12 individuals representative of their native East Africa, Indian and Pacific Ocean regions. The loci displayed moderate levels of allelic diversity (average 5.6 alleles/locus) and heterozygosity (average 42 %). Levels of genetic diversity were sufficient to produce unique multi-locus genotypes and detect phylogeographic structuring among regional samples. The invasive A. fulica can cause extensive damage to important food crops and natural resources, including native flora and fauna. The loci characterized here will be useful for determining the origins and tracking the spread of invasions, detecting fine-scale spatial structuring and estimating demographic parameters.
The African Genome Variation Project shapes medical genetics in Africa
NASA Astrophysics Data System (ADS)
Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.
2015-01-01
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
The African Genome Variation Project shapes medical genetics in Africa.
Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O; Choudhury, Ananyo; Ritchie, Graham R S; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N; Young, Elizabeth H; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S
2015-01-15
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa.
Variation analysis and gene annotation of eight MHC haplotypes: The MHC Haplotype Project
Horton, Roger; Gibson, Richard; Coggill, Penny; Miretti, Marcos; Allcock, Richard J.; Almeida, Jeff; Forbes, Simon; Gilbert, James G. R.; Halls, Karen; Harrow, Jennifer L.; Hart, Elizabeth; Howe, Kevin; Jackson, David K.; Palmer, Sophie; Roberts, Anne N.; Sims, Sarah; Stewart, C. Andrew; Traherne, James A.; Trevanion, Steve; Wilming, Laurens; Rogers, Jane; de Jong, Pieter J.; Elliott, John F.; Sawcer, Stephen; Todd, John A.; Trowsdale, John
2008-01-01
The human major histocompatibility complex (MHC) is contained within about 4 Mb on the short arm of chromosome 6 and is recognised as the most variable region in the human genome. The primary aim of the MHC Haplotype Project was to provide a comprehensively annotated reference sequence of a single, human leukocyte antigen-homozygous MHC haplotype and to use it as a basis against which variations could be assessed from seven other similarly homozygous cell lines, representative of the most common MHC haplotypes in the European population. Comparison of the haplotype sequences, including four haplotypes not previously analysed, resulted in the identification of >44,000 variations, both substitutions and indels (insertions and deletions), which have been submitted to the dbSNP database. The gene annotation uncovered haplotype-specific differences and confirmed the presence of more than 300 loci, including over 160 protein-coding genes. Combined analysis of the variation and annotation datasets revealed 122 gene loci with coding substitutions of which 97 were non-synonymous. The haplotype (A3-B7-DR15; PGF cell line) designated as the new MHC reference sequence, has been incorporated into the human genome assembly (NCBI35 and subsequent builds), and constitutes the largest single-haplotype sequence of the human genome to date. The extensive variation and annotation data derived from the analysis of seven further haplotypes have been made publicly available and provide a framework and resource for future association studies of all MHC-associated diseases and transplant medicine. PMID:18193213
STBase: one million species trees for comparative biology.
McMahon, Michelle M; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J
2015-01-01
Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user's query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees.
Pakkad, Greuk; Ueno, Saneyoshi; Yoshimaru, Hiroshi
2009-05-01
Afzelia xylocarpa is listed as an endangered species on the International Union for the Conservation of Nature and Natural Resources World list of Threatened Trees, due to overexploitation for its valuable timber and habitat loss. We isolated eight polymorphic microsatellite loci from A. xylocarpa using dual suppression polymerase chain reaction technique. These loci provide microsatellite markers with high polymorphism, as the number of alleles ranged from two to seven, and the estimate of gene diversity was between 0.285 and 0.795. The markers are now available for more detailed investigation of population genetic structure and gene flow among A. xylocarpa populations. © 2009 The Authors. Journal compilation © 2009 Blackwell Publishing Ltd.
van der Harst, Pim; Verweij, Niek
2018-02-02
Coronary artery disease (CAD) is a complex phenotype driven by genetic and environmental factors. Ninety-seven genetic risk loci have been identified to date, but the identification of additional susceptibility loci might be important to enhance our understanding of the genetic architecture of CAD. To expand the number of genome-wide significant loci, catalog functional insights, and enhance our understanding of the genetic architecture of CAD. We performed a genome-wide association study in 34 541 CAD cases and 261 984 controls of UK Biobank resource followed by replication in 88 192 cases and 162 544 controls from CARDIoGRAMplusC4D. We identified 75 loci that replicated and were genome-wide significant ( P <5×10 -8 ) in meta-analysis, 13 of which had not been reported previously. Next, to further identify novel loci, we identified all promising ( P <0.0001) loci in the CARDIoGRAMplusC4D data and performed reciprocal replication and meta-analyses with UK Biobank. This led to the identification of 21 additional novel loci reaching genome-wide significance ( P <5×10 -8 ) in meta-analysis. Finally, we performed a genome-wide meta-analysis of all available data revealing 30 additional novel loci ( P <5×10 -8 ) without further replication. The increase in sample size by UK Biobank raised the number of reconstituted gene sets from 4.2% to 13.9% of all gene sets to be involved in CAD. For the 64 novel loci, 155 candidate causal genes were prioritized, many without an obvious connection to CAD. Fine mapping of the 161 CAD loci generated lists of credible sets of single causal variants and genes for functional follow-up. Genetic risk variants of CAD were linked to development of atrial fibrillation, heart failure, and death. We identified 64 novel genetic risk loci for CAD and performed fine mapping of all 161 risk loci to obtain a credible set of causal variants. The large expansion of reconstituted gene sets argues in favor of an expanded omnigenic model view on the genetic architecture of CAD. © 2017 The Authors.
Expanding Omics Resources for Improvement of Soybean Seed Composition Traits
Chaudhary, Juhi; Patil, Gunvant B.; Sonah, Humira; Deshmukh, Rupesh K.; Vuong, Tri D.; Valliyodan, Babu; Nguyen, Henry T.
2015-01-01
Food resources of the modern world are strained due to the increasing population. There is an urgent need for innovative methods and approaches to augment food production. Legume seeds are major resources of human food and animal feed with their unique nutrient compositions including oil, protein, carbohydrates, and other beneficial nutrients. Recent advances in next-generation sequencing (NGS) together with “omics” technologies have considerably strengthened soybean research. The availability of well annotated soybean genome sequence along with hundreds of identified quantitative trait loci (QTL) associated with different seed traits can be used for gene discovery and molecular marker development for breeding applications. Despite the remarkable progress in these technologies, the analysis and mining of existing seed genomics data are still challenging due to the complexity of genetic inheritance, metabolic partitioning, and developmental regulations. Integration of “omics tools” is an effective strategy to discover key regulators of various seed traits. In this review, recent advances in “omics” approaches and their use in soybean seed trait investigations are presented along with the available databases and technological platforms and their applicability in the improvement of soybean. This article also highlights the use of modern breeding approaches, such as genome-wide association studies (GWAS), genomic selection (GS), and marker-assisted recurrent selection (MARS) for developing superior cultivars. A catalog of available important resources for major seed composition traits, such as seed oil, protein, carbohydrates, and yield traits are provided to improve the knowledge base and future utilization of this information in the soybean crop improvement programs. PMID:26635846
Lin, Qiang; Luo, Wei; Wan, Shiming; Gao, Zexia
2016-01-01
Seahorse conservation has been performed utilizing various strategies for many decades, and the deeper understanding of genomic information is necessary to more efficiently protect the germplasm resources of seahorse species. However, little genetic information about seahorses currently exists in the public databases. In this study, high-throughput RNA sequencing for two seahorse species, Hippocampus erectus and H. mohnikei, was carried out, and de novo assembly generated 37,506 unigenes for H. erectus and 36,113 unigenes for H. mohnikei. Among them, 17,338 (46.23%) unigenes for H. erectus and 17,900 (49.57%) for H. mohnikei were successfully annotated based on the information available from the public databases. Through comparing the unigenes of two seahorse species, 7,802 candidate orthologous genes were identified and 5,268 genes among them could be annotated. In addition, gene ontology analysis of two species was similarly performed on biological processes, cellular components, and molecular functions. Twenty-four and twenty-one unigenes in H. erectus and H. mohnikei were annotated in the biosynthesis of unsaturated fatty acids pathways, and both seahorses lacked the Δ12 and Δ15 desaturases. Total of 8,992 and 9,116 SSR loci were obtained from H. erectus and H. mohnikei unigenes, respectively. Dozens of SSR were developed and then applied to assess the population genetic diversity, as well as cross-amplified in a related species, H. trimaculatus. The HO and HE values of the tested populations for H. erectus, H. mohnikei, and H. trimaculatus were medium. These resources would facilitate the conservation of the species through a better understanding of the genomics and comparative genome analysis within the Hippocampus genus. PMID:27128031
Lin, Qiang; Luo, Wei; Wan, Shiming; Gao, Zexia
2016-01-01
Seahorse conservation has been performed utilizing various strategies for many decades, and the deeper understanding of genomic information is necessary to more efficiently protect the germplasm resources of seahorse species. However, little genetic information about seahorses currently exists in the public databases. In this study, high-throughput RNA sequencing for two seahorse species, Hippocampus erectus and H. mohnikei, was carried out, and de novo assembly generated 37,506 unigenes for H. erectus and 36,113 unigenes for H. mohnikei. Among them, 17,338 (46.23%) unigenes for H. erectus and 17,900 (49.57%) for H. mohnikei were successfully annotated based on the information available from the public databases. Through comparing the unigenes of two seahorse species, 7,802 candidate orthologous genes were identified and 5,268 genes among them could be annotated. In addition, gene ontology analysis of two species was similarly performed on biological processes, cellular components, and molecular functions. Twenty-four and twenty-one unigenes in H. erectus and H. mohnikei were annotated in the biosynthesis of unsaturated fatty acids pathways, and both seahorses lacked the Δ12 and Δ15 desaturases. Total of 8,992 and 9,116 SSR loci were obtained from H. erectus and H. mohnikei unigenes, respectively. Dozens of SSR were developed and then applied to assess the population genetic diversity, as well as cross-amplified in a related species, H. trimaculatus. The HO and HE values of the tested populations for H. erectus, H. mohnikei, and H. trimaculatus were medium. These resources would facilitate the conservation of the species through a better understanding of the genomics and comparative genome analysis within the Hippocampus genus.
Kang, Se Won; Patnaik, Bharat Bhusan; Hwang, Hee-Ju; Park, So Young; Chung, Jong Min; Song, Dae Kwon; Patnaik, Hongray Howrelia; Lee, Jae Bong; Kim, Changmu; Kim, Soonok; Park, Hong Seog; Park, Seung-Hwan; Park, Young-Su; Han, Yeon Soo; Lee, Jun Sang; Lee, Yong Seok
2017-03-01
Satsuma myomphala is critically endangered through loss of natural habitats, predation by natural enemies, and indiscriminate collection. It is a protected species in Korea but lacks genomic resources for an understanding of varied functional processes attributable to evolutionary success under natural habitats. For assessing the genetic information of S. myomphala, we performed for the first time, de novo transcriptome sequencing and functional annotation of expressed sequences using Illumina Next-Generation Sequencing (NGS) platform and bioinformatics analysis. We identified 103,774 unigenes of which 37,959, 12,890, and 17,699 were annotated in the PANM (Protostome DB), Unigene, and COG (Clusters of Orthologous Groups) databases, respectively. In addition, 14,451 unigenes were predicted under Gene Ontology functional categories, with 4581 assigned to a single category. Furthermore, 3369 sequences with 646 having Enzyme Commission (EC) numbers were mapped to 122 pathways in the Kyoto Encyclopedia of Genes and Genomes Pathway database. The prominent protein domains included the Zinc finger (C2H2-like), Reverse Transcriptase, Thioredoxin-like fold, and RNA recognition motif domain. Many unigenes with homology to immunity, defense, and reproduction-related genes were screened in the transcriptome. We also detected 3120 putative simple sequence repeats (SSRs) encompassing dinucleotide to hexanucleotide repeat motifs from >1kb unigene sequences. A list of PCR primers of SSR loci have been identified to study the genetic polymorphisms. The transcriptome data represents a valuable resource for further investigations on the species genome structure and biology. The unigenes information and microsatellites would provide an indispensable tool for conservation of the species in natural and adaptive environments. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Search Prairie Resources for Students Plant Database Plant Database Butterfly Info Butterfly Info Insects Insect Database Frogs Frog Info Bird Database Bird Database Online Prairie Data Online Prairie Data
An xQTL map integrates the genetic architecture of the human brain's transcriptome and epigenome.
Ng, Bernard; White, Charles C; Klein, Hans-Ulrich; Sieberts, Solveig K; McCabe, Cristin; Patrick, Ellis; Xu, Jishu; Yu, Lei; Gaiteri, Chris; Bennett, David A; Mostafavi, Sara; De Jager, Philip L
2017-10-01
We report a multi-omic resource generated by applying quantitative trait locus (xQTL) analyses to RNA sequence, DNA methylation and histone acetylation data from the dorsolateral prefrontal cortex of 411 older adults who have all three data types. We identify SNPs significantly associated with gene expression, DNA methylation and histone modification levels. Many of these SNPs influence multiple molecular features, and we demonstrate that SNP effects on RNA expression are fully mediated by epigenetic features in 9% of these loci. Further, we illustrate the utility of our new resource, xQTL Serve, by using it to prioritize the cell type(s) most affected by an xQTL. We also reanalyze published genome wide association studies using an xQTL-weighted analysis approach and identify 18 new schizophrenia and 2 new bipolar susceptibility variants, which is more than double the number of loci that can be discovered with a larger blood-based expression eQTL resource.
2010-01-01
Background Parkinson's disease is the second most common neurodegenerative disorder. The pathological hallmark of the disease is degeneration of midbrain dopaminergic neurons. Genetic association studies have linked 13 human chromosomal loci to Parkinson's disease. Identification of gene(s), as part of the etiology of Parkinson's disease, within the large number of genes residing in these loci can be achieved through several approaches, including screening methods, and considering appropriate criteria. Since several of the indentified Parkinson's disease genes are expressed in substantia nigra pars compact of the midbrain, expression within the neurons of this area could be a suitable criterion to limit the number of candidates and identify PD genes. Methods In this work we have used the combination of findings from six rodent transcriptome analysis studies on the gene expression profile of midbrain dopaminergic neurons and the PARK loci in OMIM (Online Mendelian Inheritance in Man) database, to identify new candidate genes for Parkinson's disease. Results Merging the two datasets, we identified 20 genes within PARK loci, 7 of which are located in an orphan Parkinson's disease locus and one, which had been identified as a disease gene. In addition to identifying a set of candidates for further genetic association studies, these results show that the criteria of expression in midbrain dopaminergic neurons may be used to narrow down the number of genes in PARK loci for such studies. PMID:20716345
Ciavaglia, Sherryn; Linacre, Adrian
2018-05-01
Reptile species, and in particular snakes, are protected by national and international agreements yet are commonly handled illegally. To aid in the enforcement of such legislation, we report on the development of three 11-plex assays from the genome of the carpet python to type 24 loci of tetra-nucleotide and penta-nucleotide repeat motifs (pure, compound and complex included). The loci range in size between 70 and 550 bp. Seventeen of the loci are newly characterised with the inclusion of seven previously developed loci to facilitate cross-comparison with previous carpet python genotyping studies. Assays were optimised in accordance with human forensic profiling kits using one nanogram template DNA. Three loci are included in all three of the multiplex reactions as quality assurance markers, to ensure sample identity and genotyping accuracy is maintained across the three profiling assays. Allelic ladders have been developed for the three assays to ensure consistent and precise allele designation. A DNA reference database of allele frequencies is presented based on 249 samples collected from throughout the species native range. A small number of validation tests are conducted to demonstrate the utility of these multiplex assays. We suggest further appropriate validation tests that should be conducted prior to the application of the multiplex assays in criminal investigations involving carpet pythons. Copyright © 2018 Elsevier B.V. All rights reserved.
Ghavidel, Mahdis; Mansury, Davood; Nourian, Kimiya; Ghazvini, Kiarash
2018-03-22
Mycobacterium bovis is a neglected zoonotic organism that epidemiological studies are of crucial importance in identifying its source, control it and prevent it from spreading. The aim of this study was to investigate the most common spoligotypes of Mycobacterium bovis circulating around the world and introduce the most and least strong determine powers of loci for VNTR. We have used different databases such as ISC, science direct, Embase (Elsevier), Web of Science, Scopus and Medline via PubMed. Searches were performed by key words including: Mycobacterium bovis, MIRU -VNTR, spoligotyping and discrimination power. Finally, thirty-one articles were selected after filtering out some titles, abstracts and full texts. Spoligotype SB0120 was the most common circulating type on several continents while SB0121 existed in Europe, Africa and America. SB0140 was also detected in Asia, Europe and America. QUB3232 and QUB11b were more appropriate loci among the loci with high discriminatory power. MIRU 10 and MIRU4 were among the loci with poor discriminatory power. Taking the published data into consideration, SB0120 and SB0121 are predominant spoligotypes of M. bovis circulating among animals around the world. Determining the most common spoligotype of M. bovis is the key to find source of infection, control and prevent the disease. Copyright © 2018 Elsevier Ltd. All rights reserved.
Aberrant methylation patterns affect the molecular pathogenesis of rheumatoid arthritis.
Lin, Yang; Luo, Zhengqiang
2017-05-01
This study aims to investigate DNA methylation signatures in fibroblast-like synoviocytes (FLS) from patients with rheumatoid arthritis (RA), and to explore the relationship with transcription factors (TFs) that help to distinguish RA from osteoarthritis (OA). Microarray dataset of GSE46346, including six FLS samples from patients with RA and five FLS samples from patients with OA, was downloaded from the Gene Expression Omnibus database. RA and OA samples were screened for differentially methylated loci (DMLs). The corresponding differentially methylated genes (DMGs) were identified, followed by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene Ontology (GO) enrichment analysis. A transcriptional regulatory network was built with TFs and their corresponding DMGs. Overall, 280 hypomethylated loci and 561 hypermethylated loci were screened. Genes containing hypermethylated loci were enriched in pathways in cancer, ECM-receptor interaction, focal adhesion and neurotrophin signaling pathways. Genes containing hypomethylated loci were enriched in the neurotrophin signaling pathway. Moreover, we found that CCCTC-binding factor (CTCF), Yin Yang 1 (YY1), v-myc avian myelocytomatosis viral oncogene homolog (c-MYC), and early growth response 1 (EGR1) were important TFs in the transcriptional regulatory network. Therefore, DMGs might participate in the neurotrophin signaling pathway, pathways in cancer, ECM-receptor interaction and focal adhesion pathways in RA. Furthermore, CTCF, c-MYC, YY1, and EGR1 may play important roles in RA through regulating DMGs. Copyright © 2017 Elsevier B.V. All rights reserved.
Development of a 20-locus fluorescent multiplex system as a valuable tool for national DNA database.
Jiang, Xianhua; Guo, Fei; Jia, Fei; Jin, Ping; Sun, Zhu
2013-02-01
The multiplex system allows the detection of 19 autosomal short tandem repeat (STR) loci [including all Combined DNA Index System (CODIS) STR loci as well as D2S1338, D6S1043, D12S391, D19S433, Penta D and Penta E] plus the sex-determining locus Amelogenin in a single reaction, comprising all STR loci in various commercial kits used in the China national DNA database (NDNAD). Primers are designed so that the amplicons are distributed ranging from 90 base pairs (bp) to 450 bp within a five-dye fluorescent design with the fifth dye reserved for the internal size standard. With 30 cycles, 125 pg to 2 ng DNA template showed optimal profiling result, while robust profiles could also be achieved by adjusting the cycle numbers for the DNA template beyond that optimal DNA input range. Mixture studies showed that 83% and 87% of minor alleles were detected at 9:1 and 1:9 ratios, respectively. When 4 ng of degraded DNA was digested by 2-min DNase and 1 ng undegraded DNA was added to 400 μM haematin, the complete profiles were still observed. Polymerase chain reaction (PCR)-based procedures were examined and optimized including the concentrations of primer set, magnesium and the Taq polymerase as well as volume, cycle number and annealing temperature. In addition, the system has been validated by 3000 bloodstain samples and 35 common case samples in line with the Chinese National Standards and Scientific Working Group on DNA Analysis Methods (SWGDAM) guidelines. The total probability of identity (TPI) can reach to 8×10(-24), where DNA database can be improved at the level of 10 million DNA profiles or more because the number of expected match is far from one person (4×10(-10)) and can be negligible. Further, our system also demonstrates its good performance in case samples and it will be an ideal tool for forensic DNA typing and databasing with potential application. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
2013-01-01
Background Although banana (Musa sp.) is an important edible crop, contributing towards poverty alleviation and food security, limited transcriptome datasets are available for use in accelerated molecular-based breeding in this genus. 454 GS-FLX Titanium technology was employed to determine the sequence of gene transcripts in genotypes of Musa acuminata ssp. burmannicoides Calcutta 4 and M. acuminata subgroup Cavendish cv. Grande Naine, contrasting in resistance to the fungal pathogen Mycosphaerella musicola, causal organism of Sigatoka leaf spot disease. To enrich for transcripts under biotic stress responses, full length-enriched cDNA libraries were prepared from whole plant leaf materials, both uninfected and artificially challenged with pathogen conidiospores. Results The study generated 846,762 high quality sequence reads, with an average length of 334 bp and totalling 283 Mbp. De novo assembly generated 36,384 and 35,269 unigene sequences for M. acuminata Calcutta 4 and Cavendish Grande Naine, respectively. A total of 64.4% of the unigenes were annotated through Basic Local Alignment Search Tool (BLAST) similarity analyses against public databases. Assembled sequences were functionally mapped to Gene Ontology (GO) terms, with unigene functions covering a diverse range of molecular functions, biological processes and cellular components. Genes from a number of defense-related pathways were observed in transcripts from each cDNA library. Over 99% of contig unigenes mapped to exon regions in the reference M. acuminata DH Pahang whole genome sequence. A total of 4068 genic-SSR loci were identified in Calcutta 4 and 4095 in Cavendish Grande Naine. A subset of 95 potential defense-related gene-derived simple sequence repeat (SSR) loci were validated for specific amplification and polymorphism across M. acuminata accessions. Fourteen loci were polymorphic, with alleles per polymorphic locus ranging from 3 to 8 and polymorphism information content ranging from 0.34 to 0.82. Conclusions A large set of unigenes were characterized in this study for both M. acuminata Calcutta 4 and Cavendish Grande Naine, increasing the number of public domain Musa ESTs. This transcriptome is an invaluable resource for furthering our understanding of biological processes elicited during biotic stresses in Musa. Gene-based markers will facilitate molecular breeding strategies, forming the basis of genetic linkage mapping and analysis of quantitative trait loci. PMID:23379821
A Brief Review of RNA–Protein Interaction Database Resources
Yi, Ying; Zhao, Yue; Huang, Yan; Wang, Dong
2017-01-01
RNA–Protein interactions play critical roles in various biological processes. By collecting and analyzing the RNA–Protein interactions and binding sites from experiments and predictions, RNA–Protein interaction databases have become an essential resource for the exploration of the transcriptional and post-transcriptional regulatory network. Here, we briefly review several widely used RNA–Protein interaction database resources developed in recent years to provide a guide of these databases. The content and major functions in databases are presented. The brief description of database helps users to quickly choose the database containing information they interested. In short, these RNA–Protein interaction database resources are continually updated, but the current state shows the efforts to identify and analyze the large amount of RNA–Protein interactions. PMID:29657278
Moura, Andre E; Kenny, John G; Chaudhuri, Roy; Hughes, Margaret A; J Welch, Andreanna; Reisinger, Ryan R; de Bruyn, P J Nico; Dahlheim, Marilyn E; Hall, Neil; Hoelzel, A Rus
2014-11-01
The evolution of diversity in the marine ecosystem is poorly understood, given the relatively high potential for connectivity, especially for highly mobile species such as whales and dolphins. The killer whale (Orcinus orca) has a worldwide distribution, and individual social groups travel over a wide geographic range. Even so, regional populations have been shown to be genetically differentiated, including among different foraging specialists (ecotypes) in sympatry. Given the strong matrifocal social structure of this species together with strong resource specializations, understanding the process of differentiation will require an understanding of the relative importance of both genetic drift and local adaptation. Here we provide a high-resolution analysis based on nuclear single-nucleotide polymorphic markers and inference about differentiation at both neutral loci and those potentially under selection. We find that all population comparisons, within or among foraging ecotypes, show significant differentiation, including populations in parapatry and sympatry. Loci putatively under selection show a different pattern of structure compared to neutral loci and are associated with gene ontology terms reflecting physiologically relevant functions (e.g. related to digestion). The pattern of differentiation for one ecotype in the North Pacific suggests local adaptation and shows some fixed differences among sympatric ecotypes. We suggest that differential habitat use and resource specializations have promoted sufficient isolation to allow differential evolution at neutral and functional loci, but that the process is recent and dependent on both selection and drift. © 2014 The Authors. Molecular Ecology published by John Wiley & Sons Ltd.
Moura, Andre E; Kenny, John G; Chaudhuri, Roy; Hughes, Margaret A; J Welch, Andreanna; Reisinger, Ryan R; de Bruyn, P J Nico; Dahlheim, Marilyn E; Hall, Neil; Hoelzel, A Rus
2014-01-01
The evolution of diversity in the marine ecosystem is poorly understood, given the relatively high potential for connectivity, especially for highly mobile species such as whales and dolphins. The killer whale (Orcinus orca) has a worldwide distribution, and individual social groups travel over a wide geographic range. Even so, regional populations have been shown to be genetically differentiated, including among different foraging specialists (ecotypes) in sympatry. Given the strong matrifocal social structure of this species together with strong resource specializations, understanding the process of differentiation will require an understanding of the relative importance of both genetic drift and local adaptation. Here we provide a high-resolution analysis based on nuclear single-nucleotide polymorphic markers and inference about differentiation at both neutral loci and those potentially under selection. We find that all population comparisons, within or among foraging ecotypes, show significant differentiation, including populations in parapatry and sympatry. Loci putatively under selection show a different pattern of structure compared to neutral loci and are associated with gene ontology terms reflecting physiologically relevant functions (e.g. related to digestion). The pattern of differentiation for one ecotype in the North Pacific suggests local adaptation and shows some fixed differences among sympatric ecotypes. We suggest that differential habitat use and resource specializations have promoted sufficient isolation to allow differential evolution at neutral and functional loci, but that the process is recent and dependent on both selection and drift. PMID:25244680
Zhivotovsky, Lev A; Malyarchuk, Boris A; Derenko, Miroslava V; Wozniak, Marcin; Grzybowski, Tomasz
2009-09-01
Developing a forensic DNA database on a population that consists of local ethnic groups separated by physical and cultural barriers is questionable as it can be genetically subdivided. On the other side, small sizes of ethnic groups, especially in alpine regions where they are sub-structured further into small villages, prevent collecting a large sample from each ethnic group. For such situations, we suggest to obtain both a total population database on allele frequencies across ethnic groups and a list of theta-values between the groups and the total data. We have genotyped 558 individuals from the native population of South Siberia, consisting of nine ethnic groups, at 17 autosomal STR loci of the kit packages AmpFlSTR SGM Plus i, Cyrillic AmpFlSTR Profiler Plus. The groups differentiate from each other with average theta-values of around 1.1%, and some reach up to three to four percent at certain loci. There exists between-village differentiation as well. Therefore, a database for the population of South Siberia is composed of data on allele frequencies in the pool of ethnic groups and data on theta-values that indicate variation in allele frequencies across the groups. Comparison to additional data on northeastern Asia (the Chukchi and Koryak) shows that differentiation in allele frequencies among small groups that are separated by large geographic distance can be even greater. In contrast, populations of Russians that live in large cities of the European part of Russia are homogeneous in allele frequencies, despite large geographic distance between them, and thus can be described by a database on allele frequencies alone, without any specific information on theta-values.
NREL: Renewable Resource Data Center - Biomass Resource Publications
Marginal Lands in APEC Economies NREL Publications Database For a comprehensive list of other NREL biomass resource publications, explore NREL's Publications Database. When searching the database, search on "
Exploring human disease using the Rat Genome Database.
Shimoyama, Mary; Laulederkind, Stanley J F; De Pons, Jeff; Nigam, Rajni; Smith, Jennifer R; Tutaj, Marek; Petri, Victoria; Hayman, G Thomas; Wang, Shur-Jen; Ghiasvand, Omid; Thota, Jyothi; Dwinell, Melinda R
2016-10-01
Rattus norvegicus, the laboratory rat, has been a crucial model for studies of the environmental and genetic factors associated with human diseases for over 150 years. It is the primary model organism for toxicology and pharmacology studies, and has features that make it the model of choice in many complex-disease studies. Since 1999, the Rat Genome Database (RGD; http://rgd.mcw.edu) has been the premier resource for genomic, genetic, phenotype and strain data for the laboratory rat. The primary role of RGD is to curate rat data and validate orthologous relationships with human and mouse genes, and make these data available for incorporation into other major databases such as NCBI, Ensembl and UniProt. RGD also provides official nomenclature for rat genes, quantitative trait loci, strains and genetic markers, as well as unique identifiers. The RGD team adds enormous value to these basic data elements through functional and disease annotations, the analysis and visual presentation of pathways, and the integration of phenotype measurement data for strains used as disease models. Because much of the rat research community focuses on understanding human diseases, RGD provides a number of datasets and software tools that allow users to easily explore and make disease-related connections among these datasets. RGD also provides comprehensive human and mouse data for comparative purposes, illustrating the value of the rat in translational research. This article introduces RGD and its suite of tools and datasets to researchers - within and beyond the rat community - who are particularly interested in leveraging rat-based insights to understand human diseases. © 2016. Published by The Company of Biologists Ltd.
A Stepanov, V.; Balanovsky, O.P.; Melnikov, A.V.; Lash-Zavada, A.Yu.; Khar’kov, V.N.; Tyazhelova, T.V.; Akhmetova, V.L.; Zhukova, O.V.; Shneider, Yu.V.; Shil’nikova, I.N.; Borinskaya, S.A.; Marusin, A.V.; Spiridonova, M.G.; Simonova, K.V.; Khitrinskaya, I.Yu.; Radzhabov, M.O.; Romanov, A.G.; Shtygasheva, O.V.; Koshel’, S.M.; Balanovskaya, E.V.; Rybakova, A.V.; Khusnutdinova, E.K.; Puzyrev, V.P.; Yankovsky, N.K.
2011-01-01
Seventeen population groups within the Russian Federation were characterized for the first time using a panel of 15 genetic markers that are used for DNA identification and in forensic medical examinations. The degree of polymorphism and population diversity of microsatellite loci within the Power Plex system (Promega) in Russian populations; the distribution of alleles and genotypes within the populations of six cities and 11 ethnic groups of the Russian Federation; the levels of intra- and interpopulation genetic differentiation of population; genetic relations between populations; and the identification and forensic medical characteristics of the system of markers under study were determined. Significant differences were revealed between the Russian populations and the U.S. reference base that was used recently in the forensic medical examination of the RF. A database of the allelic frequencies of 15 microsatellite loci that are used for DNA identification and forensic medical examination was created; the database has the potential of becoming the reference for performing forensic medical examinations in Russia. The spatial organization of genetic diversity over the panel of the STR markers that are used for DNA identification was revealed. It represents the general regularities of geographical clusterization of human populations over various types of genetic markers. The necessity to take into account a population’s genetic structure during forensic medical examinations and DNA identification of criminal suspects was substantiated. PMID:22649684
Dicken, Connie L.; Dunlap, Pamela; Parks, Heather L.; Hammarstrom, Jane M.; Zientek, Michael L.; Zientek, Michael L.; Hammarstrom, Jane M.; Johnson, Kathleen M.
2016-07-13
As part of the first-ever U.S. Geological Survey global assessment of undiscovered copper resources, data common to several regional spatial databases published by the U.S. Geological Survey, including one report from Finland and one from Greenland, were standardized, updated, and compiled into a global copper resource database. This integrated collection of spatial databases provides location, geologic and mineral resource data, and source references for deposits, significant prospects, and areas permissive for undiscovered deposits of both porphyry copper and sediment-hosted copper. The copper resource database allows for efficient modeling on a global scale in a geographic information system (GIS) and is provided in an Esri ArcGIS file geodatabase format.
Tan, Lay-Kim; Mohd-Farid, Baharin; Salsabil, Sulaiman; Heselynn, Hussein; Wahinuddin, Sulaiman; Lau, Ing-Soo; Gun, Suk-Chyn; Nor-Suhaila, Sharil; Eashwary, M; Mohd-Shahrir, Mohamed Said; Ainon, Mohd-Mokhtar; Azmillah, Rosman; Muhaini, Othman; Shahnaz, Murad; Too, Chun-Lai
2016-10-01
A total of 951 Southeast Asia Malays from Peninsular Malaysia were genotyped for HLA-A, -B, -C -DRB1, and -DQB1 loci using polymerase chain reaction sequence-specific oligonucleotide probe hybridization methods. In this report, there were significant deviation from Hardy-Weinberg proportions for the HLA-A (p<0.0001), -B (p<0.0001), -DRB1 (p<0.0001) and -DQB1 (p<0.01) loci. Minor deviations from HWEP were detected for HLA-C (p=0.01). This genotype data was available in Allele Frequencies Network Database (AFND) Gonzalez-Galarza et al. (2015). Copyright © 2016. Published by Elsevier Inc.
An Integrated Molecular Database on Indian Insects.
Pratheepa, Maria; Venkatesan, Thiruvengadam; Gracy, Gandhi; Jalali, Sushil Kumar; Rangheswaran, Rajagopal; Antony, Jomin Cruz; Rai, Anil
2018-01-01
MOlecular Database on Indian Insects (MODII) is an online database linking several databases like Insect Pest Info, Insect Barcode Information System (IBIn), Insect Whole Genome sequence, Other Genomic Resources of National Bureau of Agricultural Insect Resources (NBAIR), Whole Genome sequencing of Honey bee viruses, Insecticide resistance gene database and Genomic tools. This database was developed with a holistic approach for collecting information about phenomic and genomic information of agriculturally important insects. This insect resource database is available online for free at http://cib.res.in. http://cib.res.in/.
Vatanparast, Mohammad; Powell, Adrian; Doyle, Jeff J; Egan, Ashley N
2018-03-01
The development of pipelines for locus discovery has spurred the use of target enrichment for plant phylogenomics. However, few studies have compared pipelines from locus discovery and bait design, through validation, to tree inference. We compared three methods within Leguminosae (Fabaceae) and present a workflow for future efforts. Using 30 transcriptomes, we compared Hyb-Seq, MarkerMiner, and the Yang and Smith (Y&S) pipelines for locus discovery, validated 7501 baits targeting 507 loci across 25 genera via Illumina sequencing, and inferred gene and species trees via concatenation- and coalescent-based methods. Hyb-Seq discovered loci with the longest mean length. MarkerMiner discovered the most conserved loci with the least flagged as paralogous. Y&S offered the most parsimony-informative sites and putative orthologs. Target recovery averaged 93% across taxa. We optimized our targeted locus set based on a workflow designed to minimize paralog/ortholog conflation and thus present 423 loci for legume phylogenomics. Methods differed across criteria important for phylogenetic marker development. We recommend Hyb-Seq as a method that may be useful for most phylogenomic projects. Our targeted locus set is a resource for future, community-driven efforts to reconstruct the legume tree of life.
Genomic atlas of the human plasma proteome.
Sun, Benjamin B; Maranville, Joseph C; Peters, James E; Stacey, David; Staley, James R; Blackshaw, James; Burgess, Stephen; Jiang, Tao; Paige, Ellie; Surendran, Praveen; Oliver-Williams, Clare; Kamat, Mihir A; Prins, Bram P; Wilcox, Sheri K; Zimmerman, Erik S; Chi, An; Bansal, Narinder; Spain, Sarah L; Wood, Angela M; Morrell, Nicholas W; Bradley, John R; Janjic, Nebojsa; Roberts, David J; Ouwehand, Willem H; Todd, John A; Soranzo, Nicole; Suhre, Karsten; Paul, Dirk S; Fox, Caroline S; Plenge, Robert M; Danesh, John; Runz, Heiko; Butterworth, Adam S
2018-06-01
Although plasma proteins have important roles in biological processes and are the direct targets of many drugs, the genetic factors that control inter-individual variation in plasma protein levels are not well understood. Here we characterize the genetic architecture of the human plasma proteome in healthy blood donors from the INTERVAL study. We identify 1,927 genetic associations with 1,478 proteins, a fourfold increase on existing knowledge, including trans associations for 1,104 proteins. To understand the consequences of perturbations in plasma protein levels, we apply an integrated approach that links genetic variation with biological pathway, disease, and drug databases. We show that protein quantitative trait loci overlap with gene expression quantitative trait loci, as well as with disease-associated loci, and find evidence that protein biomarkers have causal roles in disease using Mendelian randomization analysis. By linking genetic factors to diseases via specific proteins, our analyses highlight potential therapeutic targets, opportunities for matching existing drugs with new disease indications, and potential safety concerns for drugs under development.
Allelic database and divergence among Psidium accessions by using microsatellite markers.
da Costa, S R; Santos, C A F
2013-12-16
This study aimed to investigate the genetic variability among guava accessions and wild Psidium species of the Embrapa Semiárido germplasm collection by using microsatellite loci to guide genetic resources and breeding programs, emphasizing crosses between guava and other Psidium species. DNA was extracted using the 2X CTAB method, and polymerase chain reaction products were analyzed on 6% denatured polyacrylamide gels stained with silver nitrate. The unweighted pair-group method using arithmetic average dendrogram generated from the distance matrix of the Jaccard coefficient for 183 alleles of 13 microsatellite loci was used for visualization of genetic similarity. The number of base pairs was estimated using inverse mobility method based on the regression of known-size products. Analysis of molecular variance was performed using total decomposition between and within guava accessions. The accessions showed similarity from 0.75 to 1.00, with the dendrogram presenting cophenetic value of 0.85. Five groups were observed: the first included guava accessions; the second, P. guineense accessions; the third, one accession of P. friedrichsthalianum; and the last 2 groups, P. cattleianum. The genetic similarity among P. guineense and some guava accessions were above 80%, suggesting greater possibility to obtain interspecies hybrids between these 2 species. The genetic variability between the accessions was considered to be high (ΦST = 0.238), indicating that guava genetic variability is not uniformly distributed among the 9 Brazilian states from where the accession were obtained. Obtaining a greater number of accessions by Brazilian states is recommended in order to have greater diversity among the species.
Nine microsatellite loci developed from the octocoral, Paragorgia arborea
Coykendall, D. Katharine; Morrison, Cheryl L.
2015-01-01
Paragorgia arborea, or bubblegum coral, occurs in continental slope habitats worldwide, which are increasingly threatened by human activities such as energy development and fisheries practices. From 101 putative loci screened, nine microsatellite markers were developed from samples taken from Baltimore canyon in the western North Atlantic Ocean. The number of alleles ranged from two to thirteen per locus and each displayed equilibrium. These nuclear resources will help further research on population connectivity in threatened coral species where mitochondrial markers are known to lack fine-scale genetic diversity.
Flandrois, Jean-Pierre; Lina, Gérard; Dumitrescu, Oana
2014-04-14
Tuberculosis is an infectious bacterial disease caused by Mycobacterium tuberculosis. It remains a major health threat, killing over one million people every year worldwide. An early antibiotic therapy is the basis of the treatment, and the emergence and spread of multidrug and extensively drug-resistant mutant strains raise significant challenges. As these bacteria grow very slowly, drug resistance mutations are currently detected using molecular biology techniques. Resistance mutations are identified by sequencing the resistance-linked genes followed by a comparison with the literature data. The only online database is the TB Drug Resistance Mutation database (TBDReaM database); however, it requires mutation detection before use, and its interrogation is complex due to its loose syntax and grammar. The MUBII-TB-DB database is a simple, highly structured text-based database that contains a set of Mycobacterium tuberculosis mutations (DNA and proteins) occurring at seven loci: rpoB, pncA, katG; mabA(fabG1)-inhA, gyrA, gyrB, and rrs. Resistance mutation data were extracted after the systematic review of MEDLINE referenced publications before March 2013. MUBII analyzes the query sequence obtained by PCR-sequencing using two parallel strategies: i) a BLAST search against a set of previously reconstructed mutated sequences and ii) the alignment of the query sequences (DNA and its protein translation) with the wild-type sequences. The post-treatment includes the extraction of the aligned sequences together with their descriptors (position and nature of mutations). The whole procedure is performed using the internet. The results are graphs (alignments) and text (description of the mutation, therapeutic significance). The system is quick and easy to use, even for technicians without bioinformatics training. MUBII-TB-DB is a structured database of the mutations occurring at seven loci of major therapeutic value in tuberculosis management. Moreover, the system provides interpretation of the mutations in biological and therapeutic terms and can evolve by the addition of newly described mutations. Its goal is to provide easy and comprehensive access through a client-server model over the Web to an up-to-date database of mutations that lead to the resistance of M. tuberculosis to antibiotics.
Recapitulation of genome-wide association studies on body mass index in the Korean population.
Hong, K W; Oh, B
2012-08-01
Obesity is a risk factor for multiple disorders such as diabetes and cardiovascular disease. Recently, a genome-wide association study for body mass index (BMI) was conducted in 249 796 individuals of European ancestry by the Genetic Investigation of Anthropometric Traits (GIANT) consortium. They identified 14 known obesity susceptibility loci and 18 new loci associated with BMI at the genome-wide significance level (P<5 × 10⁻⁸). Because the prevalence and severity of obesity vary among ethnic groups, it is worthy to investigate these results in another ethnic population. We examined the BMI association of 19 single-nucleotide polymorphisms (SNPs) out of the 32 in 8842 individuals from the Korean Association Resource data, and found 12 SNPs to be associated with BMI in the Korean population. Eight loci, rs10968576 (BDNF), rs3817334 (MTCH2), rs1558902 (FTO), rs571312 (MC4R), rs543874 (SEC16B), rs987237 (TFAP2B), rs2867125 (TMEM18) and rs7138803 (FAIM2), were previously known obesity susceptibility loci, and the remaining four loci, rs1514175 (TNNI3K), rs206936 (NUDT3), rs4771122 (MTIF3) and rs2241423 (MAP2K5), were newly identified as BMI loci by the GIANT study. Further, all 12 SNPs showed the same direction of effect on BMI between the two ethnic groups, suggesting a similar genetic architecture governing the obesity.
Montinaro, Francesco; Boschi, Ilaria; Trombetta, Federica; Merigioli, Sara; Anagnostou, Paolo; Battaggia, Cinzia; Capocasa, Marco; Crivellaro, Federica; Destro Bisol, Giovanni; Coia, Valentina
2012-12-01
The study of geographically and/or linguistically isolated populations could represent a potential area of interaction between population and forensic genetics. These investigations may be useful to evaluate the suitability of loci which have been selected using forensic criteria for bio-anthropological studies. At the same time, they give us an opportunity to evaluate the efficiency of forensic tools for parentage testing in groups with peculiar allele frequency profiles. Within the frame of a long-term project concerning Italian linguistic isolates, we studied 15 microsatellite loci (Identifiler kit) comprising the CODIS panel in 11 populations from the north-eastern Italian Alps (Veneto, Trentino and Friuli Venezia Giulia regions). All our analyses of inter-population differentiation highlight the genetic distinctiveness of most Alpine populations comparing them either to each other or with large and non-isolated Italian populations. Interestingly, we brought to light some aspects of population genetic structure which cannot be detected using unilinear polymorphisms. In fact, the analysis of genotypic disequilibrium between loci detected signals of population substructure when all the individuals of Alpine populations are pooled in a single group. Furthermore, despite the relatively low number of loci analyzed, genetic differentiation among Alpine populations was detected at individual level using a Bayesian method to cluster multilocus genotypes. Among the various populations studied, the four linguistic minorities (Fassa Valley, Luserna, Sappada and Sauris) showed the most pronounced diversity and signatures of a peculiar genetic ancestry. Finally, we show that database replacement may affect estimates of probability of paternity even when the local database is replaced by another based on populations which share a common genetic background but which differ in their demographic history. These findings point to the importance of considering the demographic and cultural profile of populations in forensic applications, even in a context of substantial genetic homogeneity such as that of European populations. Copyright © 2012 Elsevier Ireland Ltd. All rights reserved.
Nakagawa, So; Takahashi, Mahoko Ueda
2016-01-01
In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species.Database URL: http://geve.med.u-tokai.ac.jp. © The Author(s) 2016. Published by Oxford University Press.
Nakagawa, So; Takahashi, Mahoko Ueda
2016-01-01
In mammals, approximately 10% of genome sequences correspond to endogenous viral elements (EVEs), which are derived from ancient viral infections of germ cells. Although most EVEs have been inactivated, some open reading frames (ORFs) of EVEs obtained functions in the hosts. However, EVE ORFs usually remain unannotated in the genomes, and no databases are available for EVE ORFs. To investigate the function and evolution of EVEs in mammalian genomes, we developed EVE ORF databases for 20 genomes of 19 mammalian species. A total of 736,771 non-overlapping EVE ORFs were identified and archived in a database named gEVE (http://geve.med.u-tokai.ac.jp). The gEVE database provides nucleotide and amino acid sequences, genomic loci and functional annotations of EVE ORFs for all 20 genomes. In analyzing RNA-seq data with the gEVE database, we successfully identified the expressed EVE genes, suggesting that the gEVE database facilitates studies of the genomic analyses of various mammalian species. Database URL: http://geve.med.u-tokai.ac.jp PMID:27242033
Detection and analysis of CRISPRs of Shigella.
Guo, Xiangjiao; Wang, Yingfang; Duan, Guangcai; Xue, Zerun; Wang, Linlin; Wang, Pengfei; Qiu, Shaofu; Xi, Yuanlin; Yang, Haiyan
2015-01-01
The recently discovered CRISPRs (Clustered regularly interspaced short palindromic repeats) and Cas (CRISPR-associated) proteins are a novel genetic barrier that limits horizontal gene transfer in prokaryotes and the CRISPR loci provide a historical view of the exposure of prokaryotes to a variety of foreign genetic elements. The aim of study was to investigate the occurrence and distribution of the CRISPRs in Shigella. A collection of 61 strains of Shigella were screened for the existence of CRISPRs. Three CRISPR loci were identified among 61 shigella strains. CRISPR1/cas loci are detected in 49 strains of shigella. Yet, IS elements were detected in cas gene in some strains. In the remaining 12 Shigella flexneri strains, the CRISPR1/cas locus is deleted and only a cas3' pseudo gene and a repeat sequence are present. The presence of CRISPR2 is frequently accompanied by the emergence of CRISPR1. CRISPR3 loci were present in almost all strains (52/61). The length of CRISPR arrays varied from 1 to 9 spacers. Sequence analysis of the CRISPR arrays revealed that few spacers had matches in the GenBank databases. However, one spacer in CRISPR3 loci matches the cognate cas3 genes and no cas gene was present around CRISPR3 region. Analysis of CRISPR sequences show that CRISPR have little change which makes CRISPR poor genotyping markers. The present study is the first attempt to determine and analyze CRISPRs of shigella isolated from clinical patients.
Provençal, Nadine; Suderman, Matthew J.; Caramaschi, Doretta; Wang, Dongsha; Hallett, Michael; Vitaro, Frank
2013-01-01
Background Animal and human studies suggest that inflammation is associated with behavioral disorders including aggression. We have recently shown that physical aggression of boys during childhood is strongly associated with reduced plasma levels of cytokines IL-1α, IL-4, IL-6, IL-8 and IL-10, later in early adulthood. This study tests the hypothesis that there is an association between differential DNA methylation regions in cytokine genes in T cells and monocytes DNA in adult subjects and a trajectory of physical aggression from childhood to adolescence. Methodology/Principal Findings We compared the methylation profiles of the entire genomic loci encompassing the IL-1α, IL-6, IL-4, IL-10 and IL-8 and three of their regulatory transcription factors (TF) NFkB1, NFAT5 and STAT6 genes in adult males on a chronic physical aggression trajectory (CPA) and males with the same background who followed a normal physical aggression trajectory (control group) from childhood to adolescence. We used the method of methylated DNA immunoprecipitation with comprehensive cytokine gene loci and TF loci microarray hybridization, statistical analysis and false discovery rate correction. We found differentially methylated regions to associate with CPA in both the cytokine loci as well as in their transcription factors loci analyzed. Some of these differentially methylated regions were located in known regulatory regions whereas others, to our knowledge, were previously unknown as regulatory areas. However, using the ENCODE database, we were able to identify key regulatory elements in many of these regions that indicate that they might be involved in the regulation of cytokine expression. Conclusions We provide here the first evidence for an association between differential DNA methylation in cytokines and their regulators in T cells and monocytes and male physical aggression. PMID:23977113
Distinct evolutionary strategies of human leucocyte antigen loci in pathogen-rich environments
Sanchez-Mazas, Alicia; Lemaître, Jean-François; Currat, Mathias
2012-01-01
Human leucocyte antigen (HLA) loci have a complex evolution where both stochastic (e.g. genetic drift) and deterministic (natural selection) forces are involved. Owing to their extraordinary level of polymorphism, HLA genes are useful markers for reconstructing human settlement history. However, HLA variation often deviates significantly from neutral expectations towards an excess of genetic diversity. Because HLA molecules play a crucial role in immunity, this observation is generally explained by pathogen-driven-balancing selection (PDBS). In this study, we investigate the PDBS model by analysing HLA allelic diversity on a large database of 535 populations in relation to pathogen richness. Our results confirm that geographical distances are excellent predictors of HLA genetic differentiation worldwide. We also find a significant positive correlation between genetic diversity and pathogen richness at two HLA class I loci (HLA-A and -B), as predicted by PDBS, and a significant negative correlation at one HLA class II locus (HLA-DQB1). Although these effects are weak, as shown by a loss of significance when populations submitted to rapid genetic drift are removed from the analysis, the inverse relationship between genetic diversity and pathogen richness at different loci indicates that HLA genes have adopted distinct evolutionary strategies to provide immune protection in pathogen-rich environments. PMID:22312050
Musanovic, Jasmin; Filipovska-Musanovic, Marijana; Kovacevic, Lejla; Buljugic, Dzenisa; Dzehverovic, Mirela; Avdic, Jasna; Marjanovic, Damir
2012-05-01
In our previous population studies of Bosnia and Herzegovina human population, we have used autosomal STR, Y-STR, and X-STR loci, as well as Y-chromosome NRY biallelic markers. All obtained results were included in Bosnian referent database. In order of future development of applied population molecular genetics researches of Bosnia and Herzegovina human population, we have examined the effectiveness of 15 STR loci system in determination of sibship by using 15 STR loci and calculating different cut-off points of combined sibship indices (CSI) and distribution of sharing alleles. From the perspective of its application, it is very difficult and complicated to establish strict CSI cut-off values for determination of the doubtless sibship. High statistically significant difference between the means of CSI values and in distribution of alleles sharing in siblings and non-siblings was noticed (P < 0.0001). After constructing the "gray zone", only one false positive result was found in three CSI cut-off levels with the highest percent of determined sibship/non-sibship at the CSI = 0.067, confirming its practical benefit. Concerning the distribution of sharing alleles, it is recommended as an informative estimator for its usage within Bosnia and Herzegovina human population.
Chen, Lin; An, Yixin; Li, Yong-xiang; Li, Chunhui; Shi, Yunsu; Song, Yanchun; Zhang, Dengfeng; Wang, Tianyu; Li, Yu
2017-01-01
Maize grain yield and related traits are complex and are controlled by a large number of genes of small effect or quantitative trait loci (QTL). Over the years, a large number of yield-related QTLs have been identified in maize and deposited in public databases. However, integrating and re-analyzing these data and mining candidate loci for yield-related traits has become a major issue in maize. In this study, we collected information on QTLs conferring maize yield-related traits from 33 published studies. Then, 999 of these QTLs were iteratively projected and subjected to meta-analysis to obtain metaQTLs (MQTLs). A total of 76 MQTLs were found across the maize genome. Based on a comparative genomics strategy, several maize orthologs of rice yield-related genes were identified in these MQTL regions. Furthermore, three potential candidate genes (Gene ID: GRMZM2G359974, GRMZM2G301884, and GRMZM2G083894) associated with kernel size and weight within three MQTL regions were identified using regional association mapping, based on the results of the meta-analysis. This strategy, combining MQTL analysis and regional association mapping, is helpful for functional marker development and rapid identification of candidate genes or loci. PMID:29312420
Tragante, Vinicius; Barnes, Michael R.; Ganesh, Santhi K.; Lanktree, Matthew B.; Guo, Wei; Franceschini, Nora; Smith, Erin N.; Johnson, Toby; Holmes, Michael V.; Padmanabhan, Sandosh; Karczewski, Konrad J.; Almoguera, Berta; Barnard, John; Baumert, Jens; Chang, Yen-Pei Christy; Elbers, Clara C.; Farrall, Martin; Fischer, Mary E.; Gaunt, Tom R.; Gho, Johannes M.I.H.; Gieger, Christian; Goel, Anuj; Gong, Yan; Isaacs, Aaron; Kleber, Marcus E.; Leach, Irene Mateo; McDonough, Caitrin W.; Meijs, Matthijs F.L.; Melander, Olle; Nelson, Christopher P.; Nolte, Ilja M.; Pankratz, Nathan; Price, Tom S.; Shaffer, Jonathan; Shah, Sonia; Tomaszewski, Maciej; van der Most, Peter J.; Van Iperen, Erik P.A.; Vonk, Judith M.; Witkowska, Kate; Wong, Caroline O.L.; Zhang, Li; Beitelshees, Amber L.; Berenson, Gerald S.; Bhatt, Deepak L.; Brown, Morris; Burt, Amber; Cooper-DeHoff, Rhonda M.; Connell, John M.; Cruickshanks, Karen J.; Curtis, Sean P.; Davey-Smith, George; Delles, Christian; Gansevoort, Ron T.; Guo, Xiuqing; Haiqing, Shen; Hastie, Claire E.; Hofker, Marten H.; Hovingh, G. Kees; Kim, Daniel S.; Kirkland, Susan A.; Klein, Barbara E.; Klein, Ronald; Li, Yun R.; Maiwald, Steffi; Newton-Cheh, Christopher; O’Brien, Eoin T.; Onland-Moret, N. Charlotte; Palmas, Walter; Parsa, Afshin; Penninx, Brenda W.; Pettinger, Mary; Vasan, Ramachandran S.; Ranchalis, Jane E.; M Ridker, Paul; Rose, Lynda M.; Sever, Peter; Shimbo, Daichi; Steele, Laura; Stolk, Ronald P.; Thorand, Barbara; Trip, Mieke D.; van Duijn, Cornelia M.; Verschuren, W. Monique; Wijmenga, Cisca; Wyatt, Sharon; Young, J. Hunter; Zwinderman, Aeilko H.; Bezzina, Connie R.; Boerwinkle, Eric; Casas, Juan P.; Caulfield, Mark J.; Chakravarti, Aravinda; Chasman, Daniel I.; Davidson, Karina W.; Doevendans, Pieter A.; Dominiczak, Anna F.; FitzGerald, Garret A.; Gums, John G.; Fornage, Myriam; Hakonarson, Hakon; Halder, Indrani; Hillege, Hans L.; Illig, Thomas; Jarvik, Gail P.; Johnson, Julie A.; Kastelein, John J.P.; Koenig, Wolfgang; Kumari, Meena; März, Winfried; Murray, Sarah S.; O’Connell, Jeffery R.; Oldehinkel, Albertine J.; Pankow, James S.; Rader, Daniel J.; Redline, Susan; Reilly, Muredach P.; Schadt, Eric E.; Kottke-Marchant, Kandice; Snieder, Harold; Snyder, Michael; Stanton, Alice V.; Tobin, Martin D.; Uitterlinden, André G.; van der Harst, Pim; van der Schouw, Yvonne T.; Samani, Nilesh J.; Watkins, Hugh; Johnson, Andrew D.; Reiner, Alex P.; Zhu, Xiaofeng; de Bakker, Paul I.W.; Levy, Daniel; Asselbergs, Folkert W.; Munroe, Patricia B.; Keating, Brendan J.
2014-01-01
Blood pressure (BP) is a heritable risk factor for cardiovascular disease. To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP), and pulse pressure (PP), we genotyped ∼50,000 SNPs in up to 87,736 individuals of European ancestry and combined these in a meta-analysis. We replicated findings in an independent set of 68,368 individuals of European ancestry. Our analyses identified 11 previously undescribed associations in independent loci containing 31 genes including PDE1A, HLA-DQB1, CDK6, PRKAG2, VCL, H19, NUCB2, RELA, HOXC@ complex, FBN1, and NFAT5 at the Bonferroni-corrected array-wide significance threshold (p < 6 × 10−7) and confirmed 27 previously reported associations. Bioinformatic analysis of the 11 loci provided support for a putative role in hypertension of several genes, such as CDK6 and NUCB2. Analysis of potential pharmacological targets in databases of small molecules showed that ten of the genes are predicted to be a target for small molecules. In summary, we identified previously unknown loci associated with BP. Our findings extend our understanding of genes involved in BP regulation, which may provide new targets for therapeutic intervention or drug response stratification. PMID:24560520
Tringali, Michael D; Seyoum, Seifu; Carney, Susan L; Davis, Michelle C; Rodriguez-Lopez, Marta A; Reynolds Iii, John E; Haubold, Elsa
2008-03-01
Here we describe 18 polymorphic microsatellite loci for Trichechus manatus latirostris (Florida manatee), isolated using a polymerase chain reaction-based technique. The number of alleles at each locus ranged from two to four (mean = 2.5) in specimens from southwest (n = 58) and northeast (n = 58) Florida. Expected and observed heterozygosities ranged from 0.11 to 0.67 (mean = 0.35) and from 0.02 to 0.78 (mean = 0.34), respectively. Departures from Hardy-Weinberg equilibrium occurred at two loci. There was no evidence of genotypic disequilibrium for any pair of loci. For individual identification, mean random-mating and θ-corrected match probabilities were 9.36 × 10(-7) and 1.95 × 10(-6) , respectively. © 2007 The Authors.
A taxonomy of bacterial microcompartment loci constructed by a novel scoring method
Axen, Seth D.; Erbilgin, Onur; Kerfeld, Cheryl A.; ...
2014-10-23
Bacterial microcompartments (BMCs) are proteinaceous organelles involved in both autotrophic and heterotrophic metabolism. All BMCs share homologous shell proteins but differ in their complement of enzymes; these are typically encoded adjacent to shell protein genes in genetic loci, or operons. To enable the identification and prediction of functional (sub)types of BMCs, we developed LoClass, an algorithm that finds putative BMC loci and inventories, weights, and compares their constituent pfam domains to construct a locus similarity network and predict locus (sub)types. In addition to using LoClass to analyze sequences in the Non-redundant Protein Database, we compared predicted BMC loci found inmore » seven candidate bacterial phyla (six from single-cell genomic studies) to the LoClass taxonomy. Together, these analyses resulted in the identification of 23 different types of BMCs encoded in 30 distinct locus (sub)types found in 23 bacterial phyla. These include the two carboxysome types and a divergent set of metabolosomes, BMCs that share a common catalytic core and process distinct substrates via specific signature enzymes. Furthermore, many Candidate BMCs were found that lack one or more core metabolosome components, including one that is predicted to represent an entirely new paradigm for BMC-associated metabolism, joining the carboxysome and metabolosome. By placing these results in a phylogenetic context, we provide a framework for understanding the horizontal transfer of these loci, a starting point for studies aimed at understanding the evolution of BMCs. This comprehensive taxonomy of BMC loci, based on their constituent protein domains, foregrounds the functional diversity of BMCs and provides a reference for interpreting the role of BMC gene clusters encoded in isolate, single cell, and metagenomic data. Many loci encode ancillary functions such as transporters or genes for cofactor assembly; this expanded vocabulary of BMC-related functions should be useful for design of genetic modules for introducing BMCs in bioengineering applications.« less
A Taxonomy of Bacterial Microcompartment Loci Constructed by a Novel Scoring Method
Kerfeld, Cheryl A.
2014-01-01
Bacterial microcompartments (BMCs) are proteinaceous organelles involved in both autotrophic and heterotrophic metabolism. All BMCs share homologous shell proteins but differ in their complement of enzymes; these are typically encoded adjacent to shell protein genes in genetic loci, or operons. To enable the identification and prediction of functional (sub)types of BMCs, we developed LoClass, an algorithm that finds putative BMC loci and inventories, weights, and compares their constituent pfam domains to construct a locus similarity network and predict locus (sub)types. In addition to using LoClass to analyze sequences in the Non-redundant Protein Database, we compared predicted BMC loci found in seven candidate bacterial phyla (six from single-cell genomic studies) to the LoClass taxonomy. Together, these analyses resulted in the identification of 23 different types of BMCs encoded in 30 distinct locus (sub)types found in 23 bacterial phyla. These include the two carboxysome types and a divergent set of metabolosomes, BMCs that share a common catalytic core and process distinct substrates via specific signature enzymes. Furthermore, many Candidate BMCs were found that lack one or more core metabolosome components, including one that is predicted to represent an entirely new paradigm for BMC-associated metabolism, joining the carboxysome and metabolosome. By placing these results in a phylogenetic context, we provide a framework for understanding the horizontal transfer of these loci, a starting point for studies aimed at understanding the evolution of BMCs. This comprehensive taxonomy of BMC loci, based on their constituent protein domains, foregrounds the functional diversity of BMCs and provides a reference for interpreting the role of BMC gene clusters encoded in isolate, single cell, and metagenomic data. Many loci encode ancillary functions such as transporters or genes for cofactor assembly; this expanded vocabulary of BMC-related functions should be useful for design of genetic modules for introducing BMCs in bioengineering applications. PMID:25340524
Genetics Home Reference: obstructive sleep apnea
... Association of genetic loci with sleep apnea in European Americans and African-Americans: the Candidate Gene Association Resource (CARe). PLoS One. 2012;7(11):e48836. doi: 10.1371/journal.pone.0048836. Epub 2012 Nov 14. Citation on ...
The African Genome Variation Project shapes medical genetics in Africa
Gurdasani, Deepti; Carstensen, Tommy; Tekola-Ayele, Fasil; Pagani, Luca; Tachmazidou, Ioanna; Hatzikotoulas, Konstantinos; Karthikeyan, Savita; Iles, Louise; Pollard, Martin O.; Choudhury, Ananyo; Ritchie, Graham R. S.; Xue, Yali; Asimit, Jennifer; Nsubuga, Rebecca N.; Young, Elizabeth H.; Pomilla, Cristina; Kivinen, Katja; Rockett, Kirk; Kamali, Anatoli; Doumatey, Ayo P.; Asiki, Gershim; Seeley, Janet; Sisay-Joof, Fatoumatta; Jallow, Muminatou; Tollman, Stephen; Mekonnen, Ephrem; Ekong, Rosemary; Oljira, Tamiru; Bradman, Neil; Bojang, Kalifa; Ramsay, Michele; Adeyemo, Adebowale; Bekele, Endashaw; Motala, Ayesha; Norris, Shane A.; Pirie, Fraser; Kaleebu, Pontiano; Kwiatkowski, Dominic; Tyler-Smith, Chris; Rotimi, Charles; Zeggini, Eleftheria; Sandhu, Manjinder S.
2014-01-01
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterisation of African genetic diversity is needed. The African Genome Variation Project (AGVP) provides a resource to help design, implement and interpret genomic studies in sub-Saharan Africa (SSA) and worldwide. The AGVP represents dense genotypes from 1,481 and whole genome sequences (WGS) from 320 individuals across SSA. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across SSA. We identify new loci under selection, including for malaria and hypertension. We show that modern imputation panels can identify association signals at highly differentiated loci across populations in SSA. Using WGS, we show further improvement in imputation accuracy supporting efforts for large-scale sequencing of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa, showing for the first time that such designs are feasible. PMID:25470054
Characteristics of Resources Represented in the OCLC CORC Database.
ERIC Educational Resources Information Center
Connell, Tschera Harkness; Prabha, Chandra
2002-01-01
Examines the characteristics of Web resources in Online Computer Library Center's (OCLC) Cooperative Online Resource Catalog (CORC) in terms of subject matter, source of content, publication patterns, and units of information chosen for representation in the database. Suggests that the ability to successfully use a database depends on…
An efficient method to find potentially universal population genetic markers, applied to metazoans
2010-01-01
Background Despite the impressive growth of sequence databases, the limited availability of nuclear markers that are sufficiently polymorphic for population genetics and phylogeography and applicable across various phyla restricts many potential studies, particularly in non-model organisms. Numerous introns have invariant positions among kingdoms, providing a potential source for such markers. Unfortunately, most of the few known EPIC (Exon Primed Intron Crossing) loci are restricted to vertebrates or belong to multigenic families. Results In order to develop markers with broad applicability, we designed a bioinformatic approach aimed at avoiding multigenic families while identifying intron positions conserved across metazoan phyla. We developed a program facilitating the identification of EPIC loci which allowed slight variation in intron position. From the Homolens databases we selected 29 gene families which contained 52 promising introns for which we designed 93 primer pairs. PCR tests were performed on several ascidians, echinoderms, bivalves and cnidarians. On average, 24 different introns per genus were amplified in bilaterians. Remarkably, five of the introns successfully amplified in all of the metazoan genera tested (a dozen genera, including cnidarians). The influence of several factors on amplification success was investigated. Success rate was not related to the phylogenetic relatedness of a taxon to the groups that most influenced primer design, showing that these EPIC markers are extremely conserved in animals. Conclusions Our new method now makes it possible to (i) rapidly isolate a set of EPIC markers for any phylum, even outside the animal kingdom, and thus, (ii) compare genetic diversity at potentially homologous polymorphic loci between divergent taxa. PMID:20836842
Tucker, Valerie C; Hopwood, Andrew J; Sprecher, Cynthia J; McLaren, Robert S; Rabbach, Dawn R; Ensenberger, Martin G; Thompson, Jonelle M; Storts, Douglas R
2011-11-01
In response to the ENFSI and EDNAP groups' call for new STR multiplexes for Europe, Promega(®) developed a suite of four new DNA profiling kits. This paper describes the developmental validation study performed on the PowerPlex(®) ESI 16 (European Standard Investigator 16) and the PowerPlex(®) ESI 17 Systems. The PowerPlex(®) ESI 16 System combines the 11 loci compatible with the UK National DNA Database(®), contained within the AmpFlSTR(®) SGM Plus(®) PCR Amplification Kit, with five additional loci: D2S441, D10S1248, D22S1045, D1S1656 and D12S391. The multiplex was designed to reduce the amplicon size of the loci found in the AmpFlSTR(®) SGM Plus(®) kit. This design facilitates increased robustness and amplification success for the loci used in the national DNA databases created in many countries, when analyzing degraded DNA samples. The PowerPlex(®) ESI 17 System amplifies the same loci as the PowerPlex(®) ESI 16 System, but with the addition of a primer pair for the SE33 locus. Tests were designed to address the developmental validation guidelines issued by the Scientific Working Group on DNA Analysis Methods (SWGDAM), and those of the DNA Advisory Board (DAB). Samples processed include DNA mixtures, PCR reactions spiked with inhibitors, a sensitivity series, and 306 United Kingdom donor samples to determine concordance with data generated with the AmpFlSTR(®) SGM Plus(®) kit. Allele frequencies from 242 white Caucasian samples collected in the United Kingdom are also presented. The PowerPlex(®) ESI 16 and ESI 17 Systems are robust and sensitive tools, suitable for the analysis of forensic DNA samples. Full profiles were routinely observed with 62.5pg of a fully heterozygous single source DNA template. This high level of sensitivity was found to impact on mixture analyses, where 54-86% of unique minor contributor alleles were routinely observed in a 1:19 mixture ratio. Improved sensitivity combined with the robustness afforded by smaller amplicons has substantially improved the quantity of data obtained from degraded samples, and the improved chemistry confers exceptional tolerance to high levels of laboratory prepared inhibitors. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Tripathi, Abhinandan Mani; Tyagi, Antariksh; Kumar, Anoop; Singh, Akanksha; Singh, Shivani; Chaudhary, Lal Babu; Roy, Sribash
2013-01-01
DNA barcoding as a tool for species identification has been successful in animals and other organisms, including certain groups of plants. The exploration of this new tool for species identification, particularly in tree species, is very scanty from biodiversity-rich countries like India. rbcL and matK are standard barcode loci while ITS, and trnH-psbA are considered as supplementary loci for plants. Plant barcode loci, namely, rbcL, matK, ITS, trnH-psbA, and the recently proposed ITS2, were tested for their efficacy as barcode loci using 300 accessions of tropical tree species. We tested these loci for PCR, sequencing success, and species discrimination ability using three methods. rbcL was the best locus as far as PCR and sequencing success rate were concerned, but not for the species discrimination ability of tropical tree species. ITS and trnH-psbA were the second best loci in PCR and sequencing success, respectively. The species discrimination ability of ITS ranged from 24.4 percent to 74.3 percent and that of trnH-psbA was 25.6 percent to 67.7 percent, depending upon the data set and the method used. matK provided the least PCR success, followed by ITS2 (59. 0%). Species resolution by ITS2 and rbcL ranged from 9.0 percent to 48.7 percent and 13.2 percent to 43.6 percent, respectively. Further, we observed that the NCBI nucleotide database is poorly represented by the sequences of barcode loci studied here for tree species. Although a conservative approach of a success rate of 60-70 percent by both ITS and trnH-psbA may not be considered as highly successful but would certainly help in large-scale biodiversity inventorization, particularly for tropical tree species, considering the standard success rate of plant DNA barcode program reported so far. The recommended matK and rbcL primers combination may not work in tropical tree species as barcode markers.
Tsybovskii, I S; Veremeichik, V M; Kotova, S A; Kritskaya, S V; Evmenenko, S A; Udina, I G
2017-02-01
For the Republic of Belarus, development of a forensic reference database on the basis of 18 autosomal microsatellites (STR) using a population dataset (N = 1040), “familial” genotypic dataset (N = 2550) obtained from expertise performance of paternity testing, and a dataset of genotypes from a criminal registration database (N = 8756) is described. Population samples studied consist of 80% ethnic Belarusians and 20% individuals of other nationality or of mixed origin (by questionnaire data). Genotypes of 12346 inhabitants of the Republic of Belarus from 118 regional samples studied by 18 autosomal microsatellites are included in the sample: 16 tetranucleotide STR (D2S1338, TPOX, D3S1358, CSF1PO, D5S818, D8S1179, D7S820, THO1, vWA, D13S317, D16S539, D18S51, D19S433, D21S11, F13B, and FGA) and two pentanucleotide STR (Penta D and Penta E). The samples studied are in Hardy–Weinberg equilibrium according to distribution of genotypes by 18 STR. Significant differences were not detected between discrete populations or between samples from various historical ethnographic regions of the Republic of Belarus (Western and Eastern Polesie, Podneprovye, Ponemanye, Poozerye, and Center), which indicates the absence of prominent genetic differentiation. Statistically significant differences between the studied genotypic datasets also were not detected, which made it possible to combine the datasets and consider the total sample as a unified forensic reference database for 18 “criminalistic” STR loci. Differences between reference database of the Republic of Belarus and Russians and Ukrainians by the distribution of the range of autosomal STR also were not detected, corresponding to a close genetic relationship of the three Eastern Slavic nations mediated by common origin and intense mutual migrations. Significant differences by separate STR loci between the reference database of Republic of Belarus and populations of Southern and Western Slavs were observed. The necessity of using original reference database for support of forensic expertise practice in the Republic of Belarus was demonstrated.
An ontology for major histocompatibility restriction.
Vita, Randi; Overton, James A; Seymour, Emily; Sidney, John; Kaufman, Jim; Tallmadge, Rebecca L; Ellis, Shirley; Hammond, John; Butcher, Geoff W; Sette, Alessandro; Peters, Bjoern
2016-01-01
MHC molecules are a highly diverse family of proteins that play a key role in cellular immune recognition. Over time, different techniques and terminologies have been developed to identify the specific type(s) of MHC molecule involved in a specific immune recognition context. No consistent nomenclature exists across different vertebrate species. To correctly represent MHC related data in The Immune Epitope Database (IEDB), we built upon a previously established MHC ontology and created an ontology to represent MHC molecules as they relate to immunological experiments. This ontology models MHC protein chains from 16 species, deals with different approaches used to identify MHC, such as direct sequencing verses serotyping, relates engineered MHC molecules to naturally occurring ones, connects genetic loci, alleles, protein chains and multi-chain proteins, and establishes evidence codes for MHC restriction. Where available, this work is based on existing ontologies from the OBO foundry. Overall, representing MHC molecules provides a challenging and practically important test case for ontology building, and could serve as an example of how to integrate other ontology building efforts into web resources.
First genetic linkage map of Taraxacum koksaghyz Rodin based on AFLP, SSR, COS and EST-SSR markers.
Arias, Marina; Hernandez, Monica; Remondegui, Naroa; Huvenaars, Koen; van Dijk, Peter; Ritter, Enrique
2016-08-04
Taraxacum koksaghyz Rodin (TKS) has been studied in many occasions as a possible alternative source for natural rubber production of good quality and for inulin production. Some tire companies are already testing TKS tire prototypes. There are also many investigations on the production of bio-fuels from inulin and inulin applications for health improvement and in the food industry. A limited amount of genomic resources exist for TKS and particularly no genetic linkage map is available in this species. We have constructed the first TKS genetic linkage map based on AFLP, COS, SSR and EST-SSR markers. The integrated linkage map with eight linkage groups (LG), representing the eight chromosomes of Russian dandelion, has 185 individual AFLP markers from parent 1, 188 individual AFLP markers from parent 2, 75 common AFLP markers and 6 COS, 1 SSR and 63 EST-SSR loci. Blasting the EST-SSR sequences against known sequences from lettuce allowed a partial alignment of our TKS map with a lettuce map. Blast searches against plant gene databases revealed some homologies with useful genes for downstream applications in the future.
Robertson, Laura S.; Cornman, Robert S.
2014-01-01
We developed genetic resources for two North American frogs, Lithobates clamitans and Pseudacris regilla, widespread native amphibians that are potential indicator species of environmental health. For both species, mRNA from multiple tissues was sequenced using 454 technology. De novo assemblies with Mira3 resulted in 50 238 contigs (N50 = 687 bp) and 48 213 contigs (N50 = 686 bp) for L. clamitans and P. regilla, respectively, after clustering with CD-Hit-EST and purging contigs below 200 bp. We performed BLASTX similarity searches against the Xenopus tropicalis proteome and, for predicted ORFs, HMMER similarity searches against the Pfam-A database. Because there is broad interest in amphibian immune factors, we manually annotated putative antimicrobial peptides. To identify conserved regions suitable for amplicon resequencing across a broad taxonomic range, we performed an additional assembly of public short-read transcriptome data derived from two species of the genus Rana and identified reciprocal best TBLASTX matches among all assemblies. Although P. regilla, a hylid frog, is substantially more diverged from the ranid species, we identified 56 genes that were sufficiently conserved to allow nondegenerate primer design with Primer3. In addition to providing a foundation for comparative genomics and quantitative gene expression analysis, our results enable quick development of nuclear sequence-based markers for phylogenetics or population genetics.
Murphy, Philip N; Bruno, Raimondo; Ryland, Ida; Wareing, Michele; Fisk, John E; Montgomery, Catharine; Hilton, Joanne
2012-03-01
To review, with meta-analyses where appropriate, performance differences between ecstasy (3,4-methylenedioxymethamphetamine) users and non-users on a wider range of visuospatial tasks than previously reviewed. Such tasks have been shown to draw upon working memory executive resources. Abstract databases were searched using the United Kingdom National Health Service Evidence Health Information Resource. Inclusion criteria were publication in English language peer-reviewed journals and the reporting of new findings regarding human ecstasy-users' performance on visuospatial tasks. Data extracted included specific task requirements to provide a basis for meta-analyses for categories of tasks with similar requirements. Fifty-two studies were identified for review, although not all were suitable for meta-analysis. Significant weighted mean effect sizes indicating poorer performance by ecstasy users compared with matched controls were found for tasks requiring recall of spatial stimulus elements, recognition of figures and production/reproduction of figures. There was no evidence of a linear relationship between estimated ecstasy consumption and effect sizes. Given the networked nature of processing for spatial and non-spatial visual information, future scanning and imaging studies should focus on brain activation differences between ecstasy users and non-users in the context of specific tasks to facilitate identification of loci of potentially compromised activity in users. Copyright © 2012 John Wiley & Sons, Ltd.
The Emerging Oilseed Crop Sesamum indicum Enters the “Omics” Era
Dossa, Komivi; Diouf, Diaga; Wang, Linhai; Wei, Xin; Zhang, Yanxin; Niang, Mareme; Fonceka, Daniel; Yu, Jingyin; Mmadi, Marie A.; Yehouessi, Louis W.; Liao, Boshou; Zhang, Xiurong; Cisse, Ndiaga
2017-01-01
Sesame (Sesamum indicum L.) is one of the oldest oilseed crops widely grown in Africa and Asia for its high-quality nutritional seeds. It is well adapted to harsh environments and constitutes an alternative cash crop for smallholders in developing countries. Despite its economic and nutritional importance, sesame is considered as an orphan crop because it has received very little attention from science. As a consequence, it lags behind the other major oil crops as far as genetic improvement is concerned. In recent years, the scenario has considerably changed with the decoding of the sesame nuclear genome leading to the development of various genomic resources including molecular markers, comprehensive genetic maps, high-quality transcriptome assemblies, web-based functional databases and diverse daft genome sequences. The availability of these tools in association with the discovery of candidate genes and quantitative trait locis for key agronomic traits including high oil content and quality, waterlogging and drought tolerance, disease resistance, cytoplasmic male sterility, high yield, pave the way to the development of some new strategies for sesame genetic improvement. As a result, sesame has graduated from an “orphan crop” to a “genomic resource-rich crop.” With the limited research teams working on sesame worldwide, more synergic efforts are needed to integrate these resources in sesame breeding for productivity upsurge, ensuring food security and improved livelihood in developing countries. This review retraces the evolution of sesame research by highlighting the recent advances in the “Omics” area and also critically discusses the future prospects for a further genetic improvement and a better expansion of this crop. PMID:28713412
The Emerging Oilseed Crop Sesamum indicum Enters the "Omics" Era.
Dossa, Komivi; Diouf, Diaga; Wang, Linhai; Wei, Xin; Zhang, Yanxin; Niang, Mareme; Fonceka, Daniel; Yu, Jingyin; Mmadi, Marie A; Yehouessi, Louis W; Liao, Boshou; Zhang, Xiurong; Cisse, Ndiaga
2017-01-01
Sesame ( Sesamum indicum L.) is one of the oldest oilseed crops widely grown in Africa and Asia for its high-quality nutritional seeds. It is well adapted to harsh environments and constitutes an alternative cash crop for smallholders in developing countries. Despite its economic and nutritional importance, sesame is considered as an orphan crop because it has received very little attention from science. As a consequence, it lags behind the other major oil crops as far as genetic improvement is concerned. In recent years, the scenario has considerably changed with the decoding of the sesame nuclear genome leading to the development of various genomic resources including molecular markers, comprehensive genetic maps, high-quality transcriptome assemblies, web-based functional databases and diverse daft genome sequences. The availability of these tools in association with the discovery of candidate genes and quantitative trait locis for key agronomic traits including high oil content and quality, waterlogging and drought tolerance, disease resistance, cytoplasmic male sterility, high yield, pave the way to the development of some new strategies for sesame genetic improvement. As a result, sesame has graduated from an "orphan crop" to a "genomic resource-rich crop." With the limited research teams working on sesame worldwide, more synergic efforts are needed to integrate these resources in sesame breeding for productivity upsurge, ensuring food security and improved livelihood in developing countries. This review retraces the evolution of sesame research by highlighting the recent advances in the "Omics" area and also critically discusses the future prospects for a further genetic improvement and a better expansion of this crop.
Database resources of the National Center for Biotechnology Information: 2002 update
Wheeler, David L.; Church, Deanna M.; Lash, Alex E.; Leipe, Detlef D.; Madden, Thomas L.; Pontius, Joan U.; Schuler, Gregory D.; Schriml, Lynn M.; Tatusova, Tatiana A.; Wagner, Lukas; Rapp, Barbara A.
2002-01-01
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, Human¡VMouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. PMID:11752242
STBase: One Million Species Trees for Comparative Biology
McMahon, Michelle M.; Deepak, Akshay; Fernández-Baca, David; Boss, Darren; Sanderson, Michael J.
2015-01-01
Comprehensively sampled phylogenetic trees provide the most compelling foundations for strong inferences in comparative evolutionary biology. Mismatches are common, however, between the taxa for which comparative data are available and the taxa sampled by published phylogenetic analyses. Moreover, many published phylogenies are gene trees, which cannot always be adapted immediately for species level comparisons because of discordance, gene duplication, and other confounding biological processes. A new database, STBase, lets comparative biologists quickly retrieve species level phylogenetic hypotheses in response to a query list of species names. The database consists of 1 million single- and multi-locus data sets, each with a confidence set of 1000 putative species trees, computed from GenBank sequence data for 413,000 eukaryotic taxa. Two bodies of theoretical work are leveraged to aid in the assembly of multi-locus concatenated data sets for species tree construction. First, multiply labeled gene trees are pruned to conflict-free singly-labeled species-level trees that can be combined between loci. Second, impacts of missing data in multi-locus data sets are ameliorated by assembling only decisive data sets. Data sets overlapping with the user’s query are ranked using a scheme that depends on user-provided weights for tree quality and for taxonomic overlap of the tree with the query. Retrieval times are independent of the size of the database, typically a few seconds. Tree quality is assessed by a real-time evaluation of bootstrap support on just the overlapping subtree. Associated sequence alignments, tree files and metadata can be downloaded for subsequent analysis. STBase provides a tool for comparative biologists interested in exploiting the most relevant sequence data available for the taxa of interest. It may also serve as a prototype for future species tree oriented databases and as a resource for assembly of larger species phylogenies from precomputed trees. PMID:25679219
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alonso, S.; Castro, A.; Fernandez-Fernandez, I.
1997-02-01
Short VNTR alleles that go undetected after conventional Southern blot hybridization may constitute an alternative explanation for the heterozygosity deficiency observed at some minisatellite loci. To examine this hypothesis, we have employed a screening procedure based on PCR amplification of those individuals classified as homozygotes in our databases for the loci D1S7, D7S21, and D12S11. The results obtained indicate that the frequency of these short alleles is related to the heterozygosity deficiency observed. For the most polymorphic locus, D1S7, {approximately}60% of those individuals previously classified as homozygotes were in fact heterozygotes for a short allele. After the inclusion of thesemore » new alleles, the agreement between observed and expected heterozygosity, along with other statistical tests employed, provide additional evidence for lack of population substructuring. Comparisons of allele frequency distributions reveal greater differences between racial groups than between closely related populations. 45 refs., 3 figs., 6 tabs.« less
Gu, Lijun; Kawana-Tachikawa, Ai; Shiino, Teiichiro; Nakamura, Hitomi; Koga, Michiko; Kikuchi, Tadashi; Adachi, Eisuke; Koibuchi, Tomohiko; Ishida, Takaomi; Gao, George F; Matsushita, Masaki; Sugiura, Wataru; Iwamoto, Aikichi; Hosoya, Noriaki
2014-01-01
Drug resistance (DR) of HIV-1 can be examined genotypically or phenotypically. Although sequencing is the gold standard of the genotypic resistance testing (GRT), high-throughput GRT targeted to the codons responsible for DR may be more appropriate for epidemiological studies and public health research. We used a Japanese database to design and synthesize sequence-specific oligonucleotide probes (SSOP) for the detection of wild-type sequences and 6 DR mutations in the clade B HIV-1 reverse transcriptase region. We coupled SSOP to microbeads of the Luminex 100 xMAP system and developed a GRT based on the polymerase chain reaction (PCR)-SSOP-Luminex method. Sixteen oligoprobes for discriminating DR mutations from wild-type sequences at 6 loci were designed and synthesized, and their sensitivity and specificity were confirmed using isogenic plasmids. The PCR-SSOP-Luminex DR assay was then compared to direct sequencing using 74 plasma specimens from treatment-naïve patients or those on failing treatment. In the majority of specimens, the results of the PCR-SSOP-Luminex DR assay were concordant with sequencing results: 62/74 (83.8%) for M41, 43/74 (58.1%) for K65, 70/74 (94.6%) for K70, 55/73 (75.3%) for K103, 63/73 (86.3%) for M184 and 68/73 (93.2%) for T215. There were a number of specimens without any positive signals, especially for K65. The nucleotide position of A2723G, A2747G and C2750T were frequent polymorphisms for the wild-type amino acids K65, K66 and D67, respectively, and 14 specimens had the D67N mutation encoded by G2748A. We synthesized 14 additional oligoprobes for K65, and the sensitivity for K65 loci improved from 43/74 (58.1%) to 68/74 (91.9%). We developed a rapid high-throughput assay for clade B HIV-1 DR mutations, which could be customized by synthesizing oligoprobes suitable for the circulating viruses. The assay could be a useful tool especially for public health research in both resource-rich and resource-limited settings.
Molecular mapping and breeding with microsatellite markers.
Lightfoot, David A; Iqbal, Muhammad J
2013-01-01
In genetics databases for crop plant species across the world, there are thousands of mapped loci that underlie quantitative traits, oligogenic traits, and simple traits recognized by association mapping in populations. The number of loci will increase as new phenotypes are measured in more diverse genotypes and genetic maps based on saturating numbers of markers are developed. A period of locus reevaluation will decrease the number of important loci as those underlying mega-environmental effects are recognized. A second wave of reevaluation of loci will follow from developmental series analysis, especially for harvest traits like seed yield and composition. Breeding methods to properly use the accurate maps of QTL are being developed. New methods to map, fine map, and isolate the genes underlying the loci will be critical to future advances in crop biotechnology. Microsatellite markers are the most useful tool for breeders. They are codominant, abundant in all genomes, highly polymorphic so useful in many populations, and both economical and technically easy to use. The selective genotyping approaches, including genotype ranking (indexing) based on partial phenotype data combined with favorable allele data and bulked segregation event (segregant) analysis (BSA), will be increasingly important uses for microsatellites. Examples of the methods for developing and using microsatellites derived from genomic sequences are presented for monogenic, oligogenic, and polygenic traits. Examples of successful mapping, fine mapping, and gene isolation are given. When combined with high-throughput methods for genotyping and a genome sequence, the use of association mapping with microsatellite markers will provide critical advances in the analysis of crop traits.
In silico mapping of quantitative trait loci in maize.
Parisseaux, B; Bernardo, R
2004-08-01
Quantitative trait loci (QTL) are most often detected through designed mapping experiments. An alternative approach is in silico mapping, whereby genes are detected using existing phenotypic and genomic databases. We explored the usefulness of in silico mapping via a mixed-model approach in maize (Zea mays L.). Specifically, our objective was to determine if the procedure gave results that were repeatable across populations. Multilocation data were obtained from the 1995-2002 hybrid testing program of Limagrain Genetics in Europe. Nine heterotic patterns comprised 22,774 single crosses. These single crosses were made from 1,266 inbreds that had data for 96 simple sequence repeat (SSR) markers. By a mixed-model approach, we estimated the general combining ability effects associated with marker alleles in each heterotic pattern. The numbers of marker loci with significant effects--37 for plant height, 24 for smut [Ustilago maydis (DC.) Cda.] resistance, and 44 for grain moisture--were consistent with previous results from designed mapping experiments. Each trait had many loci with small effects and few loci with large effects. For smut resistance, a marker in bin 8.05 on chromosome 8 had a significant effect in seven (out of a maximum of 18) instances. For this major QTL, the maximum effect of an allele substitution ranged from 5.4% to 41.9%, with an average of 22.0%. We conclude that in silico mapping via a mixed-model approach can detect associations that are repeatable across different populations. We speculate that in silico mapping will be more useful for gene discovery than for selection in plant breeding programs. Copyright 2004 Springer-Verlag
[Bio-Resources and Database for Preemptive Medicine.
Saito, Kuniaki
2016-05-01
Establishing a primary defense for the improvement of individual quality of life by epidemiology and various clinical studies applying bio-resources/database analysis is very important. Furthermore, recent studies on understanding the epigenetic regulatory mechanisms of developmental origins of health and diseases are attracting increasing interest. Therefore, the storing of not only bio-fluid (i.e., blood, urine) but also certain tissues (i.e., placenta, cord) is very important for research. The Resource Center for Health Science (RECHS) and Bio-databases Institute of Reproductive and Developmental Medicine (BIRD) have estab- lished Bio-bank and initiated a project based on the development and utilization of bio-resources/database, comprising personal health records (PHR), such as health/medical records including individual records of daily diet and exercise, physically consolidated with bio-resources, taken from the same individuals. These Bio-Resources/Database projects are very important for the establishment of preemptive medicine and un- derstanding the mechanisms of the developmental origins of health and diseases.
2009-01-01
In order to identify new markers around the glaucoma locus GLC1B as a tool to refine its critical region at 2p11.2-2q11.2, we searched the critical region sequence obtained from the UCSC database for tetranucleotide (GATA)n and (GTCT)n repeats of at least 10 units in length. Three out of four potential microsatellite loci were found to be polymorphic, heterozygosity ranging from 64.56% to 79.59%. The identified markers are useful not only for GLC1B locus but also for the study of other disease loci at 2p11.2-2q11.2, a region with scarcity of microsatellite markers. PMID:21637444
Huang, Jing; Guo, Na; Li, Yinghui; Sun, Jutao; Hu, Guanjun; Zhang, Haipeng; Li, Yanfei; Zhang, Xing; Zhao, Jinming; Xing, Han; Qiu, Lijuan
2016-06-18
Phytophthora root and stem rot (PRR) caused by Phytophthora sojae is one of the most serious diseases affecting soybean (Glycine max (L.) Merr.) production all over the world. The most economical and environmentally-friendly way to control the disease is the exploration and utilization of resistant varieties. We screened a soybean mini core collection composed of 224 germplasm accessions for resistance against eleven P. sojae isolates. Soybean accessions from the Southern and Huanghuai regions, especially the Hubei, Jiangsu, Sichuan and Fujian provinces, had the most varied and broadest spectrum of resistance. Based on gene postulation, Rps1b, Rps1c, Rps4, Rps7 and novel resistance genes were identified in resistant accessions. Consequently, association mapping of resistance to each isolate was performed with 1,645 single nucleotide polymorphism (SNP) markers. A total of 14 marker-trait associations for Phytophthora resistance were identified. Among them, four were located in known PRR resistance loci intervals, five were located in other disease resistance quantitative trait locus (QTL) regions, and five associations unmasked novel loci for PRR resistance. In addition, we also identified candidate genes related to resistance. This is the first P. sojae resistance evaluation conducted using the Chinese soybean mini core collection, which is a representative sample of Chinese soybean cultivars. The resistance reaction analyses provided an excellent database of resistant resources and genetic variations for future breeding programs. The SNP markers associated with resistance will facilitate marker-assisted selection (MAS) in breeding programs for resistance to PRR, and the candidate genes may be useful for exploring the mechanism underlying P. sojae resistance.
Protein Bioinformatics Databases and Resources
Chen, Chuming; Huang, Hongzhan; Wu, Cathy H.
2017-01-01
Many publicly available data repositories and resources have been developed to support protein related information management, data-driven hypothesis generation and biological knowledge discovery. To help researchers quickly find the appropriate protein related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era. PMID:28150231
Tragante, Vinicius; Barnes, Michael R; Ganesh, Santhi K; Lanktree, Matthew B; Guo, Wei; Franceschini, Nora; Smith, Erin N; Johnson, Toby; Holmes, Michael V; Padmanabhan, Sandosh; Karczewski, Konrad J; Almoguera, Berta; Barnard, John; Baumert, Jens; Chang, Yen-Pei Christy; Elbers, Clara C; Farrall, Martin; Fischer, Mary E; Gaunt, Tom R; Gho, Johannes M I H; Gieger, Christian; Goel, Anuj; Gong, Yan; Isaacs, Aaron; Kleber, Marcus E; Mateo Leach, Irene; McDonough, Caitrin W; Meijs, Matthijs F L; Melander, Olle; Nelson, Christopher P; Nolte, Ilja M; Pankratz, Nathan; Price, Tom S; Shaffer, Jonathan; Shah, Sonia; Tomaszewski, Maciej; van der Most, Peter J; Van Iperen, Erik P A; Vonk, Judith M; Witkowska, Kate; Wong, Caroline O L; Zhang, Li; Beitelshees, Amber L; Berenson, Gerald S; Bhatt, Deepak L; Brown, Morris; Burt, Amber; Cooper-DeHoff, Rhonda M; Connell, John M; Cruickshanks, Karen J; Curtis, Sean P; Davey-Smith, George; Delles, Christian; Gansevoort, Ron T; Guo, Xiuqing; Haiqing, Shen; Hastie, Claire E; Hofker, Marten H; Hovingh, G Kees; Kim, Daniel S; Kirkland, Susan A; Klein, Barbara E; Klein, Ronald; Li, Yun R; Maiwald, Steffi; Newton-Cheh, Christopher; O'Brien, Eoin T; Onland-Moret, N Charlotte; Palmas, Walter; Parsa, Afshin; Penninx, Brenda W; Pettinger, Mary; Vasan, Ramachandran S; Ranchalis, Jane E; M Ridker, Paul; Rose, Lynda M; Sever, Peter; Shimbo, Daichi; Steele, Laura; Stolk, Ronald P; Thorand, Barbara; Trip, Mieke D; van Duijn, Cornelia M; Verschuren, W Monique; Wijmenga, Cisca; Wyatt, Sharon; Young, J Hunter; Zwinderman, Aeilko H; Bezzina, Connie R; Boerwinkle, Eric; Casas, Juan P; Caulfield, Mark J; Chakravarti, Aravinda; Chasman, Daniel I; Davidson, Karina W; Doevendans, Pieter A; Dominiczak, Anna F; FitzGerald, Garret A; Gums, John G; Fornage, Myriam; Hakonarson, Hakon; Halder, Indrani; Hillege, Hans L; Illig, Thomas; Jarvik, Gail P; Johnson, Julie A; Kastelein, John J P; Koenig, Wolfgang; Kumari, Meena; März, Winfried; Murray, Sarah S; O'Connell, Jeffery R; Oldehinkel, Albertine J; Pankow, James S; Rader, Daniel J; Redline, Susan; Reilly, Muredach P; Schadt, Eric E; Kottke-Marchant, Kandice; Snieder, Harold; Snyder, Michael; Stanton, Alice V; Tobin, Martin D; Uitterlinden, André G; van der Harst, Pim; van der Schouw, Yvonne T; Samani, Nilesh J; Watkins, Hugh; Johnson, Andrew D; Reiner, Alex P; Zhu, Xiaofeng; de Bakker, Paul I W; Levy, Daniel; Asselbergs, Folkert W; Munroe, Patricia B; Keating, Brendan J
2014-03-06
Blood pressure (BP) is a heritable risk factor for cardiovascular disease. To investigate genetic associations with systolic BP (SBP), diastolic BP (DBP), mean arterial pressure (MAP), and pulse pressure (PP), we genotyped ~50,000 SNPs in up to 87,736 individuals of European ancestry and combined these in a meta-analysis. We replicated findings in an independent set of 68,368 individuals of European ancestry. Our analyses identified 11 previously undescribed associations in independent loci containing 31 genes including PDE1A, HLA-DQB1, CDK6, PRKAG2, VCL, H19, NUCB2, RELA, HOXC@ complex, FBN1, and NFAT5 at the Bonferroni-corrected array-wide significance threshold (p < 6 × 10(-7)) and confirmed 27 previously reported associations. Bioinformatic analysis of the 11 loci provided support for a putative role in hypertension of several genes, such as CDK6 and NUCB2. Analysis of potential pharmacological targets in databases of small molecules showed that ten of the genes are predicted to be a target for small molecules. In summary, we identified previously unknown loci associated with BP. Our findings extend our understanding of genes involved in BP regulation, which may provide new targets for therapeutic intervention or drug response stratification. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Li, Zhenghui; Zhang, Jian; Zhang, Hantao; Lin, Ziqing; Ye, Jian
2018-05-01
Short tandem repeats (STRs) play a vitally important role in forensics. Population data is needed to improve the field. There is currently no large population data-based data set in Chamdo Tibetan. In our study, the allele frequencies and forensic statistical parameters of 18 autosomal STR loci (D5S818, D21S11, D7S820, CSF1PO, D2S1338, D3S1358, VWA, D8S1179, D16S539, PentaE, TPOX, TH01, D19S433, D18S51, FGA, D6S1043, D13S317, and D12S391) included in the DNATyper™19 kit were investigated in 2249 healthy, unrelated Tibetan subjects living in Tibet Chamdo, Southwest China. The combined power of discrimination and the combined probability of exclusion of all 18 loci were 0.9999999999999999999998174 and 0.99999994704, respectively. Furthermore, the genetic relationship between our Tibetan group and 33 previously published populations was also investigated. Phylogenetic analyses revealed that the Chamdo Tibetan population is more closely related genetically with the Lhasa Tibetan group. Our results suggest that these autosomal STR loci are highly polymorphic in the Tibetan population living in Tibet Chamdo and can be used as a powerful tool in forensics, linguistics, and population genetic analyses.
Zhang, Honghua; Xia, Mingying; Qi, Lijie; Dong, Lei; Song, Shuang; Ma, Teng; Yang, Shuping; Jin, Li; Li, Liming; Li, Shilin
2016-05-01
Estimating the allele frequencies and forensic statistical parameters of commonly used short tandem repeat (STR) loci of the Uyghur population, which is the fifth largest group in China, provides a more precise reference database for forensic investigation. The 6-dye GlobalFiler™ Express PCR Amplification kit incorporates 21 autosomal STRs, which have been proven that could provide reliable DNA typing results and enhance the power of discrimination. Here we analyzed the GlobalFiler STR loci on 1962 unrelated individuals from Chinese Uyghur population of Xinjiang, China. No significant deviations from Hardy-Weinberg equilibrium and linkage disequilibrium were detected within and between the GlobalFiler STR loci. SE33 showed the greatest power of discrimination in Uyghur population, whereas TPOX showed the lowest. The combined power of discrimination was 99.999999999999999999999998746%. No significant difference was observed between Uyghur and the other two Uyghur populations at all tested STRs, as well as Dai and Mongolian. Significant differences were only observed between Uyghur and other Chinese populations at TH01, as well as Central-South Asian at D13S317, East Asian at TH01 and VWA. The phylogenetic analysis showed that Uyghur is genetically close to Chinese populations, as well as East Asian and Central-South Asian. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A comparative study of six European databases of medically oriented Web resources.
Abad García, Francisca; González Teruel, Aurora; Bayo Calduch, Patricia; de Ramón Frias, Rosa; Castillo Blasco, Lourdes
2005-10-01
The paper describes six European medically oriented databases of Web resources, pertaining to five quality-controlled subject gateways, and compares their performance. The characteristics, coverage, procedure for selecting Web resources, record structure, searching possibilities, and existence of user assistance were described for each database. Performance indicators for each database were obtained by means of searches carried out using the key words, "myocardial infarction." Most of the databases originated in the 1990s in an academic or library context and include all types of Web resources of an international nature. Five databases use Medical Subject Headings. The number of fields per record varies between three and nineteen. The language of the search interfaces is mostly English, and some of them allow searches in other languages. In some databases, the search can be extended to Pubmed. Organizing Medical Networked Information, Catalogue et Index des Sites Médicaux Francophones, and Diseases, Disorders and Related Topics produced the best results. The usefulness of these databases as quick reference resources is clear. In addition, their lack of content overlap means that, for the user, they complement each other. Their continued survival faces three challenges: the instability of the Internet, maintenance costs, and lack of use in spite of their potential usefulness.
Database resources of the National Center for Biotechnology Information
Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Miller, Vadim; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Shumway, Martin; Sequeira, Edwin; Sherry, Steven T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L.; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene
2008-01-01
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:18045790
Initiative for standardization of reporting genetics of male infertility.
Traven, Eva; Ogrinc, Ana; Kunej, Tanja
2017-02-01
The number of publications on research of male infertility is increasing. Technologies used in research of male infertility generate complex results and various types of data that need to be appropriately managed, arranged, and made available to other researchers for further use. In our previous study, we collected over 800 candidate loci for male fertility in seven mammalian species. However, the continuation of the work towards a comprehensive database of candidate genes associated with different types of idiopathic human male infertility is challenging due to fragmented information, obtained from a variety of technologies and various omics approaches. Results are published in different forms and usually need to be excavated from the text, which hinders the gathering of information. Standardized reporting of genetic anomalies as well as causative and risk factors of male infertility therefore presents an important issue. The aim of the study was to collect examples of diverse genomic loci published in association with human male infertility and to propose a standardized format for reporting genetic causes of male infertility. From the currently available data we have selected 75 studies reporting 186 representative genomic loci which have been proposed as genetic risk factors for male infertility. Based on collected and formatted data, we suggested a first step towards unification of reporting the genetics of male infertility in original and review studies. The proposed initiative consists of five relevant data types: 1) genetic locus, 2) race/ethnicity, number of participants (infertile/controls), 3) methodology, 4) phenotype (clinical data, disease ontology, and disease comorbidity), and 5) reference. The proposed form for standardized reporting presents a baseline for further optimization with additional genetic and clinical information. This data standardization initiative will enable faster multi-omics data integration, database development and sharing, establishing more targeted hypotheses, and facilitating biomarker discovery.
Genome-wide distribution comparative and composition analysis of the SSRs in Poaceae.
Wang, Yi; Yang, Chao; Jin, Qiaojun; Zhou, Dongjie; Wang, Shuangshuang; Yu, Yuanjie; Yang, Long
2015-02-15
The Poaceae family is of great importance to human beings since it comprises the cereal grasses which are the main sources for human food and animal feed. With the rapid growth of genomic data from Poaceae members, comparative genomics becomes a convinent method to study genetics of diffierent species. The SSRs (Simple Sequence Repeats) are widely used markers in the studies of Poaceae for their high abundance and stability. In this study, using the genomic sequences of 9 Poaceae species, we detected 11,993,943 SSR loci and developed 6,799,910 SSR primer pairs. The results show that SSRs are distributed on all the genomic elements in grass. Hexamer is the most frequent motif and AT/TA is the most frequent motif in dimer. The abundance of the SSRs has a positive linear relationship with the recombination rate. SSR sequences in the coding regions involve a higher GC content in the Poaceae than that in the other species. SSRs of 70-80 bp in length showed the highest AT/GC base ratio among all of these loci. The result shows the highest polymorphism rate belongs to the SSRs ranged from 30 bp to 40 bp. Using all the SSR primers of Japonica, nineteen universal primers were selected and located on the genome of the grass family. The information of SSR loci, the SSR primers and the tools of mining and analyzing SSR are provided in the PSSRD (Poaceae SSR Database, http://biodb.sdau.edu.cn/pssrd/). Our study and the PSSRD database provide a foundation for the comparative study in the Poaceae and it will accelerate the study on markers application, gene mapping and molecular breeding.
Wang, Baosheng; Khalili Mahani, Marjan; Ng, Wei Lun; Kusumi, Junko; Phi, Hai Hong; Inomata, Nobuyuki; Wang, Xiao-Ru; Szmidt, Alfred E
2014-01-01
Pinus krempfii Lecomte is a morphologically and ecologically unique pine, endemic to Vietnam. It is regarded as vulnerable species with distribution limited to just two provinces: Khanh Hoa and Lam Dong. Although a few phylogenetic studies have included this species, almost nothing is known about its genetic features. In particular, there are no studies addressing the levels and patterns of genetic variation in natural populations of P. krempfii. In this study, we sampled 57 individuals from six natural populations of P. krempfii and analyzed their sequence variation in ten nuclear gene regions (approximately 9 kb) and 14 mitochondrial (mt) DNA regions (approximately 10 kb). We also analyzed variation at seven chloroplast (cp) microsatellite (SSR) loci. We found very low haplotype and nucleotide diversity at nuclear loci compared with other pine species. Furthermore, all investigated populations were monomorphic across all mitochondrial DNA (mtDNA) regions included in our study, which are polymorphic in other pine species. Population differentiation at nuclear loci was low (5.2%) but significant. However, structure analysis of nuclear loci did not detect genetically differentiated groups of populations. Approximate Bayesian computation (ABC) using nuclear sequence data and mismatch distribution analysis for cpSSR loci suggested recent expansion of the species. The implications of these findings for the management and conservation of P. krempfii genetic resources were discussed. PMID:25360263
Davis, G L; McMullen, M D; Baysdorfer, C; Musket, T; Grant, D; Staebell, M; Xu, G; Polacco, M; Koster, L; Melia-Hancock, S; Houchins, K; Chao, S; Coe, E H
1999-01-01
We have constructed a 1736-locus maize genome map containing1156 loci probed by cDNAs, 545 probed by random genomic clones, 16 by simple sequence repeats (SSRs), 14 by isozymes, and 5 by anonymous clones. Sequence information is available for 56% of the loci with 66% of the sequenced loci assigned functions. A total of 596 new ESTs were mapped from a B73 library of 5-wk-old shoots. The map contains 237 loci probed by barley, oat, wheat, rice, or tripsacum clones, which serve as grass genome reference points in comparisons between maize and other grass maps. Ninety core markers selected for low copy number, high polymorphism, and even spacing along the chromosome delineate the 100 bins on the map. The average bin size is 17 cM. Use of bin assignments enables comparison among different maize mapping populations and experiments including those involving cytogenetic stocks, mutants, or quantitative trait loci. Integration of nonmaize markers in the map extends the resources available for gene discovery beyond the boundaries of maize mapping information into the expanse of map, sequence, and phenotype information from other grass species. This map provides a foundation for numerous basic and applied investigations including studies of gene organization, gene and genome evolution, targeted cloning, and dissection of complex traits. PMID:10388831
Freely Accessible Chemical Database Resources of Compounds for in Silico Drug Discovery.
Yang, JingFang; Wang, Di; Jia, Chenyang; Wang, Mengyao; Hao, GeFei; Yang, GuangFu
2018-05-07
In silico drug discovery has been proved to be a solidly established key component in early drug discovery. However, this task is hampered by the limitation of quantity and quality of compound databases for screening. In order to overcome these obstacles, freely accessible database resources of compounds have bloomed in recent years. Nevertheless, how to choose appropriate tools to treat these freely accessible databases are crucial. To the best of our knowledge, this is the first systematic review on this issue. The existed advantages and drawbacks of chemical databases were analyzed and summarized based on the collected six categories of freely accessible chemical databases from literature in this review. Suggestions on how and in which conditions the usage of these databases could be reasonable were provided. Tools and procedures for building 3D structure chemical libraries were also introduced. In this review, we described the freely accessible chemical database resources for in silico drug discovery. In particular, the chemical information for building chemical database appears as attractive resources for drug design to alleviate experimental pressure. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Database resources of the National Center for Biotechnology Information
Wheeler, David L.; Church, Deanna M.; Lash, Alex E.; Leipe, Detlef D.; Madden, Thomas L.; Pontius, Joan U.; Schuler, Gregory D.; Schriml, Lynn M.; Tatusova, Tatiana A.; Wagner, Lukas; Rapp, Barbara A.
2001-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources that operate on the data in GenBank and a variety of other biological data made available through NCBI’s Web site. NCBI data retrieval resources include Entrez, PubMed, LocusLink and the Taxonomy Browser. Data analysis resources include BLAST, Electronic PCR, OrfFinder, RefSeq, UniGene, HomoloGene, Database of Single Nucleotide Polymorphisms (dbSNP), Human Genome Sequencing, Human MapViewer, GeneMap’99, Human–Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, Cancer Genome Anatomy Project (CGAP), SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB) and the Conserved Domain Database (CDD). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov. PMID:11125038
Emission & Generation Resource Integrated Database (eGRID)
The Emissions & Generation Resource Integrated Database (eGRID) is an integrated source of data on environmental characteristics of electric power generation. Twelve federal databases are represented by eGRID, which provides air emission and resource mix information for thousands of power plants and generating companies. eGRID allows direct comparison of the environmental attributes of electricity from different plants, companies, States, or regions of the power grid.
National Vulnerability Database (NVD)
National Institute of Standards and Technology Data Gateway
National Vulnerability Database (NVD) (Web, free access) NVD is a comprehensive cyber security vulnerability database that integrates all publicly available U.S. Government vulnerability resources and provides references to industry resources. It is based on and synchronized with the CVE vulnerability naming standard.
Towards BioDBcore: a community-defined information specification for biological databases
Gaudet, Pascale; Bairoch, Amos; Field, Dawn; Sansone, Susanna-Assunta; Taylor, Chris; Attwood, Teresa K.; Bateman, Alex; Blake, Judith A.; Bult, Carol J.; Cherry, J. Michael; Chisholm, Rex L.; Cochrane, Guy; Cook, Charles E.; Eppig, Janan T.; Galperin, Michael Y.; Gentleman, Robert; Goble, Carole A.; Gojobori, Takashi; Hancock, John M.; Howe, Douglas G.; Imanishi, Tadashi; Kelso, Janet; Landsman, David; Lewis, Suzanna E.; Mizrachi, Ilene Karsch; Orchard, Sandra; Ouellette, B. F. Francis; Ranganathan, Shoba; Richardson, Lorna; Rocca-Serra, Philippe; Schofield, Paul N.; Smedley, Damian; Southan, Christopher; Tan, Tin Wee; Tatusova, Tatiana; Whetzel, Patricia L.; White, Owen; Yamasaki, Chisato
2011-01-01
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases. PMID:21097465
Towards BioDBcore: a community-defined information specification for biological databases
Gaudet, Pascale; Bairoch, Amos; Field, Dawn; Sansone, Susanna-Assunta; Taylor, Chris; Attwood, Teresa K.; Bateman, Alex; Blake, Judith A.; Bult, Carol J.; Cherry, J. Michael; Chisholm, Rex L.; Cochrane, Guy; Cook, Charles E.; Eppig, Janan T.; Galperin, Michael Y.; Gentleman, Robert; Goble, Carole A.; Gojobori, Takashi; Hancock, John M.; Howe, Douglas G.; Imanishi, Tadashi; Kelso, Janet; Landsman, David; Lewis, Suzanna E.; Karsch Mizrachi, Ilene; Orchard, Sandra; Ouellette, B.F. Francis; Ranganathan, Shoba; Richardson, Lorna; Rocca-Serra, Philippe; Schofield, Paul N.; Smedley, Damian; Southan, Christopher; Tan, Tin W.; Tatusova, Tatiana; Whetzel, Patricia L.; White, Owen; Yamasaki, Chisato
2011-01-01
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases. PMID:21205783
Yu, Long-Xi
2017-01-01
Alfalfa is a worldwide grown forage crop and is important due to its high biomass production and nutritional value. However, the production of alfalfa is challenged by adverse environmental factors such as drought and other stresses. Developing drought resistance alfalfa is an important breeding target for enhancing alfalfa productivity in arid and semi-arid regions. In the present study, we used genotyping-by-sequencing and genome-wide association to identify marker loci associated with biomass yield under drought in the field in a panel of diverse germplasm of alfalfa. A total of 28 markers at 22 genetic loci were associated with yield under water deficit, whereas only four markers associated with the same trait under well-watered condition. Comparisons of marker-trait associations between water deficit and well-watered conditions showed non-similarity except one. Most of the markers were identical across harvest periods within the treatment, although different levels of significance were found among the three harvests. The loci associated with biomass yield under water deficit located throughout all chromosomes in the alfalfa genome agreed with previous reports. Our results suggest that biomass yield under drought is a complex quantitative trait with polygenic inheritance and may involve a different mechanism compared to that of non-stress. BLAST searches of the flanking sequences of the associated loci against DNA databases revealed several stress-responsive genes linked to the drought resistance loci, including leucine-rich repeat receptor-like kinase, B3 DNA-binding domain protein, translation initiation factor IF2, and phospholipase-like protein. With further investigation, those markers closely linked to drought resistance can be used for MAS to accelerate the development of new alfalfa cultivars with improved resistance to drought and other abiotic stresses. PMID:28706532
Jin, Han Jun; Kim, Ki Cheol; Yoon, Cha Eun; Kim, Wook
2013-11-01
We analyzed the variation of eighteen miniSTR loci in 411 randomly chosen individuals from Korea to increase the probability that a degraded sample can be typed, as well as to provide an expanded and reliable population database. Six multiplex PCR systems were developed (multiplex I: D1S1677, D2S441 and D4S2364; multiplex II: D10S1248, D14S1434 and D22S1045; multiplex III: D12S391, D16S3253 and D20S161; multiplex IV: D3S4529, D8S1115 and D18S853; multiplex V: D6S1017, D11S4463 and D17S1301; multiplex VI: D5S2500, D9S1122 and D21S1437). Allele frequencies and forensic parameters were calculated to evaluate the suitability and robustness of these non-CODIS miniSTR systems. No significant deviation from Hardy-Weinberg equilibrium expectations were observed, except for D4S2364, D5S2500 and D20S161 loci. A multidimensional scaling plot based on allele frequencies of the six miniSTR loci (D1S1677, D2S441, D4S2364, D10S1248, D14S1434 and D22S1045) showed that Koreans appeared to have most genetic affinity with Chinese and Japanese than to other Eurasian populations compared here. The combined probability of match calculated from the 18 miniSTR loci was 2.902 × 10(-17), indicating a high degree of polymorphism. Thus, the 18 miniSTR loci can be suitable for recovering useful information for analyzing degraded forensic casework samples and for adding supplementary genetic information for a variety of analyses involving closely related individuals where there is a need for additional genetic information. Copyright © 2013 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Temperature-responsive genetic loci in the plant pathogen Pseudomonas syringae pv. glycinea.
Ullrich, M S; Schergaut, M; Boch, J; Ullrich, B
2000-10-01
Plant-pathogenic bacteria may sense variations in environmental factors, such as temperature, to adapt to plant-associated habitats during pathogenesis or epiphytic growth. The bacterial blight pathogen of soybean, Pseudomonas syringae pv. glycinea PG4180, preferentially produces the phytotoxin coronatine at 18 degrees C and infects the host plant under conditions of low temperature and high humidity. A miniTn5-based promoterless glucuronidase (uidA) reporter gene was used to identify genetic loci of PG4180 preferentially expressed at 18 or 28 degrees C. Out of 7500 transposon mutants, 61 showed thermoregulated uidA expression as determined by a three-step screening procedure. Two-thirds of these mutants showed an increased reporter gene expression at 18 degrees C whilst the remainder exhibited higher uidA expression at 28 degrees C. MiniTn5-uidA insertion loci from these mutants were subcloned and their nucleotide sequences were determined. Several of the mutants induced at 18 degrees C contained the miniTn5-uidA insertion within the 32.8 kb coronatine biosynthetic gene cluster. Among the other mutants with increased uidA expression at 18 degrees C, insertions were found in genes encoding formaldehyde dehydrogenase, short-chain dehydrogenase and mannuronan C-5-epimerase, in a plasmid-borne replication protein, and in the hrpT locus, involved in pathogenicity of P. syringae. Among the mutants induced at 28 degrees C, insertions disrupted loci with similarities to a repressor of conjugal plasmid transfer, UV resistance determinants, an isoflavanoid-degrading enzyme, a HU-like DNA-binding protein, two additional regulatory proteins, a homologue of bacterial adhesins, transport proteins, LPS synthesis enzymes and two proteases. Genetic loci from 13 mutants did not show significant similarities to any database entries. Results of plant inoculations showed that three of the mutants tested were inhibited in symptom development and in planta multiplication rates. Temperature-shift experiments suggested that all of the identified loci showed a rather slow induction of expression upon change of temperature.
NASA Astrophysics Data System (ADS)
Wang, Jian
2017-01-01
In order to change traditional PE teaching mode and realize the interconnection, interworking and sharing of PE teaching resources, a distance PE teaching platform based on broadband network is designed and PE teaching information resource database is set up. The designing of PE teaching information resource database takes Windows NT 4/2000Server as operating system platform, Microsoft SQL Server 7.0 as RDBMS, and takes NAS technology for data storage and flow technology for video service. The analysis of system designing and implementation shows that the dynamic PE teaching information resource sharing platform based on Web Service can realize loose coupling collaboration, realize dynamic integration and active integration and has good integration, openness and encapsulation. The distance PE teaching platform based on Web Service and the design scheme of PE teaching information resource database can effectively solve and realize the interconnection, interworking and sharing of PE teaching resources and adapt to the informatization development demands of PE teaching.
Ridyard, Colin H; Hughes, Dyfrig A
2012-01-01
Health economists frequently rely on methods based on patient recall to estimate resource utilization. Access to questionnaires and diaries, however, is often limited. This study examined the feasibility of establishing an open-access Database of Instruments for Resource-Use Measurement, identified relevant fields for data extraction, and outlined its design. An electronic survey was sent to authors of full UK economic evaluations listed in the National Health Service Economic Evaluation Database (2008-2010), authors of monographs of Health Technology Assessments (1998-2010), and subscribers to the JISCMail health economics e-mailing list. The survey included questions on piloting, validation, recall period, and data capture method. Responses were analyzed and data extracted to generate relevant fields for the database. A total of 143 responses to the survey provided data on 54 resource-use instruments for inclusion in the database. All were reliant on patient or carer recall, and a majority (47) were questionnaires. Thirty-seven were designed for self-completion by the patient, carer, or guardian, and the remainder were designed for completion by researchers or health care professionals while interviewing patients. Methods of development were diverse, particularly in areas such as the planning of resource itemization (evident in 25 instruments), piloting (25), and validation (29). On the basis of the present analysis, we developed a Web-enabled Database of Instruments for Resource-Use Measurement, accessible via www.DIRUM.org. This database may serve as a practical resource for health economists, as well as a means to facilitate further research in the area of resource-use data collection. Copyright © 2012 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
2012-01-01
Background Cotton is the world’s most important natural textile fiber and a significant oilseed crop. Decoding cotton genomes will provide the ultimate reference and resource for research and utilization of the species. Integration of high-density genetic maps with genomic sequence information will largely accelerate the process of whole-genome assembly in cotton. Results In this paper, we update a high-density interspecific genetic linkage map of allotetraploid cultivated cotton. An additional 1,167 marker loci have been added to our previously published map of 2,247 loci. Three new marker types, InDel (insertion-deletion) and SNP (single nucleotide polymorphism) developed from gene information, and REMAP (retrotransposon-microsatellite amplified polymorphism), were used to increase map density. The updated map consists of 3,414 loci in 26 linkage groups covering 3,667.62 cM with an average inter-locus distance of 1.08 cM. Furthermore, genome-wide sequence analysis was finished using 3,324 informative sequence-based markers and publicly-available Gossypium DNA sequence information. A total of 413,113 EST and 195 BAC sequences were physically anchored and clustered by 3,324 sequence-based markers. Of these, 14,243 ESTs and 188 BACs from different species of Gossypium were clustered and specifically anchored to the high-density genetic map. A total of 2,748 candidate unigenes from 2,111 ESTs clusters and 63 BACs were mined for functional annotation and classification. The 337 ESTs/genes related to fiber quality traits were integrated with 132 previously reported cotton fiber quality quantitative trait loci, which demonstrated the important roles in fiber quality of these genes. Higher-level sequence conservation between different cotton species and between the A- and D-subgenomes in tetraploid cotton was found, indicating a common evolutionary origin for orthologous and paralogous loci in Gossypium. Conclusion This study will serve as a valuable genomic resource for tetraploid cotton genome assembly, for cloning genes related to superior agronomic traits, and for further comparative genomic analyses in Gossypium. PMID:23046547
A second-generation anchored genetic linkage map of the tammar wallaby (Macropus eugenii)
2011-01-01
Background The tammar wallaby, Macropus eugenii, a small kangaroo used for decades for studies of reproduction and metabolism, is the model Australian marsupial for genome sequencing and genetic investigations. The production of a more comprehensive cytogenetically-anchored genetic linkage map will significantly contribute to the deciphering of the tammar wallaby genome. It has great value as a resource to identify novel genes and for comparative studies, and is vital for the ongoing genome sequence assembly and gene ordering in this species. Results A second-generation anchored tammar wallaby genetic linkage map has been constructed based on a total of 148 loci. The linkage map contains the original 64 loci included in the first-generation map, plus an additional 84 microsatellite loci that were chosen specifically to increase coverage and assist with the anchoring and orientation of linkage groups to chromosomes. These additional loci were derived from (a) sequenced BAC clones that had been previously mapped to tammar wallaby chromosomes by fluorescence in situ hybridization (FISH), (b) End sequence from BACs subsequently FISH-mapped to tammar wallaby chromosomes, and (c) tammar wallaby genes orthologous to opossum genes predicted to fill gaps in the tammar wallaby linkage map as well as three X-linked markers from a published study. Based on these 148 loci, eight linkage groups were formed. These linkage groups were assigned (via FISH-mapped markers) to all seven autosomes and the X chromosome. The sex-pooled map size is 1402.4 cM, which is estimated to provide 82.6% total coverage of the genome, with an average interval distance of 10.9 cM between adjacent markers. The overall ratio of female/male map length is 0.84, which is comparable to the ratio of 0.78 obtained for the first-generation map. Conclusions Construction of this second-generation genetic linkage map is a significant step towards complete coverage of the tammar wallaby genome and considerably extends that of the first-generation map. It will be a valuable resource for ongoing tammar wallaby genetic research and assembling the genome sequence. The sex-pooled map is available online at http://compldb.angis.org.au/. PMID:21854616
A second-generation anchored genetic linkage map of the tammar wallaby (Macropus eugenii).
Wang, Chenwei; Webley, Lee; Wei, Ke-jun; Wakefield, Matthew J; Patel, Hardip R; Deakin, Janine E; Alsop, Amber; Marshall Graves, Jennifer A; Cooper, Desmond W; Nicholas, Frank W; Zenger, Kyall R
2011-08-19
The tammar wallaby, Macropus eugenii, a small kangaroo used for decades for studies of reproduction and metabolism, is the model Australian marsupial for genome sequencing and genetic investigations. The production of a more comprehensive cytogenetically-anchored genetic linkage map will significantly contribute to the deciphering of the tammar wallaby genome. It has great value as a resource to identify novel genes and for comparative studies, and is vital for the ongoing genome sequence assembly and gene ordering in this species. A second-generation anchored tammar wallaby genetic linkage map has been constructed based on a total of 148 loci. The linkage map contains the original 64 loci included in the first-generation map, plus an additional 84 microsatellite loci that were chosen specifically to increase coverage and assist with the anchoring and orientation of linkage groups to chromosomes. These additional loci were derived from (a) sequenced BAC clones that had been previously mapped to tammar wallaby chromosomes by fluorescence in situ hybridization (FISH), (b) End sequence from BACs subsequently FISH-mapped to tammar wallaby chromosomes, and (c) tammar wallaby genes orthologous to opossum genes predicted to fill gaps in the tammar wallaby linkage map as well as three X-linked markers from a published study. Based on these 148 loci, eight linkage groups were formed. These linkage groups were assigned (via FISH-mapped markers) to all seven autosomes and the X chromosome. The sex-pooled map size is 1402.4 cM, which is estimated to provide 82.6% total coverage of the genome, with an average interval distance of 10.9 cM between adjacent markers. The overall ratio of female/male map length is 0.84, which is comparable to the ratio of 0.78 obtained for the first-generation map. Construction of this second-generation genetic linkage map is a significant step towards complete coverage of the tammar wallaby genome and considerably extends that of the first-generation map. It will be a valuable resource for ongoing tammar wallaby genetic research and assembling the genome sequence. The sex-pooled map is available online at http://compldb.angis.org.au/.
Enrichment of putative PAX8 target genes at serous epithelial ovarian cancer susceptibility loci.
Kar, Siddhartha P; Adler, Emily; Tyrer, Jonathan; Hazelett, Dennis; Anton-Culver, Hoda; Bandera, Elisa V; Beckmann, Matthias W; Berchuck, Andrew; Bogdanova, Natalia; Brinton, Louise; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Dansonka-Mieszkowska, Agnieszka; Doherty, Jennifer Anne; Dörk, Thilo; Dürst, Matthias; Eccles, Diana; Fasching, Peter A; Flanagan, James; Gentry-Maharaj, Aleksandra; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Gronwald, Jacek; Heitz, Florian; Hildebrandt, Michelle A T; Høgdall, Estrid; Høgdall, Claus K; Huntsman, David G; Jensen, Allan; Karlan, Beth Y; Kelemen, Linda E; Kiemeney, Lambertus A; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Levine, Douglas A; Li, Qiyuan; Lissowska, Jolanta; Lu, Karen H; Lubiński, Jan; Massuger, Leon F A G; McGuire, Valerie; McNeish, Iain; Menon, Usha; Modugno, Francesmary; Monteiro, Alvaro N; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Permuth, Jennifer B; Phelan, Catherine; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rossing, Mary Anne; Salvesen, Helga B; Schildkraut, Joellen M; Sellers, Thomas A; Sherman, Mark; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa; Terry, Kathryn L; Tworoger, Shelley S; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wu, Anna H; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P; Lawrenson, Kate
2017-02-14
Genome-wide association studies (GWAS) have identified 18 loci associated with serous ovarian cancer (SOC) susceptibility but the biological mechanisms driving these findings remain poorly characterised. Germline cancer risk loci may be enriched for target genes of transcription factors (TFs) critical to somatic tumorigenesis. All 615 TF-target sets from the Molecular Signatures Database were evaluated using gene set enrichment analysis (GSEA) and three GWAS for SOC risk: discovery (2196 cases/4396 controls), replication (7035 cases/21 693 controls; independent from discovery), and combined (9627 cases/30 845 controls; including additional individuals). The PAX8-target gene set was ranked 1/615 in the discovery (P GSEA <0.001; FDR=0.21), 7/615 in the replication (P GSEA =0.004; FDR=0.37), and 1/615 in the combined (P GSEA <0.001; FDR=0.21) studies. Adding other genes reported to interact with PAX8 in the literature to the PAX8-target set and applying an alternative to GSEA, interval enrichment, further confirmed this association (P=0.006). Fifteen of the 157 genes from this expanded PAX8 pathway were near eight loci associated with SOC risk at P<10 -5 (including six with P<5 × 10 -8 ). The pathway was also associated with differential gene expression after shRNA-mediated silencing of PAX8 in HeyA8 (P GSEA =0.025) and IGROV1 (P GSEA =0.004) SOC cells and several PAX8 targets near SOC risk loci demonstrated in vitro transcriptomic perturbation. Putative PAX8 target genes are enriched for common SOC risk variants. This finding from our agnostic evaluation is of particular interest given that PAX8 is well-established as a specific marker for the cell of origin of SOC.
Lu, Xin; Zhou, Haijian; Du, Xiaoli; Liu, Sha; Xu, Jialiang; Cui, Zhigang; Pang, Bo; Kan, Biao
2016-11-01
Vibrio parahaemolyticus is a common seafood-borne pathogenic bacterium which causes gastroenteritis in humans. Continuous surveillance on the molecular characters of the clinical and environmental V. parahaemolyticus strains needs to be conducted for the epidemiological and genetic purposes. To generate a picture of the population distribution of V. parahaemolyticus in eastern China isolated from clinical cases of gastroenteritis and environmental samples, we investigated the genetic and evolutionary relationships of the strains using the commonly used multi-locus sequence typing (MLST, in which seven house-keeping genes are used in the protocol). A highly genetic diversity within the V. parahaemolyticus population was observed but ST3 was still dominant in the clinical strains, and 103 new sequence types (ST) were found in the clinical strains by searching in the global V. parahaemolyticus MLST database. With these genetically diverse strains, we estimated the recombination rates of the loci in MLST analysis. The locus recA was found to be subject to exceptionally high rate of recombination, and the recombinant single nucleotide polymorphisms (SNPs) were also identified within the seven loci. The phylogenetic tree of the strains was re-constructed using the maximum likelihood method by removing the recombination SNPs of the seven loci, and the minimum spanning tree was re-constructed with the six loci without recA. Some changes were observed in comparison with the previously used methods, suggesting that the homologous recombination has roles in shaping the clonal structure of V. parahaemolyticus. We propose the recombination-free SNPs strategy in the clonality analysis of V. parahaemolyticus, especially when using the maximum likelihood method. Copyright © 2016. Published by Elsevier B.V.
Zhang, Jing; Zhang, Lu; Zhang, Yan; Yang, Jing; Guo, Mengbiao; Sun, Liangdan; Pan, Hai-Feng; Hirankarn, Nattiya; Ying, Dingge; Zeng, Shuai; Lee, Tsz Leung; Lau, Chak Sing; Chan, Tak Mao; Leung, Alexander Moon Ho; Mok, Chi Chiu; Wong, Sik Nin; Lee, Ka Wing; Ho, Marco Hok Kung; Lee, Pamela Pui Wah; Chung, Brian Hon-Yin; Chong, Chun Yin; Wong, Raymond Woon Sing; Mok, Mo Yin; Wong, Wilfred Hing Sang; Tong, Kwok Lung; Tse, Niko Kei Chiu; Li, Xiang-Pei; Avihingsanon, Yingyos; Rianthavorn, Pornpimol; Deekajorndej, Thavatchai; Suphapeetiporn, Kanya; Shotelersuk, Vorasuk; Ying, Shirley King Yee; Fung, Samuel Ka Shun; Lai, Wai Ming; Garcia-Barceló, Maria-Mercè; Cherny, Stacey S; Sham, Pak Chung; Cui, Yong; Yang, Sen; Ye, Dong Qing; Zhang, Xue-Jun; Lau, Yu Lung; Yang, Wanling
2015-11-01
Previous genome-wide association studies (GWAS), which were mainly based on single-variant analysis, have identified many systemic lupus erythematosus (SLE) susceptibility loci. However, the genetic architecture of this complex disease is far from being understood. The aim of this study was to investigate whether using a gene-based analysis may help to identify novel loci, by considering global evidence of association from a gene or a genomic region rather than focusing on evidence for individual variants. Based on the results of a meta-analysis of 2 GWAS of SLE conducted in 2 Asian cohorts, we performed an in-depth gene-based analysis followed by replication in a total of 4,626 patients and 7,466 control subjects of Asian ancestry. Differential allelic expression was measured by pyrosequencing. More than one-half of the reported SLE susceptibility loci showed evidence of independent effects, and this finding is important for understanding the mechanisms of association and explaining disease heritability. ANXA6 was detected as a novel SLE susceptibility gene, with several single-nucleotide polymorphisms (SNPs) contributing independently to the association with disease. The risk allele of rs11960458 correlated significantly with increased expression of ANXA6 in peripheral blood mononuclear cells from heterozygous healthy control subjects. Several other associated SNPs may also regulate ANXA6 expression, according to data obtained from public databases. Higher expression of ANXA6 in patients with SLE was also reported previously. Our study demonstrated the merit of using gene-based analysis to identify novel susceptibility loci, especially those with independent effects, and also demonstrated the widespread presence of loci with independent effects in SLE susceptibility genes. © 2015, American College of Rheumatology.
Identification of Susceptibility Loci and Genes for Colorectal Cancer Risk
Zeng, Chenjie; Matsuda, Koichi; Jia, Wei-Hua; Chang, Jiang; Kweon, Sun-Seog; Xiang, Yong-Bing; Shin, Aesun; Jee, Sun Ha; Kim, Dong-Hyun; Zhang, Ben; Cai, Qiuyin; Guo, Xingyi; Long, Jirong; Wang, Nan; Courtney, Regina; Pan, Zhi-Zhong; Wu, Chen; Takahashi, Atsushi; Shin, Min-Ho; Matsuo, Keitaro; Matsuda, Fumihiko; Gao, Yu-Tang; Oh, Jae Hwan; Kim, Soriul; Jung, Keum Ji; Ahn, Yoon-Ok; Ren, Zefang; Li, Hong-Lan; Wu, Jie; Shi, Jiajun; Wen, Wanqing; Yang, Gong; Li, Bingshan; Ji, Bu-Tian; Brenner, Hermann; Schoen, Robert E.; Küry, Sébastien; Gruber, Stephen B.; Schumacher, Fredrick R.; Stenzel, Stephanie L.; Casey, Graham; Hopper, John L.; Jenkins, Mark A.; Kim, Hyeong-Rok; Jeong, Jin-Young; Park, Ji Won; Tajima, Kazuo; Cho, Sang-Hee; Kubo, Michiaki; Shu, Xiao-Ou; Lin, Dongxin; Zeng, Yi-Xin; Zheng, Wei
2016-01-01
Background & Aims Known Genetic factors explain only a small fraction of genetic variation in colorectal cancer (CRC). We conducted a genome-wide association study (GWAS) to identify risk loci for CRC. Methods This discovery stage included 8027 cases and 22577 controls of East-Asian ancestry. Promising variants were evaluated in studies including as many as 11044 cases and 12047 controls. Tumor-adjacent normal tissues from 188 patients were analyzed to evaluate correlations of risk variants with expression levels of nearby genes. Potential functionality of risk variants were evaluated using public genomic and epigenomic databases. Results We identified 4 loci associated with CRC risk; P values for the most significant variant in each locus ranged from 3.92×10−8 to 1.24×10−12: 6p21.1 (rs4711689), 8q23.3 (rs2450115, rs6469656), 10q24.3 (rs4919687), and 12p13.3 (rs11064437). We also identified 2 risk variants at loci previously associated with CRC: 10q25.2 (rs10506868) and 20q13.3 (rs6061231). These risk variants, conferring an approximate 10%–18% increase in risk per allele, are located either inside or near protein-coding genes that include TFEB (lysosome biogenesis and autophagy), EIF3H (initiation of translation), CYP17A1 (steroidogenesis), SPSB2 (proteasome degradation), and RPS21 (ribosome biogenesis). Gene expression analyses showed a significant association (P <.05) for rs4711689 with TFEB, rs6469656 with EIF3H, rs11064437 with SPSB2, and rs6061231 with RPS21. Conclusions We identified susceptibility loci and genes associated with CRC risk, linking CRC predisposition to steroid hormone, protein synthesis and degradation, and autophagy pathways and providing added insight into the mechanism of CRC pathogenesis. PMID:26965516
Enrichment of putative PAX8 target genes at serous epithelial ovarian cancer susceptibility loci
Kar, Siddhartha P; Adler, Emily; Tyrer, Jonathan; Hazelett, Dennis; Anton-Culver, Hoda; Bandera, Elisa V; Beckmann, Matthias W; Berchuck, Andrew; Bogdanova, Natalia; Brinton, Louise; Butzow, Ralf; Campbell, Ian; Carty, Karen; Chang-Claude, Jenny; Cook, Linda S; Cramer, Daniel W; Cunningham, Julie M; Dansonka-Mieszkowska, Agnieszka; Doherty, Jennifer Anne; Dörk, Thilo; Dürst, Matthias; Eccles, Diana; Fasching, Peter A; Flanagan, James; Gentry-Maharaj, Aleksandra; Glasspool, Rosalind; Goode, Ellen L; Goodman, Marc T; Gronwald, Jacek; Heitz, Florian; Hildebrandt, Michelle A T; Høgdall, Estrid; Høgdall, Claus K; Huntsman, David G; Jensen, Allan; Karlan, Beth Y; Kelemen, Linda E; Kiemeney, Lambertus A; Kjaer, Susanne K; Kupryjanczyk, Jolanta; Lambrechts, Diether; Levine, Douglas A; Li, Qiyuan; Lissowska, Jolanta; Lu, Karen H; Lubiński, Jan; Massuger, Leon F A G; McGuire, Valerie; McNeish, Iain; Menon, Usha; Modugno, Francesmary; Monteiro, Alvaro N; Moysich, Kirsten B; Ness, Roberta B; Nevanlinna, Heli; Paul, James; Pearce, Celeste L; Pejovic, Tanja; Permuth, Jennifer B; Phelan, Catherine; Pike, Malcolm C; Poole, Elizabeth M; Ramus, Susan J; Risch, Harvey A; Rossing, Mary Anne; Salvesen, Helga B; Schildkraut, Joellen M; Sellers, Thomas A; Sherman, Mark; Siddiqui, Nadeem; Sieh, Weiva; Song, Honglin; Southey, Melissa; Terry, Kathryn L; Tworoger, Shelley S; Walsh, Christine; Wentzensen, Nicolas; Whittemore, Alice S; Wu, Anna H; Yang, Hannah; Zheng, Wei; Ziogas, Argyrios; Freedman, Matthew L; Gayther, Simon A; Pharoah, Paul D P; Lawrenson, Kate
2017-01-01
Background: Genome-wide association studies (GWAS) have identified 18 loci associated with serous ovarian cancer (SOC) susceptibility but the biological mechanisms driving these findings remain poorly characterised. Germline cancer risk loci may be enriched for target genes of transcription factors (TFs) critical to somatic tumorigenesis. Methods: All 615 TF-target sets from the Molecular Signatures Database were evaluated using gene set enrichment analysis (GSEA) and three GWAS for SOC risk: discovery (2196 cases/4396 controls), replication (7035 cases/21 693 controls; independent from discovery), and combined (9627 cases/30 845 controls; including additional individuals). Results: The PAX8-target gene set was ranked 1/615 in the discovery (PGSEA<0.001; FDR=0.21), 7/615 in the replication (PGSEA=0.004; FDR=0.37), and 1/615 in the combined (PGSEA<0.001; FDR=0.21) studies. Adding other genes reported to interact with PAX8 in the literature to the PAX8-target set and applying an alternative to GSEA, interval enrichment, further confirmed this association (P=0.006). Fifteen of the 157 genes from this expanded PAX8 pathway were near eight loci associated with SOC risk at P<10−5 (including six with P<5 × 10−8). The pathway was also associated with differential gene expression after shRNA-mediated silencing of PAX8 in HeyA8 (PGSEA=0.025) and IGROV1 (PGSEA=0.004) SOC cells and several PAX8 targets near SOC risk loci demonstrated in vitro transcriptomic perturbation. Conclusions: Putative PAX8 target genes are enriched for common SOC risk variants. This finding from our agnostic evaluation is of particular interest given that PAX8 is well-established as a specific marker for the cell of origin of SOC. PMID:28103614
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tuli, J.K.; Sonzogni,A.
The National Nuclear Data Center has provided remote access to some of its resources since 1986. The major databases and other resources available currently through NNDC Web site are summarized. The National Nuclear Data Center (NNDC) has provided remote access to the nuclear physics databases it maintains and to other resources since 1986. With considerable innovation access is now mostly through the Web. The NNDC Web pages have been modernized to provide a consistent state-of-the-art style. The improved database services and other resources available from the NNOC site at www.nndc.bnl.govwill be described.
Genes Downregulated in Endometriosis Are Located Near the Known Imprinting Genes
Higashiura, Yumi; Koike, Natsuki; Akasaka, Juria; Uekuri, Chiharu; Iwai, Kana; Niiro, Emiko; Morioka, Sachiko; Yamada, Yuki
2014-01-01
There is now accumulating evidence that endometriosis is a disease associated with an epigenetic disorder. Genomic imprinting is an epigenetic phenomenon known to regulate DNA methylation of either maternal or paternal alleles. We hypothesize that hypermethylated endometriosis-associated genes may be enriched at imprinted gene loci. We sought to determine whether downregulated genes associated with endometriosis susceptibility are associated with chromosomal location of the known paternally and maternally expressed imprinting genes. Gene information has been gathered from National Center for Biotechnology Information database geneimprint.com. Several researchers have identified specific loci with strong DNA methylation in eutopic endometrium and ectopic lesion with endometriosis. Of the 29 hypermethylated genes in endometriosis, 19 genes were located near 45 known imprinted foci. There may be an association of the genomic location between genes specifically downregulated in endometriosis and epigenetically imprinted genes. PMID:24615936
Pettis, Gregg S.; Prakash, Shubha
1999-01-01
A database search revealed extensive sequence similarity between Streptomyces lividans plasmid pIJ101 and Streptomyces plasmid pSB24.2, which is a deletion derivative of Streptomyces cyanogenus plasmid pSB24.1. The high degree of relatedness between the two plasmids allowed the construction of a genetic map of pSB24.2, consisting of putative transfer and replication loci. Two pSB24.2 loci, namely, the cis-acting locus for transfer (clt) and the transfer-associated korB gene, were shown to be capable of complementing the pIJ101 clt and korB functions, respectively, a result that is consistent with the notion that pIJ101 and the parental plasmid pSB24.1 encode highly similar, if not identical, conjugation systems. PMID:10419972
Orris, Greta J.; Cocker, Mark D.; Dunlap, Pamela; Wynn, Jeff C.; Spanski, Gregory T.; Briggs, Deborah A.; Gass, Leila; Bliss, James D.; Bolm, Karen S.; Yang, Chao; Lipin, Bruce R.; Ludington, Stephen; Miller, Robert J.; Słowakiewicz, Mirosław
2014-01-01
This report describes a global, evaporite-related potash deposits and occurrences database and a potash tracts database. Chapter 1 summarizes potash resource history and use. Chapter 2 describes a global potash deposits and occurrences database, which contains more than 900 site records. Chapter 3 describes a potash tracts database, which contains 84 tracts with geology permissive for the presence of evaporite-hosted potash resources, including areas with active evaporite-related potash production, areas with known mineralization that has not been quantified or exploited, and areas with potential for undiscovered potash resources. Chapter 4 describes geographic information system (GIS) data files that include (1) potash deposits and occurrences data, (2) potash tract data, (3) reference databases for potash deposit and tract data, and (4) representative graphics of geologic features related to potash tracts and deposits. Summary descriptive models for stratabound potash-bearing salt and halokinetic potash-bearing salt are included in appendixes A and B, respectively. A glossary of salt- and potash-related terms is contained in appendix C and a list of database abbreviations is given in appendix D. Appendix E describes GIS data files, and appendix F is a guide to using the geodatabase.
Database resources of the National Center for Biotechnology Information
2015-01-01
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. PMID:25398906
Database resources of the National Center for Biotechnology Information
2016-01-01
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank® nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (PubMed Central (PMC), Bookshelf and PubReader), health (ClinVar, dbGaP, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen), genomes (BioProject, Assembly, Genome, BioSample, dbSNP, dbVar, Epigenomics, the Map Viewer, Nucleotide, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser and the Trace Archive), genes (Gene, Gene Expression Omnibus (GEO), HomoloGene, PopSet and UniGene), proteins (Protein, the Conserved Domain Database (CDD), COBALT, Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB) and Protein Clusters) and chemicals (Biosystems and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for most of these databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:26615191
Database resources of the National Center for Biotechnology Information.
Wheeler, David L; Barrett, Tanya; Benson, Dennis A; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Geer, Lewis Y; Kapustin, Yuri; Khovayko, Oleg; Landsman, David; Lipman, David J; Madden, Thomas L; Maglott, Donna R; Ostell, James; Miller, Vadim; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Steven T; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusov, Roman L; Tatusova, Tatiana A; Wagner, Lukas; Yaschenko, Eugene
2007-01-01
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link(BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace and Assembly Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Viral Genotyping Tools, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
De Groote, Sandra L; Shultz, Mary; Blecic, Deborah D
2014-07-01
The research assesses the information-seeking behaviors of health sciences faculty, including their use of online databases, journals, and social media. A survey was designed and distributed via email to 754 health sciences faculty at a large urban research university with 6 health sciences colleges. Twenty-six percent (198) of faculty responded. MEDLINE was the primary database utilized, with 78.5% respondents indicating they use the database at least once a week. Compared to MEDLINE, Google was utilized more often on a daily basis. Other databases showed much lower usage. Low use of online databases other than MEDLINE, link-out tools to online journals, and online social media and collaboration tools demonstrates a need for meaningful promotion of online resources and informatics literacy instruction for faculty. Library resources are plentiful and perhaps somewhat overwhelming. Librarians need to help faculty discover and utilize the resources and tools that libraries have to offer.
Interspecific Introgression in Cetaceans: DNA Markers Reveal Post-F1 Status of a Pilot Whale
Miralles, Laura; Lens, Santiago; Rodríguez-Folgar, Antonio; Carrillo, Manuel; Martín, Vidal; Mikkelsen, Bjarni; Garcia-Vazquez, Eva
2013-01-01
Visual species identification of cetacean strandings is difficult, especially when dead specimens are degraded and/or species are morphologically similar. The two recognised pilot whale species (Globicephala melas and Globicephala macrorhynchus) are sympatric in the North Atlantic Ocean. These species are very similar in external appearance and their morphometric characteristics partially overlap; thus visual identification is not always reliable. Genetic species identification ensures correct identification of specimens. Here we have employed one mitochondrial (D-Loop region) and eight nuclear loci (microsatellites) as genetic markers to identify six stranded pilot whales found in Galicia (Northwest Spain), one of them of ambiguous phenotype. DNA analyses yielded positive amplification of all loci and enabled species identification. Nuclear microsatellite DNA genotypes revealed mixed ancestry for one individual, identified as a post-F1 interspecific hybrid employing two different Bayesian methods. From the mitochondrial sequence the maternal species was Globicephala melas. This is the first hybrid documented between Globicephala melas and G. macrorhynchus, and the first post-F1 hybrid genetically identified between cetaceans, revealing interspecific genetic introgression in marine mammals. We propose to add nuclear loci to genetic databases for cetacean species identification in order to detect hybrid individuals. PMID:23990883
Resources | Office of Cancer Genomics
OCG provides a variety of scientific and educational resources for both cancer researchers and members of the general public. These resources are divided into the following types: OCG-Supported Resources: Tools, databases, and reagents generated by initiated and completed OCG programs for researchers, educators, and students. (Note: Databases for current OCG programs are available through program-specific data matrices)
Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones
Imanishi, Tadashi; Itoh, Takeshi; Suzuki, Yutaka; O'Donovan, Claire; Fukuchi, Satoshi; Koyanagi, Kanako O; Barrero, Roberto A; Tamura, Takuro; Yamaguchi-Kabata, Yumi; Tanino, Motohiko; Yura, Kei; Miyazaki, Satoru; Ikeo, Kazuho; Homma, Keiichi; Kasprzyk, Arek; Nishikawa, Tetsuo; Hirakawa, Mika; Thierry-Mieg, Jean; Thierry-Mieg, Danielle; Ashurst, Jennifer; Jia, Libin; Nakao, Mitsuteru; Thomas, Michael A; Mulder, Nicola; Karavidopoulou, Youla; Jin, Lihua; Kim, Sangsoo; Yasuda, Tomohiro; Lenhard, Boris; Eveno, Eric; Suzuki, Yoshiyuki; Yamasaki, Chisato; Takeda, Jun-ichi; Gough, Craig; Hilton, Phillip; Fujii, Yasuyuki; Sakai, Hiroaki; Tanaka, Susumu; Amid, Clara; Bellgard, Matthew; Bonaldo, Maria de Fatima; Bono, Hidemasa; Bromberg, Susan K; Brookes, Anthony J; Bruford, Elspeth; Carninci, Piero; Chelala, Claude; Couillault, Christine; de Souza, Sandro J.; Debily, Marie-Anne; Devignes, Marie-Dominique; Dubchak, Inna; Endo, Toshinori; Estreicher, Anne; Eyras, Eduardo; Fukami-Kobayashi, Kaoru; R. Gopinath, Gopal; Graudens, Esther; Hahn, Yoonsoo; Han, Michael; Han, Ze-Guang; Hanada, Kousuke; Hanaoka, Hideki; Harada, Erimi; Hashimoto, Katsuyuki; Hinz, Ursula; Hirai, Momoki; Hishiki, Teruyoshi; Hopkinson, Ian; Imbeaud, Sandrine; Inoko, Hidetoshi; Kanapin, Alexander; Kaneko, Yayoi; Kasukawa, Takeya; Kelso, Janet; Kersey, Paul; Kikuno, Reiko; Kimura, Kouichi; Korn, Bernhard; Kuryshev, Vladimir; Makalowska, Izabela; Makino, Takashi; Mano, Shuhei; Mariage-Samson, Regine; Mashima, Jun; Matsuda, Hideo; Mewes, Hans-Werner; Minoshima, Shinsei; Nagai, Keiichi; Nagasaki, Hideki; Nagata, Naoki; Nigam, Rajni; Ogasawara, Osamu; Ohara, Osamu; Ohtsubo, Masafumi; Okada, Norihiro; Okido, Toshihisa; Oota, Satoshi; Ota, Motonori; Ota, Toshio; Otsuki, Tetsuji; Piatier-Tonneau, Dominique; Poustka, Annemarie; Ren, Shuang-Xi; Saitou, Naruya; Sakai, Katsunaga; Sakamoto, Shigetaka; Sakate, Ryuichi; Schupp, Ingo; Servant, Florence; Sherry, Stephen; Shiba, Rie; Shimizu, Nobuyoshi; Shimoyama, Mary; Simpson, Andrew J; Soares, Bento; Steward, Charles; Suwa, Makiko; Suzuki, Mami; Takahashi, Aiko; Tamiya, Gen; Tanaka, Hiroshi; Taylor, Todd; Terwilliger, Joseph D; Unneberg, Per; Veeramachaneni, Vamsi; Watanabe, Shinya; Wilming, Laurens; Yasuda, Norikazu; Yoo, Hyang-Sook; Stodolsky, Marvin; Makalowski, Wojciech; Go, Mitiko; Nakai, Kenta; Takagi, Toshihisa; Kanehisa, Minoru; Sakaki, Yoshiyuki; Quackenbush, John; Okazaki, Yasushi; Hayashizaki, Yoshihide; Hide, Winston; Chakraborty, Ranajit; Nishikawa, Ken; Sugawara, Hideaki; Tateno, Yoshio; Chen, Zhu; Oishi, Michio; Tonellato, Peter; Apweiler, Rolf; Okubo, Kousaku; Wagner, Lukas; Wiemann, Stefan; Strausberg, Robert L; Isogai, Takao; Auffray, Charles; Nomura, Nobuo; Sugano, Sumio
2004-01-01
The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. PMID:15103394
2011-01-01
Background Over recent years, a growing effort has been made to develop microsatellite markers for the genomic analysis of the common bean (Phaseolus vulgaris) to broaden the knowledge of the molecular genetic basis of this species. The availability of large sets of expressed sequence tags (ESTs) in public databases has given rise to an expedient approach for the identification of SSRs (Simple Sequence Repeats), specifically EST-derived SSRs. In the present work, a battery of new microsatellite markers was obtained from a search of the Phaseolus vulgaris EST database. The diversity, degree of transferability and polymorphism of these markers were tested. Results From 9,583 valid ESTs, 4,764 had microsatellite motifs, from which 377 were used to design primers, and 302 (80.11%) showed good amplification quality. To analyze transferability, a group of 167 SSRs were tested, and the results showed that they were 82% transferable across at least one species. The highest amplification rates were observed between the species from the Phaseolus (63.7%), Vigna (25.9%), Glycine (19.8%), Medicago (10.2%), Dipterix (6%) and Arachis (1.8%) genera. The average PIC (Polymorphism Information Content) varied from 0.53 for genomic SSRs to 0.47 for EST-SSRs, and the average number of alleles per locus was 4 and 3, respectively. Among the 315 newly tested SSRs in the BJ (BAT93 X Jalo EEP558) population, 24% (76) were polymorphic. The integration of these segregant loci into a framework map composed of 123 previously obtained SSR markers yielded a total of 199 segregant loci, of which 182 (91.5%) were mapped to 14 linkage groups, resulting in a map length of 1,157 cM. Conclusions A total of 302 newly developed EST-SSR markers, showing good amplification quality, are available for the genetic analysis of Phaseolus vulgaris. These markers showed satisfactory rates of transferability, especially between species that have great economic and genomic values. Their diversity was comparable to genomic SSRs, and they were incorporated in the common bean reference genetic map, which constitutes an important contribution to and advance in Phaseolus vulgaris genomic research. PMID:21554695
Atanur, Santosh S; Diaz, Ana Garcia; Maratou, Klio; Sarkis, Allison; Rotival, Maxime; Game, Laurence; Tschannen, Michael R; Kaisaki, Pamela J; Otto, Georg W; Ma, Man Chun John; Keane, Thomas M; Hummel, Oliver; Saar, Kathrin; Chen, Wei; Guryev, Victor; Gopalakrishnan, Kathirvel; Garrett, Michael R; Joe, Bina; Citterio, Lorena; Bianchi, Giuseppe; McBride, Martin; Dominiczak, Anna; Adams, David J; Serikawa, Tadao; Flicek, Paul; Cuppen, Edwin; Hubner, Norbert; Petretto, Enrico; Gauguier, Dominique; Kwitek, Anne; Jacob, Howard; Aitman, Timothy J
2013-08-01
Large numbers of inbred laboratory rat strains have been developed for a range of complex disease phenotypes. To gain insights into the evolutionary pressures underlying selection for these phenotypes, we sequenced the genomes of 27 rat strains, including 11 models of hypertension, diabetes, and insulin resistance, along with their respective control strains. Altogether, we identified more than 13 million single-nucleotide variants, indels, and structural variants across these rat strains. Analysis of strain-specific selective sweeps and gene clusters implicated genes and pathways involved in cation transport, angiotensin production, and regulators of oxidative stress in the development of cardiovascular disease phenotypes in rats. Many of the rat loci that we identified overlap with previously mapped loci for related traits in humans, indicating the presence of shared pathways underlying these phenotypes in rats and humans. These data represent a step change in resources available for evolutionary analysis of complex traits in disease models. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.
Atanur, Santosh S.; Diaz, Ana Garcia; Maratou, Klio; Sarkis, Allison; Rotival, Maxime; Game, Laurence; Tschannen, Michael R.; Kaisaki, Pamela J.; Otto, Georg W.; Ma, Man Chun John; Keane, Thomas M.; Hummel, Oliver; Saar, Kathrin; Chen, Wei; Guryev, Victor; Gopalakrishnan, Kathirvel; Garrett, Michael R.; Joe, Bina; Citterio, Lorena; Bianchi, Giuseppe; McBride, Martin; Dominiczak, Anna; Adams, David J.; Serikawa, Tadao; Flicek, Paul; Cuppen, Edwin; Hubner, Norbert; Petretto, Enrico; Gauguier, Dominique; Kwitek, Anne; Jacob, Howard; Aitman, Timothy J.
2013-01-01
Summary Large numbers of inbred laboratory rat strains have been developed for a range of complex disease phenotypes. To gain insights into the evolutionary pressures underlying selection for these phenotypes, we sequenced the genomes of 27 rat strains, including 11 models of hypertension, diabetes, and insulin resistance, along with their respective control strains. Altogether, we identified more than 13 million single-nucleotide variants, indels, and structural variants across these rat strains. Analysis of strain-specific selective sweeps and gene clusters implicated genes and pathways involved in cation transport, angiotensin production, and regulators of oxidative stress in the development of cardiovascular disease phenotypes in rats. Many of the rat loci that we identified overlap with previously mapped loci for related traits in humans, indicating the presence of shared pathways underlying these phenotypes in rats and humans. These data represent a step change in resources available for evolutionary analysis of complex traits in disease models. PaperClip PMID:23890820
CottonDB: A resource for cotton genome research
USDA-ARS?s Scientific Manuscript database
CottonDB (http://cottondb.org/) is a database and web resource for cotton genomic and genetic research. Created in 1995, CottonDB was among the first plant genome databases established by the USDA-ARS. Accessed through a website interface, the database aims to be a convenient, inclusive medium of ...
Hongdan, Wang; Bing, Kang; Ning, Su; Miao, He; Bo, Zhang; Yuxin, Guo; Bofeng, Zhu; Shixiu, Liao; Zhaoshu, Zeng
2017-01-01
At present, the Han nationality is China's main ethnic group and also the most populous nation in the world. This is a great resource to study microsatellite mutations and for the study of ethnogeny. The aim of this study is to investigate the genetic polymorphisms and mutations of 22 autosomal STR loci in 2475 individuals from Henan province, China. DNA is amplified and genotyped using PowerPlex™24 system. The gene frequencies, forensic parameters, and the mutation rate of the 22 STR loci are analyzed. A total of 295 alleles are observed in this Henan Han population, and the allelic frequencies ranged from 0.0003 to 0.5036. In order to investigate the genetic relationships between the Henan Han and the other 14 different populations, our present data were compared with previously published data for the same 15 STR loci. The results indicated that the Henan Han had closer genetic relationships the groups including Minnan Han, Maonan, Yi and Guangdong Han groups while the South morocco population, the Moroccan population, the Malay group, and the Uigur stand away from Henan Han. Except of D2S441, D13S317, PentaE, D2S1338, D5S818, TPOX and D19S433, the mutation events are found in the other 15 STR loci. A total of 40 mutation events are observed in the 15 STR loci. The mutation rates are ranged from 0 to 4.85 × 10 -3 . In this study, 39 mutations are single-step mutations, and only one at FGA comprised two steps. STR mutation is commonly existed in paternity testing, while there are no STR mutation studies of the 22 STR loci in the Henan Han population. It is of great importance in forensic individual discrimination and paternal testing.
Use of EST-SSR loci flanking regions for phylogenetic analysis of genus Arachis
USDA-ARS?s Scientific Manuscript database
All wild peanut collections in the genus Arachis were assigned to nine taxonomy sections on the bases of cross-compatibility and morphologic character clustering. These nine sections consist of 80 species from the most ancient to the most advanced, providing a diverse genetic resource for phylogenet...
U.S. Geological Survey mineral databases; MRDS and MAS/MILS
McFaul, E.J.; Mason, G.T.; Ferguson, W.B.; Lipin, B.R.
2000-01-01
These two CD-ROM's contain the latest version of the Mineral Resources Data System (MRDS) database and the Minerals Availability System/Minerals Industry Location System (MAS/MILS) database for coverage of North America and the world outside North America. The records in the MRDS database each contain almost 200 data fields describing metallic and nonmetallic mineral resources, deposits, and commodities. The records in the MAS/MILS database each contain almost 100 data fields describing mines and mineral processing plans.
Kodama, Yuichi; Mashima, Jun; Kaminuma, Eli; Gojobori, Takashi; Ogasawara, Osamu; Takagi, Toshihisa; Okubo, Kousaku; Nakamura, Yasukazu
2012-01-01
The DNA Data Bank of Japan (DDBJ; http://www.ddbj.nig.ac.jp) maintains and provides archival, retrieval and analytical resources for biological information. The central DDBJ resource consists of public, open-access nucleotide sequence databases including raw sequence reads, assembly information and functional annotation. Database content is exchanged with EBI and NCBI within the framework of the International Nucleotide Sequence Database Collaboration (INSDC). In 2011, DDBJ launched two new resources: the 'DDBJ Omics Archive' (DOR; http://trace.ddbj.nig.ac.jp/dor) and BioProject (http://trace.ddbj.nig.ac.jp/bioproject). DOR is an archival database of functional genomics data generated by microarray and highly parallel new generation sequencers. Data are exchanged between the ArrayExpress at EBI and DOR in the common MAGE-TAB format. BioProject provides an organizational framework to access metadata about research projects and the data from the projects that are deposited into different databases. In this article, we describe major changes and improvements introduced to the DDBJ services, and the launch of two new resources: DOR and BioProject.
Hardison, Ross C; Chui, David H K; Giardine, Belinda; Riemer, Cathy; Patrinos, George P; Anagnou, Nicholas; Miller, Webb; Wajcman, Henri
2002-03-01
We have constructed a relational database of hemoglobin variants and thalassemia mutations, called HbVar, which can be accessed on the web at http://globin.cse.psu.edu. Extensive information is recorded for each variant and mutation, including a description of the variant and associated pathology, hematology, electrophoretic mobility, methods of isolation, stability information, ethnic occurrence, structure studies, functional studies, and references. The initial information was derived from books by Dr. Titus Huisman and colleagues [Huisman et al., 1996, 1997, 1998]. The current database is updated regularly with the addition of new data and corrections to previous data. Queries can be formulated based on fields in the database. Tables of common categories of variants, such as all those involving the alpha1-globin gene (HBA1) or all those that result in high oxygen affinity, are maintained by automated queries on the database. Users can formulate more precise queries, such as identifying "all beta-globin variants associated with instability and found in Scottish populations." This new database should be useful for clinical diagnosis as well as in fundamental studies of hemoglobin biochemistry, globin gene regulation, and human sequence variation at these loci. Copyright 2002 Wiley-Liss, Inc.
Content Is King: Databases Preserve the Collective Information of Science.
Yates, John R
2018-04-01
Databases store sequence information experimentally gathered to create resources that further science. In the last 20 years databases have become critical components of fields like proteomics where they provide the basis for large-scale and high-throughput proteomic informatics. Amos Bairoch, winner of the Association of Biomolecular Resource Facilities Frederick Sanger Award, has created some of the important databases proteomic research depends upon for accurate interpretation of data.
Bayesian variable selection for post-analytic interrogation of susceptibility loci.
Chen, Siying; Nunez, Sara; Reilly, Muredach P; Foulkes, Andrea S
2017-06-01
Understanding the complex interplay among protein coding genes and regulatory elements requires rigorous interrogation with analytic tools designed for discerning the relative contributions of overlapping genomic regions. To this aim, we offer a novel application of Bayesian variable selection (BVS) for classifying genomic class level associations using existing large meta-analysis summary level resources. This approach is applied using the expectation maximization variable selection (EMVS) algorithm to typed and imputed SNPs across 502 protein coding genes (PCGs) and 220 long intergenic non-coding RNAs (lncRNAs) that overlap 45 known loci for coronary artery disease (CAD) using publicly available Global Lipids Gentics Consortium (GLGC) (Teslovich et al., 2010; Willer et al., 2013) meta-analysis summary statistics for low-density lipoprotein cholesterol (LDL-C). The analysis reveals 33 PCGs and three lncRNAs across 11 loci with >50% posterior probabilities for inclusion in an additive model of association. The findings are consistent with previous reports, while providing some new insight into the architecture of LDL-cholesterol to be investigated further. As genomic taxonomies continue to evolve, additional classes such as enhancer elements and splicing regions, can easily be layered into the proposed analysis framework. Moreover, application of this approach to alternative publicly available meta-analysis resources, or more generally as a post-analytic strategy to further interrogate regions that are identified through single point analysis, is straightforward. All coding examples are implemented in R version 3.2.1 and provided as supplemental material. © 2016, The International Biometric Society.
Medema, Marnix H; Blin, Kai; Cimermancic, Peter; de Jager, Victor; Zakrzewski, Piotr; Fischbach, Michael A; Weber, Tilmann; Takano, Eriko; Breitling, Rainer
2011-07-01
Bacterial and fungal secondary metabolism is a rich source of novel bioactive compounds with potential pharmaceutical applications as antibiotics, anti-tumor drugs or cholesterol-lowering drugs. To find new drug candidates, microbiologists are increasingly relying on sequencing genomes of a wide variety of microbes. However, rapidly and reliably pinpointing all the potential gene clusters for secondary metabolites in dozens of newly sequenced genomes has been extremely challenging, due to their biochemical heterogeneity, the presence of unknown enzymes and the dispersed nature of the necessary specialized bioinformatics tools and resources. Here, we present antiSMASH (antibiotics & Secondary Metabolite Analysis Shell), the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes (polyketides, non-ribosomal peptides, terpenes, aminoglycosides, aminocoumarins, indolocarbazoles, lantibiotics, bacteriocins, nucleosides, beta-lactams, butyrolactones, siderophores, melanins and others). It aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view. antiSMASH is available at http://antismash.secondarymetabolites.org.
First genetic linkage map of Taraxacum koksaghyz Rodin based on AFLP, SSR, COS and EST-SSR markers
Arias, Marina; Hernandez, Monica; Remondegui, Naroa; Huvenaars, Koen; van Dijk, Peter; Ritter, Enrique
2016-01-01
Taraxacum koksaghyz Rodin (TKS) has been studied in many occasions as a possible alternative source for natural rubber production of good quality and for inulin production. Some tire companies are already testing TKS tire prototypes. There are also many investigations on the production of bio-fuels from inulin and inulin applications for health improvement and in the food industry. A limited amount of genomic resources exist for TKS and particularly no genetic linkage map is available in this species. We have constructed the first TKS genetic linkage map based on AFLP, COS, SSR and EST-SSR markers. The integrated linkage map with eight linkage groups (LG), representing the eight chromosomes of Russian dandelion, has 185 individual AFLP markers from parent 1, 188 individual AFLP markers from parent 2, 75 common AFLP markers and 6 COS, 1 SSR and 63 EST-SSR loci. Blasting the EST-SSR sequences against known sequences from lettuce allowed a partial alignment of our TKS map with a lettuce map. Blast searches against plant gene databases revealed some homologies with useful genes for downstream applications in the future. PMID:27488242
Li, Bojiang; Dong, Chao; Li, Pinghua; Ren, Zhuqing; Wang, Han; Yu, Fengxiang; Ning, Caibo; Liu, Kaiqing; Wei, Wei; Huang, Ruihua; Chen, Jie; Wu, Wangjun; Liu, Honglin
2016-10-17
Meat color is considered to be the most important indicator of meat quality, however, the molecular mechanisms underlying traits related to meat color remain mostly unknown. In this study, to elucidate the molecular basis of meat color, we constructed six cDNA libraries from biceps femoris (Bf) and soleus (Sol), which exhibit obvious differences in meat color, and analyzed the whole-transcriptome differences between Bf (white muscle) and Sol (red muscle) using high-throughput sequencing technology. Using DEseq2 method, we identified 138 differentially expressed genes (DEGs) between Bf and Sol. Using DEGseq method, we identified 770, 810, and 476 DEGs in comparisons between Bf and Sol in three separate animals. Of these DEGs, 52 were overlapping DEGs. Using these data, we determined the enriched GO terms, metabolic pathways and candidate genes associated with meat color traits. Additionally, we mapped 114 non-redundant DEGs to the meat color QTLs via a comparative analysis with the porcine quantitative trait loci (QTL) database. Overall, our data serve as a valuable resource for identifying genes whose functions are critical for meat color traits and can accelerate studies of the molecular mechanisms of meat color formation.
Li, Bojiang; Dong, Chao; Li, Pinghua; Ren, Zhuqing; Wang, Han; Yu, Fengxiang; Ning, Caibo; Liu, Kaiqing; Wei, Wei; Huang, Ruihua; Chen, Jie; Wu, Wangjun; Liu, Honglin
2016-01-01
Meat color is considered to be the most important indicator of meat quality, however, the molecular mechanisms underlying traits related to meat color remain mostly unknown. In this study, to elucidate the molecular basis of meat color, we constructed six cDNA libraries from biceps femoris (Bf) and soleus (Sol), which exhibit obvious differences in meat color, and analyzed the whole-transcriptome differences between Bf (white muscle) and Sol (red muscle) using high-throughput sequencing technology. Using DEseq2 method, we identified 138 differentially expressed genes (DEGs) between Bf and Sol. Using DEGseq method, we identified 770, 810, and 476 DEGs in comparisons between Bf and Sol in three separate animals. Of these DEGs, 52 were overlapping DEGs. Using these data, we determined the enriched GO terms, metabolic pathways and candidate genes associated with meat color traits. Additionally, we mapped 114 non-redundant DEGs to the meat color QTLs via a comparative analysis with the porcine quantitative trait loci (QTL) database. Overall, our data serve as a valuable resource for identifying genes whose functions are critical for meat color traits and can accelerate studies of the molecular mechanisms of meat color formation. PMID:27748458
Database resources of the National Center for Biotechnology Information
Wheeler, David L.; Barrett, Tanya; Benson, Dennis A.; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Kenton, David L.; Khovayko, Oleg; Lipman, David J.; Madden, Thomas L.; Maglott, Donna R.; Ostell, James; Pruitt, Kim D.; Schuler, Gregory D.; Schriml, Lynn M.; Sequeira, Edwin; Sherry, Stephen T.; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Suzek, Tugba O.; Tatusov, Roman; Tatusova, Tatiana A.; Wagner, Lukas; Yaschenko, Eugene
2006-01-01
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Retroviral Genotyping Tools, HIV-1, Human Protein Interaction Database, SAGEmap, Gene Expression Omnibus, Entrez Probe, GENSAT, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of the resources can be accessed through the NCBI home page at: . PMID:16381840
Database resources of the National Center for Biotechnology Information.
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; Dicuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian
2012-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Database resources of the National Center for Biotechnology Information
Acland, Abigail; Agarwala, Richa; Barrett, Tanya; Beck, Jeff; Benson, Dennis A.; Bollin, Colleen; Bolton, Evan; Bryant, Stephen H.; Canese, Kathi; Church, Deanna M.; Clark, Karen; DiCuccio, Michael; Dondoshansky, Ilya; Federhen, Scott; Feolo, Michael; Geer, Lewis Y.; Gorelenkov, Viatcheslav; Hoeppner, Marilu; Johnson, Mark; Kelly, Christopher; Khotomlianski, Viatcheslav; Kimchi, Avi; Kimelman, Michael; Kitts, Paul; Krasnov, Sergey; Kuznetsov, Anatoliy; Landsman, David; Lipman, David J.; Lu, Zhiyong; Madden, Thomas L.; Madej, Tom; Maglott, Donna R.; Marchler-Bauer, Aron; Karsch-Mizrachi, Ilene; Murphy, Terence; Ostell, James; O'Sullivan, Christopher; Panchenko, Anna; Phan, Lon; Pruitt, Don Preussm Kim D.; Rubinstein, Wendy; Sayers, Eric W.; Schneider, Valerie; Schuler, Gregory D.; Sequeira, Edwin; Sherry, Stephen T.; Shumway, Martin; Sirotkin, Karl; Siyan, Karanjit; Slotta, Douglas; Soboleva, Alexandra; Soussov, Vladimir; Starchenko, Grigory; Tatusova, Tatiana A.; Trawick, Bart W.; Vakatov, Denis; Wang, Yanli; Ward, Minghong; John Wilbur, W.; Yaschenko, Eugene; Zbicz, Kerry
2014-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, PubReader, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Primer-BLAST, COBALT, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, ClinVar, MedGen, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All these resources can be accessed through the NCBI home page. PMID:24259429
Database resources of the National Center for Biotechnology
Wheeler, David L.; Church, Deanna M.; Federhen, Scott; Lash, Alex E.; Madden, Thomas L.; Pontius, Joan U.; Schuler, Gregory D.; Schriml, Lynn M.; Sequeira, Edwin; Tatusova, Tatiana A.; Wagner, Lukas
2003-01-01
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's Web site. NCBI resources include Entrez, PubMed, PubMed Central (PMC), LocusLink, the NCBITaxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR (e-PCR), Open Reading Frame (ORF) Finder, References Sequence (RefSeq), UniGene, HomoloGene, ProtEST, Database of Single Nucleotide Polymorphisms (dbSNP), Human/Mouse Homology Map, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker (MM), Evidence Viewer (EV), Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov. PMID:12519941
Anglès d'Auriac, Marc B; Hobæk, Anders; Christie, Hartvig; Gundersen, Hege; Fagerli, Camilla With; Haugstetter, Johannes; Norderhaug, Kjell Magnus
2014-10-07
The green sea urchin Strongylocentrotus droebachiensis has a wide circumpolar distribution and plays a key role in coastal ecosystems worldwide by destructively grazing macroalgae beds and turn them into marine deserts, so-called barren grounds. In the past decades, large established kelp forests have been overgrazed and transformed to such barren grounds on the Norwegian coast. This has important repercussions for the coastal diversity and production, including reproduction of several fish species relying on the kelp forests as nurseries. Genetic diversity is an important parameter for the study and further anticipation of this large scale phenomenon. Microsatellites were developed using a Norwegian S. droebachiensis individual primarily for the study of Northeast Atlantic populations. The 10 new microsatellite loci were amplified using M13 forward tails, enabling the use of M13 fluorescent tagged primers for multiplex reading. Among these loci, 2 acted polysomic and should therefore not be considered useful for population genetic analysis. We screened 96 individuals sampled from 4 different sites along the Norwegian coast which have shown unexpected diversity. The new microsatellite loci should be a useful resource for further research into connectivity among S. droebachiensis populations, and assessing the risks for spreading and new overgrazing events.
The Protein Information Resource: an integrated public resource of functional annotation of proteins
Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.
2002-01-01
The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247
Systems and methods for automatically identifying and linking names in digital resources
Parker, Charles T.; Lyons, Catherine M.; Roston, Gerald P.; Garrity, George M.
2017-06-06
The present invention provides systems and methods for automatically identifying name-like-strings in digital resources, matching these name-like-string against a set of names held in an expertly curated database, and for those name-like-strings found in said database, enhancing the content by associating additional matter with the name, wherein said matter includes information about the names that is held within said database and pointers to other digital resources which include the same name and it synonyms.
Shabi, Iwok N; Shabi, Olabode M; Akewukereke, Modupe A; Udofia, Emem P
2011-12-01
To determine the extent, purpose, determinants and the impact of the utilization of Internet medical databases among the respondents. A descriptive cross sectional survey of 540 randomly selected physicians at the two tertiary health institutions in Osun State, south west, Nigeria. A total of 444 (82.2%) physicians completed the questionnaires. All the respondents have used the internet medical databases within the last 4 weeks of the study. Majority, (53.8%) used the internet resources at least once in 2 weeks, while 12.2% used the resources every day. The online resources are mainly sought for Routine patient care and for Research purposes. pubmed (70.3%), hinari (69.0%), and Free medical journals (60.1%) are the frequently used online databases/digital archives. The internet resources has positively impacted the Clinical practice (40.0%) and Research output (65.5%) of the physicians. There had been considerable increase in the extent and quality of utilization of online medical databases which has positively impacted on the Clinical practice and Research output of the physicians. Ease of finding the needed information and the availability of evidence based resources are the major determinants of the databases utilized. © 2011 The authors. Health Information and Libraries Journal © 2011 Health Libraries Group.
DNA database of populations from different parts in the Kingdom of Thailand.
Shotivaranon, Jittima; Chirachariyavej, Thamrong; Leetrakool, Nipapan; Rerkamnuaychoke, Budsaba
2009-12-01
The polymorphism of 15 short tandem repeat (STR) loci-D8S1179, D21S11, D7S820, CSF1PO, D3S1358, TH01, D13S317, D16S539, D2S1338, D19S433, vWA, TPOX, D18S51, D5S818 and FGA from AmpFlSTR Identifiler PCR amplification kit were analysed in 929 unrelated individuals living in the north, northeast, central and south of Thailand. The comparison between these four subpopulations demonstrated that subpopulations in the north and northeast were different in two loci from all paired groups while those in the north, central and south were closely related. The inter-population comparisons between combined Thai population and other ethnic groups including Eastern Chinese, Japanese, Iraq and Egyptian revealed that Eastern Chinese and Thai were closely related.
Fox, Ervin R.; Musani, Solomon K.; Barbalic, Maja; Lin, Honghuang; Yu, Bing; Ogunyankin, Kofo O.; Smith, Nicholas L.; Kutlar, Abdullah; Glazer, Nicole L.; Post, Wendy S.; Paltoo, Dina N.; Dries, Daniel L.; Farlow, Deborah N.; Duarte, Christine W.; Kardia, Sharon L.; Meyers, Kristin J.; Sun, Yan V.; Arnett, Donna K.; Patki, Amit A.; Sha, Jin; Cui, Xiangqui; Samdarshi, Tandaw E.; Penman, Alan D.; Bibbins-Domingo, Kirsten; Bůžková, Petra; Benjamin, Emelia J.; Bluemke, David A.; Morrison, Alanna C.; Heiss, Gerardo; Carr, J. Jeffrey; Tracy, Russell P.; Mosley, Thomas H.; Taylor, Herman A.; Psaty, Bruce M.; Heckbert, Susan R.; Cappola, Thomas P.; Vasan, Ramachandran S.
2013-01-01
Background Using data from four community-based cohorts of African Americans (AA), we tested the association between genome-wide markers (SNPs) and cardiac phenotypes in the Candidate-gene Association REsource (CARe) study. Methods and Results Among 6,765 AA, we related age, sex, height and weight-adjusted residuals for nine cardiac phenotypes (assessed by echocardiogram or MRI) to 2.5 million SNPs genotyped using Genome-Wide Affymetrix Human SNP Array 6.0 (Affy6.0) and the remainder imputed. Within cohort genome-wide association analysis was conducted followed by meta-analysis across cohorts using inverse variance weights (genome-wide significance threshold=4.0 ×10−07). Supplementary pathway analysis was performed. We attempted replication in 3 smaller cohorts of African ancestry and tested look-ups in one consortium of European ancestry (EchoGEN). Across the 9 phenotypes, variants in 4 genetic loci reached genome-wide significance: rs4552931 in UBE2V2 (p=1.43 × 10−07) for left ventricular mass (LVM); rs7213314 in WIPI1 (p=1.68 × 10−07) for LV internal diastolic diameter (LVIDD); rs1571099 in PPAPDC1A (p= 2.57 × 10−08) for interventricular septal wall thickness (IVST); and rs9530176 in KLF5 (p=4.02 × 10−07) for ejection fraction (EF). Associated variants were enriched in three signaling pathways involved in cardiac remodeling. None of the 4 loci replicated in cohorts of African ancestry were confirmed in look-ups in EchoGEN. Conclusions In the largest GWAS of cardiac structure and function to date in AA, we identified 4 genetic loci related to LVM, IVST, LVIDD and EF that reached genome-wide significance. Replication results suggest that these loci may represent unique to individuals of African ancestry. Additional large-scale studies are warranted for these complex phenotypes. PMID:23275298
SalmonDB: a bioinformatics resource for Salmo salar and Oncorhynchus mykiss
Di Génova, Alex; Aravena, Andrés; Zapata, Luis; González, Mauricio; Maass, Alejandro; Iturra, Patricia
2011-01-01
SalmonDB is a new multiorganism database containing EST sequences from Salmo salar, Oncorhynchus mykiss and the whole genome sequence of Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis, Oryzias latipes and Takifugu rubripes, built with core components from GMOD project, GOPArc system and the BioMart project. The information provided by this resource includes Gene Ontology terms, metabolic pathways, SNP prediction, CDS prediction, orthologs prediction, several precalculated BLAST searches and domains. It also provides a BLAST server for matching user-provided sequences to any of the databases and an advanced query tool (BioMart) that allows easy browsing of EST databases with user-defined criteria. These tools make SalmonDB database a valuable resource for researchers searching for transcripts and genomic information regarding S. salar and other salmonid species. The database is expected to grow in the near feature, particularly with the S. salar genome sequencing project. Database URL: http://genomicasalmones.dim.uchile.cl/ PMID:22120661
SalmonDB: a bioinformatics resource for Salmo salar and Oncorhynchus mykiss.
Di Génova, Alex; Aravena, Andrés; Zapata, Luis; González, Mauricio; Maass, Alejandro; Iturra, Patricia
2011-01-01
SalmonDB is a new multiorganism database containing EST sequences from Salmo salar, Oncorhynchus mykiss and the whole genome sequence of Danio rerio, Gasterosteus aculeatus, Tetraodon nigroviridis, Oryzias latipes and Takifugu rubripes, built with core components from GMOD project, GOPArc system and the BioMart project. The information provided by this resource includes Gene Ontology terms, metabolic pathways, SNP prediction, CDS prediction, orthologs prediction, several precalculated BLAST searches and domains. It also provides a BLAST server for matching user-provided sequences to any of the databases and an advanced query tool (BioMart) that allows easy browsing of EST databases with user-defined criteria. These tools make SalmonDB database a valuable resource for researchers searching for transcripts and genomic information regarding S. salar and other salmonid species. The database is expected to grow in the near feature, particularly with the S. salar genome sequencing project. Database URL: http://genomicasalmones.dim.uchile.cl/
Using glycome databases for drug discovery.
Aoki-Kinoshita, Kiyoko F
2008-08-01
The glycomics field has made great advancements in the last decade due to technologies for their synthesis and analysis including carbohydrate microarrays. Accordingly, databases for glycomics research have also emerged and been made publicly available by many major institutions worldwide. This review introduces these and other useful databases on which new methods for drug discovery can be developed. The scope of this review covers current documented and accessible databases and resources pertaining to glycomics. These were selected with the expectation that they may be useful for drug discovery research. There is a plethora of glycomics databases that have much potential for drug discovery. This may seem daunting at first but this review helps to put some of these resources into perspective. Additionally, some thoughts on how to integrate these resources to allow more efficient research are presented.
USDA-ARS?s Scientific Manuscript database
Sugarcane (Saccharum spp.) is an important economic crop for producing edible sugar and bioethanol. Brown rust had long been a major disease impacting sugarcane production world widely. Resistance resource and markers linked to the resistance are valuable tools for disease resistance improvement. An...
USDA-ARS?s Scientific Manuscript database
Phosphorus (P) is a critical element for plant growth and is frequently the limiting nutrient in many soils. Continued production and application of P fertilizer relies on a nonrenewable resource which will peak in about 2050. This will result in significantly increased cost, particularly for develo...
USDA-ARS?s Scientific Manuscript database
All plants must optimize their growth with finite resources. Water use efficiency (WUE) measures the relationship between biomass acquisition and transpired water. In the present study, we performed two experiments to understand the genetic basis of WUE and other parameters of plant-water interact...
USDA-ARS?s Scientific Manuscript database
Sugarcane (Saccharum spp.) is an important economic crop for producing edible sugar and bioethanol. Brown rust had long been a major disease impacting sugarcane production world widely. Resistance resource and markers linked to the resistance are valuable tools for disease resistance improvement. An...
Yield effecgs of two southern leaf blight resistance loci in maize hybrids
USDA-ARS?s Scientific Manuscript database
Plants need to balance resources between yield and defense. This phenomenon has rarely been investigated in the context of naturally-occurring quantitative resistance alleles in an agricultural production environment. B73-3B and B73-6A are two near-isogenic lines (NILs) in the background of the mai...
Human Ageing Genomic Resources: new and updated databases
Tacutu, Robi; Thornton, Daniel; Johnson, Emily; Budovsky, Arie; Barardo, Diogo; Craig, Thomas; Diana, Eugene; Lehmann, Gilad; Toren, Dmitri; Wang, Jingwei; Fraifeld, Vadim E
2018-01-01
Abstract In spite of a growing body of research and data, human ageing remains a poorly understood process. Over 10 years ago we developed the Human Ageing Genomic Resources (HAGR), a collection of databases and tools for studying the biology and genetics of ageing. Here, we present HAGR’s main functionalities, highlighting new additions and improvements. HAGR consists of six core databases: (i) the GenAge database of ageing-related genes, in turn composed of a dataset of >300 human ageing-related genes and a dataset with >2000 genes associated with ageing or longevity in model organisms; (ii) the AnAge database of animal ageing and longevity, featuring >4000 species; (iii) the GenDR database with >200 genes associated with the life-extending effects of dietary restriction; (iv) the LongevityMap database of human genetic association studies of longevity with >500 entries; (v) the DrugAge database with >400 ageing or longevity-associated drugs or compounds; (vi) the CellAge database with >200 genes associated with cell senescence. All our databases are manually curated by experts and regularly updated to ensure a high quality data. Cross-links across our databases and to external resources help researchers locate and integrate relevant information. HAGR is freely available online (http://genomics.senescence.info/). PMID:29121237
Measuring use patterns of online journals and databases
De Groote, Sandra L.; Dorsch, Josephine L.
2003-01-01
Purpose: This research sought to determine use of online biomedical journals and databases and to assess current user characteristics associated with the use of online resources in an academic health sciences center. Setting: The Library of the Health Sciences–Peoria is a regional site of the University of Illinois at Chicago (UIC) Library with 350 print journals, more than 4,000 online journals, and multiple online databases. Methodology: A survey was designed to assess online journal use, print journal use, database use, computer literacy levels, and other library user characteristics. A survey was sent through campus mail to all (471) UIC Peoria faculty, residents, and students. Results: Forty-one percent (188) of the surveys were returned. Ninety-eight percent of the students, faculty, and residents reported having convenient access to a computer connected to the Internet. While 53% of the users indicated they searched MEDLINE at least once a week, other databases showed much lower usage. Overall, 71% of respondents indicated a preference for online over print journals when possible. Conclusions: Users prefer online resources to print, and many choose to access these online resources remotely. Convenience and full-text availability appear to play roles in selecting online resources. The findings of this study suggest that databases without links to full text and online journal collections without links from bibliographic databases will have lower use. These findings have implications for collection development, promotion of library resources, and end-user training. PMID:12883574
Extracting patterns of database and software usage from the bioinformatics literature
Duck, Geraint; Nenadic, Goran; Brass, Andy; Robertson, David L.; Stevens, Robert
2014-01-01
Motivation: As a natural consequence of being a computer-based discipline, bioinformatics has a strong focus on database and software development, but the volume and variety of resources are growing at unprecedented rates. An audit of database and software usage patterns could help provide an overview of developments in bioinformatics and community common practice, and comparing the links between resources through time could demonstrate both the persistence of existing software and the emergence of new tools. Results: We study the connections between bioinformatics resources and construct networks of database and software usage patterns, based on resource co-occurrence, that correspond to snapshots of common practice in the bioinformatics community. We apply our approach to pairings of phylogenetics software reported in the literature and argue that these could provide a stepping stone into the identification of scientific best practice. Availability and implementation: The extracted resource data, the scripts used for network generation and the resulting networks are available at http://bionerds.sourceforge.net/networks/ Contact: robert.stevens@manchester.ac.uk PMID:25161253
Database resources of the National Center for Biotechnology Information.
2016-01-04
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank(®) nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (PubMed Central (PMC), Bookshelf and PubReader), health (ClinVar, dbGaP, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen), genomes (BioProject, Assembly, Genome, BioSample, dbSNP, dbVar, Epigenomics, the Map Viewer, Nucleotide, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser and the Trace Archive), genes (Gene, Gene Expression Omnibus (GEO), HomoloGene, PopSet and UniGene), proteins (Protein, the Conserved Domain Database (CDD), COBALT, Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB) and Protein Clusters) and chemicals (Biosystems and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for most of these databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized datasets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Database resources of the National Center for Biotechnology Information.
2015-01-01
The National Center for Biotechnology Information (NCBI) provides a large suite of online resources for biological information and data, including the GenBank(®) nucleic acid sequence database and the PubMed database of citations and abstracts for published life science journals. Additional NCBI resources focus on literature (Bookshelf, PubMed Central (PMC) and PubReader); medical genetics (ClinVar, dbMHC, the Genetic Testing Registry, HIV-1/Human Protein Interaction Database and MedGen); genes and genomics (BioProject, BioSample, dbSNP, dbVar, Epigenomics, Gene, Gene Expression Omnibus (GEO), Genome, HomoloGene, the Map Viewer, Nucleotide, PopSet, Probe, RefSeq, Sequence Read Archive, the Taxonomy Browser, Trace Archive and UniGene); and proteins and chemicals (Biosystems, COBALT, the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), the Molecular Modeling Database (MMDB), Protein Clusters, Protein and the PubChem suite of small molecule databases). The Entrez system provides search and retrieval operations for many of these databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at http://www.ncbi.nlm.nih.gov. Published by Oxford University Press on behalf of Nucleic Acids Research 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
The ChEMBL database as linked open data
2013-01-01
Background Making data available as Linked Data using Resource Description Framework (RDF) promotes integration with other web resources. RDF documents can natively link to related data, and others can link back using Uniform Resource Identifiers (URIs). RDF makes the data machine-readable and uses extensible vocabularies for additional information, making it easier to scale up inference and data analysis. Results This paper describes recent developments in an ongoing project converting data from the ChEMBL database into RDF triples. Relative to earlier versions, this updated version of ChEMBL-RDF uses recently introduced ontologies, including CHEMINF and CiTO; exposes more information from the database; and is now available as dereferencable, linked data. To demonstrate these new features, we present novel use cases showing further integration with other web resources, including Bio2RDF, Chem2Bio2RDF, and ChemSpider, and showing the use of standard ontologies for querying. Conclusions We have illustrated the advantages of using open standards and ontologies to link the ChEMBL database to other databases. Using those links and the knowledge encoded in standards and ontologies, the ChEMBL-RDF resource creates a foundation for integrated semantic web cheminformatics applications, such as the presented decision support. PMID:23657106
Julier, Bernadette; Flajoulot, Sandrine; Barre, Philippe; Cardinet, Gaëlle; Santoni, Sylvain; Huguet, Thierry; Huyghe, Christian
2003-01-01
Background Alfalfa (Medicago sativa) is a major forage crop. The genetic progress is slow in this legume species because of its autotetraploidy and allogamy. The genetic structure of this species makes the construction of genetic maps difficult. To reach this objective, and to be able to detect QTLs in segregating populations, we used the available codominant microsatellite markers (SSRs), most of them identified in the model legume Medicago truncatula from EST database. A genetic map was constructed with AFLP and SSR markers using specific mapping procedures for autotetraploids. The tetrasomic inheritance was analysed in an alfalfa mapping population. Results We have demonstrated that 80% of primer pairs defined on each side of SSR motifs in M. truncatula EST database amplify with the alfalfa DNA. Using a F1 mapping population of 168 individuals produced from the cross of 2 heterozygous parental plants from Magali and Mercedes cultivars, we obtained 599 AFLP markers and 107 SSR loci. All but 3 SSR loci showed a clear tetrasomic inheritance. For most of the SSR loci, the double-reduction was not significant. For the other loci no specific genotypes were produced, so the significant double-reduction could arise from segregation distortion. For each parent, the genetic map contained 8 groups of four homologous chromosomes. The lengths of the maps were 2649 and 3045 cM, with an average distance of 7.6 and 9.0 cM between markers, for Magali and Mercedes parents, respectively. Using only the SSR markers, we built a composite map covering 709 cM. Conclusions Compared to diploid alfalfa genetic maps, our maps cover about 88–100% of the genome and are close to saturation. The inheritance of the codominant markers (SSR) and the pattern of linkage repulsions between markers within each homology group are consistent with the hypothesis of a tetrasomic meiosis in alfalfa. Except for 2 out of 107 SSR markers, we found a similar order of markers on the chromosomes between the tetraploid alfalfa and M. truncatula genomes indicating a high level of colinearity between these two species. These maps will be a valuable tool for alfalfa breeding and are being used to locate QTLs. PMID:14683527
Pattin, Kristine A.; Moore, Jason H.
2009-01-01
One of the central goals of human genetics is the identification of loci with alleles or genotypes that confer increased susceptibility. The availability of dense maps of single-nucleotide polymorphisms (SNPs) along with high-throughput genotyping technologies has set the stage for routine genome-wide association studies that are expected to significantly improve our ability to identify susceptibility loci. Before this promise can be realized, there are some significant challenges that need to be addressed. We address here the challenge of detecting epistasis or gene-gene interactions in genome-wide association studies. Discovering epistatic interactions in high dimensional datasets remains a challenge due to the computational complexity resulting from the analysis of all possible combinations of SNPs. One potential way to overcome the computational burden of a genome-wide epistasis analysis would be to devise a logical way to prioritize the many SNPs in a dataset so that the data may be analyzed more efficiently and yet still retain important biological information. One of the strongest demonstrations of the functional relationship between genes is protein-protein interaction. Thus, it is plausible that the expert knowledge extracted from protein interaction databases may allow for a more efficient analysis of genome-wide studies as well as facilitate the biological interpretation of the data. In this review we will discuss the challenges of detecting epistasis in genome-wide genetic studies and the means by which we propose to apply expert knowledge extracted from protein interaction databases to facilitate this process. We explore some of the fundamentals of protein interactions and the databases that are publicly available. PMID:18551320
Santos, Eduardo Jose Melos Dos; McCabe, Antony; Gonzalez-Galarza, Faviel F; Jones, Andrew R; Middleton, Derek
2016-03-01
The Allele Frequencies Net Database (AFND) is a freely accessible database which stores population frequencies for alleles or genes of the immune system in worldwide populations. Herein we introduce two new tools. We have defined new classifications of data (gold, silver and bronze) to assist users in identifying the most suitable populations for their tasks. The gold standard datasets are defined by allele frequencies summing to 1, sample sizes >50 and high resolution genotyping, while silver standard datasets do not meet gold standard genotyping resolution and/or sample size criteria. The bronze standard datasets are those that could not be classified under the silver or gold standards. The gold standard includes >500 datasets covering over 3 million individuals from >100 countries at one or more of the following loci: HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1 and -DRB1 - with all loci except DPA1 present in more than 220 datasets. Three out of 12 geographic regions have low representation (the majority of their countries having less than five datasets) and the Central Asia region has no representation. There are 18 countries that are not represented by any gold standard datasets but are represented by at least one dataset that is either silver or bronze standard. We also briefly summarize the data held by AFND for KIR genes, alleles and their ligands. Our second new component is a data submission tool to assist users in the collection of the genotypes of the individuals (raw data), facilitating submission of short population reports to Human Immunology, as well as simplifying the submission of population demographics and frequency data. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.
Nims, Raymond W; Sykes, Greg; Cottrill, Karin; Ikonomi, Pranvera; Elmore, Eugene
2010-12-01
The role of cell authentication in biomedical science has received considerable attention, especially within the past decade. This quality control attribute is now beginning to be given the emphasis it deserves by granting agencies and by scientific journals. Short tandem repeat (STR) profiling, one of a few DNA profiling technologies now available, is being proposed for routine identification (authentication) of human cell lines, stem cells, and tissues. The advantage of this technique over methods such as isoenzyme analysis, karyotyping, human leukocyte antigen typing, etc., is that STR profiling can establish identity to the individual level, provided that the appropriate number and types of loci are evaluated. To best employ this technology, a standardized protocol and a data-driven, quality-controlled, and publically searchable database will be necessary. This public STR database (currently under development) will enable investigators to rapidly authenticate human-based cultures to the individual from whom the cells were sourced. Use of similar approaches for non-human animal cells will require developing other suitable loci sets. While implementing STR analysis on a more routine basis should significantly reduce the frequency of cell misidentification, additional technologies may be needed as part of an overall authentication paradigm. For instance, isoenzyme analysis, PCR-based DNA amplification, and sequence-based barcoding methods enable rapid confirmation of a cell line's species of origin while screening against cross-contaminations, especially when the cells present are not recognized by the species-specific STR method. Karyotyping may also be needed as a supporting tool during establishment of an STR database. Finally, good cell culture practices must always remain a major component of any effort to reduce the frequency of cell misidentification.
Espinal, Allyson C; Wang, Dan; Yan, Li; Liu, Song; Tang, Li; Hu, Qiang; Morrison, Carl D; Ambrosone, Christine B; Higgins, Michael J; Sucheston-Campbell, Lara E
2017-02-28
DNA from archival formalin-fixed and paraffin embedded (FFPE) tissue is an invaluable resource for genome-wide methylation studies although concerns about poor quality may limit its use. In this study, we compared DNA methylation profiles of breast tumors using DNA from fresh-frozen (FF) tissues and three types of matched FFPE samples. For 9/10 patients, correlation and unsupervised clustering analysis revealed that the FF and FFPE samples were consistently correlated with each other and clustered into distinct subgroups. Greater than 84% of the top 100 loci previously shown to differentiate ER+ and ER- tumors in FF tissues were also FFPE DML. Weighted Correlation Gene Network Analyses (WCGNA) grouped the DML loci into 16 modules in FF tissue, with ~85% of the module membership preserved across tissue types. Restored FFPE and matched FF samples were profiled using the Illumina Infinium HumanMethylation450K platform. Methylation levels (β-values) across all loci and the top 100 loci previously shown to differentiate tumors by estrogen receptor status (ER+ or ER-) in a larger FF study, were compared between matched FF and FFPE samples using Pearson's correlation, hierarchical clustering and WCGNA. Positive predictive values and sensitivity levels for detecting differentially methylated loci (DML) in FF samples were calculated in an independent FFPE cohort. FFPE breast tumors samples show lower overall detection of DMLs versus FF, however FFPE and FF DMLs compare favorably. These results support the emerging consensus that the 450K platform can be employed to investigate epigenetics in large sets of archival FFPE tissues.
Protein Information Resource: a community resource for expert annotation of protein data
Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy
2001-01-01
The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-International databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041
Genomics Community Resources | Informatics Technology for Cancer Research (ITCR)
To facilitate genomic research and the dissemination of its products, National Human Genome Research Institute (NHGRI) supports genomic resources that are crucial for basic research, disease studies, model organism studies, and other biomedical research. Awards under this FOA will support the development and distribution of genomic resources that will be valuable for the broad research community, using cost-effective approaches. Such resources include (but are not limited to) databases and informatics resources (such as human and model organism databases, ontologies, and analysi
Mapping, fine mapping, and molecular dissection of quantitative trait Loci in domestic animals.
Georges, Michel
2007-01-01
Artificial selection has created myriad breeds of domestic animals, each characterized by unique phenotypes pertaining to behavior, morphology, physiology, and disease. Most domestic animal populations share features with isolated founder populations, making them well suited for positional cloning. Genome sequences are now available for most domestic species, and with them a panoply of tools including high-density single-nucleotide polymorphism panels. As a result, domestic animal populations are becoming invaluable resources for studying the molecular architecture of complex traits and of adaptation. Here we review recent progress and issues in the positional identification of genes underlying complex traits in domestic animals. As many phenotypes studied in animals are quantitative, we focus on mapping, fine mapping, and cloning of quantitative trait loci.
Analysis of disease-associated objects at the Rat Genome Database
Wang, Shur-Jen; Laulederkind, Stanley J. F.; Hayman, G. T.; Smith, Jennifer R.; Petri, Victoria; Lowry, Timothy F.; Nigam, Rajni; Dwinell, Melinda R.; Worthey, Elizabeth A.; Munzenmaier, Diane H.; Shimoyama, Mary; Jacob, Howard J.
2013-01-01
The Rat Genome Database (RGD) is the premier resource for genetic, genomic and phenotype data for the laboratory rat, Rattus norvegicus. In addition to organizing biological data from rats, the RGD team focuses on manual curation of gene–disease associations for rat, human and mouse. In this work, we have analyzed disease-associated strains, quantitative trait loci (QTL) and genes from rats. These disease objects form the basis for seven disease portals. Among disease portals, the cardiovascular disease and obesity/metabolic syndrome portals have the highest number of rat strains and QTL. These two portals share 398 rat QTL, and these shared QTL are highly concentrated on rat chromosomes 1 and 2. For disease-associated genes, we performed gene ontology (GO) enrichment analysis across portals using RatMine enrichment widgets. Fifteen GO terms, five from each GO aspect, were selected to profile enrichment patterns of each portal. Of the selected biological process (BP) terms, ‘regulation of programmed cell death’ was the top enriched term across all disease portals except in the obesity/metabolic syndrome portal where ‘lipid metabolic process’ was the most enriched term. ‘Cytosol’ and ‘nucleus’ were common cellular component (CC) annotations for disease genes, but only the cancer portal genes were highly enriched with ‘nucleus’ annotations. Similar enrichment patterns were observed in a parallel analysis using the DAVID functional annotation tool. The relationship between the preselected 15 GO terms and disease terms was examined reciprocally by retrieving rat genes annotated with these preselected terms. The individual GO term–annotated gene list showed enrichment in physiologically related diseases. For example, the ‘regulation of blood pressure’ genes were enriched with cardiovascular disease annotations, and the ‘lipid metabolic process’ genes with obesity annotations. Furthermore, we were able to enhance enrichment of neurological diseases by combining ‘G-protein coupled receptor binding’ annotated genes with ‘protein kinase binding’ annotated genes. Database URL: http://rgd.mcw.edu PMID:23794737
Swine Leukocyte Antigen Diversity in Canadian Specific Pathogen-Free Yorkshire and Landrace Pigs
Gao, Caixia; Quan, Jinqiang; Jiang, Xinjie; Li, Changwen; Lu, Xiaoye; Chen, Hongyan
2017-01-01
The highly polymorphic swine major histocompatibility complex (MHC), termed swine leukocyte antigen (SLA), is associated with different levels of immunologic responses to infectious diseases, vaccines, and transplantation. Pig breeds with known SLA haplotypes are important genetic resources for biomedical research. Canadian Yorkshire and Landrace pigs represent the current specific pathogen-free (SPF) breeding stock maintained in the isolation environment at the Harbin Veterinary Research Institute, Chinese Academy of Agricultural Sciences. In this study, we identified 61 alleles at five polymorphic SLA loci (SLA-1, SLA-2, SLA-3, DRB1, and DQB1) representing 17 class I haplotypes and 11 class II haplotypes using reverse transcription-polymerase chain reaction (RT-PCR) sequence-based typing and PCR-sequence specific primers methods in 367 Canadian SPF Yorkshire and Landrace pigs. The official designation of the alleles has been assigned by the SLA Nomenclature Committee of the International Society for Animal Genetics and released in updated Immuno Polymorphism Database-MHC SLA sequence database [Release 2.0.0.3 (2016-11-03)]. The submissions confirmed some unassigned alleles and standardized nomenclatures of many previously unconfirmed alleles in the GenBank database. Three class I haplotypes, Hp-37.0, 63.0, and 73.0, appeared to be novel and have not previously been reported in other pig populations. One crossover within the class I region and two between class I and class II regions were observed, resulting in three new recombinant haplotypes. The presence of the duplicated SLA-1 locus was confirmed in three class I haplotypes Hp-28.0, Hp-35.0, and Hp-63.0. Furthermore, we also analyzed the functional diversities of 19 identified frequent SLA class I molecules in this study and confirmed the existence of four supertypes using the MHCcluster method. These results will be useful for studying the adaptive immune response and immunological phenotypic differences in pigs, screening potential T-cell epitopes, and further developing the more effective vaccines. PMID:28360911
Rajaram, Vengaldas; Nepolean, Thirunavukkarasu; Senthilvel, Senapathy; Varshney, Rajeev K; Vadez, Vincent; Srivastava, Rakesh K; Shah, Trushar M; Supriya, Ambawat; Kumar, Sushil; Ramana Kumari, Basava; Bhanuprakash, Amindala; Narasu, Mangamoori Lakshmi; Riera-Lizarazu, Oscar; Hash, Charles Thomas
2013-03-09
Pearl millet [Pennisetum glaucum (L.) R. Br.] is a widely cultivated drought- and high-temperature tolerant C4 cereal grown under dryland, rainfed and irrigated conditions in drought-prone regions of the tropics and sub-tropics of Africa, South Asia and the Americas. It is considered an orphan crop with relatively few genomic and genetic resources. This study was undertaken to increase the EST-based microsatellite marker and genetic resources for this crop to facilitate marker-assisted breeding. Newly developed EST-SSR markers (99), along with previously mapped EST-SSR (17), genomic SSR (53) and STS (2) markers, were used to construct linkage maps of four F7 recombinant inbred populations (RIP) based on crosses ICMB 841-P3 × 863B-P2 (RIP A), H 77/833-2 × PRLT 2/89-33 (RIP B), 81B-P6 × ICMP 451-P8 (RIP C) and PT 732B-P2 × P1449-2-P1 (RIP D). Mapped loci numbers were greatest for RIP A (104), followed by RIP B (78), RIP C (64) and RIP D (59). Total map lengths (Haldane) were 615 cM, 690 cM, 428 cM and 276 cM, respectively. A total of 176 loci detected by 171 primer pairs were mapped among the four crosses. A consensus map of 174 loci (899 cM) detected by 169 primer pairs was constructed using MergeMap to integrate the individual linkage maps. Locus order in the consensus map was well conserved for nearly all linkage groups. Eighty-nine EST-SSR marker loci from this consensus map had significant BLAST hits (top hits with e-value ≤ 1E-10) on the genome sequences of rice, foxtail millet, sorghum, maize and Brachypodium with 35, 88, 58, 48 and 38 loci, respectively. The consensus map developed in the present study contains the largest set of mapped SSRs reported to date for pearl millet, and represents a major consolidation of existing pearl millet genetic mapping information. This study increased numbers of mapped pearl millet SSR markers by >50%, filling important gaps in previously published SSR-based linkage maps for this species and will greatly facilitate SSR-based QTL mapping and applied marker-assisted selection programs.
A Simple Test of Class-Level Genetic Association Can Reveal Novel Cardiometabolic Trait Loci.
Qian, Jing; Nunez, Sara; Reed, Eric; Reilly, Muredach P; Foulkes, Andrea S
2016-01-01
Characterizing the genetic determinants of complex diseases can be further augmented by incorporating knowledge of underlying structure or classifications of the genome, such as newly developed mappings of protein-coding genes, epigenetic marks, enhancer elements and non-coding RNAs. We apply a simple class-level testing framework, termed Genetic Class Association Testing (GenCAT), to identify protein-coding gene association with 14 cardiometabolic (CMD) related traits across 6 publicly available genome wide association (GWA) meta-analysis data resources. GenCAT uses SNP-level meta-analysis test statistics across all SNPs within a class of elements, as well as the size of the class and its unique correlation structure, to determine if the class is statistically meaningful. The novelty of findings is evaluated through investigation of regional signals. A subset of findings are validated using recently updated, larger meta-analysis resources. A simulation study is presented to characterize overall performance with respect to power, control of family-wise error and computational efficiency. All analysis is performed using the GenCAT package, R version 3.2.1. We demonstrate that class-level testing complements the common first stage minP approach that involves individual SNP-level testing followed by post-hoc ascribing of statistically significant SNPs to genes and loci. GenCAT suggests 54 protein-coding genes at 41 distinct loci for the 13 CMD traits investigated in the discovery analysis, that are beyond the discoveries of minP alone. An additional application to biological pathways demonstrates flexibility in defining genetic classes. We conclude that it would be prudent to include class-level testing as standard practice in GWA analysis. GenCAT, for example, can be used as a simple, complementary and efficient strategy for class-level testing that leverages existing data resources, requires only summary level data in the form of test statistics, and adds significant value with respect to its potential for identifying multiple novel and clinically relevant trait associations.
The Chinchilla Research Resource Database: resource for an otolaryngology disease model
Shimoyama, Mary; Smith, Jennifer R.; De Pons, Jeff; Tutaj, Marek; Khampang, Pawjai; Hong, Wenzhou; Erbe, Christy B.; Ehrlich, Garth D.; Bakaletz, Lauren O.; Kerschner, Joseph E.
2016-01-01
The long-tailed chinchilla (Chinchilla lanigera) is an established animal model for diseases of the inner and middle ear, among others. In particular, chinchilla is commonly used to study diseases involving viral and bacterial pathogens and polymicrobial infections of the upper respiratory tract and the ear, such as otitis media. The value of the chinchilla as a model for human diseases prompted the sequencing of its genome in 2012 and the more recent development of the Chinchilla Research Resource Database (http://crrd.mcw.edu) to provide investigators with easy access to relevant datasets and software tools to enhance their research. The Chinchilla Research Resource Database contains a complete catalog of genes for chinchilla and, for comparative purposes, human. Chinchilla genes can be viewed in the context of their genomic scaffold positions using the JBrowse genome browser. In contrast to the corresponding records at NCBI, individual gene reports at CRRD include functional annotations for Disease, Gene Ontology (GO) Biological Process, GO Molecular Function, GO Cellular Component and Pathway assigned to chinchilla genes based on annotations from the corresponding human orthologs. Data can be retrieved via keyword and gene-specific searches. Lists of genes with similar functional attributes can be assembled by leveraging the hierarchical structure of the Disease, GO and Pathway vocabularies through the Ontology Search and Browser tool. Such lists can then be further analyzed for commonalities using the Gene Annotator (GA) Tool. All data in the Chinchilla Research Resource Database is freely accessible and downloadable via the CRRD FTP site or using the download functions available in the search and analysis tools. The Chinchilla Research Resource Database is a rich resource for researchers using, or considering the use of, chinchilla as a model for human disease. Database URL: http://crrd.mcw.edu PMID:27173523
Database resources of the National Center for Biotechnology Information
Sayers, Eric W.; Barrett, Tanya; Benson, Dennis A.; Bolton, Evan; Bryant, Stephen H.; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M.; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M.; Geer, Lewis Y.; Helmberg, Wolfgang; Kapustin, Yuri; Krasnov, Sergey; Landsman, David; Lipman, David J.; Lu, Zhiyong; Madden, Thomas L.; Madej, Tom; Maglott, Donna R.; Marchler-Bauer, Aron; Miller, Vadim; Karsch-Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D.; Schuler, Gregory D.; Sequeira, Edwin; Sherry, Stephen T.; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A.; Wagner, Lukas; Wang, Yanli; Wilbur, W. John; Yaschenko, Eugene; Ye, Jian
2012-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Website. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Probe, Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. PMID:22140104
Database resources of the National Center for Biotechnology Information
2013-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, the Genetic Testing Registry, Genome and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, BioProject, BioSample, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Probe, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool, Biosystems, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page. PMID:23193264
Database resources of the National Center for Biotechnology Information.
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Edgar, Ron; Federhen, Scott; Feolo, Michael; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Madden, Thomas L; Maglott, Donna R; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Yaschenko, Eugene; Ye, Jian
2009-01-01
In addition to maintaining the GenBank nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs), Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART) and the PubChem suite of small molecule databases. Augmenting many of the web applications is custom implementation of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Iorio, A; De Angelis, F; Garzoli, A; Battistini, A; De Stefano, G F
2014-06-01
Human Leucocyte Antigen (HLA) loci are widely known for their role in the generation of immune responses and are often considered to be effective in reconstructing human relationships. This is due to the high degree of polymorphism and the rarity of recombination observed at HLA loci. In this study, we have made an attempt to support the potential of HLA class II loci by analysing DQA1 and DQB1 in 52 Ecuadorians with ties to the Tsachilas community. Little is known about this populations either ethnologically or historically: they are considered retaining much of the ancient Chibchan culture in spite of the lack of significant genetic characterization. A total of 21 alleles were observed, with very low heterozygosity. The obtained data were then assessed for relationship reconstruction. The compiled database of 63 populations was segregated and resolved in clusters corresponding to the ethnogeographic distribution of the populations. This analysis of Central and Southern Amerindians allowed us to support a historical hypothesis related to the origin and migration of Ecuadorian people. Indeed, the relationships with neighbour human groups, especially Cayapas and Colombians, could shed light on the genetic similarity within ancient Chibchan culture that was dispersed by tribes coming up the Barbacoas. This indicates that if an appropriate analysis was to be carried out on a set of populations representative of different geographic locations, and that analysis was properly interpreted, then there would be a high possibility that HLA class II loci could infer accurate assessments, as revealed by uniparental markers. © 2014 John Wiley & Sons Ltd.
RPA tree-level database users guide
Patrick D. Miles; Scott A. Pugh; Brad Smith; Sonja N. Oswalt
2014-01-01
The Forest and Rangeland Renewable Resources Planning Act (RPA) of 1974 calls for a periodic assessment of the Nation's renewable resources. The Forest Inventory and Analysis (FIA) program of the U.S. Forest Service supports the RPA effort by providing information on the forest resources of the United States. The RPA tree-level database (RPAtreeDB) was generated...
RNAcentral: A vision for an international database of RNA sequences
Bateman, Alex; Agrawal, Shipra; Birney, Ewan; Bruford, Elspeth A.; Bujnicki, Janusz M.; Cochrane, Guy; Cole, James R.; Dinger, Marcel E.; Enright, Anton J.; Gardner, Paul P.; Gautheret, Daniel; Griffiths-Jones, Sam; Harrow, Jen; Herrero, Javier; Holmes, Ian H.; Huang, Hsien-Da; Kelly, Krystyna A.; Kersey, Paul; Kozomara, Ana; Lowe, Todd M.; Marz, Manja; Moxon, Simon; Pruitt, Kim D.; Samuelsson, Tore; Stadler, Peter F.; Vilella, Albert J.; Vogel, Jan-Hinnerk; Williams, Kelly P.; Wright, Mathew W.; Zwieb, Christian
2011-01-01
During the last decade there has been a great increase in the number of noncoding RNA genes identified, including new classes such as microRNAs and piRNAs. There is also a large growth in the amount of experimental characterization of these RNA components. Despite this growth in information, it is still difficult for researchers to access RNA data, because key data resources for noncoding RNAs have not yet been created. The most pressing omission is the lack of a comprehensive RNA sequence database, much like UniProt, which provides a comprehensive set of protein knowledge. In this article we propose the creation of a new open public resource that we term RNAcentral, which will contain a comprehensive collection of RNA sequences and fill an important gap in the provision of biomedical databases. We envision RNA researchers from all over the world joining a federated RNAcentral network, contributing specialized knowledge and databases. RNAcentral would centralize key data that are currently held across a variety of databases, allowing researchers instant access to a single, unified resource. This resource would facilitate the next generation of RNA research and help drive further discoveries, including those that improve food production and human and animal health. We encourage additional RNA database resources and research groups to join this effort. We aim to obtain international network funding to further this endeavor. PMID:21940779
A versatile genome-scale PCR-based pipeline for high-definition DNA FISH.
Bienko, Magda; Crosetto, Nicola; Teytelman, Leonid; Klemm, Sandy; Itzkovitz, Shalev; van Oudenaarden, Alexander
2013-02-01
We developed a cost-effective genome-scale PCR-based method for high-definition DNA FISH (HD-FISH). We visualized gene loci with diffraction-limited resolution, chromosomes as spot clusters and single genes together with transcripts by combining HD-FISH with single-molecule RNA FISH. We provide a database of over 4.3 million primer pairs targeting the human and mouse genomes that is readily usable for rapid and flexible generation of probes.
Genetic and Environmental Pathways in Type 1 Diabetes Complications
2010-09-01
active duty members of the military, their families and retired military personnel will potentially allow focused preventative treatment of at- risk...association and assess potential heterogeneity of association signals, such as by ancestry. Query eQTL databases for relevant associations. Goals: 1a1...1, so that at most an average of 7 SNPs remain as potential risk SNPs at each of the 30 loci. In Stage 2 we will genotype another 2000 cases and
Diversity of Mycobacterium tuberculosis lineages in French Polynesia.
Osman, Djaltou Aboubaker; Phelippeau, Michael; Drancourt, Michel; Musso, Didier
2017-04-01
French Polynesia is an overseas territory located in the South Pacific. The incidence of tuberculosis in French Polynesia has been stable since 2000 with an average of 20 cases/y/100,000 inhabitants. Molecular epidemiology of Mycobacterium tuberculosis in French Polynesia is unknown because M. tuberculosis isolates have not been routinely genotyped. From 2009 to 2012, 34 isolates collected from 32 French Polynesian patients were identified as M. tuberculosis by probe hybridization. These isolates were genotyped using spoligotyping and 24-loci mycobacterial interspersed repetitive units (MIRUs)-variable number of tandem repeat (VNTR). Spoligotype patterns obtained using commercial kits were compared with the online international database SITVIT. MIRU-VNTR genotyping was performed using an in-house protocol based on capillary electrophoresis sizing for 24-loci MIRU-VNTR genotyping. The results of the spoligotyping method revealed that 25 isolates grouped into six previously described spoligotypes [H1, H3, U likely (S), T1, Manu, and Beijing] and nine isolates grouped into six new spoligotypes. Comparison with the international database MIRU-VNTRplus distributed 30 isolates into five lineages (Haarlem, Latin American Mediterranean, S, X, and Beijing) and four as unassigned isolates. Genotyping identified four phylogenetic lineages belonging to the modern Euro-American subgroup, one Beijing genotype responsible for worldwide pandemics, including remote islands in the South Pacific, and one Manu genotype of the ancestral lineage of M. tuberculosis. Copyright © 2015. Published by Elsevier B.V.
Pe’er, Itsik
2017-01-01
Genome-wide association studies (GWAS) have identified hundreds of SNPs responsible for variation in human quantitative traits. However, genome-wide-significant associations often fail to replicate across independent cohorts, in apparent inconsistency with their apparent strong effects in discovery cohorts. This limited success of replication raises pervasive questions about the utility of the GWAS field. We identify all 332 studies of quantitative traits from the NHGRI-EBI GWAS Database with attempted replication. We find that the majority of studies provide insufficient data to evaluate replication rates. The remaining papers replicate significantly worse than expected (p < 10−14), even when adjusting for regression-to-the-mean of effect size between discovery- and replication-cohorts termed the Winner’s Curse (p < 10−16). We show this is due in part to misreporting replication cohort-size as a maximum number, rather than per-locus one. In 39 studies accurately reporting per-locus cohort-size for attempted replication of 707 loci in samples with similar ancestry, replication rate matched expectation (predicted 458, observed 457, p = 0.94). In contrast, ancestry differences between replication and discovery (13 studies, 385 loci) cause the most highly-powered decile of loci to replicate worse than expected, due to difference in linkage disequilibrium. PMID:28715421
Almeida Prado Oliveira e Sousa, Maria Luiza; de Oliveira, Marco Aurelio Tuena; Auler-Bittencout, Eloisa A; Soares-Vieira, Jose Arnaldo; Munoz, Daniel Romero; Iwamura, Edna Sadayo Miazato
2014-07-01
The State of São Paulo is the most populous state in Brazil, including approximately one fifth of the population of the country. In addition to a strong economy, the state has relatively good social indicators when compared with the rest of the country. The capital city, also called São Paulo, is the sixth largest city in the world. Its population is considered the most multicultural and racially mixed in Brazil. Currently, the largest populations in São Paulo are of Italian, Lebanese, Spanish and Japanese origin, and the state has the largest number of Northeasterners outside of the Northeast region. This population structure may lead to a particular genotype frequency. In this context, the formation of a new database containing the allele frequencies of five new genetic markers (D2S441, D10S1248, D22S1045, D1S1656 and D12S391) in a sample population is relevant. The allele frequencies of 16 STR loci, including the five new European Standard Set (ESS) loci, were calculated in a sample of 1088-1098 unrelated individuals, who geographically represent the Capital city. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Solberg, Owen D; Mack, Steven J; Lancaster, Alex K; Single, Richard M; Tsai, Yingssu; Sanchez-Mazas, Alicia; Thomson, Glenys
2008-07-01
This paper presents a meta-analysis of high-resolution human leukocyte antigen (HLA) allele frequency data describing 497 population samples. Most of the datasets were compiled from studies published in eight journals from 1990 to 2007; additional datasets came from the International Histocompatibility Workshops and from the AlleleFrequencies.net database. In all, these data represent approximately 66,800 individuals from throughout the world, providing an opportunity to observe trends that may not have been evident at the time the data were originally analyzed, especially with regard to the relative importance of balancing selection among the HLA loci. Population genetic measures of allele frequency distributions were summarized across populations by locus and geographic region. A role for balancing selection maintaining much of HLA variation was confirmed. Further, the breadth of this meta-analysis allowed the ranking of the HLA loci, with DQA1 and HLA-C showing the strongest balancing selection and DPB1 being compatible with neutrality. Comparisons of the allelic spectra reported by studies since 1990 indicate that most of the HLA alleles identified since 2000 are very-low-frequency alleles. The literature-based allele-count data, as well as maps summarizing the geographic distributions for each allele, are available online.
Solberg, Owen D.; Mack, Steven J.; Lancaster, Alex K.; Single, Richard M.; Tsai, Yingssu; Sanchez-Mazas, Alicia; Thomson, Glenys
2008-01-01
This paper presents a meta-analysis of high-resolution human leukocyte antigen (HLA) allele frequency data describing 497 population samples. Most of the datasets were compiled from studies published in eight journals from 1990 to 2007; additional datasets came from the International Histocompatibility Workshops and from the AlleleFrequencies.net database. In all, these data represent approximately 66,800 individuals from throughout the world, providing an opportunity to observe trends that may not have been evident at the time the data were originally analyzed, especially with regard to the relative importance of balancing selection among the HLA loci. Population genetic measures of allele frequency distributions were summarized across populations by locus and geographic region. A role for balancing selection maintaining much of HLA variation was confirmed. Further, the breadth of this meta-analysis allowed the ranking of the HLA loci, with DQA1 and HLA-C showing strongest balancing selection and DPB1 being compatible with neutrality. Comparisons of the allelic spectra reported by studies since 1990 suggest that most of the HLA alleles identified since 2000 are very-low-frequency alleles. The literature-based allele-count data, as well as maps summarizing the geographic distributions for each allele, are available online. PMID:18638659
biochem4j: Integrated and extensible biochemical knowledge through graph databases.
Swainston, Neil; Batista-Navarro, Riza; Carbonell, Pablo; Dobson, Paul D; Dunstan, Mark; Jervis, Adrian J; Vinaixa, Maria; Williams, Alan R; Ananiadou, Sophia; Faulon, Jean-Loup; Mendes, Pedro; Kell, Douglas B; Scrutton, Nigel S; Breitling, Rainer
2017-01-01
Biologists and biochemists have at their disposal a number of excellent, publicly available data resources such as UniProt, KEGG, and NCBI Taxonomy, which catalogue biological entities. Despite the usefulness of these resources, they remain fundamentally unconnected. While links may appear between entries across these databases, users are typically only able to follow such links by manual browsing or through specialised workflows. Although many of the resources provide web-service interfaces for computational access, performing federated queries across databases remains a non-trivial but essential activity in interdisciplinary systems and synthetic biology programmes. What is needed are integrated repositories to catalogue both biological entities and-crucially-the relationships between them. Such a resource should be extensible, such that newly discovered relationships-for example, those between novel, synthetic enzymes and non-natural products-can be added over time. With the introduction of graph databases, the barrier to the rapid generation, extension and querying of such a resource has been lowered considerably. With a particular focus on metabolic engineering as an illustrative application domain, biochem4j, freely available at http://biochem4j.org, is introduced to provide an integrated, queryable database that warehouses chemical, reaction, enzyme and taxonomic data from a range of reliable resources. The biochem4j framework establishes a starting point for the flexible integration and exploitation of an ever-wider range of biological data sources, from public databases to laboratory-specific experimental datasets, for the benefit of systems biologists, biosystems engineers and the wider community of molecular biologists and biological chemists.
biochem4j: Integrated and extensible biochemical knowledge through graph databases
Batista-Navarro, Riza; Dunstan, Mark; Jervis, Adrian J.; Vinaixa, Maria; Ananiadou, Sophia; Faulon, Jean-Loup; Kell, Douglas B.
2017-01-01
Biologists and biochemists have at their disposal a number of excellent, publicly available data resources such as UniProt, KEGG, and NCBI Taxonomy, which catalogue biological entities. Despite the usefulness of these resources, they remain fundamentally unconnected. While links may appear between entries across these databases, users are typically only able to follow such links by manual browsing or through specialised workflows. Although many of the resources provide web-service interfaces for computational access, performing federated queries across databases remains a non-trivial but essential activity in interdisciplinary systems and synthetic biology programmes. What is needed are integrated repositories to catalogue both biological entities and–crucially–the relationships between them. Such a resource should be extensible, such that newly discovered relationships–for example, those between novel, synthetic enzymes and non-natural products–can be added over time. With the introduction of graph databases, the barrier to the rapid generation, extension and querying of such a resource has been lowered considerably. With a particular focus on metabolic engineering as an illustrative application domain, biochem4j, freely available at http://biochem4j.org, is introduced to provide an integrated, queryable database that warehouses chemical, reaction, enzyme and taxonomic data from a range of reliable resources. The biochem4j framework establishes a starting point for the flexible integration and exploitation of an ever-wider range of biological data sources, from public databases to laboratory-specific experimental datasets, for the benefit of systems biologists, biosystems engineers and the wider community of molecular biologists and biological chemists. PMID:28708831
Developmental validation of the PowerPlex(®) 21 System.
Ensenberger, Martin G; Hill, Carolyn R; McLaren, Robert S; Sprecher, Cynthia J; Storts, Douglas R
2014-03-01
The PowerPlex(®) 21 System is a STR multiplex that has been optimized for casework samples while still being capable of database workflows including direct amplification. The loci included in the multiplex offer increasing overlap with core loci used in different countries and regions throughout the world. The PowerPlex(®) 21 System contains D1S1656, D2S1338, D3S1358, D5S818, D6S1043, D7S820, D8S1179, D12S391, D13S317, D16S539, D18S51, D19S433, D21S11, Amelogenin, CSF1PO, FGA, Penta D, Penta E, TH01, TPOX, and vWA. These loci represent all 13 core CODIS loci in addition to loci commonly used in Asia and Europe. A developmental validation study was completed to document performance capabilities and limitations of the PowerPlex(®) 21 System. Data from this validation work served as the basis for the following conclusions: genotyping of single-source samples was reliable across a range of template DNA concentrations with >95% alleles called at 50 pg. Direct amplification of samples from FTA(®) storage cards was successfully performed using the reagents provided with the system and modified cycling protocols provided in the technical manual. Mixture analysis showed that over 95% of minor alleles were detected at 1:9 ratios. Reaction conditions including volume and annealing temperature as well as the concentrations of primers, DNA polymerase, magnesium, and Master Mix were shown to be optimal and able to withstand moderate variations without affecting system performance. Reproducible results were generated by different users at different sites. Finally, concordance studies showed consistent results when comparing the PowerPlex(®) 21 System with other commercially available STR-genotyping systems. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Scally, Mark; Schuenzel, Erin L; Stouthamer, Richard; Nunney, Leonard
2005-12-01
Multilocus sequence typing (MLST) identifies and groups bacterial strains based on DNA sequence data from (typically) seven housekeeping genes. MLST has also been employed to estimate the relative contributions of recombination and point mutation to clonal divergence. We applied MLST to the plant pathogen Xylella fastidiosa using an initial set of sequences for 10 loci (9.3 kb) of 25 strains from five different host plants, grapevine (PD strains), oleander (OLS strains), oak (OAK strains), almond (ALS strains), and peach (PP strains). An eBURST analysis identified six clonal complexes using the grouping criterion that each member must be identical to at least one other member at 7 or more of the 10 loci. These clonal complexes corresponded to previously identified phylogenetic clades; clonal complex 1 (CC1) (all PD strains plus two ALS strains) and CC2 (OLS strains) defined the X. fastidiosa subsp. fastidiosa and X. fastidiosa subsp. sandyi clades, while CC3 (ALS strains), CC4 (OAK strains), and CC5 (PP strains) were subclades of X. fastidiosa subsp. multiplex. CC6 (ALS strains) identified an X. fastidiosa subsp. multiplex-like group characterized by a high frequency of intersubspecific recombination. Compared to the recombination rate in other bacterial species, the recombination rate in X. fastidiosa is relatively low. Recombination between different alleles was estimated to give rise to 76% of the nucleotide changes and 31% of the allelic changes observed. The housekeeping loci holC, nuoL, leuA, gltT, cysG, petC, and lacF were chosen to form the basis of a public database for typing X. fastidiosa (www.mlst.net). These loci identified the same six clonal complexes using the strain grouping criterion of identity at five or more loci with at least one other member.
Patel, Sanjay R.; Goodloe, Robert; De, Gourab; Kowgier, Matthew; Weng, Jia; Buxbaum, Sarah G.; Cade, Brian; Fulop, Tibor; Gharib, Sina A.; Gottlieb, Daniel J.; Hillman, David; Larkin, Emma K.; Lauderdale, Diane S.; Li, Li; Mukherjee, Sutapa; Palmer, Lyle; Zee, Phyllis; Zhu, Xiaofeng; Redline, Susan
2012-01-01
Although obstructive sleep apnea (OSA) is known to have a strong familial basis, no genetic polymorphisms influencing apnea risk have been identified in cross-cohort analyses. We utilized the National Heart, Lung, and Blood Institute (NHLBI) Candidate Gene Association Resource (CARe) to identify sleep apnea susceptibility loci. Using a panel of 46,449 polymorphisms from roughly 2,100 candidate genes on a customized Illumina iSelect chip, we tested for association with the apnea hypopnea index (AHI) as well as moderate to severe OSA (AHI≥15) in 3,551 participants of the Cleveland Family Study and two cohorts participating in the Sleep Heart Health Study. Among 647 African-Americans, rs11126184 in the pleckstrin (PLEK) gene was associated with OSA while rs7030789 in the lysophosphatidic acid receptor 1 (LPAR1) gene was associated with AHI using a chip-wide significance threshold of p-value<2×10−6. Among 2,904 individuals of European ancestry, rs1409986 in the prostaglandin E2 receptor (PTGER3) gene was significantly associated with OSA. Consistency of effects between rs7030789 and rs1409986 in LPAR1 and PTGER3 and apnea phenotypes were observed in independent clinic-based cohorts. Novel genetic loci for apnea phenotypes were identified through the use of customized gene chips and meta-analyses of cohort data with replication in clinic-based samples. The identified SNPs all lie in genes associated with inflammation suggesting inflammation may play a role in OSA pathogenesis. PMID:23155414
The 2018 Nucleic Acids Research database issue and the online molecular biology database collection.
Rigden, Daniel J; Fernández, Xosé M
2018-01-04
The 2018 Nucleic Acids Research Database Issue contains 181 papers spanning molecular biology. Among them, 82 are new and 84 are updates describing resources that appeared in the Issue previously. The remaining 15 cover databases most recently published elsewhere. Databases in the area of nucleic acids include 3DIV for visualisation of data on genome 3D structure and RNArchitecture, a hierarchical classification of RNA families. Protein databases include the established SMART, ELM and MEROPS while GPCRdb and the newcomer STCRDab cover families of biomedical interest. In the area of metabolism, HMDB and Reactome both report new features while PULDB appears in NAR for the first time. This issue also contains reports on genomics resources including Ensembl, the UCSC Genome Browser and ENCODE. Update papers from the IUPHAR/BPS Guide to Pharmacology and DrugBank are highlights of the drug and drug target section while a number of proteomics databases including proteomicsDB are also covered. The entire Database Issue is freely available online on the Nucleic Acids Research website (https://academic.oup.com/nar). The NAR online Molecular Biology Database Collection has been updated, reviewing 138 entries, adding 88 new resources and eliminating 47 discontinued URLs, bringing the current total to 1737 databases. It is available at http://www.oxfordjournals.org/nar/database/c/. © The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research.
ERIC Educational Resources Information Center
Howley, Craig B.; And Others
This guide explains what the Educational Resources Information Center (ERIC) database is and how it can be used by parents to learn more about schooling and parenting. The guide also presents sample records of 55 documents in the ERIC database. The cited resources are particularly relevant to parents' concerns about meeting children's basic needs,…
USDA-ARS?s Scientific Manuscript database
High-throughput genotyping arrays provide a standardized resource for crop research communities that are useful for a breadth of applications including high-density genetic mapping, genome-wide association studies (GWAS), genomic selection (GS), candidate marker and quantitative trait loci (QTL) ide...
USDA-ARS?s Scientific Manuscript database
A genetic linkage map is critical for identifying the QTL (quantitative trait loci) underling targeted traits. Over the last few years, progress has been made in marker development from multiple sources enabling the expansion of quality resources needed for genotyping applications in cultivated x cu...
Gao, Shilin; Feng, Guiwen; Feng, Yonghua; Wang, Zhigang; Zhang, Xiaobai
2016-01-01
Chronic kidney disease is becoming a global public health problem, which will usually cause uremia at the end stage of chronic kidney failure. So far, kidney transplant is the most effective and proper therapy for uremia, however, the short supply of matched donor kidney has been a persistent bottleneck for transplantation. HLA matching of HLA-A, -B and -DRB1 loci is very important for the allocation of kidney transplants. In this study, we investigated genotypes of HLA-A, -B and -DRB1 loci based on 1,464 uremia patients and 10,000 unrelated healthy individuals in Henan province of China, and compared the frequency distribution of these HLA alleles and corresponding haplotypes between patient and healthy groups. We detected 23 HLA-A, 49 HLA-B and 17 HLA-DRB1 alleles in total. The predominant alleles of HLA-A, -B and -DRB1 loci in patients are the same as those in healthy group. The seven most frequent alleles account for about 87%, 50%, and 77% at HLA-A, -B and -DRB1 loci, respectively. The haplotypes (combinations of HLA-A, -B, and -DRB1) with significantly different frequency between patients and controls mostly account for less than 1%. Overall, this suggests that HLA matching is not a potential difficulty for kidney transplant of uremia patients. However, three of the top seven frequent HLA-DRB1 alleles have a significantly different distribution in patients and controls, while only one alleles for HLA-B and zero for HLA-A loci. These HLA-DRB1 alleles may be closely associated with uremia. This study sheds new lights on the composition and difference of HLA genotypes in uremia patients and healthy populations in Central China that can serve as a guide to HLA matching for kidney transplants and a resource for HLA typing-related studies. PMID:27780235
Wu, Jing; Zhu, Jifeng; Wang, Lanfen; Wang, Shumin
2017-01-01
Nucleotide-binding site and leucine-rich repeat (NBS-LRR) genes represent the largest and most important disease resistance genes in plants. The genome sequence of the common bean ( Phaseolus vulgaris L.) provides valuable data for determining the genomic organization of NBS-LRR genes. However, data on the NBS-LRR genes in the common bean are limited. In total, 178 NBS-LRR-type genes and 145 partial genes (with or without a NBS) located on 11 common bean chromosomes were identified from genome sequences database. Furthermore, 30 NBS-LRR genes were classified into Toll/interleukin-1 receptor (TIR)-NBS-LRR (TNL) types, and 148 NBS-LRR genes were classified into coiled-coil (CC)-NBS-LRR (CNL) types. Moreover, the phylogenetic tree supported the division of these PvNBS genes into two obvious groups, TNL types and CNL types. We also built expression profiles of NBS genes in response to anthracnose and common bacterial blight using qRT-PCR. Finally, we detected nine disease resistance loci for anthracnose (ANT) and seven for common bacterial blight (CBB) using the developed NBS-SSR markers. Among these loci, NSSR24, NSSR73, and NSSR265 may be located at new regions for ANT resistance, while NSSR65 and NSSR260 may be located at new regions for CBB resistance. Furthermore, we validated NSSR24, NSSR65, NSSR73, NSSR260, and NSSR265 using a new natural population. Our results provide useful information regarding the function of the NBS-LRR proteins and will accelerate the functional genomics and evolutionary studies of NBS-LRR genes in food legumes. NBS-SSR markers represent a wide-reaching resource for molecular breeding in the common bean and other food legumes. Collectively, our results should be of broad interest to bean scientists and breeders.
Transcriptomic Signatures of Ash (Fraxinus spp.) Phloem
Mamidala, Praveen; Bonello, Pierluigi; Herms, Daniel A.; Mittapalli, Omprakash
2011-01-01
Background Ash (Fraxinus spp.) is a dominant tree species throughout urban and forested landscapes of North America (NA). The rapid invasion of NA by emerald ash borer (Agrilus planipennis), a wood-boring beetle endemic to Eastern Asia, has resulted in the death of millions of ash trees and threatens billions more. Larvae feed primarily on phloem tissue, which girdles and kills the tree. While NA ash species including black (F. nigra), green (F. pennsylvannica) and white (F. americana) are highly susceptible, the Asian species Manchurian ash (F. mandshurica) is resistant to A. planipennis perhaps due to their co-evolutionary history. Little is known about the molecular genetics of ash. Hence, we undertook a functional genomics approach to identify the repertoire of genes expressed in ash phloem. Methodology and Principal Findings Using 454 pyrosequencing we obtained 58,673 high quality ash sequences from pooled phloem samples of green, white, black, blue and Manchurian ash. Intriguingly, 45% of the deduced proteins were not significantly similar to any sequences in the GenBank non-redundant database. KEGG analysis of the ash sequences revealed a high occurrence of defense related genes. Expression analysis of early regulators potentially involved in plant defense (i.e. transcription factors, calcium dependent protein kinases and a lipoxygenase 3) revealed higher mRNA levels in resistant ash compared to susceptible ash species. Lastly, we predicted a total of 1,272 single nucleotide polymorphisms and 980 microsatellite loci, among which seven microsatellite loci showed polymorphism between different ash species. Conclusions and Significance The current transcriptomic data provide an invaluable resource for understanding the genetic make-up of ash phloem, the target tissue of A. planipennis. These data along with future functional studies could lead to the identification/characterization of defense genes involved in resistance of ash to A. planipennis, and in future ash breeding programs for marker development. PMID:21283712
Wu, Jing; Zhu, Jifeng; Wang, Lanfen; Wang, Shumin
2017-01-01
Nucleotide-binding site and leucine-rich repeat (NBS-LRR) genes represent the largest and most important disease resistance genes in plants. The genome sequence of the common bean (Phaseolus vulgaris L.) provides valuable data for determining the genomic organization of NBS-LRR genes. However, data on the NBS-LRR genes in the common bean are limited. In total, 178 NBS-LRR-type genes and 145 partial genes (with or without a NBS) located on 11 common bean chromosomes were identified from genome sequences database. Furthermore, 30 NBS-LRR genes were classified into Toll/interleukin-1 receptor (TIR)-NBS-LRR (TNL) types, and 148 NBS-LRR genes were classified into coiled-coil (CC)-NBS-LRR (CNL) types. Moreover, the phylogenetic tree supported the division of these PvNBS genes into two obvious groups, TNL types and CNL types. We also built expression profiles of NBS genes in response to anthracnose and common bacterial blight using qRT-PCR. Finally, we detected nine disease resistance loci for anthracnose (ANT) and seven for common bacterial blight (CBB) using the developed NBS-SSR markers. Among these loci, NSSR24, NSSR73, and NSSR265 may be located at new regions for ANT resistance, while NSSR65 and NSSR260 may be located at new regions for CBB resistance. Furthermore, we validated NSSR24, NSSR65, NSSR73, NSSR260, and NSSR265 using a new natural population. Our results provide useful information regarding the function of the NBS-LRR proteins and will accelerate the functional genomics and evolutionary studies of NBS-LRR genes in food legumes. NBS-SSR markers represent a wide-reaching resource for molecular breeding in the common bean and other food legumes. Collectively, our results should be of broad interest to bean scientists and breeders. PMID:28848595
How trees allocate carbon for optimal growth: insight from a game-theoretic model.
Fu, Liyong; Sun, Lidan; Han, Hao; Jiang, Libo; Zhu, Sheng; Ye, Meixia; Tang, Shouzheng; Huang, Minren; Wu, Rongling
2017-02-01
How trees allocate photosynthetic products to primary height growth and secondary radial growth reflects their capacity to best use environmental resources. Despite substantial efforts to explore tree height-diameter relationship empirically and through theoretical modeling, our understanding of the biological mechanisms that govern this phenomenon is still limited. By thinking of stem woody biomass production as an ecological system of apical and lateral growth components, we implement game theory to model and discern how these two components cooperate symbiotically with each other or compete for resources to determine the size of a tree stem. This resulting allometry game theory is further embedded within a genetic mapping and association paradigm, allowing the genetic loci mediating the carbon allocation of stemwood growth to be characterized and mapped throughout the genome. Allometry game theory was validated by analyzing a mapping data of stem height and diameter growth over perennial seasons in a poplar tree. Several key quantitative trait loci were found to interpret the process and pattern of stemwood growth through regulating the ecological interactions of stem apical and lateral growth. The application of allometry game theory enables the prediction of the situations in which the cooperation, competition or altruism is an optimal decision of a tree to fully use the environmental resources it owns. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Functional Annotation of the Arabidopsis Genome Using Controlled Vocabularies1
Berardini, Tanya Z.; Mundodi, Suparna; Reiser, Leonore; Huala, Eva; Garcia-Hernandez, Margarita; Zhang, Peifen; Mueller, Lukas A.; Yoon, Jungwoon; Doyle, Aisling; Lander, Gabriel; Moseyko, Nick; Yoo, Danny; Xu, Iris; Zoeckler, Brandon; Montoya, Mary; Miller, Neil; Weems, Dan; Rhee, Seung Y.
2004-01-01
Controlled vocabularies are increasingly used by databases to describe genes and gene products because they facilitate identification of similar genes within an organism or among different organisms. One of The Arabidopsis Information Resource's goals is to associate all Arabidopsis genes with terms developed by the Gene Ontology Consortium that describe the molecular function, biological process, and subcellular location of a gene product. We have also developed terms describing Arabidopsis anatomy and developmental stages and use these to annotate published gene expression data. As of March 2004, we used computational and manual annotation methods to make 85,666 annotations representing 26,624 unique loci. We focus on associating genes to controlled vocabulary terms based on experimental data from the literature and use The Arabidopsis Information Resource-developed PubSearch software to facilitate this process. Each annotation is tagged with a combination of evidence codes, evidence descriptions, and references that provide a robust means to assess data quality. Annotation of all Arabidopsis genes will allow quantitative comparisons between sets of genes derived from sources such as microarray experiments. The Arabidopsis annotation data will also facilitate annotation of newly sequenced plant genomes by using sequence similarity to transfer annotations to homologous genes. In addition, complete and up-to-date annotations will make unknown genes easy to identify and target for experimentation. Here, we describe the process of Arabidopsis functional annotation using a variety of data sources and illustrate several ways in which this information can be accessed and used to infer knowledge about Arabidopsis and other plant species. PMID:15173566
Concierge: Personal Database Software for Managing Digital Research Resources
Sakai, Hiroyuki; Aoyama, Toshihiro; Yamaji, Kazutsuna; Usui, Shiro
2007-01-01
This article introduces a desktop application, named Concierge, for managing personal digital research resources. Using simple operations, it enables storage of various types of files and indexes them based on content descriptions. A key feature of the software is a high level of extensibility. By installing optional plug-ins, users can customize and extend the usability of the software based on their needs. In this paper, we also introduce a few optional plug-ins: literature management, electronic laboratory notebook, and XooNlps client plug-ins. XooNIps is a content management system developed to share digital research resources among neuroscience communities. It has been adopted as the standard database system in Japanese neuroinformatics projects. Concierge, therefore, offers comprehensive support from management of personal digital research resources to their sharing in open-access neuroinformatics databases such as XooNIps. This interaction between personal and open-access neuroinformatics databases is expected to enhance the dissemination of digital research resources. Concierge is developed as an open source project; Mac OS X and Windows XP versions have been released at the official site (http://concierge.sourceforge.jp). PMID:18974800
Jasinska, Anna J; Zelaya, Ivette; Service, Susan K; Peterson, Christine B; Cantor, Rita M; Choi, Oi-Wa; DeYoung, Joseph; Eskin, Eleazar; Fairbanks, Lynn A; Fears, Scott; Furterer, Allison E; Huang, Yu S; Ramensky, Vasily; Schmitt, Christopher A; Svardal, Hannes; Jorgensen, Matthew J; Kaplan, Jay R; Villar, Diego; Aken, Bronwen L; Flicek, Paul; Nag, Rishi; Wong, Emily S; Blangero, John; Dyer, Thomas D; Bogomolov, Marina; Benjamini, Yoav; Weinstock, George M; Dewar, Ken; Sabatti, Chiara; Wilson, Richard K; Jentsch, J David; Warren, Wesley; Coppola, Giovanni; Woods, Roger P; Freimer, Nelson B
2017-12-01
By analyzing multitissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalog of expression quantitative trait loci (eQTLs) in a nonhuman primate model. This catalog contains more genome-wide significant eQTLs per sample than comparable human resources and identifies sex- and age-related expression patterns. Findings include a master regulatory locus that likely has a role in immune function and a locus regulating hippocampal long noncoding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders.
TryTransDB: A web-based resource for transport proteins in Trypanosomatidae.
Sonar, Krushna; Kabra, Ritika; Singh, Shailza
2018-03-12
TryTransDB is a web-based resource that stores transport protein data which can be retrieved using a standalone BLAST tool. We have attempted to create an integrated database that can be a one-stop shop for the researchers working with transport proteins of Trypanosomatidae family. TryTransDB (Trypanosomatidae Transport Protein Database) is a web based comprehensive resource that can fire a BLAST search against most of the transport protein sequences (protein and nucleotide) from Trypanosomatidae family organisms. This web resource further allows to compute a phylogenetic tree by performing multiple sequence alignment (MSA) using CLUSTALW suite embedded in it. Also, cross-linking to other databases helps in gathering more information for a certain transport protein in a single website.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Weckwerth, Wolfram; Baginsky, Sacha; Van Wijk, Klass
2009-12-01
In the past 10 years, we have witnessed remarkable advances in the field of plant molecular biology. The rapid development of proteomic technologies and the speed with which these techniques have been applied to the field have altered our perception of how we can analyze proteins in complex systems. At nearly the same time, the availability of the complete genome for the model plant Arabidopsis thaliana was released; this effort provides an unsurpassed resource for the identification of proteins when researchers use MS to analyze plant samples. Recognizing the growth in this area, the Multinational Arabidopsis Steering Committee (MASC) establishedmore » a subcommittee for A. thaliana proteomics in 2006 with the objective of consolidating databases, technique standards, and experimentally validated candidate genes and functions. Since the establishment of the Multinational Arabidopsis Steering Subcommittee for Proteomics (MASCP), many new approaches and resources have become available. Recently, the subcommittee established a webpage to consolidate this information (www.masc-proteomics.org). It includes links to plant proteomic databases, general information about proteomic techniques, meeting information, a summary of proteomic standards, and other relevant resources. Altogether, this website provides a useful resource for the Arabidopsis proteomics community. In the future, the website will host discussions and investigate the cross-linking of databases. The subcommittee members have extensive experience in arabidopsis proteomics and collectively have produced some of the most extensive proteomics data sets for this model plant (Table S1 in the Supporting Information has a list of resources). The largest collection of proteomics data from a single study in A. thaliana was assembled into an accessible database (AtProteome; http://fgcz-atproteome.unizh.ch/index.php) and was recently published by the Baginsky lab.1 The database provides links to major Arabidopsis online resources, and raw data have been deposited in PRIDE and PRIDE BioMart. Included in this database is an Arabidopsis proteome map that provides evidence for the expression of {approx}50% of all predicted gene models, including several alternative gene models that are not represented in The Arabidopsis Information Resource (TAIR) protein database. A set of organ-specific biomarkers is provided, as well as organ-specific proteotypic peptides for 4105 proteins that can be used to facilitate targeted quantitative proteomic surveys. In the future, the AtProteome database will be linked to additional existing resources developed by MASCP members, such as PPDB, ProMEX, and SUBA. The most comprehensive study on the Arabidopsis chloroplast proteome, which includes information on chloroplast sorting signals, posttranslational modifications (PTMs), and protein abundances (analyzed by high-accuracy MS [Orbitrap]), was recently published by the van Wijk lab.2 These and previous data are available via the plant proteome database (PPDB; http://ppdb.tc.cornell.edu) for A. thaliana and maize. PPDB provides genome-wide experimental and functional characterization of the A. thaliana and maize proteomes, including PTMs and subcellular localization information, with an emphasis on leaf and plastid proteins. Maize and Arabidopsis proteome entries are directly linked via internal BLAST alignments within PPDB. Direct links for each protein to TAIR, SUBA, ProMEX, and other resources are also provided.« less
Database resources of the National Center for Biotechnology Information.
Sayers, Eric W; Barrett, Tanya; Benson, Dennis A; Bolton, Evan; Bryant, Stephen H; Canese, Kathi; Chetvernin, Vyacheslav; Church, Deanna M; DiCuccio, Michael; Federhen, Scott; Feolo, Michael; Fingerman, Ian M; Geer, Lewis Y; Helmberg, Wolfgang; Kapustin, Yuri; Landsman, David; Lipman, David J; Lu, Zhiyong; Madden, Thomas L; Madej, Tom; Maglott, Donna R; Marchler-Bauer, Aron; Miller, Vadim; Mizrachi, Ilene; Ostell, James; Panchenko, Anna; Phan, Lon; Pruitt, Kim D; Schuler, Gregory D; Sequeira, Edwin; Sherry, Stephen T; Shumway, Martin; Sirotkin, Karl; Slotta, Douglas; Souvorov, Alexandre; Starchenko, Grigory; Tatusova, Tatiana A; Wagner, Lukas; Wang, Yanli; Wilbur, W John; Yaschenko, Eugene; Ye, Jian
2011-01-01
In addition to maintaining the GenBank® nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data made available through the NCBI Web site. NCBI resources include Entrez, the Entrez Programming Utilities, MyNCBI, PubMed, PubMed Central (PMC), Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Primer-BLAST, COBALT, Electronic PCR, OrfFinder, Splign, ProSplign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, dbVar, Epigenomics, Cancer Chromosomes, Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Trace Archive, Sequence Read Archive, Retroviral Genotyping Tools, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus (GEO), Entrez Probe, GENSAT, Online Mendelian Inheritance in Man (OMIM), Online Mendelian Inheritance in Animals (OMIA), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD), the Conserved Domain Architecture Retrieval Tool (CDART), IBIS, Biosystems, Peptidome, OMSSA, Protein Clusters and the PubChem suite of small molecule databases. Augmenting many of the Web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of these resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov.
Nicholson, Suzanne W.; Stoeser, Douglas B.; Wilson, Frederic H.; Dicken, Connie L.; Ludington, Steve
2007-01-01
The growth in the use of Geographic nformation Systems (GS) has highlighted the need for regional and national digital geologic maps attributed with age and rock type information. Such spatial data can be conveniently used to generate derivative maps for purposes that include mineral-resource assessment, metallogenic studies, tectonic studies, human health and environmental research. n 1997, the United States Geological Survey’s Mineral Resources Program initiated an effort to develop national digital databases for use in mineral resource and environmental assessments. One primary activity of this effort was to compile a national digital geologic map database, utilizing state geologic maps, to support mineral resource studies in the range of 1:250,000- to 1:1,000,000-scale. Over the course of the past decade, state databases were prepared using a common standard for the database structure, fields, attributes, and data dictionaries. As of late 2006, standardized geological map databases for all conterminous (CONUS) states have been available on-line as USGS Open-File Reports. For Alaska and Hawaii, new state maps are being prepared, and the preliminary work for Alaska is being released as a series of 1:500,000-scale regional compilations. See below for a list of all published databases.
Development and Implementation of Kumamoto Technopolis Regional Database T-KIND
NASA Astrophysics Data System (ADS)
Onoue, Noriaki
T-KIND (Techno-Kumamoto Information Network for Data-Base) is a system for effectively searching information of technology, human resources and industries which are necessary to realize Kumamoto Technopolis. It is composed of coded database, image database and LAN inside technoresearch park which is the center of R & D in the Technopolis. It constructs on-line system by networking general-purposed computers, minicomputers, optical disk file systems and so on, and provides the service through public telephone line. Two databases are now available on enterprise information and human resource information. The former covers about 4,000 enterprises, and the latter does about 2,000 persons.
Influenza research database: an integrated bioinformatics resource for influenza virus research
USDA-ARS?s Scientific Manuscript database
The Influenza Research Database (IRD) is a U.S. National Institute of Allergy and Infectious Diseases (NIAID)-sponsored Bioinformatics Resource Center dedicated to providing bioinformatics support for influenza virus research. IRD facilitates the research and development of vaccines, diagnostics, an...
The Resource Identification Initiative: A cultural shift in publishing.
Bandrowski, Anita; Brush, Matthew; Grethe, Jeffery S; Haendel, Melissa A; Kennedy, David N; Hill, Sean; Hof, Patrick R; Martone, Maryann E; Pols, Maaike; Tan, Serena; Washington, Nicole; Zudilova-Seinstra, Elena; Vasilevsky, Nicole
2015-01-01
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to allow humans and algorithms to identify the exact resources that are reported or answer basic questions such as "What other studies used resource X?" To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (including software and databases). RRIDs represent accession numbers assigned by an authoritative database, e.g., the model organism databases, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal ( www.scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are generally accurate in performing the task of identifying resources and supportive of the goals of the project. We also show that identifiability of the resources pre- and post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on reproducibility relating to research resources.
The Resource Identification Initiative: a cultural shift in publishing.
Bandrowski, Anita; Brush, Matthew; Grethe, Jeffery S; Haendel, Melissa A; Kennedy, David N; Hill, Sean; Hof, Patrick R; Martone, Maryann E; Pols, Maaike; Tan, Serena C; Washington, Nicole; Zudilova-Seinstra, Elena; Vasilevsky, Nicole
2016-01-01
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, that is, reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to identify the exact resources that are reported or to answer basic questions such as "How did other studies use resource X?" To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and scientific reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (i.e., software and databases). RRIDs are assigned by an authoritative database, for example, a model organism database for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal ( http://scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40 with RRIDs appearing in 62 different journals to date. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are able to identify resources and are supportive of the goals of the project. Identifiability of the resources post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on identifiability of research resources.
The Resource Identification Initiative: A Cultural Shift in Publishing.
Bandrowski, Anita; Brush, Matthew; Grethe, Jeffery S; Haendel, Melissa A; Kennedy, David N; Hill, Sean; Hof, Patrick R; Martone, Maryann E; Pols, Maaike; Tan, Serena C; Washington, Nicole; Zudilova-Seinstra, Elena; Vasilevsky, Nicole
2016-01-01
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to identify the exact resources that are reported or to answer basic questions such as "How did other studies use resource X?" To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the Methods sections of articles and thereby improve identifiability and scientific reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their articles prior to publication for three resource types: antibodies, model organisms, and tools (i.e., software and databases). RRIDs are assigned by an authoritative database, for example, a model organism database for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central Web portal (http://scicrunch.org/resources). RRIDs meet three key criteria: they are machine-readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 articles have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40, with RRIDs appearing in 62 different journals to date. Here we present an overview of the pilot project and its outcomes to date. We show that authors are able to identify resources and are supportive of the goals of the project. Identifiability of the resources post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on identifiability of research resources. © 2015 Wiley Periodicals, Inc.
The Resource Identification Initiative: A cultural shift in publishing
Bandrowski, Anita; Brush, Matthew; Grethe, Jeffery S.; ...
2015-05-29
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to allow humans and algorithms to identify the exact resources that are reported or answer basic questions such as “What other studies used resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and reproducibility. The pilot engaged over 25more » biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (including software and databases). RRIDs represent accession numbers assigned by an authoritative database, e.g., the model organism databases, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal (www.scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are generally accurate in performing the task of identifying resources and supportive of the goals of the project. We also show that identifiability of the resources pre- and post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on reproducibility relating to research resources.« less
The Resource Identification Initiative: A cultural shift in publishing
Bandrowski, Anita; Brush, Matthew; Grethe, Jeffery S.; Haendel, Melissa A.; Kennedy, David N.; Hill, Sean; Hof, Patrick R.; Martone, Maryann E.; Pols, Maaike; Tan, Serena; Washington, Nicole; Zudilova-Seinstra, Elena; Vasilevsky, Nicole
2015-01-01
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to allow humans and algorithms to identify the exact resources that are reported or answer basic questions such as “What other studies used resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (including software and databases). RRIDs represent accession numbers assigned by an authoritative database, e.g., the model organism databases, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal ( www.scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are generally accurate in performing the task of identifying resources and supportive of the goals of the project. We also show that identifiability of the resources pre- and post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on reproducibility relating to research resources. PMID:26594330
The Resource Identification Initiative: A cultural shift in publishing
Bandrowski, Anita; Brush, Matthew; Grethe, Jeffery S; Haendel, Melissa A; Kennedy, David N; Hill, Sean; Hof, Patrick R; Martone, Maryann E; Pols, Maaike; Tan, Serena S; Washington, Nicole; Zudilova-Seinstra, Elena; Vasilevsky, Nicole
2016-01-01
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to identify the exact resources that are reported or to answer basic questions such as “How did other studies use resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and scientific reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (i.e. software and databases). RRIDs are assigned by an authoritative database, for example a model organism database, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal (http://scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40 with RRIDs appearing in 62 different journals to date. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are able to identify resources and are supportive of the goals of the project. Identifiability of the resources post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on identifiability of research resources. PMID:26589523
The Resource Identification Initiative: A Cultural Shift in Publishing.
Bandrowski, Anita; Brush, Matthew; Grethe, Jeffery S; Haendel, Melissa A; Kennedy, David N; Hill, Sean; Hof, Patrick R; Martone, Maryann E; Pols, Maaike; Tan, Serena S; Washington, Nicole; Zudilova-Seinstra, Elena; Vasilevsky, Nicole
2016-04-01
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to identify the exact resources that are reported or to answer basic questions such as "How did other studies use resource X?" To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and scientific reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (i.e., software and databases). RRIDs are assigned by an authoritative database, for example a model organism database, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal ( http://scicrunch.org/resources ). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40 with RRIDs appearing in 62 different journals to date. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are able to identify resources and are supportive of the goals of the project. Identifiability of the resources post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on identifiability of research resources.
The Resource Identification Initiative: A cultural shift in publishing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bandrowski, Anita; Brush, Matthew; Grethe, Jeffery S.
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to allow humans and algorithms to identify the exact resources that are reported or answer basic questions such as “What other studies used resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the methods sections of papers and thereby improve identifiability and reproducibility. The pilot engaged over 25more » biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their manuscripts prior to publication for three resource types: antibodies, model organisms, and tools (including software and databases). RRIDs represent accession numbers assigned by an authoritative database, e.g., the model organism databases, for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central web portal (www.scicrunch.org/resources). RRIDs meet three key criteria: they are machine readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 papers have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40. Here, we present an overview of the pilot project and its outcomes to date. We show that authors are generally accurate in performing the task of identifying resources and supportive of the goals of the project. We also show that identifiability of the resources pre- and post-pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on reproducibility relating to research resources.« less
Kemppainen, Petri; Knight, Christopher G; Sarma, Devojit K; Hlaing, Thaung; Prakash, Anil; Maung Maung, Yan Naung; Somboon, Pradya; Mahanta, Jagadish; Walton, Catherine
2015-09-01
Recent advances in sequencing allow population-genomic data to be generated for virtually any species. However, approaches to analyse such data lag behind the ability to generate it, particularly in nonmodel species. Linkage disequilibrium (LD, the nonrandom association of alleles from different loci) is a highly sensitive indicator of many evolutionary phenomena including chromosomal inversions, local adaptation and geographical structure. Here, we present linkage disequilibrium network analysis (LDna), which accesses information on LD shared between multiple loci genomewide. In LD networks, vertices represent loci, and connections between vertices represent the LD between them. We analysed such networks in two test cases: a new restriction-site-associated DNA sequence (RAD-seq) data set for Anopheles baimaii, a Southeast Asian malaria vector; and a well-characterized single nucleotide polymorphism (SNP) data set from 21 three-spined stickleback individuals. In each case, we readily identified five distinct LD network clusters (single-outlier clusters, SOCs), each comprising many loci connected by high LD. In A. baimaii, further population-genetic analyses supported the inference that each SOC corresponds to a large inversion, consistent with previous cytological studies. For sticklebacks, we inferred that each SOC was associated with a distinct evolutionary phenomenon: two chromosomal inversions, local adaptation, population-demographic history and geographic structure. LDna is thus a useful exploratory tool, able to give a global overview of LD associated with diverse evolutionary phenomena and identify loci potentially involved. LDna does not require a linkage map or reference genome, so it is applicable to any population-genomic data set, making it especially valuable for nonmodel species. © 2015 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
Yan, Li; Liu, Song; Tang, Li; Hu, Qiang; Morrison, Carl D.; Ambrosone, Christine B.; Higgins, Michael J.; Sucheston-Campbell, Lara E.
2017-01-01
Background DNA from archival formalin-fixed and paraffin embedded (FFPE) tissue is an invaluable resource for genome-wide methylation studies although concerns about poor quality may limit its use. In this study, we compared DNA methylation profiles of breast tumors using DNA from fresh-frozen (FF) tissues and three types of matched FFPE samples. Results For 9/10 patients, correlation and unsupervised clustering analysis revealed that the FF and FFPE samples were consistently correlated with each other and clustered into distinct subgroups. Greater than 84% of the top 100 loci previously shown to differentiate ER+ and ER– tumors in FF tissues were also FFPE DML. Weighted Correlation Gene Network Analyses (WCGNA) grouped the DML loci into 16 modules in FF tissue, with ~85% of the module membership preserved across tissue types. Materials and Methods Restored FFPE and matched FF samples were profiled using the Illumina Infinium HumanMethylation450K platform. Methylation levels (β-values) across all loci and the top 100 loci previously shown to differentiate tumors by estrogen receptor status (ER+ or ER−) in a larger FF study, were compared between matched FF and FFPE samples using Pearson's correlation, hierarchical clustering and WCGNA. Positive predictive values and sensitivity levels for detecting differentially methylated loci (DML) in FF samples were calculated in an independent FFPE cohort. Conclusions FFPE breast tumors samples show lower overall detection of DMLs versus FF, however FFPE and FF DMLs compare favorably. These results support the emerging consensus that the 450K platform can be employed to investigate epigenetics in large sets of archival FFPE tissues. PMID:28118602
Exploring Short Linear Motifs Using the ELM Database and Tools.
Gouw, Marc; Sámano-Sánchez, Hugo; Van Roey, Kim; Diella, Francesca; Gibson, Toby J; Dinkel, Holger
2017-06-27
The Eukaryotic Linear Motif (ELM) resource is dedicated to the characterization and prediction of short linear motifs (SLiMs). SLiMs are compact, degenerate peptide segments found in many proteins and essential to almost all cellular processes. However, despite their abundance, SLiMs remain largely uncharacterized. The ELM database is a collection of manually annotated SLiM instances curated from experimental literature. In this article we illustrate how to browse and search the database for curated SLiM data, and cover the different types of data integrated in the resource. We also cover how to use this resource in order to predict SLiMs in known as well as novel proteins, and how to interpret the results generated by the ELM prediction pipeline. The ELM database is a very rich resource, and in the following protocols we give helpful examples to demonstrate how this knowledge can be used to improve your own research. © 2017 by John Wiley & Sons, Inc. Copyright © 2017 John Wiley & Sons, Inc.
Bushakra, Jill M; Lewers, Kim S; Staton, Margaret E; Zhebentyayeva, Tetyana; Saski, Christopher A
2015-10-26
Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed sequence tags (ESTs) are a source of SSRs that can be used to develop markers to facilitate plant breeding and for more basic research across genera and higher plant orders. Leaf and meristem tissue from 'Heritage' red raspberry (Rubus idaeus) and 'Bristol' black raspberry (R. occidentalis) were utilized for RNA extraction. After conversion to cDNA and library construction, ESTs were sequenced, quality verified, assembled and scanned for SSRs. Primers flanking the SSRs were designed and a subset tested for amplification, polymorphism and transferability across species. ESTs containing SSRs were functionally annotated using the GenBank non-redundant (nr) database and further classified using the gene ontology database. To accelerate development of EST-SSRs in the genus Rubus (Rosaceae), 1149 and 2358 cDNA sequences were generated from red raspberry and black raspberry, respectively. The cDNA sequences were screened using rigorous filtering criteria which resulted in the identification of 121 and 257 SSR loci for red and black raspberry, respectively. Primers were designed from the surrounding sequences resulting in 131 and 288 primer pairs, respectively, as some sequences contained more than one SSR locus. Sequence analysis revealed that the SSR-containing genes span a diversity of functions and share more sequence identity with strawberry genes than with other Rosaceous species. This resource of Rubus-specific, gene-derived markers will facilitate the construction of linkage maps composed of transferable markers for studying and manipulating important traits in this economically important genus.
Defining the Human Macula Transcriptome and Candidate Retinal Disease Genes UsingEyeSAGE
Rickman, Catherine Bowes; Ebright, Jessica N.; Zavodni, Zachary J.; Yu, Ling; Wang, Tianyuan; Daiger, Stephen P.; Wistow, Graeme; Boon, Kathy; Hauser, Michael A.
2009-01-01
Purpose To develop large-scale, high-throughput annotation of the human macula transcriptome and to identify and prioritize candidate genes for inherited retinal dystrophies, based on ocular-expression profiles using serial analysis of gene expression (SAGE). Methods Two human retina and two retinal pigment epithelium (RPE)/choroid SAGE libraries made from matched macula or midperipheral retina and adjacent RPE/choroid of morphologically normal 28- to 66-year-old donors and a human central retina longSAGE library made from 41- to 66-year-old donors were generated. Their transcription profiles were entered into a relational database, EyeSAGE, including microarray expression profiles of retina and publicly available normal human tissue SAGE libraries. EyeSAGE was used to identify retina- and RPE-specific and -associated genes, and candidate genes for retina and RPE disease loci. Differential and/or cell-type specific expression was validated by quantitative and single-cell RT-PCR. Results Cone photoreceptor-associated gene expression was elevated in the macula transcription profiles. Analysis of the longSAGE retina tags enhanced tag-to-gene mapping and revealed alternatively spliced genes. Analysis of candidate gene expression tables for the identified Bardet-Biedl syndrome disease gene (BBS5) in the BBS5 disease region table yielded BBS5 as the top candidate. Compelling candidates for inherited retina diseases were identified. Conclusions The EyeSAGE database, combining three different gene-profiling platforms including the authors’ multidonor-derived retina/RPE SAGE libraries and existing single-donor retina/RPE libraries, is a powerful resource for definition of the retina and RPE transcriptomes. It can be used to identify retina-specific genes, including alternatively spliced transcripts and to prioritize candidate genes within mapped retinal disease regions. PMID:16723438
Defining the human macula transcriptome and candidate retinal disease genes using EyeSAGE.
Bowes Rickman, Catherine; Ebright, Jessica N; Zavodni, Zachary J; Yu, Ling; Wang, Tianyuan; Daiger, Stephen P; Wistow, Graeme; Boon, Kathy; Hauser, Michael A
2006-06-01
To develop large-scale, high-throughput annotation of the human macula transcriptome and to identify and prioritize candidate genes for inherited retinal dystrophies, based on ocular-expression profiles using serial analysis of gene expression (SAGE). Two human retina and two retinal pigment epithelium (RPE)/choroid SAGE libraries made from matched macula or midperipheral retina and adjacent RPE/choroid of morphologically normal 28- to 66-year-old donors and a human central retina longSAGE library made from 41- to 66-year-old donors were generated. Their transcription profiles were entered into a relational database, EyeSAGE, including microarray expression profiles of retina and publicly available normal human tissue SAGE libraries. EyeSAGE was used to identify retina- and RPE-specific and -associated genes, and candidate genes for retina and RPE disease loci. Differential and/or cell-type specific expression was validated by quantitative and single-cell RT-PCR. Cone photoreceptor-associated gene expression was elevated in the macula transcription profiles. Analysis of the longSAGE retina tags enhanced tag-to-gene mapping and revealed alternatively spliced genes. Analysis of candidate gene expression tables for the identified Bardet-Biedl syndrome disease gene (BBS5) in the BBS5 disease region table yielded BBS5 as the top candidate. Compelling candidates for inherited retina diseases were identified. The EyeSAGE database, combining three different gene-profiling platforms including the authors' multidonor-derived retina/RPE SAGE libraries and existing single-donor retina/RPE libraries, is a powerful resource for definition of the retina and RPE transcriptomes. It can be used to identify retina-specific genes, including alternatively spliced transcripts and to prioritize candidate genes within mapped retinal disease regions.
The PDI genes of wheat and their syntenic relationship to the esp2 locus of rice.
Johnson, Joshua C; Appels, Rudi; Bhave, Mrinal
2006-04-01
The storage protein polymers in the endosperm, stabilised by disulphide bonds, determine a number of processing qualities of wheat dough. The enzyme protein disulphide isomerase (PDI), involved in the formation of disulphide bonds, is strongly suggested to play a role in the formation of wheat storage protein bodies. Reports of the rice mutant esp2 exhibiting aberrant storage protein deposition in conjunction with a lack of PDI expression provided strong indications of a direct role for PDI in storage protein deposition. The potential significance of wheat PDI prompted the present studies into exploring any orthology between wheat PDI genes and rice PDI and esp2 loci. By designing allele-specific (AS)-polymerase chain reaction (PCR) markers, two of the three wheat PDI genes could be genetically mapped to group 4 chromosomes and showed close association with GERMIN genes. Physical mapping led to localisation of wheat PDI genes to chromosomal "bins" on the proximal section of chromosome 4AL and distal sections of 4BS and 4DS. Identification of the putative PDI gene of rice and its comparison to the esp2 locus revealed that they were present at similar positions on the short arm of chromosome 11. Analysis of a large section of the PDI-containing section of rice chromosome 11S revealed a number of putative orthologues from The Institute for Genomic Research Triticum aestivum Gene Index database, of which five had been mapped, each localising to group 4 chromosomes, many in good agreement with our mapping results. The results strongly suggest a close linkage between the esp2 marker and the PDI gene of rice and an orthology between the PDI loci of rice and wheat and predict quantitative-trait loci involved in storage protein deposition at the PDI loci.
Zhang, Jing; Malo, Danielle; Mott, Richard; Panthier, Jean-Jacques; Montagutelli, Xavier; Jaubert, Jean
2018-04-27
Salmonella is a Gram-negative bacterium causing a wide range of clinical syndromes ranging from typhoid fever to diarrheic disease. Non-typhoidal Salmonella (NTS) serovars infect humans and animals, causing important health burden in the world. Susceptibility to salmonellosis varies between individuals under the control of host genes, as demonstrated by the identification of over 20 genetic loci in various mouse crosses. We have investigated the host response to S. Typhimurium infection in 35 Collaborative Cross (CC) strains, a genetic population which involves wild-derived strains that had not been previously assessed. One hundred and forty-eight mice from 35 CC strains were challenged intravenously with 1000 colony-forming units (CFUs) of S. Typhimurium. Bacterial load was measured in spleen and liver at day 4 post-infection. CC strains differed significantly (P < 0.0001) in spleen and liver bacterial loads, while sex and age had no effect. Two significant quantitative trait loci (QTLs) on chromosomes 8 and 10 and one suggestive QTL on chromosome 1 were found for spleen bacterial load, while two suggestive QTLs on chromosomes 6 and 17 were found for liver bacterial load. These QTLs are caused by distinct allelic patterns, principally involving alleles originating from the wild-derived founders. Using sequence variations between the eight CC founder strains combined with database mining for expression in target organs and known immune phenotypes, we were able to refine the QTLs intervals and establish a list of the most promising candidate genes. Furthermore, we identified one strain, CC042/GeniUnc (CC042), as highly susceptible to S. Typhimurium infection. By exploring a broader genetic variation, the Collaborative Cross population has revealed novel loci of resistance to Salmonella Typhimurium. It also led to the identification of CC042 as an extremely susceptible strain.
Finding alternatives when a major database is gone*
Hu, Estelle
2016-01-01
Question What to do when a major database ceases publication? Setting An urban, academic health sciences library with four campuses serves a university health sciences system, a college of medicine, and five other health sciences colleges. Methods Usage statistics of each e-book title in the resource were carefully analyzed. Purchase decisions were made based on the assessment of usage. Results Sustainable resources were acquired from other vendors, with perpetual access for library users. Conclusion This systematic process of finding alternative resources is an example of librarians' persistence in acquiring perpetual electronic resources when a major resource is cancelled. PMID:27076804
A HapMap harvest of insights into the genetics of common disease
Manolio, Teri A.; Brooks, Lisa D.; Collins, Francis S.
2008-01-01
The International HapMap Project was designed to create a genome-wide database of patterns of human genetic variation, with the expectation that these patterns would be useful for genetic association studies of common diseases. This expectation has been amply fulfilled with just the initial output of genome-wide association studies, identifying nearly 100 loci for nearly 40 common diseases and traits. These associations provided new insights into pathophysiology, suggesting previously unsuspected etiologic pathways for common diseases that will be of use in identifying new therapeutic targets and developing targeted interventions based on genetically defined risk. In addition, HapMap-based discoveries have shed new light on the impact of evolutionary pressures on the human genome, suggesting multiple loci important for adapting to disease-causing pathogens and new environments. In this review we examine the origin, development, and current status of the HapMap; its prospects for continued evolution; and its current and potential future impact on biomedical science. PMID:18451988
Tacutu, Robi; Craig, Thomas; Budovsky, Arie; Wuttke, Daniel; Lehmann, Gilad; Taranukha, Dmitri; Costa, Joana; Fraifeld, Vadim E.; de Magalhães, João Pedro
2013-01-01
The Human Ageing Genomic Resources (HAGR, http://genomics.senescence.info) is a freely available online collection of research databases and tools for the biology and genetics of ageing. HAGR features now several databases with high-quality manually curated data: (i) GenAge, a database of genes associated with ageing in humans and model organisms; (ii) AnAge, an extensive collection of longevity records and complementary traits for >4000 vertebrate species; and (iii) GenDR, a newly incorporated database, containing both gene mutations that interfere with dietary restriction-mediated lifespan extension and consistent gene expression changes induced by dietary restriction. Since its creation about 10 years ago, major efforts have been undertaken to maintain the quality of data in HAGR, while further continuing to develop, improve and extend it. This article briefly describes the content of HAGR and details the major updates since its previous publications, in terms of both structure and content. The completely redesigned interface, more intuitive and more integrative of HAGR resources, is also presented. Altogether, we hope that through its improvements, the current version of HAGR will continue to provide users with the most comprehensive and accessible resources available today in the field of biogerontology. PMID:23193293
A comprehensive view of the web-resources related to sericulture
Singh, Deepika; Chetia, Hasnahana; Kabiraj, Debajyoti; Sharma, Swagata; Kumar, Anil; Sharma, Pragya; Deka, Manab; Bora, Utpal
2016-01-01
Recent progress in the field of sequencing and analysis has led to a tremendous spike in data and the development of data science tools. One of the outcomes of this scientific progress is development of numerous databases which are gaining popularity in all disciplines of biology including sericulture. As economically important organism, silkworms are studied extensively for their numerous applications in the field of textiles, biomaterials, biomimetics, etc. Similarly, host plants, pests, pathogens, etc. are also being probed to understand the seri-resources more efficiently. These studies have led to the generation of numerous seri-related databases which are extremely helpful for the scientific community. In this article, we have reviewed all the available online resources on silkworm and its related organisms, including databases as well as informative websites. We have studied their basic features and impact on research through citation count analysis, finally discussing the role of emerging sequencing and analysis technologies in the field of seri-data science. As an outcome of this review, a web portal named SeriPort, has been created which will act as an index for the various sericulture-related databases and web resources available in cyberspace. Database URL: http://www.seriport.in/ PMID:27307138
E-MSD: an integrated data resource for bioinformatics.
Velankar, S; McNeil, P; Mittard-Runte, V; Suarez, A; Barrell, D; Apweiler, R; Henrick, K
2005-01-01
The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the 'Structure Integration with Function, Taxonomy and Sequences (SIFTS)' initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group.
Dutta, Sutapa; Kumawat, Giriraj; Singh, Bikram P; Gupta, Deepak K; Singh, Sangeeta; Dogra, Vivek; Gaikwad, Kishor; Sharma, Tilak R; Raje, Ranjeet S; Bandhopadhya, Tapas K; Datta, Subhojit; Singh, Mahendra N; Bashasab, Fakrudin; Kulwal, Pawan; Wanjari, K B; K Varshney, Rajeev; Cook, Douglas R; Singh, Nagendra K
2011-01-20
Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥ 18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea.
2011-01-01
Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea. PMID:21251263
Cheng, Jiaowen; Zhao, Zicheng; Li, Bo; Qin, Cheng; Wu, Zhiming; Trejo-Saavedra, Diana L; Luo, Xirong; Cui, Junjie; Rivera-Bustamante, Rafael F; Li, Shuaicheng; Hu, Kailin
2016-01-07
The sequences of the full set of pepper genomes including nuclear, mitochondrial and chloroplast are now available for use. However, the overall of simple sequence repeats (SSR) distribution in these genomes and their practical implications for molecular marker development in Capsicum have not yet been described. Here, an average of 868,047.50, 45.50 and 30.00 SSR loci were identified in the nuclear, mitochondrial and chloroplast genomes of pepper, respectively. Subsequently, systematic comparisons of various species, genome types, motif lengths, repeat numbers and classified types were executed and discussed. In addition, a local database composed of 113,500 in silico unique SSR primer pairs was built using a homemade bioinformatics workflow. As a pilot study, 65 polymorphic markers were validated among a wide collection of 21 Capsicum genotypes with allele number and polymorphic information content value per marker raging from 2 to 6 and 0.05 to 0.64, respectively. Finally, a comparison of the clustering results with those of a previous study indicated the usability of the newly developed SSR markers. In summary, this first report on the comprehensive characterization of SSR motifs in pepper genomes and the very large set of SSR primer pairs will benefit various genetic studies in Capsicum.
Cheng, Jiaowen; Zhao, Zicheng; Li, Bo; Qin, Cheng; Wu, Zhiming; Trejo-Saavedra, Diana L.; Luo, Xirong; Cui, Junjie; Rivera-Bustamante, Rafael F.; Li, Shuaicheng; Hu, Kailin
2016-01-01
The sequences of the full set of pepper genomes including nuclear, mitochondrial and chloroplast are now available for use. However, the overall of simple sequence repeats (SSR) distribution in these genomes and their practical implications for molecular marker development in Capsicum have not yet been described. Here, an average of 868,047.50, 45.50 and 30.00 SSR loci were identified in the nuclear, mitochondrial and chloroplast genomes of pepper, respectively. Subsequently, systematic comparisons of various species, genome types, motif lengths, repeat numbers and classified types were executed and discussed. In addition, a local database composed of 113,500 in silico unique SSR primer pairs was built using a homemade bioinformatics workflow. As a pilot study, 65 polymorphic markers were validated among a wide collection of 21 Capsicum genotypes with allele number and polymorphic information content value per marker raging from 2 to 6 and 0.05 to 0.64, respectively. Finally, a comparison of the clustering results with those of a previous study indicated the usability of the newly developed SSR markers. In summary, this first report on the comprehensive characterization of SSR motifs in pepper genomes and the very large set of SSR primer pairs will benefit various genetic studies in Capsicum. PMID:26739748
Tissue-specific patterns of allelically-skewed DNA methylation
Marzi, Sarah J.; Meaburn, Emma L.; Dempster, Emma L.; Lunnon, Katie; Paya-Cano, Jose L.; Smith, Rebecca G.; Volta, Manuela; Troakes, Claire; Schalkwyk, Leonard C.; Mill, Jonathan
2016-01-01
ABSTRACT While DNA methylation is usually thought to be symmetrical across both alleles, there are some notable exceptions. Genomic imprinting and X chromosome inactivation are two well-studied sources of allele-specific methylation (ASM), but recent research has indicated a more complex pattern in which genotypic variation can be associated with allelically-skewed DNA methylation in cis. Given the known heterogeneity of DNA methylation across tissues and cell types we explored inter- and intra-individual variation in ASM across several regions of the human brain and whole blood from multiple individuals. Consistent with previous studies, we find widespread ASM with > 4% of the ∼220,000 loci interrogated showing evidence of allelically-skewed DNA methylation. We identify ASM flanking known imprinted regions, and show that ASM sites are enriched in DNase I hypersensitivity sites and often located in an extended genomic context of intermediate DNA methylation. We also detect examples of genotype-driven ASM, some of which are tissue-specific. These findings contribute to our understanding of the nature of differential DNA methylation across tissues and have important implications for genetic studies of complex disease. As a resource to the community, ASM patterns across each of the tissues studied are available in a searchable online database: http://epigenetics.essex.ac.uk/ASMBrainBlood. PMID:26786711
IMGT/GeneInfo: enhancing V(D)J recombination database accessibility
Baum, Thierry-Pascal; Pasqual, Nicolas; Thuderoz, Florence; Hierle, Vivien; Chaume, Denys; Lefranc, Marie-Paule; Jouvin-Marche, Evelyne; Marche, Patrice-Noël; Demongeot, Jacques
2004-01-01
IMGT/GeneInfo is a user-friendly online information system that provides information on data resulting from the complex mechanisms of immunoglobulin (IG) and T cell receptor (TR) V(D)J recombinations. For the first time, it is possible to visualize all the rearrangement parameters on a single page. IMGT/GeneInfo is part of the international ImMunoGeneTics information system® (IMGT), a high-quality integrated knowledge resource specializing in IG, TR, major histocompatibility complex (MHC), and related proteins of the immune system of human and other vertebrate species. The IMGT/GeneInfo system was developed by the TIMC and ICH laboratories (with the collaboration of LIGM), and is the first example of an external system being incorporated into IMGT. In this paper, we report the first part of this work. IMGT/GeneInfo_TR deals with the human and mouse TRA/TRD and TRB loci of the TR. Data handling and visualization are complementary to the current data and tools in IMGT, and will subsequently allow the modelling of V(D)J gene use, and thus, to predict non-standard recombination profiles which may eventually be found in conditions such as leukaemias or lymphomas. Access to IMGT/GeneInfo is free and can be found at http://imgt.cines.fr/GeneInfo. PMID:14681357
Lovell, Peter V; Huizinga, Nicole A; Getachew, Abel; Mees, Brianna; Friedrich, Samantha R; Wirthlin, Morgan; Mello, Claudio V
2018-05-18
Zebra finches are a major model organism for investigating mechanisms of vocal learning, a trait that enables spoken language in humans. The development of cDNA collections with expressed sequence tags (ESTs) and microarrays has allowed for extensive molecular characterizations of circuitry underlying vocal learning and production. However, poor database curation can lead to errors in transcriptome and bioinformatics analyses, limiting the impact of these resources. Here we used genomic alignments and synteny analysis for orthology verification to curate and reannotate ~ 35% of the oligonucleotides and corresponding ESTs/cDNAs that make-up Agilent microarrays for gene expression analysis in finches. We found that: (1) 5475 out of 43,084 oligos (a) failed to align to the zebra finch genome, (b) aligned to multiple loci, or (c) aligned to Chr_un only, and thus need to be flagged until a better genome assembly is available, or (d) reflect cloning artifacts; (2) Out of 9635 valid oligos examined further, 3120 were incorrectly named, including 1533 with no known orthologs; and (3) 2635 oligos required name update. The resulting curated dataset provides a reference for correcting gene identification errors in previous finch microarrays studies, and avoiding such errors in future studies.
Plechakova, Olga; Tranchant-Dubreuil, Christine; Benedet, Fabrice; Couderc, Marie; Tinaut, Alexandra; Viader, Véronique; De Block, Petra; Hamon, Perla; Campa, Claudine; de Kochko, Alexandre; Hamon, Serge; Poncet, Valérie
2009-01-01
Background In the past few years, functional genomics information has been rapidly accumulating on Rubiaceae species and especially on those belonging to the Coffea genus (coffee trees). An increasing number of expressed sequence tag (EST) data and EST- or genomic-derived microsatellite markers have been generated, together with Conserved Ortholog Set (COS) markers. This considerably facilitates comparative genomics or map-based genetic studies through the common use of orthologous loci across different species. Similar genomic information is available for e.g. tomato or potato, members of the Solanaceae family. Since both Rubiaceae and Solanaceae belong to the Euasterids I (lamiids) integration of information on genetic markers would be possible and lead to more efficient analyses and discovery of key loci involved in important traits such as fruit development, quality, and maturation, or adaptation. Our goal was to develop a comprehensive web data source for integrated information on validated orthologous markers in Rubiaceae. Description MoccaDB is an online MySQL-PHP driven relational database that houses annotated and/or mapped microsatellite markers in Rubiaceae. In its current release, the database stores 638 markers that have been defined on 259 ESTs and 379 genomic sequences. Marker information was retrieved from 11 published works, and completed with original data on 132 microsatellite markers validated in our laboratory. DNA sequences were derived from three Coffea species/hybrids. Microsatellite markers were checked for similarity, in vitro tested for cross-amplification and diversity/polymorphism status in up to 38 Rubiaceae species belonging to the Cinchonoideae and Rubioideae subfamilies. Functional annotation was provided and some markers associated with described metabolic pathways were also integrated. Users can search the database for marker, sequence, map or diversity information through multi-option query forms. The retrieved data can be browsed and downloaded, along with protocols used, using a standard web browser. MoccaDB also integrates bioinformatics tools (CMap viewer and local BLAST) and hyperlinks to related external data sources (NCBI GenBank and PubMed, SOL Genomic Network database). Conclusion We believe that MoccaDB will be extremely useful for all researchers working in the areas of comparative and functional genomics and molecular evolution, in general, and population analysis and association mapping of Rubiaceae and Solanaceae species, in particular. PMID:19788737
NATIVE HEALTH DATABASES: NATIVE HEALTH RESEARCH DATABASE (NHRD)
The Native Health Databases contain bibliographic information and abstracts of health-related articles, reports, surveys, and other resource documents pertaining to the health and health care of American Indians, Alaska Natives, and Canadian First Nations. The databases provide i...
A Novel Approach: Chemical Relational Databases, and the ...
Mutagenicity and carcinogenicity databases are crucial resources for toxicologists and regulators involved in chemicals risk assessment. Until recently, existing public toxicity databases have been constructed primarily as
... Schwannomatosis Database Schwannomatosis Resources NF Registry International Schwannomatosis Database Because of the small number of people that ... and how to treat it, the International Schwannomatosis Database (ISD) project is proposing to bring together people ...
The Importance of Biological Databases in Biological Discovery.
Baxevanis, Andreas D; Bateman, Alex
2015-06-19
Biological databases play a central role in bioinformatics. They offer scientists the opportunity to access a wide variety of biologically relevant data, including the genomic sequences of an increasingly broad range of organisms. This unit provides a brief overview of major sequence databases and portals, such as GenBank, the UCSC Genome Browser, and Ensembl. Model organism databases, including WormBase, The Arabidopsis Information Resource (TAIR), and those made available through the Mouse Genome Informatics (MGI) resource, are also covered. Non-sequence-centric databases, such as Online Mendelian Inheritance in Man (OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and Genomes (KEGG), are also discussed. Copyright © 2015 John Wiley & Sons, Inc.
Lee, Taein; Cheng, Chun-Huai; Ficklin, Stephen; Yu, Jing; Humann, Jodi; Main, Dorrie
2017-01-01
Abstract Tripal is an open-source database platform primarily used for development of genomic, genetic and breeding databases. We report here on the release of the Chado Loader, Chado Data Display and Chado Search modules to extend the functionality of the core Tripal modules. These new extension modules provide additional tools for (1) data loading, (2) customized visualization and (3) advanced search functions for supported data types such as organism, marker, QTL/Mendelian Trait Loci, germplasm, map, project, phenotype, genotype and their respective metadata. The Chado Loader module provides data collection templates in Excel with defined metadata and data loaders with front end forms. The Chado Data Display module contains tools to visualize each data type and the metadata which can be used as is or customized as desired. The Chado Search module provides search and download functionality for the supported data types. Also included are the tools to visualize map and species summary. The use of materialized views in the Chado Search module enables better performance as well as flexibility of data modeling in Chado, allowing existing Tripal databases with different metadata types to utilize the module. These Tripal Extension modules are implemented in the Genome Database for Rosaceae (rosaceae.org), CottonGen (cottongen.org), Citrus Genome Database (citrusgenomedb.org), Genome Database for Vaccinium (vaccinium.org) and the Cool Season Food Legume Database (coolseasonfoodlegume.org). Database URL: https://www.citrusgenomedb.org/, https://www.coolseasonfoodlegume.org/, https://www.cottongen.org/, https://www.rosaceae.org/, https://www.vaccinium.org/
Database resources for the Tuberculosis community
Lew, Jocelyne M.; Mao, Chunhong; Shukla, Maulik; Warren, Andrew; Will, Rebecca; Kuznetsov, Dmitry; Xenarios, Ioannis; Robertson, Brian D.; Gordon, Stephen V.; Schnappinger, Dirk; Cole, Stewart T.; Sobral, Bruno
2013-01-01
Summary Access to online repositories for genomic and associated “-omics” datasets is now an essential part of everyday research activity. It is important therefore that the Tuberculosis community is aware of the databases and tools available to them online, as well as for the database hosts to know what the needs of the research community are. One of the goals of the Tuberculosis Annotation Jamboree, held in Washington DC on March 7th–8th 2012, was therefore to provide an overview of the current status of three key Tuberculosis resources, TubercuList (tuberculist.epfl.ch), TB Database (www.tbdb.org), and Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org). Here we summarize some key updates and upcoming features in TubercuList, and provide an overview of the PATRIC site and its online tools for pathogen RNA-Seq analysis. PMID:23332401
Núñez, Carolina; Baeta, Miriam; Ibarbia, Nerea; Ortueta, Urko; Jiménez-Moreno, Susana; Blazquez-Caeiro, José Luis; Builes, Juan José; Herrera, Rene J; Martínez-Jarreta, Begoña; de Pancorbo, Marian M
2017-04-01
A Y-STR multiplex system has been developed with the purpose of complementing the widely used 17 Y-STR haplotyping (AmpFlSTR Y Filer® PCR Amplification kit) routinely employed in forensic and population genetic studies. This new multiplex system includes six additional STR loci (DYS576, DYS481, DYS549, DYS533, DYS570, and DYS643) to reach the 23 Y-STR of the PowerPlex® Y23 System. In addition, this kit includes the DYS456 and DYS385 loci for traceability purposes. Male samples from 625 individuals from ten worldwide populations were genotyped, including three sample sets from populations previously published with the 17 Y-STR system to expand their current data. Validation studies demonstrated good performance of the panel set in terms of concordance, sensitivity, and stability in the presence of inhibitors and artificially degraded DNA. The results obtained for haplotype diversity and discrimination capacity with this multiplex system were considerably high, providing further evidences of the suitability of this novel Y-STR system for forensic purposes. Thus, the use of this multiplex for samples previously genotyped with 17 Y-STRs will be an efficient and low-cost alternative to complete the set of 23 Y-STRs and improve allele databases for population and forensic purposes. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Moreno, Lilliana I; Brown, Alice L; Callaghan, Thomas F
2017-07-01
Rapid DNA platforms are fully integrated systems capable of producing and analyzing short tandem repeat (STR) profiles from reference sample buccal swabs in less than two hours. The technology requires minimal user interaction and experience making it possible for high quality profiles to be generated outside an accredited laboratory. The automated production of point of collection reference STR profiles could eliminate the time delay for shipment and analysis of arrestee samples at centralized laboratories. Furthermore, point of collection analysis would allow searching against profiles from unsolved crimes during the normal booking process once the infrastructure to immediately search the Combined DNA Index System (CODIS) database from the booking station is established. The DNAscan/ANDE™ Rapid DNA Analysis™ System developed by Network Biosystems was evaluated for robustness and reliability in the production of high quality reference STR profiles for database enrollment and searching applications. A total of 193 reference samples were assessed for concordance of the CODIS 13 loci. Studies to evaluate contamination, reproducibility, precision, stutter, peak height ratio, noise and sensitivity were also performed. The system proved to be robust, consistent and dependable. Results indicated an overall success rate of 75% for the 13 CODIS core loci and more importantly no incorrect calls were identified. The DNAscan/ANDE™ could be confidently used without human interaction in both laboratory and non-laboratory settings to generate reference profiles. Published by Elsevier B.V.
Lin, Hongli; Wang, Weisheng; Luo, Jiawei; Yang, Xuedong
2014-12-01
The aim of this study was to develop a personalized training system using the Lung Image Database Consortium (LIDC) and Image Database resource Initiative (IDRI) Database, because collecting, annotating, and marking a large number of appropriate computed tomography (CT) scans, and providing the capability of dynamically selecting suitable training cases based on the performance levels of trainees and the characteristics of cases are critical for developing a efficient training system. A novel approach is proposed to develop a personalized radiology training system for the interpretation of lung nodules in CT scans using the Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI) database, which provides a Content-Boosted Collaborative Filtering (CBCF) algorithm for predicting the difficulty level of each case of each trainee when selecting suitable cases to meet individual needs, and a diagnostic simulation tool to enable trainees to analyze and diagnose lung nodules with the help of an image processing tool and a nodule retrieval tool. Preliminary evaluation of the system shows that developing a personalized training system for interpretation of lung nodules is needed and useful to enhance the professional skills of trainees. The approach of developing personalized training systems using the LIDC/IDRL database is a feasible solution to the challenges of constructing specific training program in terms of cost and training efficiency. Copyright © 2014 AUR. Published by Elsevier Inc. All rights reserved.
Outline for Research in Large Data Base Resources.
ERIC Educational Resources Information Center
Kahn, Paul
This paper uses a hypothetical application entitled "VAPORTRAILS" to examine how an integrated application can be used to solve the problems of search and retrieval from a range of qualitatively different databases, and the organization of the resulting information into a personal database resource. In addition, four general classes of databases…
HOED: Hypermedia Online Educational Database.
ERIC Educational Resources Information Center
Duval, E.; Olivie, H.
This paper presents HOED, a distributed hypermedia client-server system for educational resources. The aim of HOED is to provide a library facility for hyperdocuments that is accessible via the world wide web. Its main application domain is education. The HOED database not only holds the educational resources themselves, but also data describing…
Primate Info Net Related Databases NCRR PrimateLit: A bibliographic database for primatology Top of any problems with this service. We welcome your feedback. The PrimateLit database is no longer being Resources, National Institutes of Health. The database is a collaborative project of the Wisconsin Primate
An international aerospace information system - A cooperative opportunity
NASA Technical Reports Server (NTRS)
Blados, Walter R.; Cotter, Gladys A.
1992-01-01
This paper presents for consideration new possibilities for uniting the various aerospace database efforts toward a cooperative international aerospace database initiative that can optimize the cost-benefit equation for all members. The development of astronautics and aeronautics in individual nations has led to initiatives for national aerospace databases. Technological developments in information technology and science, as well as the reality of scarce resources, makes it necessary to reconsider the mutually beneficial possibilities offered by cooperation and international resource sharing.
Zhu, H; Senalik, D; McCown, B H; Zeldin, E L; Speers, J; Hyman, J; Bassil, N; Hummer, K; Simon, P W; Zalapa, J E
2012-01-01
The American cranberry (Vaccinium macrocarpon Ait.) is a major commercial fruit crop in North America, but limited genetic resources have been developed for the species. Furthermore, the paucity of codominant DNA markers has hampered the advance of genetic research in cranberry and the Ericaceae family in general. Therefore, we used Roche 454 sequencing technology to perform low-coverage whole genome shotgun sequencing of the cranberry cultivar 'HyRed'. After de novo assembly, the obtained sequence covered 266.3 Mb of the estimated 540-590 Mb in cranberry genome. A total of 107,244 SSR loci were detected with an overall density across the genome of 403 SSR/Mb. The AG repeat was the most frequent motif in cranberry accounting for 35% of all SSRs and together with AAG and AAAT accounted for 46% of all loci discovered. To validate the SSR loci, we designed 96 primer-pairs using contig sequence data containing perfect SSR repeats, and studied the genetic diversity of 25 cranberry genotypes. We identified 48 polymorphic SSR loci with 2-15 alleles per locus for a total of 323 alleles in the 25 cranberry genotypes. Genetic clustering by principal coordinates and genetic structure analyzes confirmed the heterogeneous nature of cranberries. The parentage composition of several hybrid cultivars was evident from the structure analyzes. Whole genome shotgun 454 sequencing was a cost-effective and efficient way to identify numerous SSR repeats in the cranberry sequence for marker development.
Nassar, M K; Goraga, Z S; Brockmann, G A
2013-02-01
In this study, a genome scan was performed to detect genomic loci that affect fat deposition in white adipose tissues and muscles in 278 F (2) males of reciprocal crosses between the genetically and phenotypically extreme inbred chicken lines New Hampshire (NHI) and White Leghorn (WL77). Genome-wide highly significant quantitative trait loci (QTL) influencing fat deposition in white adipose tissues were found on GGA2 and 4. The peak QTL positions for different visceral and subcutaneous white adipose tissues were located between 41.4 and 112.4 Mb on GGA2 and between 76.2 and 78.7 Mb on GGA4, which explained 4.2-10.4% and 4.3-11.6% respectively of the phenotypic F (2) variances. Contrary to our expectations, the QTL allele descending from the lean line WL77 on GGA4 led to increased fat deposition. We suggest a transgressive action of the obesity allele only if it is not in the genetic background of the line WL77. Additional highly significant loci for subcutaneous adipose tissue mass were identified on GGA12 and 15. For intramuscular fat content, a suggestive QTL was located on GGA14. The analysed crosses provide a valuable resource for further fine mapping of fatness genes and subsequent gene discovery. © 2012 The Authors, Animal Genetics © 2012 Stichting International Foundation for Animal Genetics.
Jasinska, Anna J.; Zelaya, Ivette; Service, Susan K.; Peterson, Christine B.; Cantor, Rita M.; Choi, Oi-Wa; DeYoung, Joseph; Eskin, Eleazar; Fairbanks, Lynn A.; Fears, Scott; Furterer, Allison E.; Huang, Yu S.; Ramensky, Vasily; Schmitt, Christopher A.; Svardal, Hannes; Jorgensen, Matthew J.; Kaplan, Jay R.; Villar, Diego; Aken, Bronwen L.; Flicek, Paul; Nag, Rishi; Wong, Emily S.; Blangero, John; Dyer, Thomas D.; Bogomolov, Marina; Benjamini, Yoav; Weinstock, George M.; Dewar, Ken; Sabatti, Chiara; Wilson, Richard K.; Jentsch, J. David; Warren, Wesley; Coppola, Giovanni; Woods, Roger P.; Freimer, Nelson B.
2017-01-01
By analyzing multi-tissue gene expression and genome-wide genetic variation data in samples from a vervet monkey pedigree, we generated a transcriptome resource and produced the first catalogue of expression quantitative trait loci (eQTLs) in a non-human primate model. This catalogue contains more genome-wide significant eQTLs, per sample, than comparable human resources, and reveals sex and age-related expression patterns. Findings include a master regulatory locus that likely plays a role in immune function, and a locus regulating hippocampal long non-coding RNAs (lncRNAs), whose expression correlates with hippocampal volume. This resource will facilitate genetic investigation of quantitative traits, including brain and behavioral phenotypes relevant to neuropsychiatric disorders. PMID:29083405
FReD: the floral reflectance database--a web portal for analyses of flower colour.
Arnold, Sarah E J; Faruq, Samia; Savolainen, Vincent; McOwan, Peter W; Chittka, Lars
2010-12-10
Flower colour is of great importance in various fields relating to floral biology and pollinator behaviour. However, subjective human judgements of flower colour may be inaccurate and are irrelevant to the ecology and vision of the flower's pollinators. For precise, detailed information about the colours of flowers, a full reflectance spectrum for the flower of interest should be used rather than relying on such human assessments. The Floral Reflectance Database (FReD) has been developed to make an extensive collection of such data available to researchers. It is freely available at http://www.reflectance.co.uk. The database allows users to download spectral reflectance data for flower species collected from all over the world. These could, for example, be used in modelling interactions between pollinator vision and plant signals, or analyses of flower colours in various habitats. The database contains functions for calculating flower colour loci according to widely-used models of bee colour space, reflectance graphs of the spectra and an option to search for flowers with similar colours in bee colour space. The Floral Reflectance Database is a valuable new tool for researchers interested in the colours of flowers and their association with pollinator colour vision, containing raw spectral reflectance data for a large number of flower species.
bioDBnet - Biological Database Network
bioDBnet is a comprehensive resource of most of the biological databases available from different sites like NCBI, Uniprot, EMBL, Ensembl, Affymetrix. It provides a queryable interface to all the databases available, converts identifiers from one database into another and generates comprehensive reports.
2004-06-01
remote databases, has seen little vendor acceptance. Each database ( Oracle , DB2, MySQL , etc.) has its own client- server protocol. Therefore each...existing standards – SQL , X.500/LDAP, FTP, etc. • View information dissemination as selective replication – State-oriented vs . message-oriented...allowing the 8 application to start. The resource management system would serve as a broker to the resources, making sure that resources are not
DePriest, Adam D; Fiandalo, Michael V; Schlanger, Simon; Heemers, Frederike; Mohler, James L; Liu, Song; Heemers, Hannelore V
2016-01-01
Androgen receptor (AR) is a ligand-activated transcription factor that is the main target for treatment of non-organ-confined prostate cancer (CaP). Failure of life-prolonging AR-targeting androgen deprivation therapy is due to flexibility in steroidogenic pathways that control intracrine androgen levels and variability in the AR transcriptional output. Androgen biosynthesis enzymes, androgen transporters and AR-associated coregulators are attractive novel CaP treatment targets. These proteins, however, are characterized by multiple transcript variants and isoforms, are subject to genomic alterations, and are differentially expressed among CaPs. Determining their therapeutic potential requires evaluation of extensive, diverse datasets that are dispersed over multiple databases, websites and literature reports. Mining and integrating these datasets are cumbersome, time-consuming tasks and provide only snapshots of relevant information. To overcome this impediment to effective, efficient study of AR and potential drug targets, we developed the Regulators of Androgen Action Resource (RAAR), a non-redundant, curated and user-friendly searchable web interface. RAAR centralizes information on gene function, clinical relevance, and resources for 55 genes that encode proteins involved in biosynthesis, metabolism and transport of androgens and for 274 AR-associated coregulator genes. Data in RAAR are organized in two levels: (i) Information pertaining to production of androgens is contained in a 'pre-receptor level' database, and coregulator gene information is provided in a 'post-receptor level' database, and (ii) an 'other resources' database contains links to additional databases that are complementary to and useful to pursue further the information provided in RAAR. For each of its 329 entries, RAAR provides access to more than 20 well-curated publicly available databases, and thus, access to thousands of data points. Hyperlinks provide direct access to gene-specific entries in the respective database(s). RAAR is a novel, freely available resource that provides fast, reliable and easy access to integrated information that is needed to develop alternative CaP therapies. Database URL: http://www.lerner.ccf.org/cancerbio/heemers/RAAR/search/. © The Author(s) 2016. Published by Oxford University Press.
A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation.
Howe, Glenn T; Yu, Jianbin; Knaus, Brian; Cronn, Richard; Kolpak, Scott; Dolan, Peter; Lorenz, W Walter; Dean, Jeffrey F D
2013-02-28
Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array-more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to climate change.
A SNP resource for Douglas-fir: de novo transcriptome assembly and SNP detection and validation
2013-01-01
Background Douglas-fir (Pseudotsuga menziesii), one of the most economically and ecologically important tree species in the world, also has one of the largest tree breeding programs. Although the coastal and interior varieties of Douglas-fir (vars. menziesii and glauca) are native to North America, the coastal variety is also widely planted for timber production in Europe, New Zealand, Australia, and Chile. Our main goal was to develop a SNP resource large enough to facilitate genomic selection in Douglas-fir breeding programs. To accomplish this, we developed a 454-based reference transcriptome for coastal Douglas-fir, annotated and evaluated the quality of the reference, identified putative SNPs, and then validated a sample of those SNPs using the Illumina Infinium genotyping platform. Results We assembled a reference transcriptome consisting of 25,002 isogroups (unique gene models) and 102,623 singletons from 2.76 million 454 and Sanger cDNA sequences from coastal Douglas-fir. We identified 278,979 unique SNPs by mapping the 454 and Sanger sequences to the reference, and by mapping four datasets of Illumina cDNA sequences from multiple seed sources, genotypes, and tissues. The Illumina datasets represented coastal Douglas-fir (64.00 and 13.41 million reads), interior Douglas-fir (80.45 million reads), and a Yakima population similar to interior Douglas-fir (8.99 million reads). We assayed 8067 SNPs on 260 trees using an Illumina Infinium SNP genotyping array. Of these SNPs, 5847 (72.5%) were called successfully and were polymorphic. Conclusions Based on our validation efficiency, our SNP database may contain as many as ~200,000 true SNPs, and as many as ~69,000 SNPs that could be genotyped at ~20,000 gene loci using an Infinium II array—more SNPs than are needed to use genomic selection in tree breeding programs. Ultimately, these genomic resources will enhance Douglas-fir breeding and allow us to better understand landscape-scale patterns of genetic variation and potential responses to climate change. PMID:23445355
NASA Astrophysics Data System (ADS)
Viegas, F.; Malon, D.; Cranshaw, J.; Dimitrov, G.; Nowak, M.; Nairz, A.; Goossens, L.; Gallas, E.; Gamboa, C.; Wong, A.; Vinek, E.
2010-04-01
The TAG files store summary event quantities that allow a quick selection of interesting events. This data will be produced at a nominal rate of 200 Hz, and is uploaded into a relational database for access from websites and other tools. The estimated database volume is 6TB per year, making it the largest application running on the ATLAS relational databases, at CERN and at other voluntary sites. The sheer volume and high rate of production makes this application a challenge to data and resource management, in many aspects. This paper will focus on the operational challenges of this system. These include: uploading the data from files to the CERN's and remote sites' databases; distributing the TAG metadata that is essential to guide the user through event selection; controlling resource usage of the database, from the user query load to the strategy of cleaning and archiving of old TAG data.
Kılıç, Sefa; Sagitova, Dinara M; Wolfish, Shoshannah; Bely, Benoit; Courtot, Mélanie; Ciufo, Stacy; Tatusova, Tatiana; O'Donovan, Claire; Chibucos, Marcus C; Martin, Maria J; Erill, Ivan
2016-01-01
Domain-specific databases are essential resources for the biomedical community, leveraging expert knowledge to curate published literature and provide access to referenced data and knowledge. The limited scope of these databases, however, poses important challenges on their infrastructure, visibility, funding and usefulness to the broader scientific community. CollecTF is a community-oriented database documenting experimentally validated transcription factor (TF)-binding sites in the Bacteria domain. In its quest to become a community resource for the annotation of transcriptional regulatory elements in bacterial genomes, CollecTF aims to move away from the conventional data-repository paradigm of domain-specific databases. Through the adoption of well-established ontologies, identifiers and collaborations, CollecTF has progressively become also a portal for the annotation and submission of information on transcriptional regulatory elements to major biological sequence resources (RefSeq, UniProtKB and the Gene Ontology Consortium). This fundamental change in database conception capitalizes on the domain-specific knowledge of contributing communities to provide high-quality annotations, while leveraging the availability of stable information hubs to promote long-term access and provide high-visibility to the data. As a submission portal, CollecTF generates TF-binding site information through direct annotation of RefSeq genome records, definition of TF-based regulatory networks in UniProtKB entries and submission of functional annotations to the Gene Ontology. As a database, CollecTF provides enhanced search and browsing, targeted data exports, binding motif analysis tools and integration with motif discovery and search platforms. This innovative approach will allow CollecTF to focus its limited resources on the generation of high-quality information and the provision of specialized access to the data.Database URL: http://www.collectf.org/. © The Author(s) 2016. Published by Oxford University Press.
Resources | Division of Cancer Prevention
Manual of Operations Version 3, 12/13/2012 (PDF, 162KB) Database Sources Consortium for Functional Glycomics databases Design Studies Related to the Development of Distributed, Web-based European Carbohydrate Databases (EUROCarbDB) |
Perry, G.M.L.; King, T.L.; St. -Cyr, J.; Valcourt, M.; Bernatchez, L.
2005-01-01
The brook charr (Salvelinus fontinalis; Osteichthyes: Salmonidae) is a phenotypically diverse fish species inhabiting much of North America. But relatively few genetic diagnostic resources are available for this fish species. We isolated 41 microsatellites from S. fontinalis polymorphic in one or more species of salmonid fish. Thirty-seven were polymorphic in brook charr, 15 in the congener Arctic charr (Salvelinus alpinus) and 14 in the lake charr (Salvelinus namaycush). Polymorphism was also relatively high in Oncorhynchus, where 21 loci were polymorphic in rainbow trout (Oncorhynchus mykiss) and 16 in cutthroat trout (Oncorhynchus clarkii) but only seven and four microsatellite loci were polymorphic in the more distantly related lake whitefish (Coregonus clupeaformis) and Atlantic salmon (Salmo salar), respectively. One duplicated locus (Sfo228Lav) was polymorphic at both duplicates in S. fontinalis. ?? 2005 Blackwell Publishing Ltd.
Said, Joseph I; Knapka, Joseph A; Song, Mingzhou; Zhang, Jinfa
2015-08-01
A specialized database currently containing more than 2200 QTL is established, which allows graphic presentation, visualization and submission of QTL. In cotton quantitative trait loci (QTL), studies are focused on intraspecific Gossypium hirsutum and interspecific G. hirsutum × G. barbadense populations. These two populations are commercially important for the textile industry and are evaluated for fiber quality, yield, seed quality, resistance, physiological, and morphological trait QTL. With meta-analysis data based on the vast amount of QTL studies in cotton it will be beneficial to organize the data into a functional database for the cotton community. Here we provide a tool for cotton researchers to visualize previously identified QTL and submit their own QTL to the Cotton QTLdb database. The database provides the user with the option of selecting various QTL trait types from either the G. hirsutum or G. hirsutum × G. barbadense populations. Based on the user's QTL trait selection, graphical representations of chromosomes of the population selected are displayed in publication ready images. The database also provides users with trait information on QTL, LOD scores, and explained phenotypic variances for all QTL selected. The CottonQTLdb database provides cotton geneticist and breeders with statistical data on cotton QTL previously identified and provides a visualization tool to view QTL positions on chromosomes. Currently the database (Release 1) contains 2274 QTLs, and succeeding QTL studies will be updated regularly by the curators and members of the cotton community that contribute their data to keep the database current. The database is accessible from http://www.cottonqtldb.org.
The Resource Identification Initiative: A cultural shift in publishing
Brush, Matthew; Grethe, Jeffery S.; Haendel, Melissa A; Kennedy, David N.; Hill, Sean; Hof, Patrick R.; Martone, Maryann E.; Pols, Maaike; Tan, Serena C.; Washington, Nicole; Zudilova‐Seinstra, Elena; Vasilevsky, Nicole
2016-01-01
A central tenet in support of research reproducibility is the ability to uniquely identify research resources, i.e., reagents, tools, and materials that are used to perform experiments. However, current reporting practices for research resources are insufficient to identify the exact resources that are reported or to answer basic questions such as “How did other studies use resource X?” To address this issue, the Resource Identification Initiative was launched as a pilot project to improve the reporting standards for research resources in the Methods sections of articles and thereby improve identifiability and scientific reproducibility. The pilot engaged over 25 biomedical journal editors from most major publishers, as well as scientists and funding officials. Authors were asked to include Research Resource Identifiers (RRIDs) in their articles prior to publication for three resource types: antibodies, model organisms, and tools (i.e., software and databases). RRIDs are assigned by an authoritative database, for example, a model organism database for each type of resource. To make it easier for authors to obtain RRIDs, resources were aggregated from the appropriate databases and their RRIDs made available in a central Web portal (http://scicrunch.org/resources). RRIDs meet three key criteria: they are machine‐readable, free to generate and access, and are consistent across publishers and journals. The pilot was launched in February of 2014 and over 300 articles have appeared that report RRIDs. The number of journals participating has expanded from the original 25 to more than 40, with RRIDs appearing in 62 different journals to date. Here we present an overview of the pilot project and its outcomes to date. We show that authors are able to identify resources and are supportive of the goals of the project. Identifiability of the resources post‐pilot showed a dramatic improvement for all three resource types, suggesting that the project has had a significant impact on identifiability of research resources. J. Comp. Neurol. 524:8–22, 2016. © 2015 The Authors The Journal of Comparative Neurology Published by Wiley Periodicals, Inc. PMID:26599696
A Survey of Bioinformatics Database and Software Usage through Mining the Literature.
Duck, Geraint; Nenadic, Goran; Filannino, Michele; Brass, Andy; Robertson, David L; Stevens, Robert
2016-01-01
Computer-based resources are central to much, if not most, biological and medical research. However, while there is an ever expanding choice of bioinformatics resources to use, described within the biomedical literature, little work to date has provided an evaluation of the full range of availability or levels of usage of database and software resources. Here we use text mining to process the PubMed Central full-text corpus, identifying mentions of databases or software within the scientific literature. We provide an audit of the resources contained within the biomedical literature, and a comparison of their relative usage, both over time and between the sub-disciplines of bioinformatics, biology and medicine. We find that trends in resource usage differs between these domains. The bioinformatics literature emphasises novel resource development, while database and software usage within biology and medicine is more stable and conservative. Many resources are only mentioned in the bioinformatics literature, with a relatively small number making it out into general biology, and fewer still into the medical literature. In addition, many resources are seeing a steady decline in their usage (e.g., BLAST, SWISS-PROT), though some are instead seeing rapid growth (e.g., the GO, R). We find a striking imbalance in resource usage with the top 5% of resource names (133 names) accounting for 47% of total usage, and over 70% of resources extracted being only mentioned once each. While these results highlight the dynamic and creative nature of bioinformatics research they raise questions about software reuse, choice and the sharing of bioinformatics practice. Is it acceptable that so many resources are apparently never reused? Finally, our work is a step towards automated extraction of scientific method from text. We make the dataset generated by our study available under the CC0 license here: http://dx.doi.org/10.6084/m9.figshare.1281371.
E-MSD: an integrated data resource for bioinformatics
Velankar, S.; McNeil, P.; Mittard-Runte, V.; Suarez, A.; Barrell, D.; Apweiler, R.; Henrick, K.
2005-01-01
The Macromolecular Structure Database (MSD) group (http://www.ebi.ac.uk/msd/) continues to enhance the quality and consistency of macromolecular structure data in the worldwide Protein Data Bank (wwPDB) and to work towards the integration of various bioinformatics data resources. One of the major obstacles to the improved integration of structural databases such as MSD and sequence databases like UniProt is the absence of up to date and well-maintained mapping between corresponding entries. We have worked closely with the UniProt group at the EBI to clean up the taxonomy and sequence cross-reference information in the MSD and UniProt databases. This information is vital for the reliable integration of the sequence family databases such as Pfam and Interpro with the structure-oriented databases of SCOP and CATH. This information has been made available to the eFamily group (http://www.efamily.org.uk/) and now forms the basis of the regular interchange of information between the member databases (MSD, UniProt, Pfam, Interpro, SCOP and CATH). This exchange of annotation information has enriched the structural information in the MSD database with annotation from wider sequence-oriented resources. This work was carried out under the ‘Structure Integration with Function, Taxonomy and Sequences (SIFTS)’ initiative (http://www.ebi.ac.uk/msd-srv/docs/sifts) in the MSD group. PMID:15608192
RICD: a rice indica cDNA database resource for rice functional genomics.
Lu, Tingting; Huang, Xuehui; Zhu, Chuanrang; Huang, Tao; Zhao, Qiang; Xie, Kabing; Xiong, Lizhong; Zhang, Qifa; Han, Bin
2008-11-26
The Oryza sativa L. indica subspecies is the most widely cultivated rice. During the last few years, we have collected over 20,000 putative full-length cDNAs and over 40,000 ESTs isolated from various cDNA libraries of two indica varieties Guangluai 4 and Minghui 63. A database of the rice indica cDNAs was therefore built to provide a comprehensive web data source for searching and retrieving the indica cDNA clones. Rice Indica cDNA Database (RICD) is an online MySQL-PHP driven database with a user-friendly web interface. It allows investigators to query the cDNA clones by keyword, genome position, nucleotide or protein sequence, and putative function. It also provides a series of information, including sequences, protein domain annotations, similarity search results, SNPs and InDels information, and hyperlinks to gene annotation in both The Rice Annotation Project Database (RAP-DB) and The TIGR Rice Genome Annotation Resource, expression atlas in RiceGE and variation report in Gramene of each cDNA. The online rice indica cDNA database provides cDNA resource with comprehensive information to researchers for functional analysis of indica subspecies and for comparative genomics. The RICD database is available through our website http://www.ncgr.ac.cn/ricd.
Drill hole data for coal beds in the Powder River Basin, Montana and Wyoming
Haacke, Jon E.; Scott, David C.
2013-01-01
This report by the U.S. Geological Survey (USGS) of the Powder River Basin (PRB) of Montana and Wyoming is part of the U.S. Coal Resources and Reserves Assessment Project. Essential to that project was the creation of a comprehensive drill hole database that was used for coal bed correlation and for coal resource and reserve assessments in the PRB. This drill hole database was assembled using data from the USGS National Coal Resources Data System, several other Federal and State agencies, and selected mining companies. Additionally, USGS personnel manually entered lithologic picks into the database from geophysical logs of coalbed methane, oil, and gas wells. Of the 29,928 drill holes processed, records of 21,393 are in the public domain and are included in this report. The database contains location information, lithology, and coal bed names for each drill hole.
Database resources for the tuberculosis community.
Lew, Jocelyne M; Mao, Chunhong; Shukla, Maulik; Warren, Andrew; Will, Rebecca; Kuznetsov, Dmitry; Xenarios, Ioannis; Robertson, Brian D; Gordon, Stephen V; Schnappinger, Dirk; Cole, Stewart T; Sobral, Bruno
2013-01-01
Access to online repositories for genomic and associated "-omics" datasets is now an essential part of everyday research activity. It is important therefore that the Tuberculosis community is aware of the databases and tools available to them online, as well as for the database hosts to know what the needs of the research community are. One of the goals of the Tuberculosis Annotation Jamboree, held in Washington DC on March 7th-8th 2012, was therefore to provide an overview of the current status of three key Tuberculosis resources, TubercuList (tuberculist.epfl.ch), TB Database (www.tbdb.org), and Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org). Here we summarize some key updates and upcoming features in TubercuList, and provide an overview of the PATRIC site and its online tools for pathogen RNA-Seq analysis. Copyright © 2012 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Weiner, Sharon A.
2009-01-01
Access to scholarly information in the disciplines of education and medicine occurred primarily through the simultaneous development of two bibliographic databases. The Education Resource Information Center (ERIC) originated as a resource designed to be comprehensive in its inclusion of peer-reviewed and unpublished literature for the entire…
Faculty Views of Open Web Resource Use by College Students
ERIC Educational Resources Information Center
Tomaiuolo, Nicholas G.
2005-01-01
This article assesses both the extent of students' use of open Web resources and library subscription databases and professors' satisfaction with that use as reported by a survey of 120 community college and university English faculty. It was concluded that although library budgets allocate significant funds to offer subscription databases,…
ERMes: Open Source Simplicity for Your E-Resource Management
ERIC Educational Resources Information Center
Doering, William; Chilton, Galadriel
2009-01-01
ERMes, the latest version of electronic resource management system (ERM), is a relational database; content in different tables connects to, and works with, content in other tables. ERMes requires Access 2007 (Windows) or Access 2008 (Mac) to operate as the database utilizes functionality not available in previous versions of Microsoft Access. The…
What's in a Title? Gender Micro-Inequities in a University Human Resources Database
ERIC Educational Resources Information Center
Saporu, Darlene F.; Herbers, Joan M.
2015-01-01
Men and women are perceived differently, and those perceptions can be damaging in a professional context. Unconscious bias expressed within work environments can introduce "micro-inequities" that impede career progression for women compared to men. This study examines title prefixes for faculty in the human resources database of a large…
Helping Patrons Find Locally Held Electronic Resources: An Interlibrary Loan Perspective
ERIC Educational Resources Information Center
Johnston, Pamela
2016-01-01
The University of North Texas Libraries provide extensive online access to academic journals through major vendor databases. As illustrated by interlibrary loan borrowing requests for items held in our databases, patrons often have difficulty navigating the available resources. In this study, the Interlibrary Loan staff used data gathered from the…
Distributed Structure-Searchable Toxicity (DSSTox) Database Network: Making Public Toxicity Data Resources More Accessible and U sable for Data Exploration and SAR Development
Many sources of public toxicity data are not currently linked to chemical structure, are not ...
Reiser, Leonore; Berardini, Tanya Z; Li, Donghui; Muller, Robert; Strait, Emily M; Li, Qian; Mezheritsky, Yarik; Vetushko, Andrey; Huala, Eva
2016-01-01
Databases and data repositories provide essential functions for the research community by integrating, curating, archiving and otherwise packaging data to facilitate discovery and reuse. Despite their importance, funding for maintenance of these resources is increasingly hard to obtain. Fueled by a desire to find long term, sustainable solutions to database funding, staff from the Arabidopsis Information Resource (TAIR), founded the nonprofit organization, Phoenix Bioinformatics, using TAIR as a test case for user-based funding. Subscription-based funding has been proposed as an alternative to grant funding but its application has been very limited within the nonprofit sector. Our testing of this model indicates that it is a viable option, at least for some databases, and that it is possible to strike a balance that maximizes access while still incentivizing subscriptions. One year after transitioning to subscription support, TAIR is self-sustaining and Phoenix is poised to expand and support additional resources that wish to incorporate user-based funding strategies. Database URL: www.arabidopsis.org. © The Author(s) 2016. Published by Oxford University Press.
Berardini, Tanya Z.; Li, Donghui; Muller, Robert; Strait, Emily M.; Li, Qian; Mezheritsky, Yarik; Vetushko, Andrey; Huala, Eva
2016-01-01
Databases and data repositories provide essential functions for the research community by integrating, curating, archiving and otherwise packaging data to facilitate discovery and reuse. Despite their importance, funding for maintenance of these resources is increasingly hard to obtain. Fueled by a desire to find long term, sustainable solutions to database funding, staff from the Arabidopsis Information Resource (TAIR), founded the nonprofit organization, Phoenix Bioinformatics, using TAIR as a test case for user-based funding. Subscription-based funding has been proposed as an alternative to grant funding but its application has been very limited within the nonprofit sector. Our testing of this model indicates that it is a viable option, at least for some databases, and that it is possible to strike a balance that maximizes access while still incentivizing subscriptions. One year after transitioning to subscription support, TAIR is self-sustaining and Phoenix is poised to expand and support additional resources that wish to incorporate user-based funding strategies. Database URL: www.arabidopsis.org PMID:26989150
Carlson, Mary H.; Zientek, Michael L.; Causey, J. Douglas; Kayser, Helen Z.; Spanski, Gregory T.; Wilson, Anna B.; Van Gosen, Bradley S.; Trautwein, Charles M.
2007-01-01
This report compiles selected results from 13 U.S. Geological Survey (USGS) mineral resource assessment studies conducted in Idaho and Montana into consistent spatial databases that can be used in a geographic information system. The 183 spatial databases represent areas of mineral potential delineated in these studies and include attributes on mineral deposit type, level of mineral potential, certainty, and a reference. The assessments were conducted for five 1? x 2? quadrangles (Butte, Challis, Choteau, Dillon, and Wallace), several U.S. Forest Service (USFS) National Forests (including Challis, Custer, Gallatin, Helena, and Payette), and one Bureau of Land Management (BLM) Resource Area (Dillon). The data contained in the spatial databases are based on published information: no new interpretations are made. This digital compilation is part of an ongoing effort to provide mineral resource information formatted for use in spatial analysis. In particular, this is one of several reports prepared to address USFS needs for science information as forest management plans are revised in the Northern Rocky Mountains.
SSWAP: A Simple Semantic Web Architecture and Protocol for semantic web services
Gessler, Damian DG; Schiltz, Gary S; May, Greg D; Avraham, Shulamit; Town, Christopher D; Grant, David; Nelson, Rex T
2009-01-01
Background SSWAP (Simple Semantic Web Architecture and Protocol; pronounced "swap") is an architecture, protocol, and platform for using reasoning to semantically integrate heterogeneous disparate data and services on the web. SSWAP was developed as a hybrid semantic web services technology to overcome limitations found in both pure web service technologies and pure semantic web technologies. Results There are currently over 2400 resources published in SSWAP. Approximately two dozen are custom-written services for QTL (Quantitative Trait Loci) and mapping data for legumes and grasses (grains). The remaining are wrappers to Nucleic Acids Research Database and Web Server entries. As an architecture, SSWAP establishes how clients (users of data, services, and ontologies), providers (suppliers of data, services, and ontologies), and discovery servers (semantic search engines) interact to allow for the description, querying, discovery, invocation, and response of semantic web services. As a protocol, SSWAP provides the vocabulary and semantics to allow clients, providers, and discovery servers to engage in semantic web services. The protocol is based on the W3C-sanctioned first-order description logic language OWL DL. As an open source platform, a discovery server running at (as in to "swap info") uses the description logic reasoner Pellet to integrate semantic resources. The platform hosts an interactive guide to the protocol at , developer tools at , and a portal to third-party ontologies at (a "swap meet"). Conclusion SSWAP addresses the three basic requirements of a semantic web services architecture (i.e., a common syntax, shared semantic, and semantic discovery) while addressing three technology limitations common in distributed service systems: i.e., i) the fatal mutability of traditional interfaces, ii) the rigidity and fragility of static subsumption hierarchies, and iii) the confounding of content, structure, and presentation. SSWAP is novel by establishing the concept of a canonical yet mutable OWL DL graph that allows data and service providers to describe their resources, to allow discovery servers to offer semantically rich search engines, to allow clients to discover and invoke those resources, and to allow providers to respond with semantically tagged data. SSWAP allows for a mix-and-match of terms from both new and legacy third-party ontologies in these graphs. PMID:19775460
National Rehabilitation Information Center
... search the NARIC website or one of our databases Select a database or search for a webpage A NARIC webpage ... Projects conducting research and/or development (NIDILRR Program Database). Organizations, agencies, and online resources that support people ...
Environmental Health and Toxicology Resources of the United States National Library of Medicine
Hochstein, Colette; Arnesen, Stacey; Goshorn, Jeanne
2009-01-01
For over 40 years, the National Library of Medicine’s (NLM) Toxicology and Environmental Health Information Program (TEHIP) has worked to organize and to provide access to an extensive array of environmental health and toxicology resources. During these years, the TEHIP program has evolved from a handful of databases developed primarily for researchers to a broad range of products and services that also serve industry, students, and the general public. TEHIP’s resources include TOXNET®
Measuring health system resource use for economic evaluation: a comparison of data sources.
Pollicino, Christine; Viney, Rosalie; Haas, Marion
2002-01-01
A key challenge for evaluators and health system planners is the identification, measurement and valuation of resource use for economic evaluation. Accurately capturing all significant resource use is particularly difficult in the Australian context where there is no comprehensive database from which researchers can draw. Evaluators and health system planners need to consider different approaches to data collection for estimating resource use for economic evaluation, and the relative merits of the different data sources available. This paper illustrates the issues that arise in using different data sources using a sub-sample of the data being collected for an economic evaluation. Specifically, it compares the use of Australia's largest administrative database on resource use, the Health Insurance Commission database, with the use of patient-supplied data. The extent of agreement and discrepancies between the two data sources is investigated. Findings from this study and recommendations as to how to deal with different data sources are presented.
NASA Astrophysics Data System (ADS)
Huang, Pei; Wu, Sangyun; Feng, Aiping; Guo, Yacheng
2008-10-01
As littoral areas in possession of concentrated population, abundant resources, developed industry and active economy, the coastal areas are bound to become the forward positions and supported regions for marine exploitation. In the 21st century, the pressure that coastal zones are faced with is as follows: growth of population and urbanization, rise of sea level and coastal erosion, shortage of freshwater resource and deterioration of water resource, and degradation of fishery resource and so on. So the resources of coastal zones should be programmed and used reasonably for the sustainable development of economy and environment. This paper proposes a design research on the construction of coastal zone planning and management information system based on GIS and database technologies. According to this system, the planning results of coastal zones could be queried and displayed expediently through the system interface. It is concluded that the integrated application of GIS and database technologies provides a new modern method for the management of coastal zone resources, and makes it possible to ensure the rational development and utilization of the coastal zone resources, along with the sustainable development of economy and environment.
HLA-DRB1, -DQA1 and -DQB1 genotyping of 180 Czech individuals from the Czech Republic pop 3.
Zajacova, Marta; Kotrbova-Kozak, Anna; Cerna, Marie
2016-04-01
One hundred and eighty Czech individuals from the Czech Republic pop 3 were genotyped at the HLA-DRB1, -DQA1 and -DQB1 loci using sequence-specific primers PCR methods. HLA-DRB1, -DQA1 and -DQB1 genotypes are consistent with expected Hardy-Weinberg (HW) proportions. These genotype data are available in the Allele Frequencies Net Database under identifier AFND. Copyright © 2016 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.
Brant, Steven R.; Okou, David T.; Simpson, Claire L.; Cutler, David J.; Haritunians, Talin; Bradfield, Jonathan P.; Chopra, Pankaj; Prince, Jarod; Begum, Ferdouse; Kumar, Archana; Huang, Chengrui; Venkateswaran, Suresh; Datta, Lisa W.; Wei, Zhi; Thomas, Kelly; Herrinton, Lisa J.; Klapproth, Jan-Micheal A.; Quiros, Antonio J.; Seminerio, Jenifer; Liu, Zhenqiu; Alexander, Jonathan S.; Baldassano, Robert N.; Dudley-Brown, Sharon; Cross, Raymond K.; Dassopoulos, Themistocles; Denson, Lee A.; Dhere, Tanvi A.; Dryden, Gerald W.; Hanson, John S.; Hou, Jason K.; Hussain, Sunny Z.; Hyams, Jeffrey S.; Isaacs, Kim L.; Kader, Howard; Kappelman, Michael D.; Katz, Jeffry; Kellermayer, Richard; Kirschner, Barbara S.; Kuemmerle, John F.; Kwon, John H.; Lazarev, Mark; Li, Ellen; Mack, David; Mannon, Peter; Moulton, Dedrick E.; Newberry, Rodney D.; Osuntokun, Bankole O.; Patel, Ashish S.; Saeed, Shehzad A.; Targan, Stephan R.; Valentine, John F.; Wang, Ming-Hsi; Zonca, Martin; Rioux, John D.; Duerr, Richard H.; Silverberg, Mark S.; Cho, Judy H.; Hakonarson, Hakon; Zwick, Michael E.; McGovern, Dermot P.B.; Kugathasan, Subra
2016-01-01
Background & Aims The inflammatory bowel diseases (IBD) ulcerative colitis (UC) and Crohn’s disease (CD) cause significant morbidity and are increasing in prevalence among all populations, including African Americans. More than 200 susceptibility loci have been identified in populations of predominantly European ancestry, but few loci have been associated with IBD in other ethnicities. Methods We performed 2 high-density, genome-wide scans comprising 2345 cases of African Americans with IBD (1646 with CD, 583 with UC, and 116 inflammatory bowel disease unclassified [IBD-U]) and 5002 individuals without IBD (controls, identified from the Health Retirement Study and Kaiser Permanente database). Single-nucleotide polymorphisms (SNPs) associated at P<5.0×10−8 in meta-analysis with a nominal evidence (P<.05) in each scan were considered to have genome-wide significance. Results We detected SNPs at HLA-DRB1, and African-specific SNPs at ZNF649 and LSAMP, with associations of genome-wide significance for UC. We detected SNPs at USP25 with associations of genome-wide significance associations for IBD. No associations of genome-wide significance were detected for CD. In addition, 9 genes previously associated with IBD contained SNPs with significant evidence for replication (P<1.6×10−6): ADCY3, CXCR6, HLA-DRB1 to HLA-DQA1 (genome-wide significance on conditioning), IL12B, PTGER4, and TNC for IBD; IL23R, PTGER4, and SNX20 (in strong linkage disequilibrium with NOD2) for CD; and KCNQ2 (near TNFRSF6B) for UC. Several of these genes, such as TNC (near TNFSF15), CXCR6, and genes associated with IBD at the HLA locus, contained SNPs with unique association patterns with African-specific alleles. Conclusions We performed a genome-wide association study of African Americans with IBD and identified loci associated with CD and UC in only this population; we also replicated loci identified in European populations. The detection of variants associated with IBD risk in only people of African descent demonstrates the importance of studying the genetics of IBD and other complex diseases in populations beyond those of European ancestry. PMID:27693347
The development of miniplex primer sets for the analysis of degraded DNA
NASA Astrophysics Data System (ADS)
McCord, Bruce; Opel, Kerry; Chung, Denise; Drabek, Jiri; Tatarek, Nancy; Meadows Jantz, Lee; Butler, John
2005-05-01
In this project, a new set of multiplexed PCR reactions has been developed for the analysis of degraded DNA. These DNA markers, known as Miniplexes, utilize primers that have shorter amplicons for use in short tandem repeat (STR) analysis of degraded DNA. In our work we have defined six of these new STR multiplexes, each of which consists of 3 to 4 reduced size STR loci, and each labeled with a different fluorescent dye. When compared to commercially available STR systems, reductions in size of up to 300 basepairs are possible. In addition, these newly designed amplicons consist of loci that are fully compatible with the the national computer DNA database known as CODIS. To demonstrate compatibility with commercial STR kits, a concordance study of 532 DNA samples of Caucasian, African American, and Hispanic origin was undertaken There was 99.77% concordance between allele calls with the two methods. Of these 532 samples, only 15 samples showed discrepancies at one of 12 loci. These occurred predominantly at 2 loci, vWA and D13S317. DNA sequencing revealed that these locations had deletions between the two primer binding sites. Uncommon deletions like these can be expected in certain samples and will not affect the utility of the Miniplexes as tools for degraded DNA analysis. The Miniplexes were also applied to enzymatically digested DNA to assess their potential in degraded DNA analysis. The results demonstrated a greatly improved efficiency in the analysis of degraded DNA when compared to commercial STR genotyping kits. A series of human skeletal remains that had been exposed to a variety of environmental conditions were also examined. Sixty-four percent of the samples generated full profiles when amplified with the Miniplexes, while only sixteen percent of the samples tested generated full profiles with a commercial kit. In addition, complete profiles were obtained for eleven of the twelve Miniplex loci which had amplicon size ranges less than 200 base pairs. These data clearly demonstrate that smaller PCR amplicons provide an attractive alternative to mitochondrial DNA for forensic analysis of degraded DNA.
Identification of Susceptibility Loci and Genes for Colorectal Cancer Risk.
Zeng, Chenjie; Matsuda, Koichi; Jia, Wei-Hua; Chang, Jiang; Kweon, Sun-Seog; Xiang, Yong-Bing; Shin, Aesun; Jee, Sun Ha; Kim, Dong-Hyun; Zhang, Ben; Cai, Qiuyin; Guo, Xingyi; Long, Jirong; Wang, Nan; Courtney, Regina; Pan, Zhi-Zhong; Wu, Chen; Takahashi, Atsushi; Shin, Min-Ho; Matsuo, Keitaro; Matsuda, Fumihiko; Gao, Yu-Tang; Oh, Jae Hwan; Kim, Soriul; Jung, Keum Ji; Ahn, Yoon-Ok; Ren, Zefang; Li, Hong-Lan; Wu, Jie; Shi, Jiajun; Wen, Wanqing; Yang, Gong; Li, Bingshan; Ji, Bu-Tian; Brenner, Hermann; Schoen, Robert E; Küry, Sébastien; Gruber, Stephen B; Schumacher, Fredrick R; Stenzel, Stephanie L; Casey, Graham; Hopper, John L; Jenkins, Mark A; Kim, Hyeong-Rok; Jeong, Jin-Young; Park, Ji Won; Tajima, Kazuo; Cho, Sang-Hee; Kubo, Michiaki; Shu, Xiao-Ou; Lin, Dongxin; Zeng, Yi-Xin; Zheng, Wei
2016-06-01
Known genetic factors explain only a small fraction of genetic variation in colorectal cancer (CRC). We conducted a genome-wide association study to identify risk loci for CRC. This discovery stage included 8027 cases and 22,577 controls of East-Asian ancestry. Promising variants were evaluated in studies including as many as 11,044 cases and 12,047 controls. Tumor-adjacent normal tissues from 188 patients were analyzed to evaluate correlations of risk variants with expression levels of nearby genes. Potential functionality of risk variants were evaluated using public genomic and epigenomic databases. We identified 4 loci associated with CRC risk; P values for the most significant variant in each locus ranged from 3.92 × 10(-8) to 1.24 × 10(-12): 6p21.1 (rs4711689), 8q23.3 (rs2450115, rs6469656), 10q24.3 (rs4919687), and 12p13.3 (rs11064437). We also identified 2 risk variants at loci previously associated with CRC: 10q25.2 (rs10506868) and 20q13.3 (rs6061231). These risk variants, conferring an approximate 10%-18% increase in risk per allele, are located either inside or near protein-coding genes that include transcription factor EB (lysosome biogenesis and autophagy), eukaryotic translation initiation factor 3, subunit H (initiation of translation), cytochrome P450, family 17, subfamily A, polypeptide 1 (steroidogenesis), splA/ryanodine receptor domain and SOCS box containing 2 (proteasome degradation), and ribosomal protein S2 (ribosome biogenesis). Gene expression analyses showed a significant association (P < .05) for rs4711689 with transcription factor EB, rs6469656 with eukaryotic translation initiation factor 3, subunit H, rs11064437 with splA/ryanodine receptor domain and SOCS box containing 2, and rs6061231 with ribosomal protein S2. We identified susceptibility loci and genes associated with CRC risk, linking CRC predisposition to steroid hormone, protein synthesis and degradation, and autophagy pathways and providing added insight into the mechanism of CRC pathogenesis. Copyright © 2016 AGA Institute. Published by Elsevier Inc. All rights reserved.
Yan, Xiuqin; Zhang, Xue; Lu, Min; He, Yong; An, Huaming
2015-04-25
Rosa roxburghii Tratt. is a well-known ornamental rose species native to China. In addition, the fruits of this species are valued for their nutritional and medicinal characteristics, especially their high ascorbic acid (AsA) levels. Nevertheless, AsA biosynthesis in R. roxburghii fruit has not been explored in detail because of a lack of genomic resources for this species. High-throughput transcriptomic sequencing generating large volumes of transcript sequence data can aid in gene discovery and molecular marker development. In this study, we generated more than 53 million clean reads using Illumina paired-end sequencing technology. De novo assembly yielded 106,590 unigenes, with an average length of 343 bp. On the basis of sequence similarity to known proteins, 9301 and 2393 unigenes were classified into Gene Ontology and Clusters of Orthologous Group categories, respectively. There were 7480 unigenes assigned to 124 pathways in the Kyoto Encyclopedia of Gene and Genome pathway database. BLASTx searches identified 498 unique putative transcripts encoding various transcription factors, some known to regulate fruit development. qRT-PCR validated the expressions of most of the genes encoding the main enzymes involved in ascorbate biosynthesis. In addition, 9131 potential simple sequence repeat (SSR) loci were identified among the unigenes. One hundred and two primer pairs were synthesized and 71 pairs produced an amplification product during initial screening. Among the amplified products, 30 were polymorphic in the 16 R. roxburghii germplasms tested. Our study was the first to produce a large volume of transcriptome data from R. roxburghii. The resulting sequence collection is a valuable resource for gene discovery and marker-assisted selective breeding in this rose species. Copyright © 2015 Elsevier B.V. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Tomato Functional Genomics Database (TFGD; http://ted.bti.cornell.edu) provides a comprehensive systems biology resource to store, mine, analyze, visualize and integrate large-scale tomato functional genomics datasets. The database is expanded from the previously described Tomato Expression Database...
Mutagenicity and carcinogenicity databases are crucial resources for toxicologists and regulators involved in chemicals risk assessment. Until recently, existing public toxicity databases have been constructed primarily as "look-up-tables" of existing data, and most often did no...
ERIC Educational Resources Information Center
Blackwell, Michael Lind
This study evaluates the "Education Resources Information Center" (ERIC), "Library and Information Science Abstracts" (LISA), and "Library Literature" (LL) databases, determining how long the databases take to enter records (indexing delay), how much duplication of effort exists among the three databases (indexing…
McQuilton, Peter; Gonzalez-Beltran, Alejandra; Rocca-Serra, Philippe; Thurston, Milo; Lister, Allyson; Maguire, Eamonn; Sansone, Susanna-Assunta
2016-01-01
BioSharing (http://www.biosharing.org) is a manually curated, searchable portal of three linked registries. These resources cover standards (terminologies, formats and models, and reporting guidelines), databases, and data policies in the life sciences, broadly encompassing the biological, environmental and biomedical sciences. Launched in 2011 and built by the same core team as the successful MIBBI portal, BioSharing harnesses community curation to collate and cross-reference resources across the life sciences from around the world. BioSharing makes these resources findable and accessible (the core of the FAIR principle). Every record is designed to be interlinked, providing a detailed description not only on the resource itself, but also on its relations with other life science infrastructures. Serving a variety of stakeholders, BioSharing cultivates a growing community, to which it offers diverse benefits. It is a resource for funding bodies and journal publishers to navigate the metadata landscape of the biological sciences; an educational resource for librarians and information advisors; a publicising platform for standard and database developers/curators; and a research tool for bench and computer scientists to plan their work. BioSharing is working with an increasing number of journals and other registries, for example linking standards and databases to training material and tools. Driven by an international Advisory Board, the BioSharing user-base has grown by over 40% (by unique IP address), in the last year thanks to successful engagement with researchers, publishers, librarians, developers and other stakeholders via several routes, including a joint RDA/Force11 working group and a collaboration with the International Society for Biocuration. In this article, we describe BioSharing, with a particular focus on community-led curation.Database URL: https://www.biosharing.org. © The Author(s) 2016. Published by Oxford University Press.
ERIC Educational Resources Information Center
RESNA: Association for the Advancement of Rehabilitation Technology, Washington, DC.
This resource directory provides a selective listing of electronic networks, online databases, and bulletin boards that highlight technology-related services and products. For each resource, the following information is provided: name, address, and telephone number; description; target audience; hardware/software needs to access the system;…
USDA-ARS?s Scientific Manuscript database
The Cool Season Food Legume Genome database (CSFL, www.coolseasonfoodlegume.org) is an online resource for genomics, genetics, and breeding research for chickpea, lentil,pea, and faba bean. The user-friendly and curated website allows for all publicly available map,marker,trait, gene,transcript, ger...
[Data validation methods and discussion on Chinese materia medica resource survey].
Zhang, Yue; Ma, Wei-Feng; Zhang, Xiao-Bo; Zhu, Shou-Dong; Guo, Lan-Ping; Wang, Xing-Xing
2013-07-01
From the beginning of the fourth national survey of the Chinese materia medica resources, there were 22 provinces have conducted pilots. The survey teams have reported immense data, it put forward the very high request to the database system construction. In order to ensure the quality, it is necessary to check and validate the data in database system. Data validation is important methods to ensure the validity, integrity and accuracy of census data. This paper comprehensively introduce the data validation system of the fourth national survey of the Chinese materia medica resources database system, and further improve the design idea and programs of data validation. The purpose of this study is to promote the survey work smoothly.
Conversion of a traditional image archive into an image resource on compact disc.
Andrew, S M; Benbow, E W
1997-01-01
The conversion of a traditional archive of pathology images was organised on 35 mm slides into a database of images stored on compact disc (CD-ROM), and textual descriptions were added to each image record. Students on a didactic pathology course found this resource useful as an aid to revision, despite relative computer illiteracy, and it is anticipated that students on a new problem based learning course, which incorporates experience with information technology, will benefit even more readily when they use the database as an educational resource. A text and image database on CD-ROM can be updated repeatedly, and the content manipulated to reflect the content and style of the courses it supports. Images PMID:9306931
2010-01-01
Background Quantitative models of biochemical and cellular systems are used to answer a variety of questions in the biological sciences. The number of published quantitative models is growing steadily thanks to increasing interest in the use of models as well as the development of improved software systems and the availability of better, cheaper computer hardware. To maximise the benefits of this growing body of models, the field needs centralised model repositories that will encourage, facilitate and promote model dissemination and reuse. Ideally, the models stored in these repositories should be extensively tested and encoded in community-supported and standardised formats. In addition, the models and their components should be cross-referenced with other resources in order to allow their unambiguous identification. Description BioModels Database http://www.ebi.ac.uk/biomodels/ is aimed at addressing exactly these needs. It is a freely-accessible online resource for storing, viewing, retrieving, and analysing published, peer-reviewed quantitative models of biochemical and cellular systems. The structure and behaviour of each simulation model distributed by BioModels Database are thoroughly checked; in addition, model elements are annotated with terms from controlled vocabularies as well as linked to relevant data resources. Models can be examined online or downloaded in various formats. Reaction network diagrams generated from the models are also available in several formats. BioModels Database also provides features such as online simulation and the extraction of components from large scale models into smaller submodels. Finally, the system provides a range of web services that external software systems can use to access up-to-date data from the database. Conclusions BioModels Database has become a recognised reference resource for systems biology. It is being used by the community in a variety of ways; for example, it is used to benchmark different simulation systems, and to study the clustering of models based upon their annotations. Model deposition to the database today is advised by several publishers of scientific journals. The models in BioModels Database are freely distributed and reusable; the underlying software infrastructure is also available from SourceForge https://sourceforge.net/projects/biomodels/ under the GNU General Public License. PMID:20587024
Marine and Hydrokinetic Data | Geospatial Data Science | NREL
. wave energy resource using a 51-month Wavewatch III hindcast database developed by the National Database The U.S. Department of Energy's Marine and Hydrokinetic Technology Database provides information database includes wave, tidal, current, and ocean thermal energy and contains information about energy
NASA Technical Reports Server (NTRS)
Baldwin, John; Zendejas, Silvino; Gutheinz, Sandy; Borden, Chester; Wang, Yeou-Fang
2009-01-01
Mission and Assets Database (MADB) Version 1.0 is an SQL database system with a Web user interface to centralize information. The database stores flight project support resource requirements, view periods, antenna information, schedule, and forecast results for use in mid-range and long-term planning of Deep Space Network (DSN) assets.
Complex Genetics of Behavior: BXDs in the Automated Home-Cage.
Loos, Maarten; Verhage, Matthijs; Spijker, Sabine; Smit, August B
2017-01-01
This chapter describes a use case for the genetic dissection and automated analysis of complex behavioral traits using the genetically diverse panel of BXD mouse recombinant inbred strains. Strains of the BXD resource differ widely in terms of gene and protein expression in the brain, as well as in their behavioral repertoire. A large mouse resource opens the possibility for gene finding studies underlying distinct behavioral phenotypes, however, such a resource poses a challenge in behavioral phenotyping. To address the specifics of large-scale screening we describe how to investigate: (1) how to assess mouse behavior systematically in addressing a large genetic cohort, (2) how to dissect automation-derived longitudinal mouse behavior into quantitative parameters, and (3) how to map these quantitative traits to the genome, deriving loci underlying aspects of behavior.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Laramore, G.E.; Griffin, B.R.; Spence, A.
The purpose of this work is to establish and maintain a database for patients from the United States who have received BNCT in Japan for malignant gliomas of the brain. This database will serve as a resource for the DOE to aid in decisions relating to BNCT research in the United States, as well as assisting the design and implementation of clinical trials of BNCT for brain cancer patients in this country. The database will also serve as an information resource for patients with brain tumors and their families who are considering this form of therapy.
Incorporation of a Chemical Equilibrium Equation of State into LOCI-Chem
NASA Technical Reports Server (NTRS)
Cox, Carey F.
2005-01-01
Renewed interest in development of advanced high-speed transport, reentry vehicles and propulsion systems has led to a resurgence of research into high speed aerodynamics. As this flow regime is typically dominated by hot reacting gaseous flow, efficient models for the characteristic chemical activity are necessary for accurate and cost effective analysis and design of aerodynamic vehicles that transit this regime. The LOCI-Chem code recently developed by Ed Luke at Mississippi State University for NASA/MSFC and used by NASA/MSFC and SSC represents an important step in providing an accurate, efficient computational tool for the simulation of reacting flows through the use of finite-rate kinetics [3]. Finite rate chemistry however, requires the solution of an additional N-1 species mass conservation equations with source terms involving reaction kinetics that are not fully understood. In the equilibrium limit, where the reaction rates approach infinity, these equations become very stiff. Through the use of the assumption of local chemical equilibrium the set of governing equations is reduced back to the usual gas dynamic equations, and thus requires less computation, while still allowing for the inclusion of reacting flow phenomenology. The incorporation of a chemical equilibrium equation of state module into the LOCI-Chem code was the primary objective of the current research. The major goals of the project were: (1) the development of a chemical equilibrium composition solver, and (2) the incorporation of chemical equilibrium solver into LOCI-Chem. Due to time and resource constraints, code optimization was not considered unless it was important to the proper functioning of the code.
Adaptive Genetic Divergence along Narrow Environmental Gradients in Four Stream Insects
Watanabe, Kozo; Kazama, So; Omura, Tatsuo; Monaghan, Michael T.
2014-01-01
A central question linking ecology with evolutionary biology is how environmental heterogeneity can drive adaptive genetic divergence among populations. We examined adaptive divergence of four stream insects from six adjacent catchments in Japan by combining field measures of habitat and resource components with genome scans of non-neutral Amplified Fragment Length Polymorphism (AFLP) loci. Neutral genetic variation was used to measure gene flow and non-neutral genetic variation was used to test for adaptive divergence. We identified the environmental characteristics contributing to divergence by comparing genetic distances at non-neutral loci between sites with Euclidean distances for each of 15 environmental variables. Comparisons were made using partial Mantel tests to control for geographic distance. In all four species, we found strong evidence for non-neutral divergence along environmental gradients at between 6 and 21 loci per species. The relative contribution of these environmental variables to each species' ecological niche was quantified as the specialization index, S, based on ecological data. In each species, the variable most significantly correlated with genetic distance at non-neutral loci was the same variable along which each species was most narrowly distributed (i.e., highest S). These were gradients of elevation (two species), chlorophyll-a, and ammonia-nitrogen. This adaptive divergence occurred in the face of ongoing gene flow (F st = 0.01–0.04), indicating that selection was strong enough to overcome homogenization at the landscape scale. Our results suggest that adaptive divergence is pronounced, occurs along different environmental gradients for different species, and may consistently occur along the narrowest components of species' niche. PMID:24681871
Noninvasive genome sampling in chimpanzees.
Kohn, Michael H
2010-12-01
The inevitable has happened: genomic technologies have been added to our noninvasive genetic sampling repertoire. In this issue of Molecular Ecology, Perry et al. (2010) demonstrate how DNA extraction from chimpanzee faeces, followed by a series of steps to enrich for target loci, can be coupled with next-generation sequencing. These authors collected sequence and single-nucleotide polymorphism (SNP) data at more than 600 genomic loci (chromosome 21 and the X) and the complete mitochondrial DNA. By design, each locus was 'deep sequenced' to enable SNP identification. To demonstrate the reliability of their data, the work included samples from six captive chimps, which allowed for a comparison between presumably genuine SNPs obtained from blood and potentially flawed SNPs deduced from faeces. Thus, with this method, anyone with the resources, skills and ambition to do genome sequencing of wild, elusive, or protected mammals can enjoy all of the benefits of noninvasive sampling. © 2010 Blackwell Publishing Ltd.
SplicePlot: a utility for visualizing splicing quantitative trait loci.
Wu, Eric; Nance, Tracy; Montgomery, Stephen B
2014-04-01
RNA sequencing has provided unprecedented resolution of alternative splicing and splicing quantitative trait loci (sQTL). However, there are few tools available for visualizing the genotype-dependent effects of splicing at a population level. SplicePlot is a simple command line utility that produces intuitive visualization of sQTLs and their effects. SplicePlot takes mapped RNA sequencing reads in BAM format and genotype data in VCF format as input and outputs publication-quality Sashimi plots, hive plots and structure plots, enabling better investigation and understanding of the role of genetics on alternative splicing and transcript structure. Source code and detailed documentation are available at http://montgomerylab.stanford.edu/spliceplot/index.html under Resources and at Github. SplicePlot is implemented in Python and is supported on Linux and Mac OS. A VirtualBox virtual machine running Ubuntu with SplicePlot already installed is also available.
Jairin, Jirapong; Kobayashi, Tetsuya; Yamagata, Yoshiyuki; Sanada-Morimura, Sachiyo; Mori, Kazuki; Tashiro, Kosuke; Kuhara, Satoru; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Yamamoto, Kimiko; Matsumura, Masaya; Yasui, Hideshi
2013-01-01
In this study, we developed the first genetic linkage map for the major rice insect pest, the brown planthopper (BPH, Nilaparvata lugens). The linkage map was constructed by integrating linkage data from two backcross populations derived from three inbred BPH strains. The consensus map consists of 474 simple sequence repeats, 43 single-nucleotide polymorphisms, and 1 sequence-tagged site, for a total of 518 markers at 472 unique positions in 17 linkage groups. The linkage groups cover 1093.9 cM, with an average distance of 2.3 cM between loci. The average number of marker loci per linkage group was 27.8. The sex-linkage group was identified by exploiting X-linked and Y-specific markers. Our linkage map and the newly developed markers used to create it constitute an essential resource and a useful framework for future genetic analyses in BPH. PMID:23204257
Chery, Joyce G; Sass, Chodon; Specht, Chelsea D
2017-09-01
We developed a bioinformatic pipeline that leverages a publicly available genome and published transcriptomes to design primers in conserved coding sequences flanking targeted introns of single-copy nuclear loci. Paullinieae (Sapindaceae) is used to demonstrate the pipeline. Transcriptome reads phylogenetically closer to the lineage of interest are aligned to the closest genome. Single-nucleotide polymorphisms are called, generating a "pseudoreference" closer to the lineage of interest. Several filters are applied to meet the criteria of single-copy nuclear loci with introns of a desired size. Primers are designed in conserved coding sequences flanking introns. Using this pipeline, we developed nine single-copy nuclear intron markers for Paullinieae. This pipeline is highly flexible and can be used for any group with available genomic and transcriptomic resources. This pipeline led to the development of nine variable markers for phylogenetic study without generating sequence data de novo.
Jin, Yuqing; Bi, Quanxin; Guan, Wenbin; Mao, Jian-Feng
2015-09-01
Metasequoia glyptostroboides is an endangered relict conifer species endemic to China. In this study, expressed sequence tag-simple sequence repeat (EST-SSR) markers were developed using transcriptome mining for future genetic and functional studies. We collected 97,565 unigene sequences generated by 454 pyrosequencing. A bioinformatics analysis identified 2087 unique and putative microsatellites, from which 96 novel microsatellite markers were developed. Fifty-three of the 96 primer sets successfully amplified clear fragments of the expected sizes; 23 of those loci were polymorphic. The number of alleles per locus ranged from two to eight, with an average of three, and the observed and expected heterozygosity values ranged from 0 to 1.0 and 0.117 to 0.813, respectively. These microsatellite loci will enrich the genetic resources to develop functional studies and conservation strategies for this endangered relict species.
Jin, Yuqing; Bi, Quanxin; Guan, Wenbin; Mao, Jian-Feng
2015-01-01
Premise of the study: Metasequoia glyptostroboides is an endangered relict conifer species endemic to China. In this study, expressed sequence tag–simple sequence repeat (EST-SSR) markers were developed using transcriptome mining for future genetic and functional studies. Methods and Results: We collected 97,565 unigene sequences generated by 454 pyrosequencing. A bioinformatics analysis identified 2087 unique and putative microsatellites, from which 96 novel microsatellite markers were developed. Fifty-three of the 96 primer sets successfully amplified clear fragments of the expected sizes; 23 of those loci were polymorphic. The number of alleles per locus ranged from two to eight, with an average of three, and the observed and expected heterozygosity values ranged from 0 to 1.0 and 0.117 to 0.813, respectively. Conclusions: These microsatellite loci will enrich the genetic resources to develop functional studies and conservation strategies for this endangered relict species. PMID:26421250
Castle, John C; Chalmers, Iain; Atkinson, Patricia; Badenoch, Douglas; Oxman, Andrew D; Austvoll-Dahlgren, Astrid; Nordheim, Lena; Krause, L Kendall; Schwartz, Lisa M; Woloshin, Steven; Burls, Amanda; Mosconi, Paola; Hoffmann, Tammy; Cusack, Leila; Albarqouni, Loai; Glasziou, Paul
2017-01-01
People are frequently confronted with untrustworthy claims about the effects of treatments. Uncritical acceptance of these claims can lead to poor, and sometimes dangerous, treatment decisions, and wasted time and money. Resources to help people learn to think critically about treatment claims are scarce, and they are widely scattered. Furthermore, very few learning-resources have been assessed to see if they improve knowledge and behavior. Our objectives were to develop the Critical thinking and Appraisal Resource Library (CARL). This library was to be in the form of a database containing learning resources for those who are responsible for encouraging critical thinking about treatment claims, and was to be made available online. We wished to include resources for groups we identified as 'intermediaries' of knowledge, i.e. teachers of schoolchildren, undergraduates and graduates, for example those teaching evidence-based medicine, or those communicating treatment claims to the public. In selecting resources, we wished to draw particular attention to those resources that had been formally evaluated, for example, by the creators of the resource or independent research groups. CARL was populated with learning-resources identified from a variety of sources-two previously developed but unmaintained inventories; systematic reviews of learning-interventions; online and database searches; and recommendations by members of the project group and its advisors. The learning-resources in CARL were organised by 'Key Concepts' needed to judge the trustworthiness of treatment claims, and were made available online by the James Lind Initiative in Testing Treatments interactive (TTi) English (www.testingtreatments.org/category/learning-resources).TTi English also incorporated the database of Key Concepts and the Claim Evaluation Tools developed through the Informed Healthcare Choices (IHC) project (informedhealthchoices.org). We have created a database of resources called CARL, which currently contains over 500 open-access learning-resources in a variety of formats: text, audio, video, webpages, cartoons, and lesson materials. These are aimed primarily at 'Intermediaries', that is, 'teachers', 'communicators', 'advisors', 'researchers', as well as for independent 'learners'. The resources included in CARL are currently accessible at www.testingtreatments.org/category/learning-resources. We hope that ready access to CARL will help to promote the critical thinking about treatment claims, needed to help improve healthcare choices.
Chalmers, Iain; Atkinson, Patricia; Badenoch, Douglas; Oxman, Andrew D.; Austvoll-Dahlgren, Astrid; Nordheim, Lena; Krause, L. Kendall; Schwartz, Lisa M.; Woloshin, Steven; Burls, Amanda; Mosconi, Paola; Hoffmann, Tammy; Cusack, Leila; Albarqouni, Loai; Glasziou, Paul
2017-01-01
Background People are frequently confronted with untrustworthy claims about the effects of treatments. Uncritical acceptance of these claims can lead to poor, and sometimes dangerous, treatment decisions, and wasted time and money. Resources to help people learn to think critically about treatment claims are scarce, and they are widely scattered. Furthermore, very few learning-resources have been assessed to see if they improve knowledge and behavior. Objectives Our objectives were to develop the Critical thinking and Appraisal Resource Library (CARL). This library was to be in the form of a database containing learning resources for those who are responsible for encouraging critical thinking about treatment claims, and was to be made available online. We wished to include resources for groups we identified as ‘intermediaries’ of knowledge, i.e. teachers of schoolchildren, undergraduates and graduates, for example those teaching evidence-based medicine, or those communicating treatment claims to the public. In selecting resources, we wished to draw particular attention to those resources that had been formally evaluated, for example, by the creators of the resource or independent research groups. Methods CARL was populated with learning-resources identified from a variety of sources—two previously developed but unmaintained inventories; systematic reviews of learning-interventions; online and database searches; and recommendations by members of the project group and its advisors. The learning-resources in CARL were organised by ‘Key Concepts’ needed to judge the trustworthiness of treatment claims, and were made available online by the James Lind Initiative in Testing Treatments interactive (TTi) English (www.testingtreatments.org/category/learning-resources).TTi English also incorporated the database of Key Concepts and the Claim Evaluation Tools developed through the Informed Healthcare Choices (IHC) project (informedhealthchoices.org). Results We have created a database of resources called CARL, which currently contains over 500 open-access learning-resources in a variety of formats: text, audio, video, webpages, cartoons, and lesson materials. These are aimed primarily at ‘Intermediaries’, that is, ‘teachers’, ‘communicators’, ‘advisors’, ‘researchers’, as well as for independent ‘learners’. The resources included in CARL are currently accessible at www.testingtreatments.org/category/learning-resources Conclusions We hope that ready access to CARL will help to promote the critical thinking about treatment claims, needed to help improve healthcare choices. PMID:28738058
Veterans Administration Databases
The Veterans Administration Information Resource Center provides database and informatics experts, customer service, expert advice, information products, and web technology to VA researchers and others.
Miller, Adam D; Van Rooyen, Anthony; Sweeney, Oisín F; Whiterod, Nick S; Weeks, Andrew R
2013-07-01
The Glenelg spiny crayfish, Euastacus bispinosus, is an iconic freshwater invertebrate of south eastern Australia and listed as 'endangered' under the Environment Protection and Biodiversity Conservation Act 1999, and 'vulnerable' under the International Union for Conservation of Nature's Red List. The species has suffered major population declines as a result of over-fishing, low environmental flows, the introduction of invasive fish species and habitat degradation. In order to develop an effective conservation strategy, patterns of gene flow, genetic structure and genetic diversity across the species distribution need to be clearly understood. In this study we develop a suite of polymorphic microsatellite markers by next generation sequencing. A total of 15 polymorphic loci were identified and 10 characterized using 22 individuals from the lower Glenelg River. We observed low to moderate genetic variation across most loci (mean number of alleles per locus = 2.80; mean expected heterozygosity = 0.36) with no evidence of individual loci deviating significantly from Hardy-Weinberg equilibrium. Marker independence was confirmed with tests for linkage disequilibrium, and analyses indicated no evidence of null alleles across loci. Individuals from two additional sites (Crawford River, Victoria; Ewens Ponds Conservation Park, South Australia) were genotyped at all 10 loci and a preliminary investigation of genetic diversity and population structure was undertaken. Analyses indicate high levels of genetic differentiation among sample locations (F ST = 0.49), while the Ewens Ponds population is genetically homogeneous, indicating a likely small founder group and ongoing inbreeding. Management actions will be needed to restore genetic diversity in this and possibly other at risk populations. These markers will provide a valuable resource for future population genetic assessments so that an effective framework can be developed for implementing conservation strategies for E. bispinosus.
The Universal Protein Resource (UniProt): an expanding universe of protein information.
Wu, Cathy H; Apweiler, Rolf; Bairoch, Amos; Natale, Darren A; Barker, Winona C; Boeckmann, Brigitte; Ferro, Serenella; Gasteiger, Elisabeth; Huang, Hongzhan; Lopez, Rodrigo; Magrane, Michele; Martin, Maria J; Mazumder, Raja; O'Donovan, Claire; Redaschi, Nicole; Suzek, Baris
2006-01-01
The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.
Polvi, Anne; Linturi, Henna; Varilo, Teppo; Anttonen, Anna-Kaisa; Byrne, Myles; Fokkema, Ivo F A C; Almusa, Henrikki; Metzidis, Anthony; Avela, Kristiina; Aula, Pertti; Kestilä, Marjo; Muilu, Juha
2013-11-01
The Finnish Disease Heritage Database (FinDis) (http://findis.org) was originally published in 2004 as a centralized information resource for rare monogenic diseases enriched in the Finnish population. The FinDis database originally contained 405 causative variants for 30 diseases. At the time, the FinDis database was a comprehensive collection of data, but since 1994, a large amount of new information has emerged, making the necessity to update the database evident. We collected information and updated the database to contain genes and causative variants for 35 diseases, including six more genes and more than 1,400 additional disease-causing variants. Information for causative variants for each gene is collected under the LOVD 3.0 platform, enabling easy updating. The FinDis portal provides a centralized resource and user interface to link information on each disease and gene with variant data in the LOVD 3.0 platform. The software written to achieve this has been open-sourced and made available on GitHub (http://github.com/findis-db), allowing biomedical institutions in other countries to present their national data in a similar way, and to both contribute to, and benefit from, standardized variation data. The updated FinDis portal provides a unique resource to assist patient diagnosis, research, and the development of new cures. © 2013 WILEY PERIODICALS, INC.
Database Resources of the BIG Data Center in 2018
Xu, Xingjian; Hao, Lili; Zhu, Junwei; Tang, Bixia; Zhou, Qing; Song, Fuhai; Chen, Tingting; Zhang, Sisi; Dong, Lili; Lan, Li; Wang, Yanqing; Sang, Jian; Hao, Lili; Liang, Fang; Cao, Jiabao; Liu, Fang; Liu, Lin; Wang, Fan; Ma, Yingke; Xu, Xingjian; Zhang, Lijuan; Chen, Meili; Tian, Dongmei; Li, Cuiping; Dong, Lili; Du, Zhenglin; Yuan, Na; Zeng, Jingyao; Zhang, Zhewen; Wang, Jinyue; Shi, Shuo; Zhang, Yadong; Pan, Mengyu; Tang, Bixia; Zou, Dong; Song, Shuhui; Sang, Jian; Xia, Lin; Wang, Zhennan; Li, Man; Cao, Jiabao; Niu, Guangyi; Zhang, Yang; Sheng, Xin; Lu, Mingming; Wang, Qi; Xiao, Jingfa; Zou, Dong; Wang, Fan; Hao, Lili; Liang, Fang; Li, Mengwei; Sun, Shixiang; Zou, Dong; Li, Rujiao; Yu, Chunlei; Wang, Guangyu; Sang, Jian; Liu, Lin; Li, Mengwei; Li, Man; Niu, Guangyi; Cao, Jiabao; Sun, Shixiang; Xia, Lin; Yin, Hongyan; Zou, Dong; Xu, Xingjian; Ma, Lina; Chen, Huanxin; Sun, Yubin; Yu, Lei; Zhai, Shuang; Sun, Mingyuan; Zhang, Zhang; Zhao, Wenming; Xiao, Jingfa; Bao, Yiming; Song, Shuhui; Hao, Lili; Li, Rujiao; Ma, Lina; Sang, Jian; Wang, Yanqing; Tang, Bixia; Zou, Dong; Wang, Fan
2018-01-01
Abstract The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. PMID:29036542
DNA Identification of Skeletal Remains from World War II Mass Graves Uncovered in Slovenia
Marjanović, Damir; Durmić-Pašić, Adaleta; Bakal, Narcisa; Haverić, Sanin; Kalamujić, Belma; Kovačević, Lejla; Ramić, Jasmin; Pojskić, Naris; Škaro, Vedrana; Projić, Petar; Bajrović, Kasim; Hadžiselimović, Rifat; Drobnič, Katja; Huffine, Ed; Davoren, Jon; Primorac, Dragan
2007-01-01
Aim To present the joint effort of three institutions in the identification of human remains from the World War II found in two mass graves in the area of Škofja Loka, Slovenia. Methods The remains of 27 individuals were found in two small and closely located mass graves. The DNA was isolated from bone and teeth samples using either standard phenol/chloroform alcohol extraction or optimized Qiagen DNA extraction procedure. Some recovered samples required the employment of additional DNA purification methods, such as N-buthanol treatment. QuantifilerTM Human DNA Quantification Kit was used for DNA quantification. PowerPlex 16 kit was used to simultaneously amplify 15 short tandem repeat (STR) loci. Matching probabilities were estimated using the DNA View program. Results Out of all processed samples, 15 remains were fully profiled at all 15 STR loci. The other 12 profiles were partial. The least successful profile included 13 loci. Also, 69 referent samples (buccal swabs) from potential living relatives were collected and profiled. Comparison of victims' profile against referent samples database resulted in 4 strong matches. In addition, 5 other profiles were matched to certain referent samples with lower probability. Conclusion Our results show that more than 6 decades after the end of the World War II, DNA analysis may significantly contribute to the identification of the remains from that period. Additional analysis of Y-STRs and mitochondrial DNA (mtDNA) markers will be performed in the second phase of the identification project. PMID:17696306
Mutations, mutation rates, and evolution at the hypervariable VNTR loci of Yersinia pestis.
Vogler, Amy J; Keys, Christine E; Allender, Christopher; Bailey, Ira; Girard, Jessica; Pearson, Talima; Smith, Kimothy L; Wagner, David M; Keim, Paul
2007-03-01
VNTRs are able to discriminate among closely related isolates of recently emerged clonal pathogens, including Yersinia pestis the etiologic agent of plague, because of their great diversity. Diversity is driven largely by mutation but little is known about VNTR mutation rates, factors affecting mutation rates, or the mutational mechanisms. The molecular epidemiological utility of VNTRs will be greatly enhanced when this foundational knowledge is available. Here, we measure mutation rates for 43 VNTR loci in Y. pestis using an in vitro generated population encompassing approximately 96,000 generations. We estimate the combined 43-locus rate and individual rates for 14 loci. A comparison of Y. pestis and Escherichia coli O157:H7 VNTR mutation rates and products revealed a similar relationship between diversity and mutation rate in these two species. Likewise, the relationship between repeat copy number and mutation rate is nearly identical between these species, suggesting a generalized relationship that may be applicable to other species. The single- versus multiple-repeat mutation ratios and the insertion versus deletion mutation ratios were also similar, providing support for a general model for the mutations associated with VNTRs. Finally, we use two small sets of Y. pestis isolates to show how this general model and our estimated mutation rates can be used to compare alternate phylogenies, and to evaluate the significance of genotype matches, near-matches, and mismatches found in empirical comparisons with a reference database.
sRNAdb: A small non-coding RNA database for gram-positive bacteria
2012-01-01
Background The class of small non-coding RNA molecules (sRNA) regulates gene expression by different mechanisms and enables bacteria to mount a physiological response due to adaptation to the environment or infection. Over the last decades the number of sRNAs has been increasing rapidly. Several databases like Rfam or fRNAdb were extended to include sRNAs as a class of its own. Furthermore new specialized databases like sRNAMap (gram-negative bacteria only) and sRNATarBase (target prediction) were established. To the best of the authors’ knowledge no database focusing on sRNAs from gram-positive bacteria is publicly available so far. Description In order to understand sRNA’s functional and phylogenetic relationships we have developed sRNAdb and provide tools for data analysis and visualization. The data compiled in our database is assembled from experiments as well as from bioinformatics analyses. The software enables comparison and visualization of gene loci surrounding the sRNAs of interest. To accomplish this, we use a client–server based approach. Offline versions of the database including analyses and visualization tools can easily be installed locally on the user’s computer. This feature facilitates customized local addition of unpublished sRNA candidates and related information such as promoters or terminators using tab-delimited files. Conclusion sRNAdb allows a user-friendly and comprehensive comparative analysis of sRNAs from available sequenced gram-positive prokaryotic replicons. Offline versions including analysis and visualization tools facilitate complex user specific bioinformatics analyses. PMID:22883983
Ingle, Danielle J; Valcanis, Mary; Kuzevski, Alex; Tauschek, Marija; Inouye, Michael; Stinear, Tim; Levine, Myron M; Robins-Browne, Roy M; Holt, Kathryn E
2016-07-01
The lipopolysaccharide (O) and flagellar (H) surface antigens of Escherichia coli are targets for serotyping that have traditionally been used to identify pathogenic lineages. These surface antigens are important for the survival of E. coli within mammalian hosts. However, traditional serotyping has several limitations, and public health reference laboratories are increasingly moving towards whole genome sequencing (WGS) to characterize bacterial isolates. Here we present a method to rapidly and accurately serotype E. coli isolates from raw, short read WGS data. Our approach bypasses the need for de novo genome assembly by directly screening WGS reads against a curated database of alleles linked to known and novel E. coli O-groups and H-types (the EcOH database) using the software package srst2. We validated the approach by comparing in silico results for 197 enteropathogenic E. coli isolates with those obtained by serological phenotyping in an independent laboratory. We then demonstrated the utility of our method to characterize isolates in public health and clinical settings, and to explore the genetic diversity of >1500 E. coli genomes from multiple sources. Importantly, we showed that transfer of O- and H-antigen loci between E. coli chromosomal backbones is common, with little evidence of constraints by host or pathotype, suggesting that E. coli ' strain space' may be virtually unlimited, even within specific pathotypes. Our findings show that serotyping is most useful when used in combination with strain genotyping to characterize microevolution events within an inferred population structure.
McIlroy, Simon Jon; Kirkegaard, Rasmus Hansen; McIlroy, Bianca; Nierychlo, Marta; Kristensen, Jannie Munk; Karst, Søren Michael; Albertsen, Mads
2017-01-01
Abstract Wastewater is increasingly viewed as a resource, with anaerobic digester technology being routinely implemented for biogas production. Characterising the microbial communities involved in wastewater treatment facilities and their anaerobic digesters is considered key to their optimal design and operation. Amplicon sequencing of the 16S rRNA gene allows high-throughput monitoring of these systems. The MiDAS field guide is a public resource providing amplicon sequencing protocols and an ecosystem-specific taxonomic database optimized for use with wastewater treatment facility samples. The curated taxonomy endeavours to provide a genus-level-classification for abundant phylotypes and the online field guide links this identity to published information regarding their ecology, function and distribution. This article describes the expansion of the database resources to cover the organisms of the anaerobic digester systems fed primary sludge and surplus activated sludge. The updated database includes descriptions of the abundant genus-level-taxa in influent wastewater, activated sludge and anaerobic digesters. Abundance information is also included to allow assessment of the role of emigration in the ecology of each phylotype. MiDAS is intended as a collaborative resource for the progression of research into the ecology of wastewater treatment, by providing a public repository for knowledge that is accessible to all interested in these biotechnologically important systems. Database URL: http://www.midasfieldguide.org PMID:28365734
ERIC Educational Resources Information Center
Kreie, Jennifer; Hashemi, Shohreh
2012-01-01
Data is a vital resource for businesses; therefore, it is important for businesses to manage and use their data effectively. Because of this, businesses value college graduates with an understanding of and hands-on experience working with databases, data warehouses and data analysis theories and tools. Faculty in many business disciplines try to…
DSSTox: New On-line Resource for Publishing Structure-Standardized Toxicity Databases
Ann M Richard1, Jamie Burch2, ClarLynda Williams3
1Nat. Health and Environ. Effects Res. Lb, US EP& Ret Triangle Park, NC 27711; 2EPA-NC
Central Univ Student COOP, US EPA, lies. Tri...
NetMap: a new tool in support of watershed science and resource management.
L. Benda; D. Miller; K. Andras; P. Bigelow; G. Reeves; D. Michael
2007-01-01
In this paper, we show how application of principles of river ecology can guide use of a comprehensive terrain database within geographic information system (GIS) to facilitate watershed analysis relevant to natural resource management. We present a unique arrangement of a terrain database, GIS, and principles of riverine ecology for the purpose of advancing watershed...
Mapping PDB chains to UniProtKB entries.
Martin, Andrew C R
2005-12-01
UniProtKB/SwissProt is the main resource for detailed annotations of protein sequences. This database provides a jumping-off point to many other resources through the links it provides. Among others, these include other primary databases, secondary databases, the Gene Ontology and OMIM. While a large number of links are provided to Protein Data Bank (PDB) files, obtaining a regularly updated mapping between UniProtKB entries and PDB entries at the chain or residue level is not straightforward. In particular, there is no regularly updated resource which allows a UniProtKB/SwissProt entry to be identified for a given residue of a PDB file. We have created a completely automatically maintained database which maps PDB residues to residues in UniProtKB/SwissProt and UniProtKB/trEMBL entries. The protocol uses links from PDB to UniProtKB, from UniProtKB to PDB and a brute-force sequence scan to resolve PDB chains for which no annotated link is available. Finally the sequences from PDB and UniProtKB are aligned to obtain a residue-level mapping. The resource may be queried interactively or downloaded from http://www.bioinf.org.uk/pdbsws/.
Hendrickx, Diana M; Boyles, Rebecca R; Kleinjans, Jos C S; Dearry, Allen
2014-12-01
A joint US-EU workshop on enhancing data sharing and exchange in toxicogenomics was held at the National Institute for Environmental Health Sciences. Currently, efficient reuse of data is hampered by problems related to public data availability, data quality, database interoperability (the ability to exchange information), standardization and sustainability. At the workshop, experts from universities and research institutes presented databases, studies, organizations and tools that attempt to deal with these problems. Furthermore, a case study showing that combining toxicogenomics data from multiple resources leads to more accurate predictions in risk assessment was presented. All participants agreed that there is a need for a web portal describing the diverse, heterogeneous data resources relevant for toxicogenomics research. Furthermore, there was agreement that linking more data resources would improve toxicogenomics data analysis. To outline a roadmap to enhance interoperability between data resources, the participants recommend collecting user stories from the toxicogenomics research community on barriers in data sharing and exchange currently hampering answering to certain research questions. These user stories may guide the prioritization of steps to be taken for enhancing integration of toxicogenomics databases.
Hamilton, John P; Neeno-Eckwall, Eric C; Adhikari, Bishwo N; Perna, Nicole T; Tisserat, Ned; Leach, Jan E; Lévesque, C André; Buell, C Robin
2011-01-01
The Comprehensive Phytopathogen Genomics Resource (CPGR) provides a web-based portal for plant pathologists and diagnosticians to view the genome and trancriptome sequence status of 806 bacterial, fungal, oomycete, nematode, viral and viroid plant pathogens. Tools are available to search and analyze annotated genome sequences of 74 bacterial, fungal and oomycete pathogens. Oomycete and fungal genomes are obtained directly from GenBank, whereas bacterial genome sequences are downloaded from the A Systematic Annotation Package (ASAP) database that provides curation of genomes using comparative approaches. Curated lists of bacterial genes relevant to pathogenicity and avirulence are also provided. The Plant Pathogen Transcript Assemblies Database provides annotated assemblies of the transcribed regions of 82 eukaryotic genomes from publicly available single pass Expressed Sequence Tags. Data-mining tools are provided along with tools to create candidate diagnostic markers, an emerging use for genomic sequence data in plant pathology. The Plant Pathogen Ribosomal DNA (rDNA) database is a resource for pathogens that lack genome or transcriptome data sets and contains 131 755 rDNA sequences from GenBank for 17 613 species identified as plant pathogens and related genera. Database URL: http://cpgr.plantbiology.msu.edu.
FReD: The Floral Reflectance Database — A Web Portal for Analyses of Flower Colour
Savolainen, Vincent; McOwan, Peter W.; Chittka, Lars
2010-01-01
Background Flower colour is of great importance in various fields relating to floral biology and pollinator behaviour. However, subjective human judgements of flower colour may be inaccurate and are irrelevant to the ecology and vision of the flower's pollinators. For precise, detailed information about the colours of flowers, a full reflectance spectrum for the flower of interest should be used rather than relying on such human assessments. Methodology/Principal Findings The Floral Reflectance Database (FReD) has been developed to make an extensive collection of such data available to researchers. It is freely available at http://www.reflectance.co.uk. The database allows users to download spectral reflectance data for flower species collected from all over the world. These could, for example, be used in modelling interactions between pollinator vision and plant signals, or analyses of flower colours in various habitats. The database contains functions for calculating flower colour loci according to widely-used models of bee colour space, reflectance graphs of the spectra and an option to search for flowers with similar colours in bee colour space. Conclusions/Significance The Floral Reflectance Database is a valuable new tool for researchers interested in the colours of flowers and their association with pollinator colour vision, containing raw spectral reflectance data for a large number of flower species. PMID:21170326
Kamitsuji, Shigeo; Matsuda, Takashi; Nishimura, Koichi; Endo, Seiko; Wada, Chisa; Watanabe, Kenji; Hasegawa, Koichi; Hishigaki, Haretsugu; Masuda, Masatoshi; Kuwahara, Yusuke; Tsuritani, Katsuki; Sugiura, Kenkichi; Kubota, Tomoko; Miyoshi, Shinji; Okada, Kinya; Nakazono, Kazuyuki; Sugaya, Yuki; Yang, Woosung; Sawamoto, Taiji; Uchida, Wataru; Shinagawa, Akira; Fujiwara, Tsutomu; Yamada, Hisaharu; Suematsu, Koji; Tsutsui, Naohisa; Kamatani, Naoyuki; Liou, Shyh-Yuh
2015-06-01
Japan Pharmacogenomics Data Science Consortium (JPDSC) has assembled a database for conducting pharmacogenomics (PGx) studies in Japanese subjects. The database contains the genotypes of 2.5 million single-nucleotide polymorphisms (SNPs) and 5 human leukocyte antigen loci from 2994 Japanese healthy volunteers, as well as 121 kinds of clinical information, including self-reports, physiological data, hematological data and biochemical data. In this article, the reliability of our data was evaluated by principal component analysis (PCA) and association analysis for hematological and biochemical traits by using genome-wide SNP data. PCA of the SNPs showed that all the samples were collected from the Japanese population and that the samples were separated into two major clusters by birthplace, Okinawa and other than Okinawa, as had been previously reported. Among 87 SNPs that have been reported to be associated with 18 hematological and biochemical traits in genome-wide association studies (GWAS), the associations of 56 SNPs were replicated using our data base. Statistical power simulations showed that the sample size of the JPDSC control database is large enough to detect genetic markers having a relatively strong association even when the case sample size is small. The JPDSC database will be useful as control data for conducting PGx studies to explore genetic markers to improve the safety and efficacy of drugs either during clinical development or in post-marketing.
Assembly: a resource for assembled genomes at NCBI
Kitts, Paul A.; Church, Deanna M.; Thibaud-Nissen, Françoise; Choi, Jinna; Hem, Vichet; Sapojnikov, Victor; Smith, Robert G.; Tatusova, Tatiana; Xiang, Charlie; Zherikov, Andrey; DiCuccio, Michael; Murphy, Terence D.; Pruitt, Kim D.; Kimchi, Avi
2016-01-01
The NCBI Assembly database (www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project. Users can find assemblies of interest by querying the Assembly Resource directly or by browsing available assemblies for a particular organism. Links in the Assembly Resource allow users to easily download sequence and annotations for current versions of genome assemblies from the NCBI genomes FTP site. PMID:26578580
Molecular and clinical studies of X-linked deafness among Pakistani families.
Waryah, Ali M; Ahmed, Zubair M; Bhinder, Munir A; Binder, Munir A; Choo, Daniel I; Sisk, Robert A; Shahzad, Mohsin; Khan, Shaheen N; Friedman, Thomas B; Riazuddin, Sheikh; Riazuddin, Saima
2011-07-01
There are 68 sex-linked syndromes that include hearing loss as one feature and five sex-linked nonsyndromic deafness loci listed in the OMIM database. The possibility of additional such sex-linked loci was explored by ascertaining three unrelated Pakistani families (PKDF536, PKDF1132 and PKDF740) segregating X-linked recessive deafness. Sequence analysis of POU3F4 (DFN3) in affected members of families PKDF536 and PKDF1132 revealed two novel nonsense mutations, p.Q136X and p.W114X, respectively. Family PKDF740 is segregating congenital blindness, mild-to-profound progressive hearing loss that is characteristic of Norrie disease (MIM#310600). Sequence analysis of NDP among affected members of this family revealed a novel single nucleotide deletion c.49delG causing a frameshift and premature truncation (p.V17fsX1) of the encoded protein. These mutations were not found in 150 normal DNA samples. Identification of pathogenic alleles causing X-linked recessive deafness will improve molecular diagnosis, genetic counseling and molecular epidemiology of hearing loss among Pakistanis.
Francez, Pablo Abdon da Costa; Ribeiro-Rodrigues, Elzemar Martins; dos Santos, Sidney Emanuel Batista
2012-01-01
Allelic frequencies of 48 informative insert-delete (INDEL) loci were obtained from a sample set of 130 unrelated individuals living in Macapá, a city located in the northern Amazon region, in Brazil. The values of heterozygosity (H), polymorphic information content (PIC), power of discrimination (PD), power of exclusion (PE), matching probability (MP) and typical paternity index (TPI) were calculated and showed the forensic efficiency of these genetic markers. Based on the allele frequency obtained for the population of Macapá, we estimated an interethnic admixture for the three parental groups (European, Native American and African) of, respectively, 50%, 21% and 29%. Comparing these allele frequencies with those of other Brazilian populations and the parental populations, statistically significant distances were found. The interpopulation genetic distance (F(ST) coefficients) to the present database ranged from F(ST)=0.0431 (p<0.00001) between Macapá and Belém to F(ST)=0.266 (p<0.00001) between Macapá and the Native American group. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Typing Clostridium difficile strains based on tandem repeat sequences
2009-01-01
Background Genotyping of epidemic Clostridium difficile strains is necessary to track their emergence and spread. Portability of genotyping data is desirable to facilitate inter-laboratory comparisons and epidemiological studies. Results This report presents results from a systematic screen for variation in repetitive DNA in the genome of C. difficile. We describe two tandem repeat loci, designated 'TR6' and 'TR10', which display extensive sequence variation that may be useful for sequence-based strain typing. Based on an investigation of 154 C. difficile isolates comprising 75 ribotypes, tandem repeat sequencing demonstrated excellent concordance with widely used PCR ribotyping and equal discriminatory power. Moreover, tandem repeat sequences enabled the reconstruction of the isolates' largely clonal population structure and evolutionary history. Conclusion We conclude that sequence analysis of the two repetitive loci introduced here may be highly useful for routine typing of C. difficile. Tandem repeat sequence typing resolves phylogenetic diversity to a level equivalent to PCR ribotypes. DNA sequences may be stored in databases accessible over the internet, obviating the need for the exchange of reference strains. PMID:19133124
Spontaneous CRISPR loci generation in vivo by non-canonical spacer integration
Nivala, Jeff; Shipman, Seth L.; Church, George M.
2018-01-01
The adaptation phase of CRISPR-Cas immunity depends on the precise integration of short segments of foreign DNA (spacers) into a specific genomic location within the CRISPR locus by the Cas1-Cas2 integration complex. Although off-target spacer integration outside of canonical CRISPR arrays has been described in vitro, no evidence of non-specific integration activity has been found in vivo. Here, we show that non-canonical off-target integrations can occur within bacterial chromosomes at locations that resemble the native CRISPR locus by characterizing hundreds of off-target integration locations within Escherichia coli. Considering whether such promiscuous Cas1-Cas2 activity could have an evolutionary role through the genesis of neo-CRISPR loci, we combed existing CRISPR databases and available genomes for evidence of off-target integration activity. This search uncovered several putative instances of naturally occurring off-target spacer integration events within the genomes of Yersinia pestis and Sulfolobus islandicus. These results are important in understanding alternative routes to CRISPR array genesis and evolution, as well as in the use of spacer acquisition in technological applications. PMID:29379209
Molecular and Clinical Studies of X-linked Deafness Among Pakistani Families
Waryah, Ali M.; Ahmed, Zubair M.; Choo, Daniel I.; Sisk, Robert A.; Binder, Munir A.; Shahzad, Mohsin; Khan, Shaheen N.; Friedman, Thomas B.; Riazuddin, Sheikh; Riazuddin, Saima
2011-01-01
There are 68 sex-linked syndromes that include hearing loss as one feature and five sex-linked nonsyndromic deafness loci listed in the OMIM database. The possibility of additional such sex-linked loci was explored by ascertaining three unrelated Pakistani families (PKDF536, PKDF1132, PKDF740) segregating X-linked recessive deafness. Sequence analysis of POU3F4 (DFN3) in affected members of families PKDF536 and PKDF1132 revealed two novel nonsense mutations, p.Q136X and p.W114X, respectively. Family PKDF740 is segregating congenital blindness, mild to profound progressive hearing loss that is characteristic of Norrie disease (MIM#310600). Sequence analysis of NDP among affected members of this family revealed a novel single nucleotide deletion c.49delG causing a frameshift and premature truncation (p.V17fsX1) of the encoded protein. These mutations were not found in 150 normal DNA samples. Identification of pathogenic alleles causing X-linked recessive deafness will improve molecular diagnosis, genetic counseling, and molecular epidemiology of hearing loss among Pakistanis. PMID:21633365
University Faculty Use of Computerized Databases: An Assessment of Needs and Resources.
ERIC Educational Resources Information Center
Borgman, Christine L.; And Others
1985-01-01
Results of survey indicate that: academic faculty are unaware of range of databases available; few recognize need for databases in research; most delegate searching to librarian or assistant, rather than perform searching themselves; and 39 database guides identified tended to be descriptive rather than evaluative. A comparison of the guides is…
University Real Estate Development Database: A Database-Driven Internet Research Tool
ERIC Educational Resources Information Center
Wiewel, Wim; Kunst, Kara
2008-01-01
The University Real Estate Development Database is an Internet resource developed by the University of Baltimore for the Lincoln Institute of Land Policy, containing over six hundred cases of university expansion outside of traditional campus boundaries. The University Real Estate Development database is a searchable collection of real estate…
Page, Robert B; Monaghan, James R; Samuels, Amy K; Smith, Jeramiah J; Beachy, Christopher K; Voss, S Randal
2007-02-01
Ambystomatid salamanders offer several advantages for endocrine disruption research, including genomic and bioinformatics resources, an accessible laboratory model (Ambystoma mexicanum), and natural lineages that are broadly distributed among North American habitats. We used microarray analysis to measure the relative abundance of transcripts isolated from A. mexicanum epidermis (skin) after exogenous application of thyroid hormone (TH). Only one gene had a >2-fold change in transcript abundance after 2 days of TH treatment. However, hundreds of genes showed significantly different transcript levels at days 12 and 28 in comparison to day 0. A list of 123 TH-responsive genes was identified using statistical, BLAST, and fold level criteria. Cluster analysis identified two groups of genes with similar transcription patterns: up-regulated versus down-regulated. Most notably, several keratins exhibited dramatic (1000 fold) increases or decreases in transcript abundance. Keratin gene expression changes coincided with morphological remodeling of epithelial tissues. This suggests that keratin loci can be developed as sensitive biomarkers to assay temporal disruptions of larval-to-adult gene expression programs. Our study has identified the first collection of loci that are regulated during TH-induced metamorphosis in a salamander, thus setting the stage for future investigations of TH disruption in the Mexican axolotl and other salamanders of the genus Ambystoma.
Mariette, Stéphanie; Wong Jun Tai, Fabienne; Roch, Guillaume; Barre, Aurélien; Chague, Aurélie; Decroocq, Stéphane; Groppi, Alexis; Laizet, Yec'han; Lambert, Patrick; Tricon, David; Nikolski, Macha; Audergon, Jean-Marc; Abbott, Albert G; Decroocq, Véronique
2016-01-01
In fruit tree species, many important traits have been characterized genetically by using single-family descent mapping in progenies segregating for the traits. However, most mapped loci have not been sufficiently resolved to the individual genes due to insufficient progeny sizes for high resolution mapping and the previous lack of whole-genome sequence resources of the study species. To address this problem for Plum Pox Virus (PPV) candidate resistance gene identification in Prunus species, we implemented a genome-wide association (GWA) approach in apricot. This study exploited the broad genetic diversity of the apricot (Prunus armeniaca) germplasm containing resistance to PPV, next-generation sequence-based genotyping, and the high-quality peach (Prunus persica) genome reference sequence for single nucleotide polymorphism (SNP) identification. The results of this GWA study validated previously reported PPV resistance quantitative trait loci (QTL) intervals, highlighted other potential resistance loci, and resolved each to a limited set of candidate genes for further study. This work substantiates the association genetics approach for resolution of QTL to candidate genes in apricot and suggests that this approach could simplify identification of other candidate genes for other marked trait intervals in this germplasm. © 2015 INRA, UMR 1332 BFP New Phytologist © 2015 New Phytologist Trust.
Genome-wide association study of rice grain width variation.
Zheng, Xiao-Ming; Gong, Tingting; Ou, Hong-Ling; Xue, Dayuan; Qiao, Weihua; Wang, Junrui; Liu, Sha; Yang, Qingwen; Olsen, Kenneth M
2018-04-01
Seed size is variable within many plant species, and understanding the underlying genetic factors can provide insights into mechanisms of local environmental adaptation. Here we make use of the abundant genomic and germplasm resources available for rice (Oryza sativa) to perform a large-scale genome-wide association study (GWAS) of grain width. Grain width varies widely within the crop and is also known to show climate-associated variation across populations of its wild progenitor. Using a filtered dataset of >1.9 million genome-wide SNPs in a sample of 570 cultivated and wild rice accessions, we performed GWAS with two complementary models, GLM and MLM. The models yielded 10 and 33 significant associations, respectively, and jointly yielded seven candidate locus regions, two of which have been previously identified. Analyses of nucleotide diversity and haplotype distributions at these loci revealed signatures of selection and patterns consistent with adaptive introgression of grain width alleles across rice variety groups. The results provide a 50% increase in the total number of rice grain width loci mapped to date and support a polygenic model whereby grain width is shaped by gene-by-environment interactions. These loci can potentially serve as candidates for studies of adaptive seed size variation in wild grass species.
Sserwadda, Ivan; Amujal, Marion; Namatovu, Norah
2018-01-01
HIV/AIDS, tuberculosis (TB), and malaria are 3 major global public health threats that undermine development in many resource-poor settings. Recently, the notion that positive selection during epidemics or longer periods of exposure to common infectious diseases may have had a major effect in modifying the constitution of the human genome is being interrogated at a large scale in many populations around the world. This positive selection from infectious diseases increases power to detect associations in genome-wide association studies (GWASs). High-throughput sequencing (HTS) has transformed both the management of infectious diseases and continues to enable large-scale functional characterization of host resistance/susceptibility alleles and loci; a paradigm shift from single candidate gene studies. Application of genome sequencing technologies and genomics has enabled us to interrogate the host-pathogen interface for improving human health. Human populations are constantly locked in evolutionary arms races with pathogens; therefore, identification of common infectious disease-associated genomic variants/markers is important in therapeutic, vaccine development, and screening susceptible individuals in a population. This review describes a range of host-pathogen genomic loci that have been associated with disease susceptibility and resistant patterns in the era of HTS. We further highlight potential opportunities for these genetic markers. PMID:29755620
Linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets.
Edge, Michael D; Algee-Hewitt, Bridget F B; Pemberton, Trevor J; Li, Jun Z; Rosenberg, Noah A
2017-05-30
Combining genotypes across datasets is central in facilitating advances in genetics. Data aggregation efforts often face the challenge of record matching-the identification of dataset entries that represent the same individual. We show that records can be matched across genotype datasets that have no shared markers based on linkage disequilibrium between loci appearing in different datasets. Using two datasets for the same 872 people-one with 642,563 genome-wide SNPs and the other with 13 short tandem repeats (STRs) used in forensic applications-we find that 90-98% of forensic STR records can be connected to corresponding SNP records and vice versa. Accuracy increases to 99-100% when ∼30 STRs are used. Our method expands the potential of data aggregation, but it also suggests privacy risks intrinsic in maintenance of databases containing even small numbers of markers-including databases of forensic significance.
Brohée, Sylvain; Barriot, Roland; Moreau, Yves
2010-09-01
In recent years, the number of knowledge bases developed using Wiki technology has exploded. Unfortunately, next to their numerous advantages, classical Wikis present a critical limitation: the invaluable knowledge they gather is represented as free text, which hinders their computational exploitation. This is in sharp contrast with the current practice for biological databases where the data is made available in a structured way. Here, we present WikiOpener an extension for the classical MediaWiki engine that augments Wiki pages by allowing on-the-fly querying and formatting resources external to the Wiki. Those resources may provide data extracted from databases or DAS tracks, or even results returned by local or remote bioinformatics analysis tools. This also implies that structured data can be edited via dedicated forms. Hence, this generic resource combines the structure of biological databases with the flexibility of collaborative Wikis. The source code and its documentation are freely available on the MediaWiki website: http://www.mediawiki.org/wiki/Extension:WikiOpener.
An international aerospace information system: A cooperative opportunity
NASA Technical Reports Server (NTRS)
Cotter, Gladys A.; Blados, Walter R.
1992-01-01
Scientific and technical information (STI) is a valuable resource which represents the results of large investments in research and development (R&D), and the expertise of a nation. NASA and its predecessor organizations have developed and managed the preeminent aerospace information system. We see information and information systems changing and becoming more international in scope. In Europe, consistent with joint R&D programs and a view toward a united Europe, we have seen the emergence of a European Aerospace Database concept. In addition, the development of aeronautics and astronautics in individual nations have also lead to initiatives for national aerospace databases. Considering recent technological developments in information science and technology, as well as the reality of scarce resources in all nations, it is time to reconsider the mutually beneficial possibilities offered by cooperation and international resource sharing. The new possibilities offered through cooperation among the various aerospace database efforts toward an international aerospace database initiative which can optimize the cost/benefit equation for all participants are considered.
Relax with CouchDB - Into the non-relational DBMS era of Bioinformatics
Manyam, Ganiraju; Payton, Michelle A.; Roth, Jack A.; Abruzzo, Lynne V.; Coombes, Kevin R.
2012-01-01
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. PMID:22609849
REFOLDdb: a new and sustainable gateway to experimental protocols for protein refolding.
Mizutani, Hisashi; Sugawara, Hideaki; Buckle, Ashley M; Sangawa, Takeshi; Miyazono, Ken-Ichi; Ohtsuka, Jun; Nagata, Koji; Shojima, Tomoki; Nosaki, Shohei; Xu, Yuqun; Wang, Delong; Hu, Xiao; Tanokura, Masaru; Yura, Kei
2017-04-24
More than 7000 papers related to "protein refolding" have been published to date, with approximately 300 reports each year during the last decade. Whilst some of these papers provide experimental protocols for protein refolding, a survey in the structural life science communities showed a necessity for a comprehensive database for refolding techniques. We therefore have developed a new resource - "REFOLDdb" that collects refolding techniques into a single, searchable repository to help researchers develop refolding protocols for proteins of interest. We based our resource on the existing REFOLD database, which has not been updated since 2009. We redesigned the data format to be more concise, allowing consistent representations among data entries compared with the original REFOLD database. The remodeled data architecture enhances the search efficiency and improves the sustainability of the database. After an exhaustive literature search we added experimental refolding protocols from reports published 2009 to early 2017. In addition to this new data, we fully converted and integrated existing REFOLD data into our new resource. REFOLDdb contains 1877 entries as of March 17 th , 2017, and is freely available at http://p4d-info.nig.ac.jp/refolddb/ . REFOLDdb is a unique database for the life sciences research community, providing annotated information for designing new refolding protocols and customizing existing methodologies. We envisage that this resource will find wide utility across broad disciplines that rely on the production of pure, active, recombinant proteins. Furthermore, the database also provides a useful overview of the recent trends and statistics in refolding technology development.
Harb, Omar S; Roos, David S
2015-01-01
Over the past 20 years, advances in high-throughput biological techniques and the availability of computational resources including fast Internet access have resulted in an explosion of large genome-scale data sets "big data." While such data are readily available for download and personal use and analysis from a variety of repositories, often such analysis requires access to seldom-available computational skills. As a result a number of databases have emerged to provide scientists with online tools enabling the interrogation of data without the need for sophisticated computational skills beyond basic knowledge of Internet browser utility. This chapter focuses on the Eukaryotic Pathogen Databases (EuPathDB: http://eupathdb.org) Bioinformatic Resource Center (BRC) and illustrates some of the available tools and methods.
Schlautman, Brandon; Covarrubias-Pazaran, Giovanny; Diaz-Garcia, Luis; Iorizzo, Massimo; Polashock, James; Grygleski, Edward; Vorsa, Nicholi; Zalapa, Juan
2017-01-01
The American cranberry (Vaccinium macrocarpon Ait.) is a recently domesticated, economically important, fruit crop with limited molecular resources. New genetic resources could accelerate genetic gain in cranberry through characterization of its genomic structure and by enabling molecular-assisted breeding strategies. To increase the availability of cranberry genomic resources, genotyping-by-sequencing (GBS) was used to discover and genotype thousands of single nucleotide polymorphisms (SNPs) within three interrelated cranberry full-sib populations. Additional simple sequence repeat (SSR) loci were added to the SNP datasets and used to construct bin maps for the parents of the populations, which were then merged to create the first high-density cranberry composite map containing 6073 markers (5437 SNPs and 636 SSRs) on 12 linkage groups (LGs) spanning 1124 cM. Interestingly, higher rates of recombination were observed in maternal than paternal gametes. The large number of markers in common (mean of 57.3) and the high degree of observed collinearity (mean Pair-wise Spearman rank correlations >0.99) between the LGs of the parental maps demonstrates the utility of GBS in cranberry for identifying polymorphic SNP loci that are transferable between pedigrees and populations in future trait-association studies. Furthermore, the high-density of markers anchored within the component maps allowed identification of segregation distortion regions, placement of centromeres on each of the 12 LGs, and anchoring of genomic scaffolds. Collectively, the results represent an important contribution to the current understanding of cranberry genomic structure and to the availability of molecular tools for future genetic research and breeding efforts in cranberry. PMID:28250016
Rosset, Saharon; Aharoni, Ehud; Neuvirth, Hani
2014-07-01
Issues of publication bias, lack of replicability, and false discovery have long plagued the genetics community. Proper utilization of public and shared data resources presents an opportunity to ameliorate these problems. We present an approach to public database management that we term Quality Preserving Database (QPD). It enables perpetual use of the database for testing statistical hypotheses while controlling false discovery and avoiding publication bias on the one hand, and maintaining testing power on the other hand. We demonstrate it on a use case of a replication server for GWAS findings, underlining its practical utility. We argue that a shift to using QPD in managing current and future biological databases will significantly enhance the community's ability to make efficient and statistically sound use of the available data resources. © 2014 WILEY PERIODICALS, INC.
Rahman, Muhammad H; Rajora, Om P
2002-12-01
Accurate identification of Populus clones and cultivars is essential for effective selection, breeding, and genetic resource management programs. The unit of cultivation and breeding in poplars is a clone, and individual cultivars are normally represented by a single clone. Microsatellite DNA markers of 10 simple sequence repeat loci were used for genetic fingerprinting and differentiation of 96 clones/cultivars and varieties belonging to six Populus species (P. deltoides, P. nigra, P. balsamifera, P. trichocarpa, P. grandidentata, and P maximowiczii) from three sections of the genus. All 96 clones/cultivars could be uniquely fingerprinted based on their single- or multilocus microsatellite genotypes. The five P. grandidentata clones could be differentiated based on their single-locus genotypes, while six clones of P. trichocarpa and 11 clones of P. maximowiczii could be identified by their two-locus genotypes. Twenty clones of P. deltoides and 25 clones of P. nigra could be differentiated by their multilocus genotypes employing three loci, and 29 clones of P. balsamifera required the use of multilocus genotypes at five loci for their genetic fingerprinting and differentiation. The loci PTR3, PTR5, and PTR7 were found to be the most informative for genetic fingerprinting and differentiation of the clones. The mean number of alleles per locus ranged from 2.9 in P. trichocarpa or P. grandidentata to 6.0 in P. balsamifera and 11.2 in 96 clones of the six species. The mean number of observed genotypes per locus ranged from 2.4 in P. grandidentata to 7.4 in P. balsamifera and 19.6 in 96 clones of the six species. The mean number of unique genotypes per locus ranged from 1.3 in P. grandidentata to 3.9 in P. deltoides and 8.8 in 96 clones of the six species. The power of discrimination of the microsatellite DNA markers in the 96 clones ranged from 0.726 for PTR4 to 0.939 for PTR7, with a mean of 0.832 over the 10 simple sequence repeat loci. Clones/cultivars from the same species showed higher microsatellite DNA similarities than the clones from the different species. A UPGMA cluster plot constructed from the microsatellite genotypic similarities separated the 96 clones into six major groups corresponding to their species. Populus nigra var. italica clones were genetically differentiated from the P. nigra var. nigra clones. Microsatellite DNA markers could be useful in genetic fingerprinting, identification, classification, certification, and registration of clones, clultivars, and varieties as well as genetic resource management and protection of plant breeders' rights in Populus.
Database of significant deposits of gold, silver, copper, lead, and zinc in the United States
Long, Keith R.; DeYoung,, John H.; Ludington, Stephen
1998-01-01
It has long been recognized that the largest mineral deposits contain most of the known mineral endowment (Singer and DeYoung, 1980). Sometimes called giant or world-class deposits, these largest deposits account for a very large share of historic and current mineral production and resources in industrial society (Singer, 1995). For example, Singer (1995) shows that the largest 10 percent of the world’s gold deposits contain 86 percent of the gold discovered to date. Many mineral resource issues and investigations are more easily addressed if limited to the relatively small number of deposits that contain most of the known mineral resources. An estimate of known resources using just these deposits would normally be sufficient, because considering smaller deposits would not add significantly to the total estimate. Land-use planning should treat mainly with these deposits due to their relative scarcity, the large share of known resources they contain, and the fact that economies of scale allow minerals to be produced much more cheaply from larger deposits. Investigation of environmental and other hazards that result from mining operations can be limited to these largest deposits because they account for most of past and current production.The National Mineral Resource Assessment project of the U.S. Geological Survey (USGS) has compiled a database on the largest known deposits of gold, silver, copper, lead, and zinc in the United States to complement the 1996 national assessment of undiscovered deposits of these same metals (Ludington and Cox, 1996). The deposits in this database account for approximately 99 percent of domestic production of these metals and probably a similar share of identified resources. These data may be compared with results of the assessment of undiscovered resources to characterize the nation’s total mineral endowment for these metals. This database is a starting point for any national or regional mineral-resource or mineral-environmental investigation.
Demirci, F. Yesim; Wang, Xingbin; Kelly, Jennifer A.; Morris, David L.; Barmada, M. Michael; Feingold, Eleanor; Kao, Amy H.; Sivils, Kathy L.; Bernatsky, Sasha; Pineau, Christian; Clarke, Ann; Ramsey-Goldman, Rosalind; Vyse, Timothy J.; Gaffney, Patrick M.; Manzi, Susan; Kamboh, M. Ilyas
2016-01-01
Objective Genome-wide association studies (GWASs) in individuals of European ancestry identified a number of systemic lupus erythematosus (SLE) susceptibility loci using earlier versions of high-density genotyping platforms. Follow-up studies on suggestive GWAS regions using larger samples and more markers identified additional SLE loci in European-descent subjects. Here we report the results of a multi-stage study that we performed to identify novel SLE loci. Methods In Stage 1, we conducted a new GWAS of SLE in a North American case-control sample of European ancestry (n=1,166) genotyped on Affymetrix Genome-Wide Human SNP Array 6.0. In Stage 2, we further investigated top new suggestive GWAS hits by in silico evaluation and meta-analysis using an additional dataset of European-descent subjects (>2,500 individuals), followed by replication of top meta-analysis findings in another dataset of European-descent subjects (>10,000 individuals) in Stage 3. Results As expected, our GWAS revealed most significant associations at the major histocompatibility complex locus (6p21), which easily surpassed genome-wide significance threshold (P<5×10−8). Several other SLE signals/loci previously implicated in Caucasians and/or Asians were also supported in Stage 1 discovery sample and strongest signals were observed at 2q32/STAT4 (P=3.6×10−7) and at 8p23/BLK (P=8.1×10−6). Stage 2 meta-analyses identified a new genome-wide significant SLE locus at 12q12 (meta P=3.1×10−8), which was replicated in Stage 3. Conclusion Our multi-stage study identified and replicated a new SLE locus that warrants further follow-up in additional studies. Publicly available databases suggest that this new SLE signal falls within a functionally relevant genomic region and near biologically important genes. PMID:26316170
Messina, Francesco; Finocchio, Andrea; Akar, Nejat; Loutradis, Aphrodite; Michalodimitrakis, Emmanuel I; Brdicka, Radim; Jodice, Carla; Novelletto, Andrea
2018-02-01
Tetranucleotide Short Tandem Repeats (STRs) for human identification and common use in forensic cases have recently been used to address the population genetics of the North-Eastern Mediterranean area. However, to gain confidence in the inferences made using STRs, this kind of analysis should be challenged with changes in three main aspects of the data, i.e. the sizes of the samples, their distance across space and the genetic background from which they are drawn. To test the resilience of the gradients previously detected in the North-Eastern Mediterranean to the enlargement of the surveyed area and population set, using revised data. STR genotype profiles were obtained from a publicly available database (PopAffilietor databank) and a dataset was assembled including >7000 subjects from the Arabian Peninsula to Scandinavia, genotyped at eight loci. Spatial principal component analysis (sPCA) was applied and the frequency maps of the nine alleles which contributed most strongly to sPC1 were examined in detail. By far the greatest part of diversity was summarised by a single spatial principal component (sPC1), oriented along a SouthEast-to-NorthWest axis. The alleles with the top 5% squared loadings were TH01(9.3), D19S433(14), TH01(6), D19S433(15.2), FGA(20), FGA(24), D3S1358(14), FGA(21) and D2S1338(19). These results confirm a clinal pattern over the whole range for at least four loci (TH01, D19S433, FGA, D3S1358). Four of the eight STR loci (or even alleles) considered here can reproducibly capture continental arrangements of diversity. This would, in principle, allow for the exploitation of forensic data to clarify important aspects in the formation of local gene pools.
Ryynänen, Heikki J; Primmer, Craig R
2006-01-01
Background Single nucleotide polymorphisms (SNPs) represent the most abundant type of DNA variation in the vertebrate genome, and their applications as genetic markers in numerous studies of molecular ecology and conservation of natural populations are emerging. Recent large-scale sequencing projects in several fish species have provided a vast amount of data in public databases, which can be utilized in novel SNP discovery in salmonids. However, the suggested duplicated nature of the salmonid genome may hamper SNP characterization if the primers designed in conserved gene regions amplify multiple loci. Results Here we introduce a new intron-primed exon-crossing (IPEC) method in an attempt to overcome this duplication problem, and also evaluate different priming methods for SNP discovery in Atlantic salmon (Salmo salar) and other salmonids. A total of 69 loci with differing priming strategies were screened in S. salar, and 27 of these produced ~13 kb of high-quality sequence data consisting of 19 SNPs or indels (one per 680 bp). The SNP frequency and the overall nucleotide diversity (3.99 × 10-4) in S. salar was lower than reported in a majority of other organisms, which may suggest a relative young population history for Atlantic salmon. A subset of primers used in cross-species analyses revealed considerable variation in the SNP frequencies and nucleotide diversities in other salmonids. Conclusion Sequencing success was significantly higher with the new IPEC primers; thus the total number of loci to screen in order to identify one potential polymorphic site was six times less with this new strategy. Given that duplication may hamper SNP discovery in some species, the IPEC method reported here is an alternative way of identifying novel polymorphisms in such cases. PMID:16872523
Pert, Petina L; Ens, Emilie J; Locke, John; Clarke, Philip A; Packer, Joanne M; Turpin, Gerry
2015-11-15
With growing international calls for the enhanced involvement of Indigenous peoples and their biocultural knowledge in managing conservation and the sustainable use of physical environment, it is timely to review the available literature and develop cross-cultural approaches to the management of biocultural resources. Online spatial databases are becoming common tools for educating land managers about Indigenous Biocultural Knowledge (IBK), specifically to raise a broad awareness of issues, identify knowledge gaps and opportunities, and to promote collaboration. Here we describe a novel approach to the application of internet and spatial analysis tools that provide an overview of publically available documented Australian IBK (AIBK) and outline the processes used to develop the online resource. By funding an AIBK working group, the Australian Centre for Ecological Analysis and Synthesis (ACEAS) provided a unique opportunity to bring together cross-cultural, cross-disciplinary and trans-organizational contributors who developed these resources. Without such an intentionally collaborative process, this unique tool would not have been developed. The tool developed through this process is derived from a spatial and temporal literature review, case studies and a compilation of methods, as well as other relevant AIBK papers. The online resource illustrates the depth and breadth of documented IBK and identifies opportunities for further work, partnerships and investment for the benefit of not only Indigenous Australians, but all Australians. The database currently includes links to over 1500 publically available IBK documents, of which 568 are geo-referenced and were mapped. It is anticipated that as awareness of the online resource grows, more documents will be provided through the website to build the database. It is envisaged that this will become a well-used tool, integral to future natural and cultural resource management and maintenance. Copyright © 2015. Published by Elsevier B.V.
Second-Tier Database for Ecosystem Focus, 2003-2004 Annual Report.
DOE Office of Scientific and Technical Information (OSTI.GOV)
University of Washington, Columbia Basin Research, DART Project Staff,
2004-12-01
The Second-Tier Database for Ecosystem Focus (Contract 00004124) provides direct and timely public access to Columbia Basin environmental, operational, fishery and riverine data resources for federal, state, public and private entities essential to sound operational and resource management. The database also assists with juvenile and adult mainstem passage modeling supporting federal decisions affecting the operation of the FCRPS. The Second-Tier Database known as Data Access in Real Time (DART) integrates public data for effective access, consideration and application. DART also provides analysis tools and performance measures for evaluating the condition of Columbia Basin salmonid stocks. These services are critical tomore » BPA's implementation of its fish and wildlife responsibilities under the Endangered Species Act (ESA).« less
The National Nonindigenous Aquatic Species Database
Neilson, Matthew E.; Fuller, Pamela L.
2012-01-01
The U.S. Geological Survey (USGS) Nonindigenous Aquatic Species (NAS) Program maintains a database that monitors, records, and analyzes sightings of nonindigenous aquatic plant and animal species throughout the United States. The program is based at the USGS Wetland and Aquatic Research Center in Gainesville, Florida.The initiative to maintain scientific information on nationwide occurrences of nonindigenous aquatic species began with the Aquatic Nuisance Species Task Force, created by Congress in 1990 to provide timely information to natural resource managers. Since then, the NAS database has been a clearinghouse of information for confirmed sightings of nonindigenous, also known as nonnative, aquatic species throughout the Nation. The database is used to produce email alerts, maps, summary graphs, publications, and other information products to support natural resource managers.
Wang, Shur-Jen; Laulederkind, Stanley J F; Hayman, G Thomas; Petri, Victoria; Smith, Jennifer R; Tutaj, Marek; Nigam, Rajni; Dwinell, Melinda R; Shimoyama, Mary
2016-08-01
Cardiovascular diseases are complex diseases caused by a combination of genetic and environmental factors. To facilitate progress in complex disease research, the Rat Genome Database (RGD) provides the community with a disease portal where genome objects and biological data related to cardiovascular diseases are systematically organized. The purpose of this study is to present biocuration at RGD, including disease, genetic, and pathway data. The RGD curation team uses controlled vocabularies/ontologies to organize data curated from the published literature or imported from disease and pathway databases. These organized annotations are associated with genes, strains, and quantitative trait loci (QTLs), thus linking functional annotations to genome objects. Screen shots from the web pages are used to demonstrate the organization of annotations at RGD. The human cardiovascular disease genes identified by annotations were grouped according to data sources and their annotation profiles were compared by in-house tools and other enrichment tools available to the public. The analysis results show that the imported cardiovascular disease genes from ClinVar and OMIM are functionally different from the RGD manually curated genes in terms of pathway and Gene Ontology annotations. The inclusion of disease genes from other databases enriches the collection of disease genes not only in quantity but also in quality. Copyright © 2016 the American Physiological Society.
WormQTLHD—a web database for linking human disease to natural variation data in C. elegans
van der Velde, K. Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L. Basten; Kammenga, Jan E.; Jansen, Ritsert C.; Swertz, Morris A.; Li, Yang
2014-01-01
Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism—Caenorhabditis elegans—has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTLHD (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene–disease associations in man. WormQTLHD, available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene–disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench. PMID:24217915
WormQTLHD--a web database for linking human disease to natural variation data in C. elegans.
van der Velde, K Joeri; de Haan, Mark; Zych, Konrad; Arends, Danny; Snoek, L Basten; Kammenga, Jan E; Jansen, Ritsert C; Swertz, Morris A; Li, Yang
2014-01-01
Interactions between proteins are highly conserved across species. As a result, the molecular basis of multiple diseases affecting humans can be studied in model organisms that offer many alternative experimental opportunities. One such organism-Caenorhabditis elegans-has been used to produce much molecular quantitative genetics and systems biology data over the past decade. We present WormQTL(HD) (Human Disease), a database that quantitatively and systematically links expression Quantitative Trait Loci (eQTL) findings in C. elegans to gene-disease associations in man. WormQTL(HD), available online at http://www.wormqtl-hd.org, is a user-friendly set of tools to reveal functionally coherent, evolutionary conserved gene networks. These can be used to predict novel gene-to-gene associations and the functions of genes underlying the disease of interest. We created a new database that links C. elegans eQTL data sets to human diseases (34 337 gene-disease associations from OMIM, DGA, GWAS Central and NHGRI GWAS Catalogue) based on overlapping sets of orthologous genes associated to phenotypes in these two species. We utilized QTL results, high-throughput molecular phenotypes, classical phenotypes and genotype data covering different developmental stages and environments from WormQTL database. All software is available as open source, built on MOLGENIS and xQTL workbench.
A simple genetic architecture underlies morphological variation in dogs.
Boyko, Adam R; Quignon, Pascale; Li, Lin; Schoenebeck, Jeffrey J; Degenhardt, Jeremiah D; Lohmueller, Kirk E; Zhao, Keyan; Brisbin, Abra; Parker, Heidi G; vonHoldt, Bridgett M; Cargill, Michele; Auton, Adam; Reynolds, Andy; Elkahloun, Abdel G; Castelhano, Marta; Mosher, Dana S; Sutter, Nathan B; Johnson, Gary S; Novembre, John; Hubisz, Melissa J; Siepel, Adam; Wayne, Robert K; Bustamante, Carlos D; Ostrander, Elaine A
2010-08-10
Domestic dogs exhibit tremendous phenotypic diversity, including a greater variation in body size than any other terrestrial mammal. Here, we generate a high density map of canine genetic variation by genotyping 915 dogs from 80 domestic dog breeds, 83 wild canids, and 10 outbred African shelter dogs across 60,968 single-nucleotide polymorphisms (SNPs). Coupling this genomic resource with external measurements from breed standards and individuals as well as skeletal measurements from museum specimens, we identify 51 regions of the dog genome associated with phenotypic variation among breeds in 57 traits. The complex traits include average breed body size and external body dimensions and cranial, dental, and long bone shape and size with and without allometric scaling. In contrast to the results from association mapping of quantitative traits in humans and domesticated plants, we find that across dog breeds, a small number of quantitative trait loci (< or = 3) explain the majority of phenotypic variation for most of the traits we studied. In addition, many genomic regions show signatures of recent selection, with most of the highly differentiated regions being associated with breed-defining traits such as body size, coat characteristics, and ear floppiness. Our results demonstrate the efficacy of mapping multiple traits in the domestic dog using a database of genotyped individuals and highlight the important role human-directed selection has played in altering the genetic architecture of key traits in this important species.
A Simple Genetic Architecture Underlies Morphological Variation in Dogs
Schoenebeck, Jeffrey J.; Degenhardt, Jeremiah D.; Lohmueller, Kirk E.; Zhao, Keyan; Brisbin, Abra; Parker, Heidi G.; vonHoldt, Bridgett M.; Cargill, Michele; Auton, Adam; Reynolds, Andy; Elkahloun, Abdel G.; Castelhano, Marta; Mosher, Dana S.; Sutter, Nathan B.; Johnson, Gary S.; Novembre, John; Hubisz, Melissa J.; Siepel, Adam; Wayne, Robert K.; Bustamante, Carlos D.; Ostrander, Elaine A.
2010-01-01
Domestic dogs exhibit tremendous phenotypic diversity, including a greater variation in body size than any other terrestrial mammal. Here, we generate a high density map of canine genetic variation by genotyping 915 dogs from 80 domestic dog breeds, 83 wild canids, and 10 outbred African shelter dogs across 60,968 single-nucleotide polymorphisms (SNPs). Coupling this genomic resource with external measurements from breed standards and individuals as well as skeletal measurements from museum specimens, we identify 51 regions of the dog genome associated with phenotypic variation among breeds in 57 traits. The complex traits include average breed body size and external body dimensions and cranial, dental, and long bone shape and size with and without allometric scaling. In contrast to the results from association mapping of quantitative traits in humans and domesticated plants, we find that across dog breeds, a small number of quantitative trait loci (≤3) explain the majority of phenotypic variation for most of the traits we studied. In addition, many genomic regions show signatures of recent selection, with most of the highly differentiated regions being associated with breed-defining traits such as body size, coat characteristics, and ear floppiness. Our results demonstrate the efficacy of mapping multiple traits in the domestic dog using a database of genotyped individuals and highlight the important role human-directed selection has played in altering the genetic architecture of key traits in this important species. PMID:20711490
Geo-spatial Service and Application based on National E-government Network Platform and Cloud
NASA Astrophysics Data System (ADS)
Meng, X.; Deng, Y.; Li, H.; Yao, L.; Shi, J.
2014-04-01
With the acceleration of China's informatization process, our party and government take a substantive stride in advancing development and application of digital technology, which promotes the evolution of e-government and its informatization. Meanwhile, as a service mode based on innovative resources, cloud computing may connect huge pools together to provide a variety of IT services, and has become one relatively mature technical pattern with further studies and massive practical applications. Based on cloud computing technology and national e-government network platform, "National Natural Resources and Geospatial Database (NRGD)" project integrated and transformed natural resources and geospatial information dispersed in various sectors and regions, established logically unified and physically dispersed fundamental database and developed national integrated information database system supporting main e-government applications. Cross-sector e-government applications and services are realized to provide long-term, stable and standardized natural resources and geospatial fundamental information products and services for national egovernment and public users.
GIGGLE: a search engine for large-scale integrated genome analysis.
Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R
2018-02-01
GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.
GIGGLE: a search engine for large-scale integrated genome analysis
Layer, Ryan M; Pedersen, Brent S; DiSera, Tonya; Marth, Gabor T; Gertz, Jason; Quinlan, Aaron R
2018-01-01
GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation. PMID:29309061
A genomic scale map of genetic diversity in Trypanosoma cruzi
2012-01-01
Background Trypanosoma cruzi, the causal agent of Chagas Disease, affects more than 16 million people in Latin America. The clinical outcome of the disease results from a complex interplay between environmental factors and the genetic background of both the human host and the parasite. However, knowledge of the genetic diversity of the parasite, is currently limited to a number of highly studied loci. The availability of a number of genomes from different evolutionary lineages of T. cruzi provides an unprecedented opportunity to look at the genetic diversity of the parasite at a genomic scale. Results Using a bioinformatic strategy, we have clustered T. cruzi sequence data available in the public domain and obtained multiple sequence alignments in which one or two alleles from the reference CL-Brener were included. These data covers 4 major evolutionary lineages (DTUs): TcI, TcII, TcIII, and the hybrid TcVI. Using these set of alignments we have identified 288,957 high quality single nucleotide polymorphisms and 1,480 indels. In a reduced re-sequencing study we were able to validate ~ 97% of high-quality SNPs identified in 47 loci. Analysis of how these changes affect encoded protein products showed a 0.77 ratio of synonymous to non-synonymous changes in the T. cruzi genome. We observed 113 changes that introduce or remove a stop codon, some causing significant functional changes, and a number of tri-allelic and tetra-allelic SNPs that could be exploited in strain typing assays. Based on an analysis of the observed nucleotide diversity we show that the T. cruzi genome contains a core set of genes that are under apparent purifying selection. Interestingly, orthologs of known druggable targets show statistically significant lower nucleotide diversity values. Conclusions This study provides the first look at the genetic diversity of T. cruzi at a genomic scale. The analysis covers an estimated ~ 60% of the genetic diversity present in the population, providing an essential resource for future studies on the development of new drugs and diagnostics, for Chagas Disease. These data is available through the TcSNP database (http://snps.tcruzi.org). PMID:23270511
ERIC Educational Resources Information Center
Irwin, Gretchen; Wessel, Lark; Blackman, Harvey
2012-01-01
This case describes a database redesign project for the United States Department of Agriculture's National Animal Germplasm Program (NAGP). The case provides a valuable context for teaching and practicing database analysis, design, and implementation skills, and can be used as the basis for a semester-long team project. The case demonstrates the…
An Integrated Korean Biodiversity and Genetic Information Retrieval System
Lim, Jeongheui; Bhak, Jong; Oh, Hee-Mock; Kim, Chang-Bae; Park, Yong-Ha; Paek, Woon Kee
2008-01-01
Background On-line biodiversity information databases are growing quickly and being integrated into general bioinformatics systems due to the advances of fast gene sequencing technologies and the Internet. These can reduce the cost and effort of performing biodiversity surveys and genetic searches, which allows scientists to spend more time researching and less time collecting and maintaining data. This will cause an increased rate of knowledge build-up and improve conservations. The biodiversity databases in Korea have been scattered among several institutes and local natural history museums with incompatible data types. Therefore, a comprehensive database and a nation wide web portal for biodiversity information is necessary in order to integrate diverse information resources, including molecular and genomic databases. Results The Korean Natural History Research Information System (NARIS) was built and serviced as the central biodiversity information system to collect and integrate the biodiversity data of various institutes and natural history museums in Korea. This database aims to be an integrated resource that contains additional biological information, such as genome sequences and molecular level diversity. Currently, twelve institutes and museums in Korea are integrated by the DiGIR (Distributed Generic Information Retrieval) protocol, with Darwin Core2.0 format as its metadata standard for data exchange. Data quality control and statistical analysis functions have been implemented. In particular, integrating molecular and genetic information from the National Center for Biotechnology Information (NCBI) databases with NARIS was recently accomplished. NARIS can also be extended to accommodate other institutes abroad, and the whole system can be exported to establish local biodiversity management servers. Conclusion A Korean data portal, NARIS, has been developed to efficiently manage and utilize biodiversity data, which includes genetic resources. NARIS aims to be integral in maximizing bio-resource utilization for conservation, management, research, education, industrial applications, and integration with other bioinformation data resources. It can be found at . PMID:19091024
Whetzel, Patricia L.; Grethe, Jeffrey S.; Banks, Davis E.; Martone, Maryann E.
2015-01-01
The NIDDK Information Network (dkNET; http://dknet.org) was launched to serve the needs of basic and clinical investigators in metabolic, digestive and kidney disease by facilitating access to research resources that advance the mission of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). By research resources, we mean the multitude of data, software tools, materials, services, projects and organizations available to researchers in the public domain. Most of these are accessed via web-accessible databases or web portals, each developed, designed and maintained by numerous different projects, organizations and individuals. While many of the large government funded databases, maintained by agencies such as European Bioinformatics Institute and the National Center for Biotechnology Information, are well known to researchers, many more that have been developed by and for the biomedical research community are unknown or underutilized. At least part of the problem is the nature of dynamic databases, which are considered part of the “hidden” web, that is, content that is not easily accessed by search engines. dkNET was created specifically to address the challenge of connecting researchers to research resources via these types of community databases and web portals. dkNET functions as a “search engine for data”, searching across millions of database records contained in hundreds of biomedical databases developed and maintained by independent projects around the world. A primary focus of dkNET are centers and projects specifically created to provide high quality data and resources to NIDDK researchers. Through the novel data ingest process used in dkNET, additional data sources can easily be incorporated, allowing it to scale with the growth of digital data and the needs of the dkNET community. Here, we provide an overview of the dkNET portal and its functions. We show how dkNET can be used to address a variety of use cases that involve searching for research resources. PMID:26393351
Godown, Justin; Thurm, Cary; Dodd, Debra A; Soslow, Jonathan H; Feingold, Brian; Smith, Andrew H; Mettler, Bret A; Thompson, Bryn; Hall, Matt
2017-12-01
Large clinical, research, and administrative databases are increasingly utilized to facilitate pediatric heart transplant (HTx) research. Linking databases has proven to be a robust strategy across multiple disciplines to expand the possible analyses that can be performed while leveraging the strengths of each dataset. We describe a unique linkage of the Scientific Registry of Transplant Recipients (SRTR) database and the Pediatric Health Information System (PHIS) administrative database to provide a platform to assess resource utilization in pediatric HTx. All pediatric patients (1999-2016) who underwent HTx at a hospital enrolled in the PHIS database were identified. A linkage was performed between the SRTR and PHIS databases in a stepwise approach using indirect identifiers. To determine the feasibility of using these linked data to assess resource utilization, total and post-HTx hospital costs were assessed. A total of 3188 unique transplants were identified as being present in both databases and amenable to linkage. Linkage of SRTR and PHIS data was successful in 3057 (95.9%) patients, of whom 2896 (90.8%) had complete cost data. Median total and post-HTx hospital costs were $518,906 (IQR $324,199-$889,738), and $334,490 (IQR $235,506-$498,803) respectively with significant differences based on patient demographics and clinical characteristics at HTx. Linkage of the SRTR and PHIS databases is feasible and provides an invaluable tool to assess resource utilization. Our analysis provides contemporary cost data for pediatric HTx from the largest US sample reported to date. It also provides a platform for expanded analyses in the pediatric HTx population. Copyright © 2017 Elsevier Inc. All rights reserved.
Riviere, Guillaume; Klopp, Christophe; Ibouniyamine, Nabihoudine; Huvet, Arnaud; Boudry, Pierre; Favrel, Pascal
2015-12-02
The Pacific oyster, Crassostrea gigas, is one of the most important aquaculture shellfish resources worldwide. Important efforts have been undertaken towards a better knowledge of its genome and transcriptome, which makes now C. gigas becoming a model organism among lophotrochozoans, the under-described sister clade of ecdysozoans within protostomes. These massive sequencing efforts offer the opportunity to assemble gene expression data and make such resource accessible and exploitable for the scientific community. Therefore, we undertook this assembly into an up-to-date publicly available transcriptome database: the GigaTON (Gigas TranscriptOme pipeliNe) database. We assembled 2204 million sequences obtained from 114 publicly available RNA-seq libraries that were realized using all embryo-larval development stages, adult organs, different environmental stressors including heavy metals, temperature, salinity and exposure to air, which were mostly performed as part of the Crassostrea gigas genome project. This data was analyzed in silico and resulted into 56621 newly assembled contigs that were deposited into a publicly available database, the GigaTON database. This database also provides powerful and user-friendly request tools to browse and retrieve information about annotation, expression level, UTRs, splice and polymorphism, and gene ontology associated to all the contigs into each, and between all libraries. The GigaTON database provides a convenient, potent and versatile interface to browse, retrieve, confront and compare massive transcriptomic information in an extensive range of conditions, tissues and developmental stages in Crassostrea gigas. To our knowledge, the GigaTON database constitutes the most extensive transcriptomic database to date in marine invertebrates, thereby a new reference transcriptome in the oyster, a highly valuable resource to physiologists and evolutionary biologists.
Significant deposits of gold, silver, copper, lead, and zinc in the United States
Long, K.R.; DeYoung, J.H.; Ludington, S.
2000-01-01
Approximately 99 percent of past production and remaining identified resources of gold, silver, copper, lead, and zinc in the United States are accounted for by deposits that originally contained at least 2 metric tonnes (t) gold, 85 t silver, 50,000 t copper, 30,000 t lead, or 50,000 t zinc. The U.S. Geological Survey, beginning with the 1996 National Mineral Resource Assessment, is systematically compiling data on these deposits, collectively known as 'significant' deposits. As of December 31, 1996, the significant deposits database contained 1,118 entries corresponding to individual deposits or mining districts. Maintaining, updating and analyzing a database of this size is much easier than managing the more than 100,000 records in the Mineral Resource Data System and Minerals Availability System/Minerals Industry Location System, yet the significant deposits database accounts for almost all past production and remaining identified resources of these metals in the United States. About 33 percent of gold, 22 percent of silver, 42 percent of copper, 39 percent of lead, and 46 percent of zinc are contained in or were produced from deposits discovered after World War II. Even within a database of significant deposits, a disproportionate share of past production and remaining resources is accounted for by a very small number of deposits. The largest 10 producers for each metal account for one third of the gold, 60 percent of the silver, 68 percent of the copper, 85 percent of the lead, and 75 percent of the zinc produced in the United States. The 10 largest deposits in terms of identified remaining resources of each of the five metals contain 43 percent of the gold, 56 percent of the silver, 48 percent of the copper, 94 percent of the lead, and 72 percent of the zinc. Identified resources in significant deposits for each metal are less than the mean estimates of resources in undiscovered deposits from the 1996 U.S. National Mineral Resource Assessment. Identified resources are roughly the same magnitude as cumulative past production. Assuming that roughly the same proportion of resources in undiscovered deposits will occur in significant deposits, a substantial number of significant deposits remain to be discovered.
Squires, R. Burke; Noronha, Jyothi; Hunt, Victoria; García‐Sastre, Adolfo; Macken, Catherine; Baumgarth, Nicole; Suarez, David; Pickett, Brett E.; Zhang, Yun; Larsen, Christopher N.; Ramsey, Alvin; Zhou, Liwei; Zaremba, Sam; Kumar, Sanjeev; Deitrich, Jon; Klem, Edward; Scheuermann, Richard H.
2012-01-01
Please cite this paper as: Squires et al. (2012) Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and Other Respiratory Viruses 6(6), 404–416. Background The recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics. Design The Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly‐accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user‐friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in‐protected ‘workbench’ spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature. Results To demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross‐protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks. Conclusions The IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics. PMID:22260278
Database Resources of the BIG Data Center in 2018.
2018-01-04
The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
The MAR databases: development and implementation of databases specific for marine metagenomics
Klemetsen, Terje; Raknes, Inge A; Fu, Juan; Agafonov, Alexander; Balasundaram, Sudhagar V; Tartari, Giacomo; Robertsen, Espen
2018-01-01
Abstract We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/. PMID:29106641
Integrating diverse databases into an unified analysis framework: a Galaxy approach
Blankenberg, Daniel; Coraor, Nathan; Von Kuster, Gregory; Taylor, James; Nekrutenko, Anton
2011-01-01
Recent technological advances have lead to the ability to generate large amounts of data for model and non-model organisms. Whereas, in the past, there have been a relatively small number of central repositories that serve genomic data, an increasing number of distinct specialized data repositories and resources have been established. Here, we describe a generic approach that provides for the integration of a diverse spectrum of data resources into a unified analysis framework, Galaxy (http://usegalaxy.org). This approach allows the simplified coupling of external data resources with the data analysis tools available to Galaxy users, while leveraging the native data mining facilities of the external data resources. Database URL: http://usegalaxy.org PMID:21531983
AOP-DB Frontend: A user interface for the Adverse Outcome Pathways Database.
The EPA Adverse Outcome Pathway Database (AOP-DB) is a database resource that aggregates association relationships between AOPs, genes, chemicals, diseases, pathways, species orthology information, ontologies. The AOP-DB frontend is a simple yet powerful AOP-DB user interface in...
AOP-DB Frontend: A user interface for the Adverse Outcome Pathways Database
The EPA Adverse Outcome Pathway Database (AOP-DB) is a database resource that aggregates association relationships between AOPs, genes, chemicals, diseases, pathways, species orthology information, ontologies. The AOP-DB frontend is a simple yet powerful user interface in the for...
ExPASy: SIB bioinformatics resource portal.
Artimo, Panu; Jonnalagedda, Manohar; Arnold, Konstantin; Baratin, Delphine; Csardi, Gabor; de Castro, Edouard; Duvaud, Séverine; Flegel, Volker; Fortier, Arnaud; Gasteiger, Elisabeth; Grosdidier, Aurélien; Hernandez, Céline; Ioannidis, Vassilios; Kuznetsov, Dmitry; Liechti, Robin; Moretti, Sébastien; Mostaguir, Khaled; Redaschi, Nicole; Rossier, Grégoire; Xenarios, Ioannis; Stockinger, Heinz
2012-07-01
ExPASy (http://www.expasy.org) has worldwide reputation as one of the main bioinformatics resources for proteomics. It has now evolved, becoming an extensible and integrative portal accessing many scientific resources, databases and software tools in different areas of life sciences. Scientists can henceforth access seamlessly a wide range of resources in many different domains, such as proteomics, genomics, phylogeny/evolution, systems biology, population genetics, transcriptomics, etc. The individual resources (databases, web-based and downloadable software tools) are hosted in a 'decentralized' way by different groups of the SIB Swiss Institute of Bioinformatics and partner institutions. Specifically, a single web portal provides a common entry point to a wide range of resources developed and operated by different SIB groups and external institutions. The portal features a search function across 'selected' resources. Additionally, the availability and usage of resources are monitored. The portal is aimed for both expert users and people who are not familiar with a specific domain in life sciences. The new web interface provides, in particular, visual guidance for newcomers to ExPASy.
An Index to PGE-Ni-Cr Deposits and Occurrences in Selected Mineral-Occurrence Databases
Causey, J. Douglas; Galloway, John P.; Zientek, Michael L.
2009-01-01
Databases of mineral deposits and occurrences are essential to conducting assessments of undiscovered mineral resources. In the USGS's (U.S. Geological Survey) global assessment of undiscovered resources of copper, potash, and the platinum-group elements (PGE), only a few mineral deposit types will be evaluated. For example, only porphyry-copper and sediment-hosted copper deposits will be considered for the copper assessment. To support the global assessment, the USGS prepared comprehensive compilations of the occurrences of these two deposit types in order to develop grade and tonnage models and delineate permissive areas for undiscovered deposits of those types. This publication identifies previously published databases and database records that describe PGE, nickel, and chromium deposits and occurrences. Nickel and chromium were included in this overview because of the close association of PGE with nickel and chromium mineralization. Users of this database will need to refer to the original databases for detailed information about the deposits and occurrences. This information will be used to develop a current and comprehensive global database of PGE deposits and occurrences.
Maccaferri, Marco; Zhang, Junli; Bulli, Peter; Abate, Zewdie; Chao, Shiaoman; Cantu, Dario; Bossolini, Eligio; Chen, Xianming; Pumphrey, Michael; Dubcovsky, Jorge
2015-01-01
New races of Puccinia striiformis f. sp. tritici (Pst), the causal pathogen of wheat stripe rust, show high virulence to previously deployed resistance genes and are responsible for large yield losses worldwide. To identify new sources of resistance we performed a genome-wide association study (GWAS) using a worldwide collection of 1000 spring wheat accessions. Adult plants were evaluated under field conditions in six environments in the western United States, and seedlings were tested with four Pst races. A single-nucleotide polymorphism (SNP) Infinium 9K-assay provided 4585 SNPs suitable for GWAS. High correlations among environments and high heritabilities were observed for stripe rust infection type and severity. Greater levels of Pst resistance were observed in a subpopulation from Southern Asia than in other groups. GWAS identified 97 loci that were significant for at least three environments, including 10 with an experiment-wise adjusted Bonferroni probability < 0.10. These 10 quantitative trait loci (QTL) explained 15% of the phenotypic variation in infection type, a percentage that increased to 45% when all QTL were considered. Three of these 10 QTL were mapped far from previously identified Pst resistance genes and QTL, and likely represent new resistance loci. The other seven QTL mapped close to known resistance genes and allelism tests will be required to test their relationships. In summary, this study provides an integrated view of stripe rust resistance resources in spring wheat and identifies new resistance loci that will be useful to diversify the current set of resistance genes deployed to control this devastating disease. PMID:25609748
Maccaferri, Marco; Zhang, Junli; Bulli, Peter; Abate, Zewdie; Chao, Shiaoman; Cantu, Dario; Bossolini, Eligio; Chen, Xianming; Pumphrey, Michael; Dubcovsky, Jorge
2015-01-20
New races of Puccinia striiformis f. sp. tritici (Pst), the causal pathogen of wheat stripe rust, show high virulence to previously deployed resistance genes and are responsible for large yield losses worldwide. To identify new sources of resistance we performed a genome-wide association study (GWAS) using a worldwide collection of 1000 spring wheat accessions. Adult plants were evaluated under field conditions in six environments in the western United States, and seedlings were tested with four Pst races. A single-nucleotide polymorphism (SNP) Infinium 9K-assay provided 4585 SNPs suitable for GWAS. High correlations among environments and high heritabilities were observed for stripe rust infection type and severity. Greater levels of Pst resistance were observed in a subpopulation from Southern Asia than in other groups. GWAS identified 97 loci that were significant for at least three environments, including 10 with an experiment-wise adjusted Bonferroni probability < 0.10. These 10 quantitative trait loci (QTL) explained 15% of the phenotypic variation in infection type, a percentage that increased to 45% when all QTL were considered. Three of these 10 QTL were mapped far from previously identified Pst resistance genes and QTL, and likely represent new resistance loci. The other seven QTL mapped close to known resistance genes and allelism tests will be required to test their relationships. In summary, this study provides an integrated view of stripe rust resistance resources in spring wheat and identifies new resistance loci that will be useful to diversify the current set of resistance genes deployed to control this devastating disease. Copyright © 2015 Maccaferri et al.
NREL: Renewable Resource Data Center - Solar Resource Models and Tools
Solar Resource Models and Tools The Renewable Resource Data Center (RReDC) features the following -supplied hourly average measured global horizontal data. NSRDB Data Viewer Visualize, explore, and download solar resource data from the National Solar Radiation Database. PVWatts® Calculator PVWattsÂ
SInCRe—structural interactome computational resource for Mycobacterium tuberculosis
Metri, Rahul; Hariharaputran, Sridhar; Ramakrishnan, Gayatri; Anand, Praveen; Raghavender, Upadhyayula S.; Ochoa-Montaño, Bernardo; Higueruelo, Alicia P.; Sowdhamini, Ramanathan; Chandra, Nagasuma R.; Blundell, Tom L.; Srinivasan, Narayanaswamy
2015-01-01
We have developed an integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein–protein and protein–small molecule interactions. SInCRe (Structural Interactome Computational Resource) is developed out of CamBan (Cambridge and Bangalore) collaboration. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics. In-house algorithms and databases developed independently by various academic groups in CamBan are used to generate Mtb-specific datasets and are integrated in this database to provide a structural dimension to studies on tuberculosis. The SInCRe database readily provides information on identification of functional domains, genome-scale modelling of structures of Mtb proteins and characterization of the small-molecule binding sites within Mtb. The resource also provides structure-based function annotation, information on small-molecule binders including FDA (Food and Drug Administration)-approved drugs, protein–protein interactions (PPIs) and natural compounds that bind to pathogen proteins potentially and result in weakening or elimination of host–pathogen protein–protein interactions. Together they provide prerequisites for identification of off-target binding. Database URL: http://proline.biochem.iisc.ernet.in/sincre PMID:26130660
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reisman, D.J.
A variety of issues must be addressed in development of software for information resources. One is accessibility and use of information. Another is that to properly design, abstract, index, and do quality control on a database requires the effort of well-trained and knowledgeable personnel as well as substantial financial resources. Transferring data to other locations has inherent difficulties, including those related to incompatibility. The main issue in developing health risk assessment databases is the needs of the user.
Using TEI for an Endangered Language Lexical Resource: The Nxa?amxcín Database-Dictionary Project
ERIC Educational Resources Information Center
Czaykowska-Higgins, Ewa; Holmes, Martin D.; Kell, Sarah M.
2014-01-01
This paper describes the evolution of a lexical resource project for Nxa?amxcín, an endangered Salish language, from the project's inception in the 1990s, based on legacy materials recorded in the 1960s and 1970s, to its current form as an online database that is transformable into various print and web-based formats for varying uses. We…
ERIC Educational Resources Information Center
Gruner, Richard; Heron, Carol E.
1984-01-01
Examines usefulness of DIALOG as legal research tool through use of DIALOG's DIALINDEX database to identify those databases among almost 200 available that contain large numbers of records related to federal securities regulation. Eight databases selected for further study are detailed. Twenty-six footnotes, database statistics, and samples are…
bioNerDS: exploring bioinformatics’ database and software use through literature mining
2013-01-01
Background Biology-focused databases and software define bioinformatics and their use is central to computational biology. In such a complex and dynamic field, it is of interest to understand what resources are available, which are used, how much they are used, and for what they are used. While scholarly literature surveys can provide some insights, large-scale computer-based approaches to identify mentions of bioinformatics databases and software from primary literature would automate systematic cataloguing, facilitate the monitoring of usage, and provide the foundations for the recovery of computational methods for analysing biological data, with the long-term aim of identifying best/common practice in different areas of biology. Results We have developed bioNerDS, a named entity recogniser for the recovery of bioinformatics databases and software from primary literature. We identify such entities with an F-measure ranging from 63% to 91% at the mention level and 63-78% at the document level, depending on corpus. Not attaining a higher F-measure is mostly due to high ambiguity in resource naming, which is compounded by the on-going introduction of new resources. To demonstrate the software, we applied bioNerDS to full-text articles from BMC Bioinformatics and Genome Biology. General mention patterns reflect the remit of these journals, highlighting BMC Bioinformatics’s emphasis on new tools and Genome Biology’s greater emphasis on data analysis. The data also illustrates some shifts in resource usage: for example, the past decade has seen R and the Gene Ontology join BLAST and GenBank as the main components in bioinformatics processing. Abstract Conclusions We demonstrate the feasibility of automatically identifying resource names on a large-scale from the scientific literature and show that the generated data can be used for exploration of bioinformatics database and software usage. For example, our results help to investigate the rate of change in resource usage and corroborate the suspicion that a vast majority of resources are created, but rarely (if ever) used thereafter. bioNerDS is available at http://bionerds.sourceforge.net/. PMID:23768135
NASA Astrophysics Data System (ADS)
East, J. A., II
2016-12-01
The U.S. Geological Survey's (USGS) Eastern Energy Resources Science Center (EERSC) has an ongoing project which has mapped coal chemistry and stratigraphy since 1977. Over the years, the USGS has collected various forms of coal data and archived that data into the National Coal Resources Data System (NCRDS) database. NCRDS is a repository that houses data from the major coal basins in the United States and includes information on location, seam thickness, coal rank, geologic age, geographic region, geologic province, coalfield, and characteristics of the coal or lithology for that data point. These data points can be linked to the US Coal Quality Database (COALQUAL) to include ultimate, proximate, major, minor and trace-element data. Although coal is an inexpensive energy provider, the United States has shifted away from coal usage recently and branched out into other forms of non-renewable and renewable energy because of environmental concerns. NCRDS's primary method of data capture has been USGS field work coupled with cooperative agreements with state geological agencies and universities doing coal-related research. These agreements are on competitive five-year cycles that have evolved into larger scope research efforts including solid fuel resources such as coal-bed methane, shale gas and oil. Recently these efforts have expanded to include environmental impacts of the use of fossil fuels, which has allowed the USGS to enter into agreements with states for the Geologic CO2 Storage Resources Assessment as required by the Energy Independence and Security Act. In 2016 they expanded into research areas to include geothermal, conventional and unconventional oil and gas. The NCRDS and COALQUAL databases are now online for the public to use, and are in the process of being updated to include new data for other energy resources. Along with this expansion of scope, the database name will change to the National Energy Resources Data System (NERDS) in FY 2017.
e-MIR2: a public online inventory of medical informatics resources.
de la Calle, Guillermo; García-Remesal, Miguel; Nkumu-Mbomio, Nelida; Kulikowski, Casimir; Maojo, Victor
2012-08-02
Over the past years, the number of available informatics resources in medicine has grown exponentially. While specific inventories of such resources have already begun to be developed for Bioinformatics (BI), comparable inventories are as yet not available for the Medical Informatics (MI) field, so that locating and accessing them currently remains a difficult and time-consuming task. We have created a repository of MI resources from the scientific literature, providing free access to its contents through a web-based service. We define informatics resources as all those elements that constitute, serve to define or are used by informatics systems, ranging from architectures or development methodologies to terminologies, vocabularies, databases or tools. Relevant information describing the resources is automatically extracted from manuscripts published in top-ranked MI journals. We used a pattern matching approach to detect the resources' names and their main features. Detected resources are classified according to three different criteria: functionality, resource type and domain. To facilitate these tasks, we have built three different classification schemas by following a novel approach based on folksonomies and social tagging. We adopted the terminology most frequently used by MI researchers in their publications to create the concepts and hierarchical relationships belonging to the classification schemas. The classification algorithm identifies the categories associated with resources and annotates them accordingly. The database is then populated with this data after manual curation and validation. We have created an online repository of MI resources to assist researchers in locating and accessing the most suitable resources to perform specific tasks. The database contains 609 resources at the time of writing and is available at http://www.gib.fi.upm.es/eMIR2. We are continuing to expand the number of available resources by taking into account further publications as well as suggestions from users and resource developers.
... Splign Vector Alignment Search Tool (VAST) All Data & Software Resources... Domains & Structures BioSystems Cn3D Conserved Domain Database (CDD) Conserved Domain Search Service (CD Search) Structure (Molecular Modeling Database) Vector Alignment ...
A Tactical Framework for Cyberspace Situational Awareness
2010-06-01
Command & Control 1. VOIP Telephone 2. Internet Chat 3. Web App ( TBMCS ) 4. Email 5. Web App (PEX) 6. Database (CAMS) 7. Database (ARMS) 8...Database (LogMod) 9. Resource (WWW) 10. Application (PFPS) Mission Planning 1. Application (PFPS) 2. Email 3. Web App ( TBMCS ) 4. Internet Chat...1. Web App (PEX) 2. Database (ARMS) 3. Web App ( TBMCS ) 4. Email 5. Database (CAMS) 6. VOIP Telephone 7. Application (PFPS) 8. Internet Chat 9
NCBI Epigenomics: what's new for 2013.
Fingerman, Ian M; Zhang, Xuan; Ratzat, Walter; Husain, Nora; Cohen, Robert F; Schuler, Gregory D
2013-01-01
The Epigenomics resource at the National Center for Biotechnology Information (NCBI) has been created to serve as a comprehensive public repository for whole-genome epigenetic data sets (www.ncbi.nlm.nih.gov/epigenomics). We have constructed this resource by selecting the subset of epigenetics-specific data from the Gene Expression Omnibus (GEO) database and then subjecting them to further review and annotation. Associated data tracks can be viewed using popular genome browsers or downloaded for local analysis. We have performed extensive user testing throughout the development of this resource, and new features and improvements are continuously being implemented based on the results. We have made substantial usability improvements to user interfaces, enhanced functionality, made identification of data tracks of interest easier and created new tools for preliminary data analyses. Additionally, we have made efforts to enhance the integration between the Epigenomics resource and other NCBI databases, including the Gene database and PubMed. Data holdings have also increased dramatically since the initial publication describing the NCBI Epigenomics resource and currently consist of >3700 viewable and downloadable data tracks from 955 biological sources encompassing five well-studied species. This updated manuscript highlights these changes and improvements.
NCBI Epigenomics: What’s new for 2013
Fingerman, Ian M.; Zhang, Xuan; Ratzat, Walter; Husain, Nora; Cohen, Robert F.; Schuler, Gregory D.
2013-01-01
The Epigenomics resource at the National Center for Biotechnology Information (NCBI) has been created to serve as a comprehensive public repository for whole-genome epigenetic data sets (www.ncbi.nlm.nih.gov/epigenomics). We have constructed this resource by selecting the subset of epigenetics-specific data from the Gene Expression Omnibus (GEO) database and then subjecting them to further review and annotation. Associated data tracks can be viewed using popular genome browsers or downloaded for local analysis. We have performed extensive user testing throughout the development of this resource, and new features and improvements are continuously being implemented based on the results. We have made substantial usability improvements to user interfaces, enhanced functionality, made identification of data tracks of interest easier and created new tools for preliminary data analyses. Additionally, we have made efforts to enhance the integration between the Epigenomics resource and other NCBI databases, including the Gene database and PubMed. Data holdings have also increased dramatically since the initial publication describing the NCBI Epigenomics resource and currently consist of >3700 viewable and downloadable data tracks from 955 biological sources encompassing five well-studied species. This updated manuscript highlights these changes and improvements. PMID:23193265
Chaoui, Imane; Zozio, Thierry; Lahlou, Ouafae; Sabouni, Radia; Abid, Mohammed; El Aouad, Rajae; Akrim, Mohammed; Amzazi, Said; Rastogi, Nalin; El Mzibri, Mohammed
2014-01-01
In the present study, Mycobacterium tuberculosis complex (MTBC) clinical isolates from culture-positive TB patients in Morocco were studied by spoligotyping and 12-loci MIRU-VNTR typing methods to characterize prevalent genotypes (n = 219 isolates from 208 patients). Spoligotyping resulted in 39 unique patterns and 167 strains in 30 clusters (2-50 strains per cluster). Comparison with international database showed that 29 of 39 unique patterns matched existing shared spoligotype international types (SITs). Nine shared types containing 10 strains were newly created (SIT 2891 to SIT 2899); this led to the description of 69 SITs with 206 strains and two orphan patterns. The most prevalent spoligotype was SIT42 (LAM; n = 50 or 24% of isolates). The repartition of strains according to major MTBC clades was as follows LAM (46.1%)> Haarlem (26%) >ill-defined T superfamily (22.6%) and S clade (0.96%). On the other hand, Beijing, CAS (Central Asian) and EAI (East-African Indian) strains were absent in this setting. Subsequent 12-Loci MIRU typing resulted in a total of 25 SIT/MIT clusters (n = 66 isolates, 2-6 isolates per cluster), with a resulting recent transmission rate of 22.3%. The MIRU-VNTR patterns corresponded to 69 MITs for 138 strains and 46 orphan patterns. The most frequent patterns were MIT43 (n = 8), MIT9 (n = 7) and MIT42 (n = 7). HGDI analysis of the 12 MIRU loci showed that loci 10, 23 and 40 were highly discriminative in our setting. The results also underlined the usefulness of spoligotyping and MIRU-VNTR to detect mixed infections among certain of our TB patients. Globally, the results obtained showed that TB is almost exclusively transmitted in Morocco through evolutionary-modern MTBC lineages belonging to principal genetic groups 2/3 strains (Haarlem, LAM, T), with a high level of biodiversity seen by MIRU typing. This study provides with a 1st global snapshot of MTBC population structure in Morocco, and validates the potential use of spoligotyping in conjunction with minisatellites for future investigations in Morocco that should in future ideally include optimized 15- or 24-loci MIRU-VNTRs. Copyright © 2013 Elsevier B.V. All rights reserved.
Brant, Steven R; Okou, David T; Simpson, Claire L; Cutler, David J; Haritunians, Talin; Bradfield, Jonathan P; Chopra, Pankaj; Prince, Jarod; Begum, Ferdouse; Kumar, Archana; Huang, Chengrui; Venkateswaran, Suresh; Datta, Lisa W; Wei, Zhi; Thomas, Kelly; Herrinton, Lisa J; Klapproth, Jan-Micheal A; Quiros, Antonio J; Seminerio, Jenifer; Liu, Zhenqiu; Alexander, Jonathan S; Baldassano, Robert N; Dudley-Brown, Sharon; Cross, Raymond K; Dassopoulos, Themistocles; Denson, Lee A; Dhere, Tanvi A; Dryden, Gerald W; Hanson, John S; Hou, Jason K; Hussain, Sunny Z; Hyams, Jeffrey S; Isaacs, Kim L; Kader, Howard; Kappelman, Michael D; Katz, Jeffry; Kellermayer, Richard; Kirschner, Barbara S; Kuemmerle, John F; Kwon, John H; Lazarev, Mark; Li, Ellen; Mack, David; Mannon, Peter; Moulton, Dedrick E; Newberry, Rodney D; Osuntokun, Bankole O; Patel, Ashish S; Saeed, Shehzad A; Targan, Stephan R; Valentine, John F; Wang, Ming-Hsi; Zonca, Martin; Rioux, John D; Duerr, Richard H; Silverberg, Mark S; Cho, Judy H; Hakonarson, Hakon; Zwick, Michael E; McGovern, Dermot P B; Kugathasan, Subra
2017-01-01
The inflammatory bowel diseases (IBD) ulcerative colitis (UC) and Crohn's disease (CD) cause significant morbidity and are increasing in prevalence among all populations, including African Americans. More than 200 susceptibility loci have been identified in populations of predominantly European ancestry, but few loci have been associated with IBD in other ethnicities. We performed 2 high-density, genome-wide scans comprising 2345 cases of African Americans with IBD (1646 with CD, 583 with UC, and 116 inflammatory bowel disease unclassified) and 5002 individuals without IBD (controls, identified from the Health Retirement Study and Kaiser Permanente database). Single-nucleotide polymorphisms (SNPs) associated at P < 5.0 × 10 -8 in meta-analysis with a nominal evidence (P < .05) in each scan were considered to have genome-wide significance. We detected SNPs at HLA-DRB1, and African-specific SNPs at ZNF649 and LSAMP, with associations of genome-wide significance for UC. We detected SNPs at USP25 with associations of genome-wide significance for IBD. No associations of genome-wide significance were detected for CD. In addition, 9 genes previously associated with IBD contained SNPs with significant evidence for replication (P < 1.6 × 10 -6 ): ADCY3, CXCR6, HLA-DRB1 to HLA-DQA1 (genome-wide significance on conditioning), IL12B,PTGER4, and TNC for IBD; IL23R, PTGER4, and SNX20 (in strong linkage disequilibrium with NOD2) for CD; and KCNQ2 (near TNFRSF6B) for UC. Several of these genes, such as TNC (near TNFSF15), CXCR6, and genes associated with IBD at the HLA locus, contained SNPs with unique association patterns with African-specific alleles. We performed a genome-wide association study of African Americans with IBD and identified loci associated with UC in only this population; we also replicated IBD, CD, and UC loci identified in European populations. The detection of variants associated with IBD risk in only people of African descent demonstrates the importance of studying the genetics of IBD and other complex diseases in populations beyond those of European ancestry. Copyright © 2017 AGA Institute. Published by Elsevier Inc. All rights reserved.
PIPEMicroDB: microsatellite database and primer generation tool for pigeonpea genome
Sarika; Arora, Vasu; Iquebal, M. A.; Rai, Anil; Kumar, Dinesh
2013-01-01
Molecular markers play a significant role for crop improvement in desirable characteristics, such as high yield, resistance to disease and others that will benefit the crop in long term. Pigeonpea (Cajanus cajan L.) is the recently sequenced legume by global consortium led by ICRISAT (Hyderabad, India) and been analysed for gene prediction, synteny maps, markers, etc. We present PIgeonPEa Microsatellite DataBase (PIPEMicroDB) with an automated primer designing tool for pigeonpea genome, based on chromosome wise as well as location wise search of primers. Total of 123 387 Short Tandem Repeats (STRs) were extracted from pigeonpea genome, available in public domain using MIcroSAtellite tool (MISA). The database is an online relational database based on ‘three-tier architecture’ that catalogues information of microsatellites in MySQL and user-friendly interface is developed using PHP. Search for STRs may be customized by limiting their location on chromosome as well as number of markers in that range. This is a novel approach and is not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of selected markers with left and right flankings of size up to 500 bp. This will enable researchers to select markers of choice at desired interval over the chromosome. Furthermore, one can use individual STRs of a targeted region over chromosome to narrow down location of gene of interest or linked Quantitative Trait Loci (QTLs). Although it is an in silico approach, markers’ search based on characteristics and location of STRs is expected to be beneficial for researchers. Database URL: http://cabindb.iasri.res.in/pigeonpea/ PMID:23396298
PIPEMicroDB: microsatellite database and primer generation tool for pigeonpea genome.
Sarika; Arora, Vasu; Iquebal, M A; Rai, Anil; Kumar, Dinesh
2013-01-01
Molecular markers play a significant role for crop improvement in desirable characteristics, such as high yield, resistance to disease and others that will benefit the crop in long term. Pigeonpea (Cajanus cajan L.) is the recently sequenced legume by global consortium led by ICRISAT (Hyderabad, India) and been analysed for gene prediction, synteny maps, markers, etc. We present PIgeonPEa Microsatellite DataBase (PIPEMicroDB) with an automated primer designing tool for pigeonpea genome, based on chromosome wise as well as location wise search of primers. Total of 123 387 Short Tandem Repeats (STRs) were extracted from pigeonpea genome, available in public domain using MIcroSAtellite tool (MISA). The database is an online relational database based on 'three-tier architecture' that catalogues information of microsatellites in MySQL and user-friendly interface is developed using PHP. Search for STRs may be customized by limiting their location on chromosome as well as number of markers in that range. This is a novel approach and is not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of selected markers with left and right flankings of size up to 500 bp. This will enable researchers to select markers of choice at desired interval over the chromosome. Furthermore, one can use individual STRs of a targeted region over chromosome to narrow down location of gene of interest or linked Quantitative Trait Loci (QTLs). Although it is an in silico approach, markers' search based on characteristics and location of STRs is expected to be beneficial for researchers. Database URL: http://cabindb.iasri.res.in/pigeonpea/
Relax with CouchDB--into the non-relational DBMS era of bioinformatics.
Manyam, Ganiraju; Payton, Michelle A; Roth, Jack A; Abruzzo, Lynne V; Coombes, Kevin R
2012-07-01
With the proliferation of high-throughput technologies, genome-level data analysis has become common in molecular biology. Bioinformaticians are developing extensive resources to annotate and mine biological features from high-throughput data. The underlying database management systems for most bioinformatics software are based on a relational model. Modern non-relational databases offer an alternative that has flexibility, scalability, and a non-rigid design schema. Moreover, with an accelerated development pace, non-relational databases like CouchDB can be ideal tools to construct bioinformatics utilities. We describe CouchDB by presenting three new bioinformatics resources: (a) geneSmash, which collates data from bioinformatics resources and provides automated gene-centric annotations, (b) drugBase, a database of drug-target interactions with a web interface powered by geneSmash, and (c) HapMap-CN, which provides a web interface to query copy number variations from three SNP-chip HapMap datasets. In addition to the web sites, all three systems can be accessed programmatically via web services. Copyright © 2012 Elsevier Inc. All rights reserved.
Paganoni, Sabrina; Nicholson, Katharine; Chan, James; Shui, Amy; Schoenfeld, David; Sherman, Alexander; Berry, James; Cudkowicz, Merit; Atassi, Nazem
2018-03-01
Urate has been identified as a predictor of amyotrophic lateral sclerosis (ALS) survival in some but not all studies. Here we leverage the recent expansion of the Pooled Resource Open-Access ALS Clinical Trials (PRO-ACT) database to study the association between urate levels and ALS survival. Pooled data of 1,736 ALS participants from the PRO-ACT database were analyzed. Cox proportional hazards regression models were used to evaluate associations between urate levels at trial entry and survival. After adjustment for potential confounders (i.e., creatinine and body mass index), there was an 11% reduction in risk of reaching a survival endpoint during the study with each 1-mg/dL increase in uric acid levels (adjusted hazard ratio 0.89, 95% confidence interval 0.82-0.97, P < 0.01). Our pooled analysis provides further support for urate as a prognostic factor for survival in ALS and confirms the utility of the PRO-ACT database as a powerful resource for ALS epidemiological research. Muscle Nerve 57: 430-434, 2018. © 2017 Wiley Periodicals, Inc.
Environment Online: The Greening of Databases. Part 2. Scientific and Technical Databases.
ERIC Educational Resources Information Center
Alston, Patricia Gayle
1991-01-01
This second in a series of articles about online sources of environmental information describes scientific and technical databases that are useful for searching environmental data. Topics covered include chemicals and hazardous substances; agriculture; pesticides; water; forestry, oil, and energy resources; air; environmental and occupational…
Chapter 4 - The LANDFIRE Prototype Project reference database
John F. Caratti
2006-01-01
This chapter describes the data compilation process for the Landscape Fire and Resource Management Planning Tools Prototype Project (LANDFIRE Prototype Project) reference database (LFRDB) and explains the reference data applications for LANDFIRE Prototype maps and models. The reference database formed the foundation for all LANDFIRE tasks. All products generated by the...
Use of Genomic Databases for Inquiry-Based Learning about Influenza
ERIC Educational Resources Information Center
Ledley, Fred; Ndung'u, Eric
2011-01-01
The genome projects of the past decades have created extensive databases of biological information with applications in both research and education. We describe an inquiry-based exercise that uses one such database, the National Center for Biotechnology Information Influenza Virus Resource, to advance learning about influenza. This database…