Sample records for kb sequence analysis

  1. Two sequence-ready contigs spanning the two copies of a 200-kb duplication on human 21q: partial sequence and polymorphisms.

    PubMed

    Potier, M; Dutriaux, A; Orti, R; Groet, J; Gibelin, N; Karadima, G; Lutfalla, G; Lynn, A; Van Broeckhoven, C; Chakravarti, A; Petersen, M; Nizetic, D; Delabar, J; Rossier, J

    1998-08-01

    Physical mapping across a duplication can be a tour de force if the region is larger than the size of a bacterial clone. This was the case of the 170- to 275-kb duplication present on the long arm of chromosome 21 in normal human at 21q11.1 (proximal region) and at 21q22.1 (distal region), which we described previously. We have constructed sequence-ready contigs of the two copies of the duplication of which all the clones are genuine representatives of one copy or the other. This required the identification of four duplicon polymorphisms that are copy-specific and nonallelic variations in the sequence of the STSs. Thirteen STSs were mapped inside the duplicated region and 5 outside but close to the boundaries. Among these STSs 10 were end clones from YACs, PACs, or cosmids, and the average interval between two markers in the duplicated region was 16 kb. Eight PACs and cosmids showing minimal overlaps were selected in both copies of the duplication. Comparative sequence analysis along the duplication showed three single-basepair changes between the two copies over 659 bp sequenced (4 STSs), suggesting that the duplication is recent (less than 4 mya). Two CpG islands were located in the duplication, but no genes were identified after a 36-kb cosmid from the proximal copy of the duplication was sequenced. The homology of this chromosome 21 duplicated region with the pericentromeric regions of chromosomes 13, 2, and 18 suggests that the mechanism involved is probably similar to pericentromeric-directed mechanisms described in interchromosomal duplications. Copyright 1998 Academic Press.

  2. Mapping PDB chains to UniProtKB entries.

    PubMed

    Martin, Andrew C R

    2005-12-01

    UniProtKB/SwissProt is the main resource for detailed annotations of protein sequences. This database provides a jumping-off point to many other resources through the links it provides. Among others, these include other primary databases, secondary databases, the Gene Ontology and OMIM. While a large number of links are provided to Protein Data Bank (PDB) files, obtaining a regularly updated mapping between UniProtKB entries and PDB entries at the chain or residue level is not straightforward. In particular, there is no regularly updated resource which allows a UniProtKB/SwissProt entry to be identified for a given residue of a PDB file. We have created a completely automatically maintained database which maps PDB residues to residues in UniProtKB/SwissProt and UniProtKB/trEMBL entries. The protocol uses links from PDB to UniProtKB, from UniProtKB to PDB and a brute-force sequence scan to resolve PDB chains for which no annotated link is available. Finally the sequences from PDB and UniProtKB are aligned to obtain a residue-level mapping. The resource may be queried interactively or downloaded from http://www.bioinf.org.uk/pdbsws/.

  3. PGen: large-scale genomic variations analysis workflow and browser in SoyKB.

    PubMed

    Liu, Yang; Khan, Saad M; Wang, Juexin; Rynge, Mats; Zhang, Yuanxun; Zeng, Shuai; Chen, Shiyuan; Maldonado Dos Santos, Joao V; Valliyodan, Babu; Calyam, Prasad P; Merchant, Nirav; Nguyen, Henry T; Xu, Dong; Joshi, Trupti

    2016-10-06

    With the advances in next-generation sequencing (NGS) technology and significant reductions in sequencing costs, it is now possible to sequence large collections of germplasm in crops for detecting genome-scale genetic variations and to apply the knowledge towards improvements in traits. To efficiently facilitate large-scale NGS resequencing data analysis of genomic variations, we have developed "PGen", an integrated and optimized workflow using the Extreme Science and Engineering Discovery Environment (XSEDE) high-performance computing (HPC) virtual system, iPlant cloud data storage resources and Pegasus workflow management system (Pegasus-WMS). The workflow allows users to identify single nucleotide polymorphisms (SNPs) and insertion-deletions (indels), perform SNP annotations and conduct copy number variation analyses on multiple resequencing datasets in a user-friendly and seamless way. We have developed both a Linux version in GitHub ( https://github.com/pegasus-isi/PGen-GenomicVariations-Workflow ) and a web-based implementation of the PGen workflow integrated within the Soybean Knowledge Base (SoyKB), ( http://soykb.org/Pegasus/index.php ). Using PGen, we identified 10,218,140 single-nucleotide polymorphisms (SNPs) and 1,398,982 indels from analysis of 106 soybean lines sequenced at 15X coverage. 297,245 non-synonymous SNPs and 3330 copy number variation (CNV) regions were identified from this analysis. SNPs identified using PGen from additional soybean resequencing projects adding to 500+ soybean germplasm lines in total have been integrated. These SNPs are being utilized for trait improvement using genotype to phenotype prediction approaches developed in-house. In order to browse and access NGS data easily, we have also developed an NGS resequencing data browser ( http://soykb.org/NGS_Resequence/NGS_index.php ) within SoyKB to provide easy access to SNP and downstream analysis results for soybean researchers. PGen workflow has been optimized for the most

  4. Second-generation sequencing of entire mitochondrial coding-regions (∼15.4 kb) holds promise for study of the phylogeny and taxonomy of human body lice and head lice.

    PubMed

    Xiong, H; Campelo, D; Pollack, R J; Raoult, D; Shao, R; Alem, M; Ali, J; Bilcha, K; Barker, S C

    2014-08-01

    The Illumina Hiseq platform was used to sequence the entire mitochondrial coding-regions of 20 body lice, Pediculus humanus Linnaeus, and head lice, P. capitis De Geer (Phthiraptera: Pediculidae), from eight towns and cities in five countries: Ethiopia, France, China, Australia and the U.S.A. These data (∼310 kb) were used to see how much more informative entire mitochondrial coding-region sequences were than partial mitochondrial coding-region sequences, and thus to guide the design of future studies of the phylogeny, origin, evolution and taxonomy of body lice and head lice. Phylogenies were compared from entire coding-region sequences (∼15.4 kb), entire cox1 (∼1.5 kb), partial cox1 (∼700 bp) and partial cytb (∼600 bp) sequences. On the one hand, phylogenies from entire mitochondrial coding-region sequences (∼15.4 kb) were much more informative than phylogenies from entire cox1 sequences (∼1.5 kb) and partial gene sequences (∼600 to ∼700 bp). For example, 19 branches had > 95% bootstrap support in our maximum likelihood tree from the entire mitochondrial coding-regions (∼15.4 kb) whereas the tree from 700 bp cox1 had only two branches with bootstrap support > 95%. Yet, by contrast, partial cytb (∼600 bp) and partial cox1 (∼486 bp) sequences were sufficient to genotype lice to Clade A, B or C. The sequences of the mitochondrial genomes of the P. humanus, P. capitis and P. schaeffi Fahrenholz studied are in NCBI GenBank under the accession numbers KC660761-800, KC685631-6330, KC241882-97, EU219988-95, HM241895-8 and JX080388-407. © 2014 The Royal Entomological Society.

  5. Diagnostic screening identifies a wide range of mutations involving the SHOX gene, including a common 47.5 kb deletion 160 kb downstream with a variable phenotypic effect.

    PubMed

    Bunyan, David J; Baker, Kevin R; Harvey, John F; Thomas, N Simon

    2013-06-01

    Léri-Weill dyschondrosteosis (LWD) results from heterozygous mutations of the SHOX gene, with homozygosity or compound heterozygosity resulting in the more severe form, Langer mesomelic dysplasia (LMD). These mutations typically take the form of whole or partial gene deletions, point mutations within the coding sequence, or large (>100 kb) 3' deletions of downstream regulatory elements. We have analyzed the coding sequence of the SHOX gene and its downstream regulatory regions in a cohort of 377 individuals referred with symptoms of LWD, LMD or short stature. A causative mutation was identified in 68% of the probands with LWD or LMD (91/134). In addition, a 47.5 kb deletion was found 160 kb downstream of the SHOX gene in 17 of the 377 patients (12% of the LWD referrals, 4.5% of all referrals). In 14 of these 17 patients, this was the only potentially causative abnormality detected (13 had symptoms consistent with LWD and one had short stature only), but the other three 47.5 kb deletions were found in patients with an additional causative SHOX mutation (with symptoms of LWD rather than LMD). Parental samples were available on 14/17 of these families, and analysis of these showed a more variable phenotype ranging from apparently unaffected to LWD. Breakpoint sequence analysis has shown that the 47.5 kb deletion is identical in all 17 patients, most likely due to an ancient founder mutation rather than recurrence. This deletion was not seen in 471 normal controls (P<0.0001), providing further evidence for a phenotypic effect, albeit one with variable penetration. Copyright © 2013 Wiley Periodicals, Inc.

  6. Characterization of the 101-Kilobase-Pair Megaplasmid pKB1, Isolated from the Rubber-Degrading Bacterium Gordonia westfalica Kb1

    PubMed Central

    Bröker, Daniel; Arenskötter, Matthias; Legatzki, Antje; Nies, Dietrich H.; Steinbüchel, Alexander

    2004-01-01

    The complete sequence of the circular 101,016-bp megaplasmid pKB1 from the cis-1,4-polyisoprene-degrading bacterium Gordonia westfalica Kb1, which represents the first described extrachromosomal DNA of a member of this genus, was determined. Plasmid pKB1 harbors 105 open reading frames. The predicted products of 46 of these are significantly related to proteins of known function. Plasmid pKB1 is organized into three functional regions that are flanked by insertion sequence (IS) elements: (i) a replication and putative partitioning region, (ii) a putative metabolic region, and (iii) a large putative conjugative transfer region, which is interrupted by an additional IS element. Southern hybridization experiments revealed the presence of another copy of this conjugational transfer region on the bacterial chromosome. The origin of replication (oriV) of pKB1 was identified and used for construction of Escherichia coli-Gordonia shuttle vectors, which was also suitable for several other Gordonia species and related genera. The metabolic region included the heavy-metal resistance gene cadA, encoding a P-type ATPase. Expression of cadA in E. coli mediated resistance to cadmium, but not to zinc, and decreased the cellular content of cadmium in this host. When G. westfalica strain Kb1 was cured of plasmid pKB1, the resulting derivative strains exhibited slightly decreased cadmium resistance. Furthermore, they had lost the ability to use isoprene rubber as a sole source of carbon and energy, suggesting that genes essential for rubber degradation are encoded by pKB1. PMID:14679241

  7. Large-Scale Sequencing of Two Regions in Human Chromosome 7q22: Analysis of 650 kb of Genomic Sequence around the EPO and CUTL1 Loci Reveals 17 Genes

    PubMed Central

    Glöckner, Gernot; Scherer, Stephen; Schattevoy, Ruben; Boright, Andrew; Weber, Jacqueline; Tsui, Lap-Chee; Rosenthal, André

    1998-01-01

    We have sequenced and annotated two genomic regions located in the Giemsa negative band q22 of human chromosome 7. The first region defined by the erythropoietin (EPO) locus is 228 kb in length and contains 13 genes. Whereas 3 genes (GNB2, EPO, PCOLCE) were known previously on the mRNA level, we have been able to identify 10 novel genes using a newly developed automatic annotation tool RUMMAGE-DP, which comprises >26 different programs mainly for exon prediction, homology searches, and compositional and repeat analysis. For precise annotation we have also resequenced ESTs identified to the region and assembled them to build large cDNAs. In addition, we have investigated the differential splicing of genes. Using these tools we annotated 4 of the 10 genes as a zonadhesin, a transferrin homolog, a nucleoporin-like gene, and an actin gene. Two genes showed weak similarity to an insulin-like receptor and a neuronal protein with a leucine-rich amino-terminal domain. Four predicted genes (CDS1–CDS4) CDS that have been confirmed on the mRNA level showed no similarity to known proteins and a potential function could not be assigned. The second region in 7q22 defined by the CUTL1 (CCAAT displacement protein and its splice variant) locus is 416 kb in length and contains three known genes, including PMSL12, APS, CUTL1, and a novel gene (CDS5). The CUTL1 locus, consisting of two splice variants (CDP and CASP), occupies >300 kb. Based on the G,C profile an isochore switch can be defined between the CUTL1 gene and the APS and PMSL12 genes. [Clones 37G3, 164c7, and 235f8 are deposited in GenBank under accession no. AF053356; clone 123e15, accession no. AF024533; 186d2, accession no. AF024534; 46f6, accession no. AF006752; 50h2, accession no. AF047825; and 76h2, accession no. AF030453] PMID:9799793

  8. Sequencing, annotation and comparative analysis of nine BACs of giant panda (Ailuropoda melanoleuca).

    PubMed

    Zheng, Yang; Cai, Jing; Li, JianWen; Li, Bo; Lin, Runmao; Tian, Feng; Wang, XiaoLing; Wang, Jun

    2010-01-01

    A 10-fold BAC library for giant panda was constructed and nine BACs were selected to generate finish sequences. These BACs could be used as a validation resource for the de novo assembly accuracy of the whole genome shotgun sequencing reads of giant panda newly generated by the Illumina GA sequencing technology. Complete sanger sequencing, assembly, annotation and comparative analysis were carried out on the selected BACs of a joint length 878 kb. Homologue search and de novo prediction methods were used to annotate genes and repeats. Twelve protein coding genes were predicted, seven of which could be functionally annotated. The seven genes have an average gene size of about 41 kb, an average coding size of about 1.2 kb and an average exon number of 6 per gene. Besides, seven tRNA genes were found. About 27 percent of the BAC sequence is composed of repeats. A phylogenetic tree was constructed using neighbor-join algorithm across five species, including giant panda, human, dog, cat and mouse, which reconfirms dog as the most related species to giant panda. Our results provide detailed sequence and structure information for new genes and repeats of giant panda, which will be helpful for further studies on the giant panda.

  9. Disease-Causing 7.4 kb Cis-Regulatory Deletion Disrupting Conserved Non-Coding Sequences and Their Interaction with the FOXL2 Promotor: Implications for Mutation Screening

    PubMed Central

    Dostie, Josée; Lemire, Edmond; Bouchard, Philippe; Field, Michael; Jones, Kristie; Lorenz, Birgit; Menten, Björn; Buysse, Karen; Pattyn, Filip; Friedli, Marc; Ucla, Catherine; Rossier, Colette; Wyss, Carine; Speleman, Frank; De Paepe, Anne; Dekker, Job; Antonarakis, Stylianos E.; De Baere, Elfride

    2009-01-01

    To date, the contribution of disrupted potentially cis-regulatory conserved non-coding sequences (CNCs) to human disease is most likely underestimated, as no systematic screens for putative deleterious variations in CNCs have been conducted. As a model for monogenic disease we studied the involvement of genetic changes of CNCs in the cis-regulatory domain of FOXL2 in blepharophimosis syndrome (BPES). Fifty-seven molecularly unsolved BPES patients underwent high-resolution copy number screening and targeted sequencing of CNCs. Apart from three larger distant deletions, a de novo deletion as small as 7.4 kb was found at 283 kb 5′ to FOXL2. The deletion appeared to be triggered by an H-DNA-induced double-stranded break (DSB). In addition, it disrupts a novel long non-coding RNA (ncRNA) PISRT1 and 8 CNCs. The regulatory potential of the deleted CNCs was substantiated by in vitro luciferase assays. Interestingly, Chromosome Conformation Capture (3C) of a 625 kb region surrounding FOXL2 in expressing cellular systems revealed physical interactions of three upstream fragments and the FOXL2 core promoter. Importantly, one of these contains the 7.4 kb deleted fragment. Overall, this study revealed the smallest distant deletion causing monogenic disease and impacts upon the concept of mutation screening in human disease and developmental disorders in particular. PMID:19543368

  10. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    PubMed Central

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can

  11. Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (>11 kb), single molecule, real-time sequencing

    PubMed Central

    Vembar, Shruthi Sridhar; Seetin, Matthew; Lambert, Christine; Nattestad, Maria; Schatz, Michael C.; Baybayan, Primo; Scherf, Artur; Smith, Melissa Laird

    2016-01-01

    The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [∼80.6% (A + T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12 kb, with 50% of the reads between 15.5 and 50 kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [∼90–99% (A + T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission. PMID:27345719

  12. Ornithine aminotransferase (OAT): recombination between an X-linked OAT sequence (7.5 kb) and the Norrie disease locus.

    PubMed

    Ngo, J T; Bateman, J B; Spence, M A; Cortessis, V; Sparkes, R S; Kivlin, J D; Mohandas, T; Inana, G

    1990-01-01

    A human ornithine aminotransferase (OAT) locus has been mapped to the Xp11.2, as has the Norrie disease locus. We used a cDNA probe to investigate a 3-generation UCLA family with Norrie disease; a 4.2-kb RFLP was detected and a maximum lod score of 0.602 at zero recombination fraction was calculated. We used the same probe to study a second multigeneration family with Norrie disease from Utah. A different RFLP of 7.5 kb in size was identified and a recombinational event between the OAT locus represented by this RFLP and the disease loci was observed. Linkage analysis of these two loci in this family revealed a maximum load score of 1.88 at a recombination fraction of 0.10. Although both families have affected members with the same disease, the lod scores are reported separately because the 4.2- and 7.5-kb RFLPs may represent two different loci for the X-linked OAT.

  13. Cloning and sequence analysis of a cDNA clone coding for the mouse GM2 activator protein.

    PubMed Central

    Bellachioma, G; Stirling, J L; Orlacchio, A; Beccari, T

    1993-01-01

    A cDNA (1.1 kb) containing the complete coding sequence for the mouse GM2 activator protein was isolated from a mouse macrophage library using a cDNA for the human protein as a probe. There was a single ATG located 12 bp from the 5' end of the cDNA clone followed by an open reading frame of 579 bp. Northern blot analysis of mouse macrophage RNA showed that there was a single band with a mobility corresponding to a size of 2.3 kb. We deduce from this that the mouse mRNA, in common with the mRNA for the human GM2 activator protein, has a long 3' untranslated sequence of approx. 1.7 kb. Alignment of the mouse and human deduced amino acid sequences showed 68% identity overall and 75% identity for the sequence on the C-terminal side of the first 31 residues, which in the human GM2 activator protein contains the signal peptide. Hydropathicity plots showed great similarity between the mouse and human sequences even in regions of low sequence similarity. There is a single N-glycosylation site in the mouse GM2 activator protein sequence (Asn151-Phe-Thr) which differs in its location from the single site reported in the human GM2 activator protein sequence (Asn63-Val-Thr). Images Figure 1 PMID:7689829

  14. Analysis of complex repeat sequences within the spinal muscular atrophy (SMA) candidate region in 5q13

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Davies, K.E.; Morrison, K.E.; Daniels, R.I.

    1994-09-01

    We previously reported that the 400 kb interval flanked the polymorphic loci D5S435 and D5S557 contains blocks of a chromosome 5 specific repeat. This interval also defines the SMA candidate region by genetic analysis of recombinant families. A YAC contig of 2-3 Mb encompassing this area has been constructed and a 5.5 kb conserved fragment, isolated from a YAC end clone within the above interval, was used to obtain cDNAs from both fetal and adult brain libraries. We describe the identification of cDNAs with stretches of high DNA sequence homology to exons of {beta} glucuronidase on human chromosome 7. Themore » cDNAs map both to the candidate region and to an area of 5p using FISH and deletion hybrid analysis. Hybridization to bacteriophage and cosmid clones from the YACs localizes the {beta} glucuronidase related sequences within the 400 kb region of the YAC contig. The cDNAs show a polymorphic pattern on hybridization to genomic BamH1 fragments in the size range of 10-250 kb. Further analysis using YAC fragmentation vectors is being used to determine how these {beta} glucuronidase related cDNAs are distributed within 5q13. Dinucleotide repeats within the region are being investigated to determine linkage disequilibrium with the disease locus.« less

  15. KB425796-A, a novel antifungal antibiotic produced by Paenibacillus sp. 530603.

    PubMed

    Kai, Hirohito; Yamashita, Midori; Takase, Shigehiro; Hashimoto, Michizane; Muramatsu, Hideyuki; Nakamura, Ikuko; Yoshikawa, Koji; Ezaki, Masami; Nitta, Kumiko; Watanabe, Masato; Inamura, Noriaki; Fujie, Akihiko

    2013-08-01

    The novel antifungal macrocyclic lipopeptidolactone, KB425796-A (1), was isolated from the fermentation broth of bacterial strain 530603, which was identified as a new Paenibacillus species based on morphological and physiological characteristics, and 16S rRNA sequences. KB425796-A (1) was isolated as white powder by solvent extraction, HP-20 and ODS-B column chromatography, and lyophilization, and was determined to have the molecular formula C79H115N19O18. KB425796-A (1) showed antifungal activities against Aspergillus fumigatus and the micafungin-resistant infectious fungi Trichosporon asahii, Rhizopus oryzae, Pseudallescheria boydii and Cryptococcus neoformans.

  16. Analysis of the Complete Mitochondrial Genome Sequence of the Diploid Cotton Gossypium raimondii by Comparative Genomics Approaches

    PubMed Central

    Paterson, Andrew H.; Wang, Xuelin; Xu, Yiqing; Wu, Dongyang; Qu, Yanshu; Jiang, Anna; Ye, Qiaolin

    2016-01-01

    Cotton is one of the most important economic crops and the primary source of natural fiber and is an important protein source for animal feed. The complete nuclear and chloroplast (cp) genome sequences of G. raimondii are already available but not mitochondria. Here, we assembled the complete mitochondrial (mt) DNA sequence of G. raimondii into a circular genome of length of 676,078 bp and performed comparative analyses with other higher plants. The genome contains 39 protein-coding genes, 6 rRNA genes, and 25 tRNA genes. We also identified four larger repeats (63.9 kb, 10.6 kb, 9.1 kb, and 2.5 kb) in this mt genome, which may be active in intramolecular recombination in the evolution of cotton. Strikingly, nearly all of the G. raimondii mt genome has been transferred to nucleus on Chr1, and the transfer event must be very recent. Phylogenetic analysis reveals that G. raimondii, as a member of Malvaceae, is much closer to another cotton (G. barbadense) than other rosids, and the clade formed by two Gossypium species is sister to Brassicales. The G. raimondii mt genome may provide a crucial foundation for evolutionary analysis, molecular biology, and cytoplasmic male sterility in cotton and other higher plants. PMID:27847816

  17. Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man

    PubMed Central

    Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.

    2000-01-01

    The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409

  18. Genomic organization of the 260 kb surrounding the waxy locus in a Japonica rice

    PubMed

    Nagano; Wu; Kawasaki; Kishima; Sano

    1999-12-01

    The present study was carried out to characterize the molecular organization in the vicinity of the waxy locus in rice. To determine the structural organization of the region surrounding waxy, contiguous clones covering a total of 260 kb were constructed using a bacterial artificial chromosome (BAC) library from the Shimokita variety of Japonica rice. This map also contains 200 overlapping subclones, which allowed construction of a fine physical map with a total of 64 HindIII sites. During the course of constructing the map, we noticed the presence of some repeated regions which might be related to transposable elements. We divided the 260-kb region into 60 segments (average size of 5.7 kb) to use as probes to determine their genomic organization. Hybridization patterns obtained by probing with these segments were classified into four types: class 1, a single or a few bands without a smeared background; class 2, a single or a few bands with a smeared background; class 3, multiple discrete bands without a smeared background; and class 4, only a smeared background. These classes constituted 6.5%, 20.9%, 3.7%, and 68.9% of the 260-kb region, respectively. The distribution of each class revealed that repetitive sequences are a major component in this region, as expected, and that unique sequence regions were mostly no longer than 6 kb due to interruption by repetitive sequences. We discuss how the map constructed here might be a powerful tool for characterization and comparison of the genome structures and the genes around the waxy locus in the Oryza species.

  19. Loss of retrovirus production in JB/RH melanoma cells transfected with H-2Kb and TAP-1 genes.

    PubMed

    Li, M; Xu, F; Muller, J; Huang, X; Hearing, V J; Gorelik, E

    1999-01-20

    JB/RH1 melanoma cells, as well as other melanomas of C57BL/6 mice (B16 and JB/MS), express a common melanoma-associated antigen (MAA) encoded by an ecotropic melanoma-associated retrovirus (MelARV). JB/RH1 cells do not express the H-2Kb molecules due to down-regulation of the H-2Kb and TAP-1 genes. When JB/RH1 cells were transfected with the H-2Kb and cotransfected with the TAP-1 gene, it resulted in the appearance of H-2Kb molecules and an increase in their immunogenicity, albeit they lost expression of retrovirus-encoded MAA recognized by MM2-9B6 mAb. Loss of MAA was found to result from a complete and stable elimination of ecotropic MelARV production in the H-2Kb/TAP-1-transfected JB/RH1 cells. Northern blot analysis showed no differences in ecotropic retroviral messages in MelARV-producing and -nonproducing melanoma cells, suggesting that loss of MelARV production was not due to down-regulation of MelARV transcription. Southern blot analysis revealed several rearrangements in the proviral DNA of H-2Kb-positive JB/RH1 melanoma cells. Sequence analysis of the ecotropic proviral DNA from these cells showed numerous nucleotide substitutions, some of which resulted in the appearance of a novel intraviral PstI restriction site and the loss of a HindIII restriction site in the pol region. PCR amplification of the proviral DNAs indicates that an ecotropic provirus found in the H-2Kb-positive cells is novel and does not preexist in the parental H-2Kb-negative melanoma cells. Conversely, the ecotropic provirus of the parental JB/RH1 cells was not amplifable from the H-2Kb-positive cells. Our data indicate that stable loss of retroviral production in the H-2Kb/TAP-1-transfected melanoma cells is probably due to the induction of recombination between a productive ecotropic MelARV and a defective nonecotropic provirus leading to the generation of a defective ecotropic provirus and the loss of MelARV production and expression of the retrovirus-encoded MAA. Copyright 1999

  20. Recombination hot spot in 3.2-kb region of the Charcot-Marie Tooth type 1A repeat sequences: New tools for molecular diagnosis of hereditary neuropathy with liability to pressure palsies and of Charcot-Marie-Tooth type 1A

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lopes, J.; LeGuern, E.; Gouider, R.

    1996-06-01

    Charcot-Marie-Tooth type 1A (CMT1A) disease and hereditary neuropathy with liability to pressure palsies (HNPP) are autosomal dominant neuropathies, associated, respectively, with duplications and deletions of the same 1.5-Mb region on 17p11.2-p12. These two rearrangements are the reciprocal products of an unequal meiotic crossover between the two chromosome 17 homologues, caused by the misalignment of the CMT1A repeat sequences (CMT1A-REPs), the homologous sequences flanking the 1.5-Mb CMT1A/HNPP monomer unit. In order to map recombination breakpoints within the CMT1A-REPs, a 12.9-kb restriction map was constructed from cloned EcoRI fragments of the proximal and distal CMT1A-REPs. Only 3 of the 17 tested restrictionmore » sites were present in the proximal CMT1A-REP but absent in the distal CMT1A-REP, indicating a high degree of homology between these sequences. The rearrangements were mapped in four regions of the CMT1A-REPs by analysis of 76 CMT1A index cases and 38 HNPP patients, who were unrelated. A hot spot of crossover breakpoints located in a 3.2-kb region accounted for three-quarters of the rearrangements, detected after EcoRI/SacI digestion, by the presence of 3.2-kb and 7.8-kb junction fragments in CMT1A and HNPP patients, respectively. These junction fragments, which can be detected on classical Southern blots, permit molecular diagnosis. Other rearrangements can also be detected by gene dosage on the same Southern blots. 25 refs., 4 figs., 2 tabs.« less

  1. Origin of noncoding DNA sequences: molecular fossils of genome evolution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Naora, H.; Miyahara, K.; Curnow, R.N.

    The total amount of noncoding sequences on chromosomes of contemporary organisms varies significantly from species to species. The authors propose a hypothesis for the origin of these noncoding sequences that assumes that (i) an approx. 0.55-kilobase (kb)-long reading frame composed the primordial gene and (ii) a 20-kb-long single-stranded polynucleotide is the longest molecule (as a genome) that was polymerized at random and without a specific template in the primordial soup/cell. The statistical distribution of stop codons allows examination of the probability of generating reading frames of approx. 0.55 kb in this primordial polynucleotide. This analysis reveals that with three stopmore » codons, a run of at least 0.55-kb equivalent length of nonstop codons would occur in 4.6% of 20-kb-long polynucleotide molecules. They attempt to estimate the total amount of noncoding sequences that would be present on the chromosomes of contemporary species assuming that present-day chromosomes retain the prototype primordial genome structure. Theoretical estimates thus obtained for most eukaryotes do not differ significantly from those reported for these specific organisms, with only a few exceptions. Furthermore, analysis of possible stop-codon distributions suggests that life on earth would not exist, at least in its present form, had two or four stop codons been selected early in evolution.« less

  2. Genome sequencing and analysis of a type A Clostridium perfringens isolate from a case of bovine clostridial abomasitis.

    PubMed

    Nowell, Victoria J; Kropinski, Andrew M; Songer, J Glenn; MacInnes, Janet I; Parreira, Valeria R; Prescott, John F

    2012-01-01

    Clostridium perfringens is a common inhabitant of the avian and mammalian gastrointestinal tracts and can behave commensally or pathogenically. Some enteric diseases caused by type A C. perfringens, including bovine clostridial abomasitis, remain poorly understood. To investigate the potential basis of virulence in strains causing this disease, we sequenced the genome of a type A C. perfringens isolate (strain F262) from a case of bovine clostridial abomasitis. The ∼3.34 Mbp chromosome of C. perfringens F262 is predicted to contain 3163 protein-coding genes, 76 tRNA genes, and an integrated plasmid sequence, Cfrag (∼18 kb). In addition, sequences of two complete circular plasmids, pF262C (4.8 kb) and pF262D (9.1 kb), and two incomplete plasmid fragments, pF262A (48.5 kb) and pF262B (50.0 kb), were identified. Comparison of the chromosome sequence of C. perfringens F262 to complete C. perfringens chromosomes, plasmids and phages revealed 261 unique genes. No novel toxin genes related to previously described clostridial toxins were identified: 60% of the 261 unique genes were hypothetical proteins. There was a two base pair deletion in virS, a gene reported to encode the main sensor kinase involved in virulence gene activation. Despite this frameshift mutation, C. perfringens F262 expressed perfringolysin O, alpha-toxin and the beta2-toxin, suggesting that another regulation system might contribute to the pathogenicity of this strain. Two complete plasmids, pF262C (4.8 kb) and pF262D (9.1 kb), unique to this strain of C. perfringens were identified.

  3. Genome Sequencing and Analysis of a Type A Clostridium perfringens Isolate from a Case of Bovine Clostridial Abomasitis

    PubMed Central

    Nowell, Victoria J.; Kropinski, Andrew M.; Songer, J. Glenn; MacInnes, Janet I.; Parreira, Valeria R.; Prescott, John F.

    2012-01-01

    Clostridium perfringens is a common inhabitant of the avian and mammalian gastrointestinal tracts and can behave commensally or pathogenically. Some enteric diseases caused by type A C. perfringens, including bovine clostridial abomasitis, remain poorly understood. To investigate the potential basis of virulence in strains causing this disease, we sequenced the genome of a type A C. perfringens isolate (strain F262) from a case of bovine clostridial abomasitis. The ∼3.34 Mbp chromosome of C. perfringens F262 is predicted to contain 3163 protein-coding genes, 76 tRNA genes, and an integrated plasmid sequence, Cfrag (∼18 kb). In addition, sequences of two complete circular plasmids, pF262C (4.8 kb) and pF262D (9.1 kb), and two incomplete plasmid fragments, pF262A (48.5 kb) and pF262B (50.0 kb), were identified. Comparison of the chromosome sequence of C. perfringens F262 to complete C. perfringens chromosomes, plasmids and phages revealed 261 unique genes. No novel toxin genes related to previously described clostridial toxins were identified: 60% of the 261 unique genes were hypothetical proteins. There was a two base pair deletion in virS, a gene reported to encode the main sensor kinase involved in virulence gene activation. Despite this frameshift mutation, C. perfringens F262 expressed perfringolysin O, alpha-toxin and the beta2-toxin, suggesting that another regulation system might contribute to the pathogenicity of this strain. Two complete plasmids, pF262C (4.8 kb) and pF262D (9.1 kb), unique to this strain of C. perfringens were identified. PMID:22412860

  4. A rare case of 46, XX SRY-negative male with approximately 74-kb duplication in a region upstream of SOX9.

    PubMed

    Xiao, Bing; Ji, Xing; Xing, Ya; Chen, Ying-Wei; Tao, Jiong

    2013-12-01

    The 46, XX male disorder of sex development (DSD) is a rare genetic condition. Here, we report the case of a 46, XX SRY-negative male with complete masculinization. The coding region and exon/intron boundaries of the DAX1, SOX9 and RSPO1 genes were sequenced, and no mutations were detected. Using whole genome array analysis and real-time PCR, we identified a approximately 74-kb duplication in a region approximately 510-584 kb upstream of SOX9 (chr17:69,533,305-69,606,825, hg19). Combined with the results of previous studies, the minimum critical region associated with gonadal development is a 67-kb region located 584-517 kb upstream of SOX9. The amplification of this region might lead to SOX9 overexpression, causing female-to-male sex reversal. Gonadal-specific enhancers in the region upstream of SOX9 may activate the SOX9 expression through long-range regulation, thus triggering testicular differentiation. Copyright © 2013 Elsevier Masson SAS. All rights reserved.

  5. The UniProtKB guide to the human proteome

    PubMed Central

    Breuza, Lionel; Poux, Sylvain; Estreicher, Anne; Famiglietti, Maria Livia; Magrane, Michele; Tognolli, Michael; Bridge, Alan; Baratin, Delphine; Redaschi, Nicole

    2016-01-01

    Advances in high-throughput and advanced technologies allow researchers to routinely perform whole genome and proteome analysis. For this purpose, they need high-quality resources providing comprehensive gene and protein sets for their organisms of interest. Using the example of the human proteome, we will describe the content of a complete proteome in the UniProt Knowledgebase (UniProtKB). We will show how manual expert curation of UniProtKB/Swiss-Prot is complemented by expert-driven automatic annotation to build a comprehensive, high-quality and traceable resource. We will also illustrate how the complexity of the human proteome is captured and structured in UniProtKB. Database URL: www.uniprot.org PMID:26896845

  6. SNPs in Entire Mitochondrial Genome Sequences (≈15.4 kb) and cox1 Sequences (≈486 bp) Resolve Body and Head Lice From Doubly Infected People From Ethiopia, China, Nepal, and Iran But Not France.

    PubMed

    Xiong, H; Campelo, D; Boutellis, A; Raoult, D; Alem, M; Ali, J; Bilcha, K; Shao, R; Pollack, R J; Barker, S C

    2014-11-01

    Some people host lice on the clothing as well as the head. Whether body lice and head lice are distinct species or merely variants of the same species remains contentious. We sought to ascertain the extent to which lice from these different habitats might interbreed on doubly infected people by comparing their entire mitochondrial genome sequences. Toward this end, we analyzed two sets of published genetic data from double-infections of body lice and head lice: 1) entire mitochondrial coding regions (≈15.4 kb) from body lice and head lice from seven doubly infected people from Ethiopia, China, and France; and 2) part of the cox1 gene (≈486 bp) from body lice and head lice from a further nine doubly infected people from China, Nepal, and Iran. These mitochondrial data, from 65 lice, revealed extraordinary variation in the number of single nucleotide polymorphisms between the individual body lice and individual head lice of double-infections: from 1.096 kb of 15.4 kb (7.6%) to 2 bps of 15.4 kb (0.01%). We detected coinfections of lice of Clades A and C on the scalp hair of three of the eight people from Nepal: one person of the two people from Kathmandu and two of the six people from Pokhara. Lice of Clades A and B coinfected the scalp hair of one person from Atherton, Far North Queensland, Australia. These findings argue for additional large-scale studies of the body lice and head lice of double-infected people. © 2014 Entomological Society of America.

  7. Chromosome arm-specific BAC end sequences permit comparative analysis of homoeologous chromosomes and genomes of polyploid wheat

    PubMed Central

    2012-01-01

    Background Bread wheat, one of the world’s staple food crops, has the largest, highly repetitive and polyploid genome among the cereal crops. The wheat genome holds the key to crop genetic improvement against challenges such as climate change, environmental degradation, and water scarcity. To unravel the complex wheat genome, the International Wheat Genome Sequencing Consortium (IWGSC) is pursuing a chromosome- and chromosome arm-based approach to physical mapping and sequencing. Here we report on the use of a BAC library made from flow-sorted telosomic chromosome 3A short arm (t3AS) for marker development and analysis of sequence composition and comparative evolution of homoeologous genomes of hexaploid wheat. Results The end-sequencing of 9,984 random BACs from a chromosome arm 3AS-specific library (TaaCsp3AShA) generated 11,014,359 bp of high quality sequence from 17,591 BAC-ends with an average length of 626 bp. The sequence represents 3.2% of t3AS with an average DNA sequence read every 19 kb. Overall, 79% of the sequence consisted of repetitive elements, 1.38% as coding regions (estimated 2,850 genes) and another 19% of unknown origin. Comparative sequence analysis suggested that 70-77% of the genes present in both 3A and 3B were syntenic with model species. Among the transposable elements, gypsy/sabrina (12.4%) was the most abundant repeat and was significantly more frequent in 3A compared to homoeologous chromosome 3B. Twenty novel repetitive sequences were also identified using de novo repeat identification. BESs were screened to identify simple sequence repeats (SSR) and transposable element junctions. A total of 1,057 SSRs were identified with a density of one per 10.4 kb, and 7,928 junctions between transposable elements (TE) and other sequences were identified with a density of one per 1.39 kb. With the objective of enhancing the marker density of chromosome 3AS, oligonucleotide primers were successfully designed from 758 SSRs and 695

  8. Molecular characterization of the breakpoints of a 12-kb deletion in the NF1 gene in a family showing germ-line mosaicism.

    PubMed Central

    Lázaro, C; Gaona, A; Lynch, M; Kruyer, H; Ravella, A; Estivill, X

    1995-01-01

    Neurofibromatosis type 1 (NF1) is caused by deletions, insertions, translocations, and point mutations in the NF1 gene, which spans 350 kb on the long arm of human chromosome 17. Although several point mutations have been described, large molecular abnormalities have rarely been characterized in detail. We describe here the molecular breakpoints of a 12-kb deletion of the NF1 gene, which is responsible for the NF1 phenotype in a kindred with two children affected because of germline mosaicism in the unaffected father, who has the mutation in 10% of his spermatozoa. The mutation spans introns 31-39, removing 12,021 nt and inserting 30 bp, of which 19 bp are a direct repetition of a sequence located in intron 31, just 4 bp before the 5' breakpoint. The 5' and 3' breakpoints contain the sequence TATTTTA, which could be involved in the generation of the deletion. The most plausible explanation for the mechanism involved in the generation of this 12-kb deletion is homologous/nonhomologous recombination. Since sperm of the father does not contain the corresponding insertion of the 12-kb deleted sequence, this deletion could have occurred within the NF1 chromosome through loop formation. RNA from lymphocytes of one of the NF1 patients showed similar levels of the mutated and normal transcripts, suggesting that the NF1-mRNA from mutations causing frame shifts of the reading frame or stop codons in this gene is not degraded during its processing. The mutation was not detected in fresh lymphocytes from the unaffected father by PCR analysis, supporting the case for true germ-line mosaicism. Images Figure 1 Figure 3 PMID:7485153

  9. Evolution and dynamics of megaplasmids with genome sizes larger than 100 kb in the Bacillus cereus group.

    PubMed

    Zheng, Jinshui; Peng, Donghai; Ruan, Lifang; Sun, Ming

    2013-12-02

    Plasmids play a crucial role in the evolution of bacterial genomes by mediating horizontal gene transfer. However, the origin and evolution of most plasmids remains unclear, especially for megaplasmids. Strains of the Bacillus cereus group contain up to 13 plasmids with genome sizes ranging from 2 kb to 600 kb, and thus can be used to study plasmid dynamics and evolution. This work studied the origin and evolution of 31 B. cereus group megaplasmids (>100 kb) focusing on the most conserved regions on plasmids, minireplicons. Sixty-five putative minireplicons were identified and classified to six types on the basis of proteins that are essential for replication. Twenty-nine of the 31 megaplasmids contained two or more minireplicons. Phylogenetic analysis of the protein sequences showed that different minireplicons on the same megaplasmid have different evolutionary histories. Therefore, we speculated that these megaplasmids are the results of fusion of smaller plasmids. All plasmids of a bacterial strain must be compatible. In megaplasmids of the B. cereus group, individual minireplicons of different megaplasmids in the same strain belong to different types or subtypes. Thus, the subtypes of each minireplicon they contain may determine the incompatibilities of megaplasmids. A broader analysis of all 1285 bacterial plasmids with putative known minireplicons whose complete genome sequences were available from GenBank revealed that 34% (443 plasmids) of the plasmids have two or more minireplicons. This indicates that plasmid fusion events are general among bacterial plasmids. Megaplasmids of B. cereus group are fusion of smaller plasmids, and the fusion of plasmids likely occurs frequently in the B. cereus group and in other bacterial taxa. Plasmid fusion may be one of the major mechanisms for formation of novel megaplasmids in the evolution of bacteria.

  10. Norrie disease: linkage analysis using a 4.2-kb RFLP detected by a human ornithine aminotransferase cDNA probe.

    PubMed

    Ngo, J T; Bateman, J B; Cortessis, V; Sparkes, R S; Mohandas, T; Inana, G; Spence, M A

    1989-05-01

    Previous study has shown that the usual DNA marker for Norrie disease, the L1.28 probe which identifies the DXS7 locus, can recombine with the disease locus. In this study, we used a human ornithine aminotransferase (OAT) cDNA which detects OAT-related DNA sequences mapped to the same region on the X chromosome as that of the L1.28 probe to investigate the family with Norrie disease who exhibited the recombinational event. When genomic DNA from this family was digested with the PvuII restriction endonuclease, we found a restriction fragment length polymorphism (RFLP) of 4.2 kb in size. This fragment was absent in the affected males and cosegregated with the disease locus; we calculated a lod score of 0.602, at theta = 0.00. No deletion could be detected by chromosomal analysis or on Southern blots with other enzymes. These results suggest that one of the OAT-related sequences on the X chromosome may be in close proximity to the Norrie disease locus and represent the first report which indicates that the OAT cDNA may be useful for the identification of carrier status and/or prenatal diagnosis.

  11. The Friedreich ataxia critical region spans a 150-kb interval on chromosome 9q13

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Montermini, L.; Zara, F.; Patel, P.I.

    1995-11-01

    By analysis of crossovers in key recombinant families and by homozygosity analysis of inbred families, the Friedreich ataxia (FRDA) locus was localized in a 300-kb interval between the X104 gene and the microsatellite marker FR8 (D9S888). By homology searches of the sequence databases, we identified X104 as the human tight junction protein ZO-2 gene. We generated a large-scale physical map of the FRDA region by pulsed-field gel electrophoresis analysis of genomic DNA and of three YAC clones derived from different libraries, and we constructed an uninterrupted cosmid contig spanning the FRDA locus. The cAMP-dependent protein kinase {gamma}-catalytic subunit gene wasmore » identified within the critical FRDA interval, but it was excluded as candidate because of its biological properties and because of lack of mutations in FRDA patients. Six new polymorphic markers were isolated between FR2 (D9S886) and FR8 (D9S888), which were used for homozygosity analysis in a family in which parents of an affected child are distantly related. An ancient recombination involving the centromeric FRDA flanking markers had been previously demonstrated in this family. Homozygosity analysis indicated that the FRDA gene is localized in the telomeric 150 kb of the FR2-FR8 interval. 17 refs., 3 figs., 1 tab.« less

  12. Bacterial Artificial Chromosome Libraries for Mouse Sequencing and Functional Analysis

    PubMed Central

    Osoegawa, Kazutoyo; Tateno, Minako; Woon, Peng Yeong; Frengen, Eirik; Mammoser, Aaron G.; Catanese, Joseph J.; Hayashizaki, Yoshihide; de Jong, Pieter J.

    2000-01-01

    Bacterial artificial chromosome (BAC) and P1-derived artificial chromosome (PAC) libraries providing a combined 33-fold representation of the murine genome have been constructed using two different restriction enzymes for genomic digestion. A large-insert PAC library was prepared from the 129S6/SvEvTac strain in a bacterial/mammalian shuttle vector to facilitate functional gene studies. For genome mapping and sequencing, we prepared BAC libraries from the 129S6/SvEvTac and the C57BL/6J strains. The average insert sizes for the three libraries range between 130 kb and 200 kb. Based on the numbers of clones and the observed average insert sizes, we estimate each library to have slightly in excess of 10-fold genome representation. The average number of clones found after hybridization screening with 28 probes was in the range of 9–14 clones per marker. To explore the fidelity of the genomic representation in the three libraries, we analyzed three contigs, each established after screening with a single unique marker. New markers were established from the end sequences and screened against all the contig members to determine if any of the BACs and PACs are chimeric or rearranged. Only one chimeric clone and six potential deletions have been observed after extensive analysis of 113 PAC and BAC clones. Seventy-one of the 113 clones were conclusively nonchimeric because both end markers or sequences were mapped to the other confirmed contig members. We could not exclude chimerism for the remaining 41 clones because one or both of the insert termini did not contain unique sequence to design markers. The low rate of chimerism, ∼1%, and the low level of detected rearrangements support the anticipated usefulness of the BAC libraries for genome research. [The sequence data described in this paper have been submitted to the GenBank data library under accession numbers AQ797173–AQ797398.] PMID:10645956

  13. Soybean Knowledge Base (SoyKB): a Web Resource for Soybean Translational Genomics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Joshi, Trupti; Patil, Kapil; Fitzpatrick, Michael R.

    2012-01-17

    Background: Soybean Knowledge Base (SoyKB) is a comprehensive all-inclusive web resource for soybean translational genomics. SoyKB is designed to handle the management and integration of soybean genomics, transcriptomics, proteomics and metabolomics data along with annotation of gene function and biological pathway. It contains information on four entities, namely genes, microRNAs, metabolites and single nucleotide polymorphisms (SNPs). Methods: SoyKB has many useful tools such as Affymetrix probe ID search, gene family search, multiple gene/ metabolite search supporting co-expression analysis, and protein 3D structure viewer as well as download and upload capacity for experimental data and annotations. It has four tiers ofmore » registration, which control different levels of access to public and private data. It allows users of certain levels to share their expertise by adding comments to the data. It has a user-friendly web interface together with genome browser and pathway viewer, which display data in an intuitive manner to the soybean researchers, producers and consumers. Conclusions: SoyKB addresses the increasing need of the soybean research community to have a one-stop-shop functional and translational omics web resource for information retrieval and analysis in a user-friendly way. SoyKB can be publicly accessed at http://soykb.org/.« less

  14. Whole Genome Sequencing Identifies a 78 kb Insertion from Chromosome 8 as the Cause of Charcot-Marie-Tooth Neuropathy CMTX3

    PubMed Central

    Brewer, Megan H.; Chaudhry, Rabia; Qi, Jessica; Kidambi, Aditi; Drew, Alexander P.; Ryan, Monique M.; Subramanian, Gopinath M.; Young, Helen K.; Zuchner, Stephan; Reddel, Stephen W.; Nicholson, Garth A.; Kennerson, Marina L.

    2016-01-01

    With the advent of whole exome sequencing, cases where no pathogenic coding mutations can be found are increasingly being observed in many diseases. In two large, distantly-related families that mapped to the Charcot-Marie-Tooth neuropathy CMTX3 locus at chromosome Xq26.3-q27.3, all coding mutations were excluded. Using whole genome sequencing we found a large DNA interchromosomal insertion within the CMTX3 locus. The 78 kb insertion originates from chromosome 8q24.3, segregates fully with the disease in the two families, and is absent from the general population as well as 627 neurologically normal chromosomes from in-house controls. Large insertions into chromosome Xq27.1 are known to cause a range of diseases and this is the first neuropathy phenotype caused by an interchromosomal insertion at this locus. The CMTX3 insertion represents an understudied pathogenic structural variation mechanism for inherited peripheral neuropathies. Our finding highlights the importance of considering all structural variation types when studying unsolved inherited peripheral neuropathy cases with no pathogenic coding mutations. PMID:27438001

  15. Cloning and sequence analysis of the meso-diaminopimelate decarboxylase gene from Bacillus methanolicus MGA3 and comparison to other decarboxylase genes.

    PubMed

    Mills, D A; Flickinger, M C

    1993-09-01

    The lysA gene of Bacillus methanolicus MGA3 was cloned by complementation of an auxotrophic Escherichia coli lysA22 mutant with a genomic library of B. methanolicus MGA3 chromosomal DNA. Subcloning localized the B. methanolicus MGA3 lysA gene into a 2.3-kb SmaI-SstI fragment. Sequence analysis of the 2.3-kb fragment indicated an open reading frame encoding a protein of 48,223 Da, which was similar to the meso-diaminopimelate (DAP) decarboxylase amino acid sequences of Bacillus subtilis (62%) and Corynebacterium glutamicum (40%). Amino acid sequence analysis indicated several regions of conservation among bacterial DAP decarboxylases, eukaryotic ornithine decarboxylases, and arginine decarboxylases, suggesting a common structural arrangement for positioning of substrate and the cofactor pyridoxal 5'-phosphate. The B. methanolicus MGA3 DAP decarboxylase was shown to be a dimer (M(r) 86,000) with a subunit molecular mass of approximately 50,000 Da. This decarboxylase is inhibited by lysine (Ki = 0.93 mM) with a Km of 0.8 mM for DAP. The inhibition pattern suggests that the activity of this enzyme in lysine-overproducing strains of B. methanolicus MGA3 may limit lysine synthesis.

  16. Sequence and Analysis of the Tomato JOINTLESS Locus1

    PubMed Central

    Mao, Long; Begum, Dilara; Goff, Stephen A.; Wing, Rod A.

    2001-01-01

    A 119-kb bacterial artificial chromosome from the JOINTLESS locus on the tomato (Lycopersicon esculentum) chromosome 11 contained 15 putative genes. Repetitive sequences in this region include one copia-like LTR retrotransposon, 13 simple sequence repeats, three copies of a novel type III foldback transposon, and four putative short DNA repeats. Database searches showed that the foldback transposon and the short DNA repeats seemed to be associated preferably with genes. The predicted tomato genes were compared with the complete Arabidopsis genome. Eleven out of 15 tomato open reading frames were found to be colinear with segments on five Arabidopsis bacterial artificial chromosome/P1-derived artificial chromosome clones. The synteny patterns, however, did not reveal duplicated segments in Arabidopsis, where over half of the genome is duplicated. Our analysis indicated that the microsynteny between the tomato and Arabidopsis genomes was still conserved at a very small scale but was complicated by the large number of gene families in the Arabidopsis genome. PMID:11457984

  17. Large-Scale Concatenation cDNA Sequencing

    PubMed Central

    Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

    1997-01-01

    A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

  18. PMS2 gene mutational analysis: direct cDNA sequencing to circumvent pseudogene interference.

    PubMed

    Wimmer, Katharina; Wernstedt, Annekatrin

    2014-01-01

    The presence of highly homologous pseudocopies can compromise the mutation analysis of a gene of interest. In particular, when using PCR-based strategies, pseudogene co-amplification has to be effectively prevented. This is often achieved by using primers designed to be parental gene specific according to the reference sequence and by applying stringent PCR conditions. However, there are cases in which this approach is of limited utility. For example, it has been shown that the PMS2 gene exchanges sequences with one of its pseudogenes, named PMS2CL. This results in functional PMS2 alleles containing pseudogene-derived sequences at their 3'-end and in nonfunctional PMS2CL pseudogene alleles that contain gene-derived sequences. Hence, the paralogues cannot be distinguished according to the reference sequence. This shortcoming can be effectively circumvented by using direct cDNA sequencing. This approach is based on the selective amplification of PMS2 transcripts in two overlapping 1.6-kb RT-PCR products. In addition to avoiding pseudogene co-amplification and allele dropout, this method has also the advantage that it allows to effectively identify deletions, splice mutations, and de novo retrotransposon insertions that escape the detection of most DNA-based mutation analysis protocols.

  19. Machine Learned Replacement of N-Labels for Basecalled Sequences in DNA Barcoding.

    PubMed

    Ma, Eddie Y T; Ratnasingham, Sujeevan; Kremer, Stefan C

    2018-01-01

    This study presents a machine learning method that increases the number of identified bases in Sanger Sequencing. The system post-processes a KB basecalled chromatogram. It selects a recoverable subset of N-labels in the KB-called chromatogram to replace with basecalls (A,C,G,T). An N-label correction is defined given an additional read of the same sequence, and a human finished sequence. Corrections are added to the dataset when an alignment determines the additional read and human agree on the identity of the N-label. KB must also rate the replacement with quality value of in the additional read. Corrections are only available during system training. Developing the system, nearly 850,000 N-labels are obtained from Barcode of Life Datasystems, the premier database of genetic markers called DNA Barcodes. Increasing the number of correct bases improves reference sequence reliability, increases sequence identification accuracy, and assures analysis correctness. Keeping with barcoding standards, our system maintains an error rate of percent. Our system only applies corrections when it estimates low rate of error. Tested on this data, our automation selects and recovers: 79 percent of N-labels from COI (animal barcode); 80 percent from matK and rbcL (plant barcodes); and 58 percent from non-protein-coding sequences (across eukaryotes).

  20. Deep functional analysis of synII, a 770 kb synthetic yeast chromosome

    PubMed Central

    Gao, Feng; Gong, Jianhui; Abramczyk, Dariusz; Walker, Roy; Zhao, Hongcui; Chen, Shihong; Liu, Wei; Luo, Yisha; Müller, Carolin A.; Paul-Dubois-Taine, Adrien; Alver, Bonnie; Stracquadanio, Giovanni; Mitchell, Leslie A.; Luo, Zhouqing; Fan, Yanqun; Zhou, Baojin; Wen, Bo; Tan, Fengji; Wang, Yujia; Zi, Jin; Xie, Zexiong; Li, Bingzhi; Yang, Kun; Richardson, Sarah M.; Jiang, Hui; French, Christopher E.; Nieduszynski, Conrad A.; Koszul, Romain; Marston, Adele L.; Yuan, Yingjin; Wang, Jian; Bader, Joel S.; Dai, Junbiao; Boeke, Jef D.; Xu, Xun; Cai, Yizhi; Yang, Huanming

    2017-01-01

    Herein we report the successful design, construction and characterization of a 770 kb synthetic yeast chromosome II (synII). Our study incorporates characterization at multiple levels, including phenomics, transcriptomics, proteomics, chromosome segregation and replication analysis to provide a thorough and comprehensive analysis of a synthetic chromosome. Our “Trans-Omics” analyses reveal a modest but potentially significant pervasive up-regulation of translational machinery observed in synII is mainly caused by the deletion of 13 tRNAs. By both complementation assays and SCRaMbLE, we targeted and debuged the origin of a growth defect at 37°C in glycerol medium, which is related to misregulation of the HOG response. Despite the subtle differences, the synII strain shows highly consistent biological processes comparable to the native strain. PMID:28280153

  1. A 3.0-kb deletion including an erythroid cell-specific regulatory element in intron 1 of the ABO blood group gene in an individual with the Bm phenotype.

    PubMed

    Sano, R; Kuboya, E; Nakajima, T; Takahashi, Y; Takahashi, K; Kubo, R; Kominato, Y; Takeshita, H; Yamao, H; Kishida, T; Isa, K; Ogasawara, K; Uchikawa, M

    2015-04-01

    We developed a sequence-specific primer PCR (SSP-PCR) for detection of a 5.8-kb deletion (B(m) 5.8) involving an erythroid cell-specific regulatory element in intron 1 of the ABO blood group gene. Using this SSP-PCR, we performed genetic analysis of 382 individuals with Bm or ABm. The 5.8-kb deletion was found in 380 individuals, and disruption of the GATA motif in the regulatory element was found in one individual. Furthermore, a novel 3.0-kb deletion involving the element (B(m) 3.0) was demonstrated in the remaining individual. Comparisons of single-nucleotide polymorphisms and microsatellites in intron 1 between B(m) 5.8 and B(m) 3.0 suggested that these deletions occurred independently. © 2014 International Society of Blood Transfusion.

  2. UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.

    PubMed

    Boutet, Emmanuel; Lieberherr, Damien; Tognolli, Michael; Schneider, Michel; Bansal, Parit; Bridge, Alan J; Poux, Sylvain; Bougueleret, Lydie; Xenarios, Ioannis

    2016-01-01

    The Universal Protein Resource (UniProt, http://www.uniprot.org ) consortium is an initiative of the SIB Swiss Institute of Bioinformatics (SIB), the European Bioinformatics Institute (EBI) and the Protein Information Resource (PIR) to provide the scientific community with a central resource for protein sequences and functional information. The UniProt consortium maintains the UniProt KnowledgeBase (UniProtKB), updated every 4 weeks, and several supplementary databases including the UniProt Reference Clusters (UniRef) and the UniProt Archive (UniParc).The Swiss-Prot section of the UniProt KnowledgeBase (UniProtKB/Swiss-Prot) contains publicly available expertly manually annotated protein sequences obtained from a broad spectrum of organisms. Plant protein entries are produced in the frame of the Plant Proteome Annotation Program (PPAP), with an emphasis on characterized proteins of Arabidopsis thaliana and Oryza sativa. High level annotations provided by UniProtKB/Swiss-Prot are widely used to predict annotation of newly available proteins through automatic pipelines.The purpose of this chapter is to present a guided tour of a UniProtKB/Swiss-Prot entry. We will also present some of the tools and databases that are linked to each entry.

  3. Analysis of sequences from field samples reveals the presence of the recently described pepper vein yellows virus (genus Polerovirus) in six additional countries.

    PubMed

    Knierim, Dennis; Tsai, Wen-Shi; Kenyon, Lawrence

    2013-06-01

    Polerovirus infection was detected by reverse transcription polymerase chain reaction (RT-PCR) in 29 pepper plants (Capsicum spp.) and one black nightshade plant (Solanum nigrum) sample collected from fields in India, Indonesia, Mali, Philippines, Thailand and Taiwan. At least two representative samples for each country were selected to generate a general polerovirus RT-PCR product of 1.4 kb length for sequencing. Sequence analysis of the partial genome sequences revealed the presence of pepper vein yellows virus (PeVYV) in all 13 samples. A 1990 Australian herbarium sample of pepper described by serological means as infected with capsicum yellows virus (CYV) was identified by sequence analysis of a partial CP sequence as probably infected with a potato leaf roll virus (PLRV) isolate.

  4. Chromosomal insertion and excision of a 30 kb unstable genetic element is responsible for phase variation of lipopolysaccharide and other virulence determinants in Legionella pneumophila.

    PubMed

    Lüneberg, E; Mayer, B; Daryab, N; Kooistra, O; Zähringer, U; Rohde, M; Swanson, J; Frosch, M

    2001-03-01

    We recently described the phase-variable expression of a virulence-associated lipopolysaccharide (LPS) epitope in Legionella pneumophila. In this study, the molecular mechanism for phase variation was investigated. We identified a 30 kb unstable genetic element as the molecular origin for LPS phase variation. Thirty putative genes were encoded on the 30 kb sequence, organized in two putative opposite transcription units. Some of the open reading frames (ORFs) shared homologies with bacteriophage genes, suggesting that the 30 kb element was of phage origin. In the virulent wild-type strain, the 30 kb element was located on the chromosome, whereas excision from the chromosome and replication as a high-copy plasmid resulted in the mutant phenotype, which is characterized by alteration of an LPS epitope and loss of virulence. Mapping and sequencing of the insertion site in the genome revealed that the chromosomal attachment site was located in an intergenic region flanked by genes of unknown function. As phage release could not be induced by mitomycin C, it is conceivable that the 30 kb element is a non-functional phage remnant. The protein encoded by ORF T on the 30 kb plasmid could be isolated by an outer membrane preparation, indicating that the genes encoded on the 30 kb element are expressed in the mutant phenotype. Therefore, it is conceivable that the phenotypic alterations seen in the mutant depend on high-copy replication of the 30 kb element and expression of the encoded genes. Excision of the 30 kb element from the chromosome was found to occur in a RecA-independent pathway, presumably by the involvement of RecE, RecT and RusA homologues that are encoded on the 30 kb element.

  5. Cloning and sequence analysis of the meso-diaminopimelate decarboxylase gene from Bacillus methanolicus MGA3 and comparison to other decarboxylase genes.

    PubMed Central

    Mills, D A; Flickinger, M C

    1993-01-01

    The lysA gene of Bacillus methanolicus MGA3 was cloned by complementation of an auxotrophic Escherichia coli lysA22 mutant with a genomic library of B. methanolicus MGA3 chromosomal DNA. Subcloning localized the B. methanolicus MGA3 lysA gene into a 2.3-kb SmaI-SstI fragment. Sequence analysis of the 2.3-kb fragment indicated an open reading frame encoding a protein of 48,223 Da, which was similar to the meso-diaminopimelate (DAP) decarboxylase amino acid sequences of Bacillus subtilis (62%) and Corynebacterium glutamicum (40%). Amino acid sequence analysis indicated several regions of conservation among bacterial DAP decarboxylases, eukaryotic ornithine decarboxylases, and arginine decarboxylases, suggesting a common structural arrangement for positioning of substrate and the cofactor pyridoxal 5'-phosphate. The B. methanolicus MGA3 DAP decarboxylase was shown to be a dimer (M(r) 86,000) with a subunit molecular mass of approximately 50,000 Da. This decarboxylase is inhibited by lysine (Ki = 0.93 mM) with a Km of 0.8 mM for DAP. The inhibition pattern suggests that the activity of this enzyme in lysine-overproducing strains of B. methanolicus MGA3 may limit lysine synthesis. Images PMID:8215365

  6. Complete Genome Sequence and Comparative Analysis of the Fish Pathogen Lactococcus garvieae

    PubMed Central

    Oshima, Kenshiro; Yoshizaki, Mariko; Kawanishi, Michiko; Nakaya, Kohei; Suzuki, Takehito; Miyauchi, Eiji; Ishii, Yasuo; Tanabe, Soichi; Murakami, Masaru; Hattori, Masahira

    2011-01-01

    Lactococcus garvieae causes fatal haemorrhagic septicaemia in fish such as yellowtail. The comparative analysis of genomes of a virulent strain Lg2 and a non-virulent strain ATCC 49156 of L. garvieae revealed that the two strains shared a high degree of sequence identity, but Lg2 had a 16.5-kb capsule gene cluster that is absent in ATCC 49156. The capsule gene cluster was composed of 15 genes, of which eight genes are highly conserved with those in exopolysaccharide biosynthesis gene cluster often found in Lactococcus lactis strains. Sequence analysis of the capsule gene cluster in the less virulent strain L. garvieae Lg2-S, Lg2-derived strain, showed that two conserved genes were disrupted by a single base pair deletion, respectively. These results strongly suggest that the capsule is crucial for virulence of Lg2. The capsule gene cluster of Lg2 may be a genomic island from several features such as the presence of insertion sequences flanked on both ends, different GC content from the chromosomal average, integration into the locus syntenic to other lactococcal genome sequences, and distribution in human gut microbiomes. The analysis also predicted other potential virulence factors such as haemolysin. The present study provides new insights into understanding of the virulence mechanisms of L. garvieae in fish. PMID:21829716

  7. A 11.7-kb deletion triggers intersexuality and polledness in goats.

    PubMed

    Pailhoux, E; Vigier, B; Chaffaux, S; Servel, N; Taourit, S; Furet, J P; Fellous, M; Grosclaude, F; Cribiu, E P; Cotinot, C; Vaiman, D

    2001-12-01

    Mammalian sex determination is governed by the presence of the sex determining region Y gene (SRY) on the Y chromosome. Familial cases of SRY-negative XX sex reversal are rare in humans, often hampering the discovery of new sex-determining genes. The mouse model is also insufficient to correctly apprehend the sex-determination cascade, as the human pathway is much more sensitive to gene dosage. Other species might therefore be considered in this respect. In goats, the polled intersex syndrome (PIS) mutation associates polledness and intersexuality. The sex reversal affects exclusively the XX individuals in a recessive manner, whereas the absence of horns is dominant in both sexes. The syndrome is caused by an autosomal gene located at chromosome band 1q43 (ref. 9), shown to be homologous to human chromosome band 3q23 (ref. 10). Through a positional cloning approach, we demonstrate that the mutation underlying PIS is the deletion of a critical 11.7-kb DNA element containing mainly repetitive sequences. This deletion affects the transcription of at least two genes: PISRT1, encoding a 1.5-kb mRNA devoid of open reading frame (ORF), and FOXL2, recently shown to be responsible for blepharophimosis ptosis epicanthus inversus syndrome (BPES) in humans. These two genes are located 20 and 200 kb telomeric from the deletion, respectively.

  8. Sequences in the intergenic spacer influence RNA Pol I transcription from the human rRNA promoter

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, W.M.; Sylvester, J.E.

    1994-09-01

    In most eucaryotic species, ribosomal genes are tandemly repeated about 100-5000 times per haploid genome. The 43 Kb human rDNA repeat consists of a 13 Kb coding region for the 18S, 5.8S, 28S ribosomal RNAs (rRNAs) and transcribed spacers separated by a 30 Kb intergenic spacer. For species such as frog, mouse and rat, sequences in the intergenic spacer other than the gene promoter have been shown to modulate transcription of the ribosomal gene. These sequences are spacer promoters, enhancers and the terminator for spacer transcription. We are addressing whether the human ribosomal gene promoter is similarly influenced. In-vitro transcriptionmore » run-off assays have revealed that the 4.5 kb region (CBE), directly upstream of the gene promoter, has cis-stimulation and trans-competition properties. This suggests that the CBE fragment contains an enhancer(s) for ribosomal gene transcription. Further experiments have shown that a fragment ({approximately}1.6 kb) within the CBE fragment also has trans-competition function. Deletion subclones of this region are being tested to delineate the exact sequences responsible for these modulating activities. Previous sequence analysis and functional studies have revealed that CBE contains regions of DNA capable of adopting alternative structures such as bent DNA, Z-DNA, and triple-stranded DNA. Whether these structures are required for modulating transcription remains to be determined as does the specific DNA-protein interaction involved.« less

  9. Multiplexed resequencing analysis to identify rare variants in pooled DNA with barcode indexing using next-generation sequencer.

    PubMed

    Mitsui, Jun; Fukuda, Yoko; Azuma, Kyo; Tozaki, Hirokazu; Ishiura, Hiroyuki; Takahashi, Yuji; Goto, Jun; Tsuji, Shoji

    2010-07-01

    We have recently found that multiple rare variants of the glucocerebrosidase gene (GBA) confer a robust risk for Parkinson disease, supporting the 'common disease-multiple rare variants' hypothesis. To develop an efficient method of identifying rare variants in a large number of samples, we applied multiplexed resequencing using a next-generation sequencer to identification of rare variants of GBA. Sixteen sets of pooled DNAs from six pooled DNA samples were prepared. Each set of pooled DNAs was subjected to polymerase chain reaction to amplify the target gene (GBA) covering 6.5 kb, pooled into one tube with barcode indexing, and then subjected to extensive sequence analysis using the SOLiD System. Individual samples were also subjected to direct nucleotide sequence analysis. With the optimization of data processing, we were able to extract all the variants from 96 samples with acceptable rates of false-positive single-nucleotide variants.

  10. Haplotype estimation using sequencing reads.

    PubMed

    Delaneau, Olivier; Howie, Bryan; Cox, Anthony J; Zagury, Jean-François; Marchini, Jonathan

    2013-10-03

    High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  11. The Complete Sequence of a Human Parainfluenzavirus 4 Genome

    PubMed Central

    Yea, Carmen; Cheung, Rose; Collins, Carol; Adachi, Dena; Nishikawa, John; Tellier, Raymond

    2009-01-01

    Although the human parainfluenza virus 4 (HPIV4) has been known for a long time, its genome, alone among the human paramyxoviruses, has not been completely sequenced to date. In this study we obtained the first complete genomic sequence of HPIV4 from a clinical isolate named SKPIV4 obtained at the Hospital for Sick Children in Toronto (Ontario, Canada). The coding regions for the N, P/V, M, F and HN proteins show very high identities (95% to 97%) with previously available partial sequences for HPIV4B. The sequence for the L protein and the non-coding regions represent new information. A surprising feature of the genome is its length, more than 17 kb, making it the longest genome within the genus Rubulavirus, although the length is well within the known range of 15 kb to 19 kb for the subfamily Paramyxovirinae. The availability of a complete genomic sequence will facilitate investigations on a respiratory virus that is still not completely characterized. PMID:21994536

  12. Analysis of the 9p21.3 sequence associated with coronary artery disease reveals a tendency for duplication in a CAD patient

    PubMed Central

    Kouprina, Natalay; Noskov, Vladimir N.; Waterfall, Joshua J.; Walker, Robert L.; Meltzer, Paul S.; Topol, Eric J.; Larionov, Vladimir

    2018-01-01

    Tandem segmental duplications (SDs) greater than 10 kb are widespread in complex genomes. They provide material for gene divergence and evolutionary adaptation, while formation of specific de novo SDs is a hallmark of cancer and some human diseases. Most SDs map to distinct genomic regions termed ‘duplication blocks’. SDs organization within these blocks is often poorly characterized as they are mosaics of ancestral duplicons juxtaposed with younger duplicons arising from more recent duplication events. Structural and functional analysis of SDs is further hampered as long repetitive DNA structures are underrepresented in existing BAC and YAC libraries. We applied Transformation-Associated Recombination (TAR) cloning, a versatile technique for large DNA manipulation, to selectively isolate the coronary artery disease (CAD) interval sequence within the 9p21.3 chromosome locus from a patient with coronary artery disease and normal individuals. Four tandem head-to-tail duplicons, each ∼50 kb long, were recovered in the patient but not in normal individuals. Sequence analysis revealed that the repeats varied by 10-15 SNPs between each other and by 82 SNPs between the human genome sequence (version hg19). SNPs polymorphism within the junctions between repeats allowed two junction types to be distinguished, Type 1 and Type 2, which were found at a 2:1 ratio. The junction sequences contained an Alu element, a sequence previously shown to play a role in duplication. Knowledge of structural variation in the CAD interval from more patients could help link this locus to cardiovascular diseases susceptibility, and maybe relevant to other cases of regional amplification, including cancer. PMID:29632643

  13. Fine mapping suggests that the goat Polled Intersex Syndrome and the human Blepharophimosis Ptosis Epicanthus Syndrome map to a 100-kb homologous region.

    PubMed

    Schibler, L; Cribiu, E P; Oustry-Vaiman, A; Furet, J P; Vaiman, D

    2000-03-01

    To clone the goat Polled Intersex Syndrome (PIS) gene(s), a chromosome walk was performed from six entry points at 1q43. This enabled 91 BACs to be recovered from a recently constructed goat BAC library. Six BAC contigs of goat chromosome 1q43 (ICC1-ICC6) were thus constructed covering altogether 4.5 Mb. A total of 37 microsatellite sequences were isolated from this 4.5-Mb region (16 in this study), of which 33 were genotyped and mapped. ICC3 (1500 kb) was shown by genetic analysis to encompass the PIS locus in a approximately 400-kb interval without recombinants detected in the resource families (293 informative meioses). A strong linkage disequilibrium was detected among unrelated animals with the two central markers of the region, suggesting a probable location for PIS in approximately 100 kb. High-resolution comparative mapping with human data shows that this DNA segment is the homolog of the human region associated with Blepharophimosis Ptosis Epicanthus inversus Syndrome (BPES) gene located in 3q23. This finding suggests that homologous gene(s) could be responsible for the pathologies observed in humans and goats.

  14. DNA sequence analysis of the photosynthesis region of Rhodobacter sphaeroides 2.4.1.

    PubMed

    Choudhary, M; Kaplan, S

    2000-02-15

    This paper describes the DNA sequence of the photosynthesis region of Rhodobacter sphaeroides 2.4.1 (T). The photosynthesis gene cluster is located within a approximately 73 kb Ase I genomic DNA fragment containing the puf, puhA, cycA and puc operons. A total of 65 open reading frames (ORFs) have been identified, of which 61 showed significant similarity to genes/proteins of other organisms while only four did not reveal any significant sequence similarity to any gene/protein sequences in the database. The data were compared with the corresponding genes/ORFs from a different strain of R.sphaeroides and Rhodobacter capsulatus, a close relative of R. sphaeroides. A detailed analysis of the gene organization in the photosynthesis region revealed a similar gene order in both species with some notable differences located to the pucBAC = cycA region. In addition, photosynthesis gene regulatory protein (PpsR, FNR, IHF) binding motifs in upstream sequences of a number of photosynthesis genes have been identified and shown to differ between these two species. The difference in gene organization relative to pucBAC and cycA suggests that this region originated independently of the photosynthesis gene cluster of R.sphaeroides.

  15. Comparative Sequence Analysis of the X-Inactivation Center Region in Mouse, Human, and Bovine

    PubMed Central

    Chureau, Corinne; Prissette, Marine; Bourdet, Agnès; Barbe, Valérie; Cattolico, Laurence; Jones, Louis; Eggen, André; Avner, Philip; Duret, Laurent

    2002-01-01

    We have sequenced to high levels of accuracy 714-kb and 233-kb regions of the mouse and bovine X-inactivation centers (Xic), respectively, centered on the Xist gene. This has provided the basis for a fully annotated comparative analysis of the mouse Xic with the 2.3-Mb orthologous region in human and has allowed a three-way species comparison of the core central region, including the Xist gene. These comparisons have revealed conserved genes, both coding and noncoding, conserved CpG islands and, more surprisingly, conserved pseudogenes. The distribution of repeated elements, especially LINE repeats, in the mouse Xic region when compared to the rest of the genome does not support the hypothesis of a role for these repeat elements in the spreading of X inactivation. Interestingly, an asymmetric distribution of LINE elements on the two DNA strands was observed in the three species, not only within introns but also in intergenic regions. This feature is suggestive of important transcriptional activity within these intergenic regions. In silico prediction followed by experimental analysis has allowed four new genes, Cnbp2, Ftx, Jpx, and Ppnx, to be identified and novel, widespread, complex, and apparently noncoding transcriptional activity to be characterized in a region 5′ of Xist that was recently shown to attract histone modification early after the onset of X inactivation. [The sequence data described in this paper have been submitted to the EMBL data library under accession nos. AJ421478, AJ421479, AJ421480, and AJ421481. Online supplemental data are available at http://pbil.univ-lyon1.fr/datasets/Xic2002/data.html and www.genome.org.] PMID:12045143

  16. Structural analysis of two length variants of the rDNA intergenic spacer from Eruca sativa.

    PubMed

    Lakshmikumaran, M; Negi, M S

    1994-03-01

    Restriction enzyme analysis of the rRNA genes of Eruca sativa indicated the presence of many length variants within a single plant and also between different cultivars which is unusual for most crucifers studied so far. Two length variants of the rDNA intergenic spacer (IGS) from a single individual E. sativa (cv. Itsa) plant were cloned and characterized. The complete nucleotide sequences of both the variants (3 kb and 4 kb) were determined. The intergenic spacer contains three families of tandemly repeated DNA sequences denoted as A, B and C. However, the long (4 kb) variant shows the presence of an additional repeat, denoted as D, which is a duplication of a 224 bp sequence just upstream of the putative transcription initiation site. Repeat units belonging to the three different families (A, B and C) were in the size range of 22 to 30 bp. Such short repeat elements are present in the IGS of most of the crucifers analysed so far. Sequence analysis of the variants (3 kb and 4 kb) revealed that the length heterogeneity of the spacer is located at three different regions and is due to the varying copy numbers of repeat units belonging to families A and B. Length variation of the spacer is also due to the presence of a large duplication (D repeats) in the 4 kb variant which is absent in the 3 kb variant. The putative transcription initiation site was identified by comparisons with the rDNA sequences from other plant species.

  17. Fine Mapping Suggests that the Goat Polled Intersex Syndrome and the Human Blepharophimosis Ptosis Epicanthus Syndrome Map to a 100-kb Homologous Region

    PubMed Central

    Schibler, Laurent; Cribiu, Edmond P.; Oustry-Vaiman, Anne; Furet, Jean-Pierre; Vaiman, Daniel

    2000-01-01

    To clone the goat Polled Intersex Syndrome (PIS) gene(s), a chromosome walk was performed from six entry points at 1q43. This enabled 91 BACs to be recovered from a recently constructed goat BAC library. Six BAC contigs of goat chromosome 1q43 (ICC1–ICC6) were thus constructed covering altogether 4.5 Mb. A total of 37 microsatellite sequences were isolated from this 4.5-Mb region (16 in this study), of which 33 were genotyped and mapped. ICC3 (1500 kb) was shown by genetic analysis to encompass the PIS locus in a ∼400-kb interval without recombinants detected in the resource families (293 informative meioses). A strong linkage disequilibrium was detected among unrelated animals with the two central markers of the region, suggesting a probable location for PIS in ∼100 kb. High-resolution comparative mapping with human data shows that this DNA segment is the homolog of the human region associated with Blepharophimosis Ptosis Epicanthus inversus Syndrome (BPES) gene located in 3q23. This finding suggests that homologous gene(s) could be responsible for the pathologies observed in humans and goats. [The sequence data, PCR primers and PCR conditions for STS and microsatellites described in this paper have been submitted to the GenBank data library under accession nos. AQ666547–AQ666579, AQ686084–AQ686129, AQ793920–793931, AQ810429–AQ810527, G41201–G41228, and G54270–G54286.] PMID:10720572

  18. Construction of a plant-transformation-competent BIBAC library and genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.)

    PubMed Central

    2013-01-01

    Background Cotton, one of the world’s leading crops, is important to the world’s textile and energy industries, and is a model species for studies of plant polyploidization, cellulose biosynthesis and cell wall biogenesis. Here, we report the construction of a plant-transformation-competent binary bacterial artificial chromosome (BIBAC) library and comparative genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.) with one of its diploid putative progenitor species, G. raimondii Ulbr. Results We constructed the cotton BIBAC library in a vector competent for high-molecular-weight DNA transformation in different plant species through either Agrobacterium or particle bombardment. The library contains 76,800 clones with an average insert size of 135 kb, providing an approximate 99% probability of obtaining at least one positive clone from the library using a single-copy probe. The quality and utility of the library were verified by identifying BIBACs containing genes important for fiber development, fiber cellulose biosynthesis, seed fatty acid metabolism, cotton-nematode interaction, and bacterial blight resistance. In order to gain an insight into the Upland cotton genome and its relationship with G. raimondii, we sequenced nearly 10,000 BIBAC ends (BESs) randomly selected from the library, generating approximately one BES for every 250 kb along the Upland cotton genome. The retroelement Gypsy/DIRS1 family predominates in the Upland cotton genome, accounting for over 77% of all transposable elements. From the BESs, we identified 1,269 simple sequence repeats (SSRs), of which 1,006 were new, thus providing additional markers for cotton genome research. Surprisingly, comparative sequence analysis showed that Upland cotton is much more diverged from G. raimondii at the genomic sequence level than expected. There seems to be no significant difference between the relationships of the Upland cotton D- and A-subgenomes with the G. raimondii genome

  19. Cloning and sequence analysis of the Antheraea pernyi nucleopolyhedrovirus gp64 gene.

    PubMed

    Wang, Wenbing; Zhu, Shanying; Wang, Liqun; Yu, Feng; Shen, Weide

    2005-12-01

    Frequent outbreaks of the purulence disease of Chinese oak silkworm are reported in Middle and Northeast China. The disease is produced by the pathogen Antheraea pernyi nucleopolyhedrovirus (AnpeNPV). To obtain molecular information of the virus, the polyhedra of AnpeNPV were purified and characterized. The genomic DNA of AnpeNPV was extracted and digested with HindIII. The genome size of AnpeNPV is estimated at 128 kb. Based on the analysis of DNA fragments digested with HindIII, 23 fragments were bigger than 564 bp. A genomic library was generated using HindIII and the positive clones were sequenced and analysed. The gp64 gene, encoding the baculovirus envelope protein GP64, was found in an insert. The nucleotide sequence analysis indicated that the AnpeNPV gp64 gene consists of a 1,530 nucleotide open reading frame (ORF), encoding a protein of 509 amino acids. Of the eight gp64 homologues, the AnpeNPV gp64 ORF shared the most sequence similarity with the gp64 gene of Anticarsia gemmatalis NPV, but not Bombyx mori NPV. The upstream region of the AnpeNPV gp64 ORF encoded the conserved transcriptional elements for early and late stage of the viral infection cycle. These results indicated that AnpeNPV belongs to group I NPV and was far removed in molecular phylogeny from the BmNPV.

  20. Bacteriophage prevalence in the genus Azospirillum and analysis of the first genome sequence of an Azospirillum brasilense integrative phage.

    PubMed

    Boyer, Mickaël; Haurat, Jacqueline; Samain, Sylvie; Segurens, Béatrice; Gavory, Frédérick; González, Víctor; Mavingui, Patrick; Rohr, René; Bally, René; Wisniewski-Dyé, Florence

    2008-02-01

    The prevalence of bacteriophages was investigated in 24 strains of four species of plant growth-promoting rhizobacteria belonging to the genus Azospirillum. Upon induction by mitomycin C, the release of phage particles was observed in 11 strains from three species. Transmission electron microscopy revealed two distinct sizes of particles, depending on the identity of the Azospirillum species, typical of the Siphoviridae family. Pulsed-field gel electrophoresis and hybridization experiments carried out on phage-encapsidated DNAs revealed that all phages isolated from A. lipoferum and A. doebereinerae strains had a size of about 10 kb whereas all phages isolated from A. brasilense strains displayed genome sizes ranging from 62 to 65 kb. Strong DNA hybridizing signals were shown for most phages hosted by the same species whereas no homology was found between phages harbored by different species. Moreover, the complete sequence of the A. brasilense Cd bacteriophage (phiAb-Cd) genome was determined as a double-stranded DNA circular molecule of 62,337 pb that encodes 95 predicted proteins. Only 14 of the predicted proteins could be assigned functions, some of which were involved in DNA processing, phage morphogenesis, and bacterial lysis. In addition, the phiAb-Cd complete genome was mapped as a prophage on a 570-kb replicon of strain A. brasilense Cd, and a region of 27.3 kb of phiAb-Cd was found to be duplicated on the 130-kb pRhico plasmid previously sequenced from A. brasilense Sp7, the parental strain of A. brasilense Cd.

  1. Bacteriophage Prevalence in the Genus Azospirillum and Analysis of the First Genome Sequence of an Azospirillum brasilense Integrative Phage▿

    PubMed Central

    Boyer, Mickaël; Haurat, Jacqueline; Samain, Sylvie; Segurens, Béatrice; Gavory, Frédérick; González, Víctor; Mavingui, Patrick; Rohr, René; Bally, René; Wisniewski-Dyé, Florence

    2008-01-01

    The prevalence of bacteriophages was investigated in 24 strains of four species of plant growth-promoting rhizobacteria belonging to the genus Azospirillum. Upon induction by mitomycin C, the release of phage particles was observed in 11 strains from three species. Transmission electron microscopy revealed two distinct sizes of particles, depending on the identity of the Azospirillum species, typical of the Siphoviridae family. Pulsed-field gel electrophoresis and hybridization experiments carried out on phage-encapsidated DNAs revealed that all phages isolated from A. lipoferum and A. doebereinerae strains had a size of about 10 kb whereas all phages isolated from A. brasilense strains displayed genome sizes ranging from 62 to 65 kb. Strong DNA hybridizing signals were shown for most phages hosted by the same species whereas no homology was found between phages harbored by different species. Moreover, the complete sequence of the A. brasilense Cd bacteriophage (ΦAb-Cd) genome was determined as a double-stranded DNA circular molecule of 62,337 pb that encodes 95 predicted proteins. Only 14 of the predicted proteins could be assigned functions, some of which were involved in DNA processing, phage morphogenesis, and bacterial lysis. In addition, the ΦAb-Cd complete genome was mapped as a prophage on a 570-kb replicon of strain A. brasilense Cd, and a region of 27.3 kb of ΦAb-Cd was found to be duplicated on the 130-kb pRhico plasmid previously sequenced from A. brasilense Sp7, the parental strain of A. brasilense Cd. PMID:18065619

  2. Comparative sequence analysis of the X-inactivation center region in mouse, human, and bovine.

    PubMed

    Chureau, Corinne; Prissette, Marine; Bourdet, Agnès; Barbe, Valérie; Cattolico, Laurence; Jones, Louis; Eggen, André; Avner, Philip; Duret, Laurent

    2002-06-01

    We have sequenced to high levels of accuracy 714-kb and 233-kb regions of the mouse and bovine X-inactivation centers (Xic), respectively, centered on the Xist gene. This has provided the basis for a fully annotated comparative analysis of the mouse Xic with the 2.3-Mb orthologous region in human and has allowed a three-way species comparison of the core central region, including the Xist gene. These comparisons have revealed conserved genes, both coding and noncoding, conserved CpG islands and, more surprisingly, conserved pseudogenes. The distribution of repeated elements, especially LINE repeats, in the mouse Xic region when compared to the rest of the genome does not support the hypothesis of a role for these repeat elements in the spreading of X inactivation. Interestingly, an asymmetric distribution of LINE elements on the two DNA strands was observed in the three species, not only within introns but also in intergenic regions. This feature is suggestive of important transcriptional activity within these intergenic regions. In silico prediction followed by experimental analysis has allowed four new genes, Cnbp2, Ftx, Jpx, and Ppnx, to be identified and novel, widespread, complex, and apparently noncoding transcriptional activity to be characterized in a region 5' of Xist that was recently shown to attract histone modification early after the onset of X inactivation.

  3. Confirmation of a novel siadenovirus species detected in raptors: partial sequence and phylogenetic analysis.

    PubMed

    Kovács, Endre R; Benko, Mária

    2009-03-01

    Partial genome characterisation of a novel adenovirus, found recently in organ samples of multiple species of dead birds of prey, was carried out by sequence analysis of PCR-amplified DNA fragments. The virus, named as raptor adenovirus 1 (RAdV-1), has originally been detected by a nested PCR method with consensus primers targeting the adenoviral DNA polymerase gene. Phylogenetic analysis with the deduced amino acid sequence of the small PCR product has implied a new siadenovirus type present in the samples. Since virus isolation attempts remained unsuccessful, further characterisation of this putative novel siadenovirus was carried out with the use of PCR on the infected organ samples. The DNA sequence of the central genome part of RAdV-1, encompassing nine full (pTP, 52K, pIIIa, III, pVII, pX, pVI, hexon, protease) and two partial (DNA polymerase and DBP) genes and exceeding 12 kb pairs in size, was determined. Phylogenetic tree reconstructions, based on several genes, unambiguously confirmed the preliminary classification of RAdV-1 as a new species within the genus Siadenovirus. Further study of RAdV-1 is of interest since it represents a rare adenovirus genus of yet undetermined host origin.

  4. Formation of a functional maize centromere after loss of centromeric sequences and gain of ectopic sequences.

    PubMed

    Zhang, Bing; Lv, Zhenling; Pang, Junling; Liu, Yalin; Guo, Xiang; Fu, Shulan; Li, Jun; Dong, Qianhua; Wu, Hua-Jun; Gao, Zhi; Wang, Xiu-Jie; Han, Fangpu

    2013-06-01

    The maize (Zea mays) B centromere is composed of B centromere-specific repeats (ZmBs), centromere-specific satellite repeats (CentC), and centromeric retrotransposons of maize (CRM). Here we describe a newly formed B centromere in maize, which has lost CentC sequences and has dramatically reduced CRM and ZmBs sequences, but still retains the molecular features of functional centromeres, such as CENH3, H2A phosphorylation at Thr-133, H3 phosphorylation at Ser-10, and Thr-3 immunostaining signals. This new centromere is stable and can be transmitted to offspring through meiosis. Anti-CENH3 chromatin immunoprecipitation sequencing revealed that a 723-kb region from the short arm of chromosome 9 (9S) was involved in the formation of the new centromere. The 723-kb region, which is gene poor and enriched for transposons, contains two abundant DNA motifs. Genes in the new centromere region are still transcribed. The original 723-kb region showed a higher DNA methylation level compared with native centromeres but was not significantly changed when it was involved in new centromere formation. Our results indicate that functional centromeres may be formed without the known centromere-specific sequences, yet the maintenance of a high DNA methylation level seems to be crucial for the proper function of a new centromere.

  5. Nanopore DNA Sequencing and Genome Assembly on the International Space Station.

    PubMed

    Castro-Wallace, Sarah L; Chiu, Charles Y; John, Kristen K; Stahl, Sarah E; Rubins, Kathleen H; McIntyre, Alexa B R; Dworkin, Jason P; Lupisella, Mark L; Smith, David J; Botkin, Douglas J; Stephenson, Timothy A; Juul, Sissel; Turner, Daniel J; Izquierdo, Fernando; Federman, Scot; Stryke, Doug; Somasekar, Sneha; Alexander, Noah; Yu, Guixia; Mason, Christopher E; Burton, Aaron S

    2017-12-21

    We evaluated the performance of the MinION DNA sequencer in-flight on the International Space Station (ISS), and benchmarked its performance off-Earth against the MinION, Illumina MiSeq, and PacBio RS II sequencing platforms in terrestrial laboratories. Samples contained equimolar mixtures of genomic DNA from lambda bacteriophage, Escherichia coli (strain K12, MG1655) and Mus musculus (female BALB/c mouse). Nine sequencing runs were performed aboard the ISS over a 6-month period, yielding a total of 276,882 reads with no apparent decrease in performance over time. From sequence data collected aboard the ISS, we constructed directed assemblies of the ~4.6 Mb E. coli genome, ~48.5 kb lambda genome, and a representative M. musculus sequence (the ~16.3 kb mitochondrial genome), at 100%, 100%, and 96.7% consensus pairwise identity, respectively; de novo assembly of the E. coli genome from raw reads yielded a single contig comprising 99.9% of the genome at 98.6% consensus pairwise identity. Simulated real-time analyses of in-flight sequence data using an automated bioinformatic pipeline and laptop-based genomic assembly demonstrated the feasibility of sequencing analysis and microbial identification aboard the ISS. These findings illustrate the potential for sequencing applications including disease diagnosis, environmental monitoring, and elucidating the molecular basis for how organisms respond to spaceflight.

  6. Whole genome sequence and comparative analysis of Borrelia burgdorferi MM1

    PubMed Central

    Jabbari, Neda; Reddy, Panga Jaipal; Hood, Leroy

    2018-01-01

    Lyme disease is caused by spirochaetes of the Borrelia burgdorferi sensu lato genospecies. Complete genome assemblies are available for fewer than ten strains of Borrelia burgdorferi sensu stricto, the primary cause of Lyme disease in North America. MM1 is a sensu stricto strain originally isolated in the midwestern United States. Aside from a small number of genes, the complete genome sequence of this strain has not been reported. Here we present the complete genome sequence of MM1 in relation to other sensu stricto strains and in terms of its Multi Locus Sequence Typing. Our results indicate that MM1 is a new sequence type which contains a conserved main chromosome and 15 plasmids. Our results include the first contiguous 28.5 kb assembly of lp28-8, a linear plasmid carrying the vls antigenic variation system, from a Borrelia burgdorferi sensu stricto strain. PMID:29889842

  7. Comparison of the nucleotide and amino acid sequences of the RsrI and EcoRI restriction endonucleases.

    PubMed

    Stephenson, F H; Ballard, B T; Boyer, H W; Rosenberg, J M; Greene, P J

    1989-12-21

    The RsrI endonuclease, a type-II restriction endonuclease (ENase) found in Rhodobacter sphaeroides, is an isoschizomer of the EcoRI ENase. A clone containing an 11-kb BamHI fragment was isolated from an R. sphaeroides genomic DNA library by hybridization with synthetic oligodeoxyribonucleotide probes based on the N-terminal amino acid (aa) sequence of RsrI. Extracts of E. coli containing a subclone of the 11-kb fragment display RsrI activity. Nucleotide sequence analysis reveals an 831-bp open reading frame encoding a polypeptide of 277 aa. A 50% identity exists within a 266-aa overlap between the deduced aa sequences of RsrI and EcoRI. Regions of 75-100% aa sequence identity correspond to key structural and functional regions of EcoRI. The type-II ENases have many common properties, and a common origin might have been expected. Nevertheless, this is the first demonstration of aa sequence similarity between ENases produced by different organisms.

  8. Growth characteristics of Lactobacillus brevis KB290 in the presence of bile.

    PubMed

    Kimoto-Nira, Hiromi; Suzuki, Shigenori; Suganuma, Hiroyuki; Moriya, Naoko; Suzuki, Chise

    2015-10-01

    Live Lactobacillus brevis KB290 have several probiotic activities, including immune stimulation and modulation of intestinal microbial balance. We investigated the adaptation of L. brevis KB290 to bile as a mechanism of intestinal survival. Strain KB290 was grown for 5 days at 37 °C in tryptone-yeast extract-glucose (TYG) broth supplemented with 0.5% sodium acetate (TYGA) containing 0.15%, 0.3%, or 0.5% bile. Growth was determined by absorbance at 620 nm or by dry weight. Growth was enhanced as the broth's bile concentration increased. Bile-enhanced growth was not observed in TYG broth or with xylose or fructose as the carbon source, although strain KB290 could assimilate these sugars. Compared with cells grown without bile, cells grown with bile had twice the cell yield (dry weight) and higher hydrophobicity, which may improve epithelial adhesion. Metabolite analysis revealed that bile induced more lactate production by glycolysis, thus enhancing growth efficiency. Scanning electron microscopy revealed that cells cultured without bile for 5 days in TYGA broth had a shortened rod shape and showed lysis and aggregation, unlike cells cultured for 1 day; cells grown with bile for 5 days had an intact rod shape and rarely appeared damaged. Cellular material leakage through autolysis was lower in the presence of bile than in its absence. Thus lysis of strain KB290 cells cultured for extended periods was suppressed in the presence of bile. This study provides new role of bile and sodium acetate for retaining an intact cell shape and enhancing cell yield, which are beneficial for intestinal survival. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. A novel model for DNA sequence similarity analysis based on graph theory.

    PubMed

    Qi, Xingqin; Wu, Qin; Zhang, Yusen; Fuller, Eddie; Zhang, Cun-Quan

    2011-01-01

    Determination of sequence similarity is one of the major steps in computational phylogenetic studies. As we know, during evolutionary history, not only DNA mutations for individual nucleotide but also subsequent rearrangements occurred. It has been one of major tasks of computational biologists to develop novel mathematical descriptors for similarity analysis such that various mutation phenomena information would be involved simultaneously. In this paper, different from traditional methods (eg, nucleotide frequency, geometric representations) as bases for construction of mathematical descriptors, we construct novel mathematical descriptors based on graph theory. In particular, for each DNA sequence, we will set up a weighted directed graph. The adjacency matrix of the directed graph will be used to induce a representative vector for DNA sequence. This new approach measures similarity based on both ordering and frequency of nucleotides so that much more information is involved. As an application, the method is tested on a set of 0.9-kb mtDNA sequences of twelve different primate species. All output phylogenetic trees with various distance estimations have the same topology, and are generally consistent with the reported results from early studies, which proves the new method's efficiency; we also test the new method on a simulated data set, which shows our new method performs better than traditional global alignment method when subsequent rearrangements happen frequently during evolutionary history.

  10. Construction, Characterization, and Preliminary BAC-End Sequence Analysis of a Bacterial Artificial Chromosome Library of the Tea Plant (Camellia sinensis)

    PubMed Central

    Lin, Jinke; Kudrna, Dave; Wing, Rod A.

    2011-01-01

    We describe the construction and characterization of a publicly available BAC library for the tea plant, Camellia sinensis. Using modified methods, the library was constructed with the aim of developing public molecular resources to advance tea plant genomics research. The library consists of a total of 401,280 clones with an average insert size of 135 kb, providing an approximate coverage of 13.5 haploid genome equivalents. No empty vector clones were observed in a random sampling of 576 BAC clones. Further analysis of 182 BAC-end sequences from randomly selected clones revealed a GC content of 40.35% and low chloroplast and mitochondrial contamination. Repetitive sequence analyses indicated that LTR retrotransposons were the most predominant sequence class (86.93%–87.24%), followed by DNA retrotransposons (11.16%–11.69%). Additionally, we found 25 simple sequence repeats (SSRs) that could potentially be used as genetic markers. PMID:21234344

  11. Identification of herpes simplex virus type 1 proteins encoded within the first 1.5 kb of the latency-associated transcript.

    PubMed

    Henderson, Gail; Jaber, Tareq; Carpenter, Dale; Wechsler, Steven L; Jones, Clinton

    2009-09-01

    Expression of the first 1.5 kb of the latency-associated transcript (LAT) that is encoded by herpes simplex virus type 1 (HSV-1) is sufficient for wild-type (wt) levels of reactivation from latency in small animal models. Peptide-specific immunoglobulin G (IgG) was generated against open reading frames (ORFs) that are located within the first 1.5 kb of LAT coding sequences. Cells stably transfected with LAT or trigeminal ganglionic neurons of mice infected with a LAT expressing virus appeared to express the L2 or L8 ORF. Only L2 ORF expression was readily detected in trigeminal ganglionic neurons of latently infected mice.

  12. Scanning the Effects of Ethyl Methanesulfonate on the Whole Genome of Lotus japonicus Using Second-Generation Sequencing Analysis

    PubMed Central

    Mohd-Yusoff, Nur Fatihah; Ruperao, Pradeep; Tomoyoshi, Nurain Emylia; Edwards, David; Gresshoff, Peter M.; Biswas, Bandana; Batley, Jacqueline

    2015-01-01

    Genetic structure can be altered by chemical mutagenesis, which is a common method applied in molecular biology and genetics. Second-generation sequencing provides a platform to reveal base alterations occurring in the whole genome due to mutagenesis. A model legume, Lotus japonicus ecotype Miyakojima, was chemically mutated with alkylating ethyl methanesulfonate (EMS) for the scanning of DNA lesions throughout the genome. Using second-generation sequencing, two individually mutated third-generation progeny (M3, named AM and AS) were sequenced and analyzed to identify single nucleotide polymorphisms and reveal the effects of EMS on nucleotide sequences in these mutant genomes. Single-nucleotide polymorphisms were found in every 208 kb (AS) and 202 kb (AM) with a bias mutation of G/C-to-A/T changes at low percentage. Most mutations were intergenic. The mutation spectrum of the genomes was comparable in their individual chromosomes; however, each mutated genome has unique alterations, which are useful to identify causal mutations for their phenotypic changes. The data obtained demonstrate that whole genomic sequencing is applicable as a high-throughput tool to investigate genomic changes due to mutagenesis. The identification of these single-point mutations will facilitate the identification of phenotypically causative mutations in EMS-mutated germplasm. PMID:25660167

  13. Cytogenetic and Sequence Analyses of Mitochondrial DNA Insertions in Nuclear Chromosomes of Maize

    PubMed Central

    Lough, Ashley N.; Faries, Kaitlyn M.; Koo, Dal-Hoe; Hussain, Abid; Roark, Leah M.; Langewisch, Tiffany L.; Backes, Teresa; Kremling, Karl A. G.; Jiang, Jiming; Birchler, James A.; Newton, Kathleen J.

    2015-01-01

    The transfer of mitochondrial DNA (mtDNA) into nuclear genomes is a regularly occurring process that has been observed in many species. Few studies, however, have focused on the variation of nuclear-mtDNA sequences (NUMTs) within a species. This study examined mtDNA insertions within chromosomes of a diverse set of Zea mays ssp. mays (maize) inbred lines by the use of fluorescence in situ hybridization. A relatively large NUMT on the long arm of chromosome 9 (9L) was identified at approximately the same position in four inbred lines (B73, M825, HP301, and Oh7B). Further examination of the similarly positioned 9L NUMT in two lines, B73 and M825, indicated that the large size of these sites is due to the presence of a majority of the mitochondrial genome; however, only portions of this NUMT (∼252 kb total) were found in the publically available B73 nuclear sequence for chromosome 9. Fiber-fluorescence in situ hybridization analysis estimated the size of the B73 9L NUMT to be ∼1.8 Mb and revealed that the NUMT is methylated. Two regions of mtDNA (2.4 kb and 3.3 kb) within the 9L NUMT are not present in the B73 mitochondrial NB genome; however, these 2.4-kb and 3.3-kb segments are present in other Zea mitochondrial genomes, including that of Zea mays ssp. parviglumis, a progenitor of domesticated maize. PMID:26333837

  14. Deletion of 2.7 kb near HOXD3 in an Arabian horse with occipitoatlantoaxial malformation

    PubMed Central

    Bordbari, MH; Penedo, MCT; Aleman, M.; Valberg, SJ; Mickelson, J.; Finno, CJ

    2017-01-01

    Summary In the horse, the term occipitoatlantoaxial malformation (OAAM) is used to describe a developmental defect in which the first cervical vertebra (atlas) resembles the base of the skull (occiput) and the second cervical vertebra (axis) resembles the atlas. Affected individuals demonstrate an abnormal posture and varying degrees of ataxia. The homeobox (HOX) gene cluster is involved in the development of both the axial and appendicular skeleton. Hoxd3-null mice demonstrate a strikingly similar phenotype to Arabian foals with OAAM. Whole-genome sequencing was performed in an OAAM-affected horse (OAAM1) and seven unaffected Arabian horses. Visual inspection of the raw reads within the region of HOXD3 identified a 2.7-kb deletion located 4.4 kb downstream of the end of HOXD4 and 8.2 kb upstream of the start of HOXD3. A genotyping assay revealed that both parents of OAAM1 were heterozygous for the deletion. Additional genotyping identified two of 162 heterozygote Arabians, and the deletion was not present in 371 horses of other breeds. Comparative genomics studies have revealed that this region is highly conserved across species and that the entire genomic region between Hoxd4 and Hoxd3 is transcribed in mice. Two additional Arabian foals diagnosed with OAAM (OAAM 2 and 3) were genotyped and did not have the 2.7-kb deletion. Closer examination of the phenotype in these cases revealed notable variation. OAAM3 also had facial malformations and a patent ductus arteriosus, and the actual malformation at the craniocervical junction differed. Genetic heterogeneity may exist across the HOXD locus in Arabian foals with OAAM. PMID:28111759

  15. Complete Genome Sequence of Escherichia coli Strain M8, Isolated from ob/ob Mice

    PubMed Central

    Siddharth, Jay; Membrez, Mathieu; Chakrabarti, Anirikh; Betrisey, Bertrand; Chou, Chieh Jason

    2017-01-01

    ABSTRACT Escherichia coli is one of the common inhabitants of the mammalian gastrointestinal track. We isolated a strain from an ob/ob mouse and performed whole-genome sequencing, which yielded a chromosome of ~5.1 Mb and three plasmids of ~160 kb, ~6 kb, and ~4 kb. PMID:28572322

  16. Genome sequencing and analysis of Yersina pestis KIM D27, an avirulent strain exempt from select agent regulation.

    PubMed

    Losada, Liliana; Varga, John J; Hostetler, Jessica; Radune, Diana; Kim, Maria; Durkin, Scott; Schneewind, Olaf; Nierman, William C

    2011-04-29

    Yersinia pestis is the causative agent of the plague. Y. pestis KIM 10+ strain was passaged and selected for loss of the 102 kb pgm locus, resulting in an attenuated strain, KIM D27. In this study, whole genome sequencing was performed on KIM D27 in order to identify any additional differences. Initial assemblies of 454 data were highly fragmented, and various bioinformatic tools detected between 15 and 465 SNPs and INDELs when comparing both strains, the vast majority associated with A or T homopolymer sequences. Consequently, Illumina sequencing was performed to improve the quality of the assembly. Hybrid sequence assemblies were performed and a total of 56 validated SNP/INDELs and 5 repeat differences were identified in the D27 strain relative to published KIM 10+ sequence. However, further analysis showed that 55 of these SNP/INDELs and 3 repeats were errors in the KIM 10+ reference sequence. We conclude that both 454 and Illumina sequencing were required to obtain the most accurate and rapid sequence results for Y. pestis KIMD27. SNP and INDELS calls were most accurate when both Newbler and CLC Genomics Workbench were employed. For purposes of obtaining high quality genome sequence differences between strains, any identified differences should be verified in both the new and reference genomes.

  17. Molecular analysis of the glucocerebrosidase gene locus

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Winfield, S.L.; Martin, B.M.; Fandino, A.

    1994-09-01

    Gaucher disease is due to a deficiency in the activity of the lysosomal enzyme glucocerebrosidase. Both the functional gene for this enzyme and a pseudogene are located in close proximity on chromosome 1q21. Analysis of the mutations present in patient samples has suggested interaction between the functional gene and the pseudogene in the origin of mutant genotypes. To investigate the involvement of regions flanking the functional gene and pseudogene in the origin of mutations found in Gaucher disease, a YAC clone containing DNA from this locus has been subcloned and characterized. The original YAC containing {approximately}360 kb was truncated withmore » the use of fragmentation plasmids to about 85 kb. A lambda library derived from this YAC was screened to obtain clones containing glucocerebrosidase sequences. PCR amplification was used to identify subclones containing 5{prime}, central, or 3{prime} sequences of the functional gene or of the pseudogene. Clones spanning the entire distance from the last exon of the functional gene to intron 1 of the pseudogene, the 5{prime} end of the functional gene and 16 kb of 5{prime} flanking region and approximately 15 kb of 3{prime} flanking region of the pseudogene were sequenced. Sequence data from 48 kb of intergenic and flanking regions of the glucocerebrosidase gene and its pseudogene has been generated. A large number of Alu sequences and several simple repeats have been found. Two of these repeats exhibit fragment length polymorphism. There is almost 100% homology between the 3{prime} flanking regions of the functional gene and the pseudogene, extending to about 4 kb past the termination codons. A much lower degree of homology is observed in the 5{prime} flanking region. Patient samples are currently being screened for polymorphisms in these flanking regions.« less

  18. Formation of a Functional Maize Centromere after Loss of Centromeric Sequences and Gain of Ectopic Sequences[C][W

    PubMed Central

    Zhang, Bing; Lv, Zhenling; Pang, Junling; Liu, Yalin; Guo, Xiang; Fu, Shulan; Li, Jun; Dong, Qianhua; Wu, Hua-Jun; Gao, Zhi; Wang, Xiu-Jie; Han, Fangpu

    2013-01-01

    The maize (Zea mays) B centromere is composed of B centromere–specific repeats (ZmBs), centromere-specific satellite repeats (CentC), and centromeric retrotransposons of maize (CRM). Here we describe a newly formed B centromere in maize, which has lost CentC sequences and has dramatically reduced CRM and ZmBs sequences, but still retains the molecular features of functional centromeres, such as CENH3, H2A phosphorylation at Thr-133, H3 phosphorylation at Ser-10, and Thr-3 immunostaining signals. This new centromere is stable and can be transmitted to offspring through meiosis. Anti-CENH3 chromatin immunoprecipitation sequencing revealed that a 723-kb region from the short arm of chromosome 9 (9S) was involved in the formation of the new centromere. The 723-kb region, which is gene poor and enriched for transposons, contains two abundant DNA motifs. Genes in the new centromere region are still transcribed. The original 723-kb region showed a higher DNA methylation level compared with native centromeres but was not significantly changed when it was involved in new centromere formation. Our results indicate that functional centromeres may be formed without the known centromere-specific sequences, yet the maintenance of a high DNA methylation level seems to be crucial for the proper function of a new centromere. PMID:23771890

  19. Facile Recovery of Individual High-Molecular-Weight, Low-Copy-Number Natural Plasmids for Genomic Sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Williams, L.E.; Detter, C,; Barrie, K.

    2006-06-01

    Sequencing of the large (>50 kb), low-copy-number (<5 per cell) plasmids that mediate horizontal gene transfer has been hindered by the difficulty and expense of isolating DNA from individual plasmids of this class. We report here that a kit method previously devised for purification of bacterial artificial chromosomes (BACs) can be adapted for effective preparation of individual plasmids up to 220 kb from wild gram-negative and gram-positive bacteria. Individual plasmid DNA recovered from less than 10 ml of Escherichia coli, Staphylococcus, and Corynebacterium cultures was of sufficient quantity and quality for construction of highcoverage libraries, as shown by sequencing fivemore » native plasmids ranging in size from 30 kb to 94 kb. We also report recommendations for vector screening to optimize plasmid sequence assembly, preliminary annotation of novel plasmid genomes, and insights on mobile genetic element biology derived from these sequences. Adaptation of this BAC method for large plasmid isolation removes one major technical hurdle to expanding our knowledge of the natural plasmid gene pool.« less

  20. Nucleotide Sequence and Genetic Structure of a Novel Carbaryl Hydrolase Gene (cehA) from Rhizobium sp. Strain AC100

    PubMed Central

    Hashimoto, Masayuki; Fukui, Mitsuru; Hayano, Kouichi; Hayatsu, Masahito

    2002-01-01

    Rhizobium sp. strain AC100, which is capable of degrading carbaryl (1-naphthyl-N-methylcarbamate), was isolated from soil treated with carbaryl. This bacterium hydrolyzed carbaryl to 1-naphthol and methylamine. Carbaryl hydrolase from the strain was purified to homogeneity, and its N-terminal sequence, molecular mass (82 kDa), and enzymatic properties were determined. The purified enzyme hydrolyzed 1-naphthyl acetate and 4-nitrophenyl acetate indicating that the enzyme is an esterase. We then cloned the carbaryl hydrolase gene (cehA) from the plasmid DNA of the strain and determined the nucleotide sequence of the 10-kb region containing cehA. No homologous sequences were found by a database homology search using the nucleotide and deduced amino acid sequences of the cehA gene. Six open reading frames including the cehA gene were found in the 10-kb region, and sequencing analysis shows that the cehA gene is flanked by two copies of insertion sequence-like sequence, suggesting that it makes part of a composite transposon. PMID:11872471

  1. De novo assembly of human genomes with massively parallel short read sequencing.

    PubMed

    Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun

    2010-02-01

    Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.

  2. 'DNA Strider': a 'C' program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers.

    PubMed Central

    Marck, C

    1988-01-01

    DNA Strider is a new integrated DNA and Protein sequence analysis program written with the C language for the Macintosh Plus, SE and II computers. It has been designed as an easy to learn and use program as well as a fast and efficient tool for the day-to-day sequence analysis work. The program consists of a multi-window sequence editor and of various DNA and Protein analysis functions. The editor may use 4 different types of sequences (DNA, degenerate DNA, RNA and one-letter coded protein) and can handle simultaneously 6 sequences of any type up to 32.5 kB each. Negative numbering of the bases is allowed for DNA sequences. All classical restriction and translation analysis functions are present and can be performed in any order on any open sequence or part of a sequence. The main feature of the program is that the same analysis function can be repeated several times on different sequences, thus generating multiple windows on the screen. Many graphic capabilities have been incorporated such as graphic restriction map, hydrophobicity profile and the CAI plot- codon adaptation index according to Sharp and Li. The restriction sites search uses a newly designed fast hexamer look-ahead algorithm. Typical runtime for the search of all sites with a library of 130 restriction endonucleases is 1 second per 10,000 bases. The circular graphic restriction map of the pBR322 plasmid can be therefore computed from its sequence and displayed on the Macintosh Plus screen within 2 seconds and its multiline restriction map obtained in a scrolling window within 5 seconds. PMID:2832831

  3. The transcriptional terminator sequences downstream of the covR gene terminate covR/S operon transcription to generate covR monocistronic transcripts in Streptococcus pyogenes.

    PubMed

    Chiang-Ni, Chuan; Tsou, Chih-Cheng; Lin, Yee-Shin; Chuang, Woei-Jer; Lin, Ming-T; Liu, Ching-Chuan; Wu, Jiunn-Jong

    2008-12-31

    CovR/S is an important two component regulatory system, which regulates about 15% of the gene expression in Streptococcus pyogenes. The covR/S locus was identified as an operon generating an RNA transcript around 2.5-kb in size. In this study, we found the covR/S operon produced three RNA transcripts (around 2.5-, 1.0-, and 0.8-kb in size). Using RNA transcriptional terminator sequence prediction and transcriptional terminator analysis, we identified two atypical rho-independent terminator sequences downstream of the covR gene and showed these terminator sequences terminate RNA transcription efficiently. These results indicate that covR/S operon generates covR/S transcript and monocistronic covR transcripts.

  4. Direct molecular regulation of the myogenic determination gene Myf5 by Pax3, with modulation by Six1/4 factors, is exemplified by the -111 kb-Myf5 enhancer.

    PubMed

    Daubas, Philippe; Buckingham, Margaret E

    2013-04-15

    The Myf5 gene plays an important role in myogenic determination during mouse embryo development. Multiple genomic regions of the Mrf4-Myf5 locus have been characterised as enhancer sequences responsible for the complex spatiotemporal expression of the Myf5 gene at the onset of myogenesis. These include an enhancer sequence, located at -111 kb upstream of the Myf5 transcription start site, which is responsible of Myf5 activation in ventral somitic domains (Ribas et al., 2011. Dev. Biol. 355, 372-380). We show that the -111 kb-Myf5 enhancer also directs transgene expression in some limb muscles, and is active at foetal as well as embryonic stages. We have carried out further characterisation of the regulation of this enhancer and show that the paired-box Pax3 transcription factor binds to it in vitro as in vivo, and that Pax binding sites are essential for its activity. This requirement is independent of the previously reported regulation by TEAD transcription factors. Six1/4 which, like Pax3, are important upstream regulators of myogenesis, also bind in vivo to sites in the -111 kb-Myf5 enhancer and modulate its activity. The -111 kb-Myf5 enhancer therefore shares common functional characteristics with another Myf5 regulatory sequence, the hypaxial and limb 145 bp-Myf5 enhancer, both being directly regulated in vivo by Pax3 and Six1/4 proteins. However, in the case of the -111 kb-Myf5 enhancer, Six has less effect and we conclude that Pax regulation plays a major role in controlling this aspect of the Myf5 gene expression at the onset of myogenesis in the embryo. Copyright © 2013 Elsevier Inc. All rights reserved.

  5. Microfluidic droplet enrichment for targeted sequencing

    PubMed Central

    Eastburn, Dennis J.; Huang, Yong; Pellegrino, Maurizio; Sciambi, Adam; Ptáček, Louis J.; Abate, Adam R.

    2015-01-01

    Targeted sequence enrichment enables better identification of genetic variation by providing increased sequencing coverage for genomic regions of interest. Here, we report the development of a new target enrichment technology that is highly differentiated from other approaches currently in use. Our method, MESA (Microfluidic droplet Enrichment for Sequence Analysis), isolates genomic DNA fragments in microfluidic droplets and performs TaqMan PCR reactions to identify droplets containing a desired target sequence. The TaqMan positive droplets are subsequently recovered via dielectrophoretic sorting, and the TaqMan amplicons are removed enzymatically prior to sequencing. We demonstrated the utility of this approach by generating an average 31.6-fold sequence enrichment across 250 kb of targeted genomic DNA from five unique genomic loci. Significantly, this enrichment enabled a more comprehensive identification of genetic polymorphisms within the targeted loci. MESA requires low amounts of input DNA, minimal prior locus sequence information and enriches the target region without PCR bias or artifacts. These features make it well suited for the study of genetic variation in a number of research and diagnostic applications. PMID:25873629

  6. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads.

    PubMed

    Wang, Zhiwen; Hobson, Neil; Galindo, Leonardo; Zhu, Shilin; Shi, Daihu; McDill, Joshua; Yang, Linfeng; Hawkins, Simon; Neutelings, Godfrey; Datla, Raju; Lambert, Georgina; Galbraith, David W; Grassa, Christopher J; Geraldes, Armando; Cronk, Quentin C; Cullis, Christopher; Dash, Prasanta K; Kumar, Polumetla A; Cloutier, Sylvie; Sharpe, Andrew G; Wong, Gane K-S; Wang, Jun; Deyholos, Michael K

    2012-11-01

    Flax (Linum usitatissimum) is an ancient crop that is widely cultivated as a source of fiber, oil and medicinally relevant compounds. To accelerate crop improvement, we performed whole-genome shotgun sequencing of the nuclear genome of flax. Seven paired-end libraries ranging in size from 300 bp to 10 kb were sequenced using an Illumina genome analyzer. A de novo assembly, comprised exclusively of deep-coverage (approximately 94× raw, approximately 69× filtered) short-sequence reads (44-100 bp), produced a set of scaffolds with N(50) =694 kb, including contigs with N(50)=20.1 kb. The contig assembly contained 302 Mb of non-redundant sequence representing an estimated 81% genome coverage. Up to 96% of published flax ESTs aligned to the whole-genome shotgun scaffolds. However, comparisons with independently sequenced BACs and fosmids showed some mis-assembly of regions at the genome scale. A total of 43384 protein-coding genes were predicted in the whole-genome shotgun assembly, and up to 93% of published flax ESTs, and 86% of A. thaliana genes aligned to these predicted genes, indicating excellent coverage and accuracy at the gene level. Analysis of the synonymous substitution rates (K(s) ) observed within duplicate gene pairs was consistent with a recent (5-9 MYA) whole-genome duplication in flax. Within the predicted proteome, we observed enrichment of many conserved domains (Pfam-A) that may contribute to the unique properties of this crop, including agglutinin proteins. Together these results show that de novo assembly, based solely on whole-genome shotgun short-sequence reads, is an efficient means of obtaining nearly complete genome sequence information for some plant species. © 2012 The Authors. The Plant Journal © 2012 Blackwell Publishing Ltd.

  7. The 253-kb inversion and deep intronic mutations in UNC13D are present in North American patients with familial hemophagocytic lymphohistiocytosis 3.

    PubMed

    Qian, Yaping; Johnson, Judith A; Connor, Jessica A; Valencia, C Alexander; Barasa, Nathaniel; Schubert, Jeffery; Husami, Ammar; Kissell, Diane; Zhang, Ge; Weirauch, Matthew T; Filipovich, Alexandra H; Zhang, Kejian

    2014-06-01

    The mutations in UNC13D are responsible for familial hemophagocytic lymphohistiocytosis (FHL) type 3. A 253-kb inversion and two deep intronic mutations, c.118-308C > T and c.118-307G > A, in UNC13D were recently reported in European and Asian FHL3 patients. We sought to determine the prevalence of these three non-coding mutations in North American FHL patients and evaluate the significance of examining these new mutations in genetic testing. We performed DNA sequencing of UNC13D and targeted analysis of these three mutations in 1,709 North American patients with a suspected clinical diagnosis of hemophagocytic lymphohistiocytosis (HLH). The 253-kb inversion, intronic mutations c.118-308C > T and c.118-307G > A were found in 11, 15, and 4 patients, respectively, in which the genetic basis (bi-allelic mutations) explained 25 additional patients. Taken together with previously diagnosed FHL3 patients in our HLH patient registry, these three non-coding mutations were found in 31.6% (25/79) of the FHL3 patients. The 253-kb inversion, c.118-308C > T and c.118-307G > A accounted for 7.0%, 8.9%, and 1.3% of mutant alleles, respectively. Significantly, eight novel mutations in UNC13D are being reported in this study. To further evaluate the expression level of the newly reported intronic mutation c.118-307G > A, reverse transcription PCR and Western blot analysis revealed a significant reduction of both RNA and protein levels suggesting that the c.118-307G > A mutation affects transcription. These specified non-coding mutations were found in a significant number of North American patients and inclusion of them in mutation analysis will improve the molecular diagnosis of FHL3. © 2014 Wiley Periodicals, Inc.

  8. A 3.4-kb Copy-Number Deletion near EPAS1 Is Significantly Enriched in High-Altitude Tibetans but Absent from the Denisovan Sequence.

    PubMed

    Lou, Haiyi; Lu, Yan; Lu, Dongsheng; Fu, Ruiqing; Wang, Xiaoji; Feng, Qidi; Wu, Sijie; Yang, Yajun; Li, Shilin; Kang, Longli; Guan, Yaqun; Hoh, Boon-Peng; Chung, Yeun-Jun; Jin, Li; Su, Bing; Xu, Shuhua

    2015-07-02

    Tibetan high-altitude adaptation (HAA) has been studied extensively, and many candidate genes have been reported. Subsequent efforts targeting HAA functional variants, however, have not been that successful (e.g., no functional variant has been suggested for the top candidate HAA gene, EPAS1). With WinXPCNVer, a method developed in this study, we detected in microarray data a Tibetan-enriched deletion (TED) carried by 90% of Tibetans; 50% were homozygous for the deletion, whereas only 3% carried the TED and 0% carried the homozygous deletion in 2,792 worldwide samples (p < 10(-15)). We employed long PCR and Sanger sequencing technologies to determine the exact copy number and breakpoints of the TED in 70 additional Tibetan and 182 diverse samples. The TED had identical boundaries (chr2: 46,694,276-46,697,683; hg19) and was 80 kb downstream of EPAS1. Notably, the TED was in strong linkage disequilibrium (LD; r(2) = 0.8) with EPAS1 variants associated with reduced blood concentrations of hemoglobin. It was also in complete LD with the 5-SNP motif, which was suspected to be introgressed from Denisovans, but the deletion itself was absent from the Denisovan sequence. Correspondingly, we detected that footprints of positive selection for the TED occurred 12,803 (95% confidence interval = 12,075-14,725) years ago. We further whole-genome deep sequenced (>60×) seven Tibetans and verified the TED but failed to identify any other copy-number variations with comparable patterns, giving this TED top priority for further study. We speculate that the specific patterns of the TED resulted from its own functionality in HAA of Tibetans or LD with a functional variant of EPAS1. Copyright © 2015 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  9. A 3.4-kb Copy-Number Deletion near EPAS1 Is Significantly Enriched in High-Altitude Tibetans but Absent from the Denisovan Sequence

    PubMed Central

    Lou, Haiyi; Lu, Yan; Lu, Dongsheng; Fu, Ruiqing; Wang, Xiaoji; Feng, Qidi; Wu, Sijie; Yang, Yajun; Li, Shilin; Kang, Longli; Guan, Yaqun; Hoh, Boon-Peng; Chung, Yeun-Jun; Jin, Li; Su, Bing; Xu, Shuhua

    2015-01-01

    Tibetan high-altitude adaptation (HAA) has been studied extensively, and many candidate genes have been reported. Subsequent efforts targeting HAA functional variants, however, have not been that successful (e.g., no functional variant has been suggested for the top candidate HAA gene, EPAS1). With WinXPCNVer, a method developed in this study, we detected in microarray data a Tibetan-enriched deletion (TED) carried by 90% of Tibetans; 50% were homozygous for the deletion, whereas only 3% carried the TED and 0% carried the homozygous deletion in 2,792 worldwide samples (p < 10−15). We employed long PCR and Sanger sequencing technologies to determine the exact copy number and breakpoints of the TED in 70 additional Tibetan and 182 diverse samples. The TED had identical boundaries (chr2: 46,694,276–46,697,683; hg19) and was 80 kb downstream of EPAS1. Notably, the TED was in strong linkage disequilibrium (LD; r2 = 0.8) with EPAS1 variants associated with reduced blood concentrations of hemoglobin. It was also in complete LD with the 5-SNP motif, which was suspected to be introgressed from Denisovans, but the deletion itself was absent from the Denisovan sequence. Correspondingly, we detected that footprints of positive selection for the TED occurred 12,803 (95% confidence interval = 12,075–14,725) years ago. We further whole-genome deep sequenced (>60×) seven Tibetans and verified the TED but failed to identify any other copy-number variations with comparable patterns, giving this TED top priority for further study. We speculate that the specific patterns of the TED resulted from its own functionality in HAA of Tibetans or LD with a functional variant of EPAS1. PMID:26073780

  10. Genome Sequencing and Analysis of Yersina pestis KIM D27, an Avirulent Strain Exempt from Select Agent Regulation

    PubMed Central

    Losada, Liliana; Varga, John J.; Hostetler, Jessica; Radune, Diana; Kim, Maria; Durkin, Scott; Schneewind, Olaf; Nierman, William C.

    2011-01-01

    Yersinia pestis is the causative agent of the plague. Y. pestis KIM 10+ strain was passaged and selected for loss of the 102 kb pgm locus, resulting in an attenuated strain, KIM D27. In this study, whole genome sequencing was performed on KIM D27 in order to identify any additional differences. Initial assemblies of 454 data were highly fragmented, and various bioinformatic tools detected between 15 and 465 SNPs and INDELs when comparing both strains, the vast majority associated with A or T homopolymer sequences. Consequently, Illumina sequencing was performed to improve the quality of the assembly. Hybrid sequence assemblies were performed and a total of 56 validated SNP/INDELs and 5 repeat differences were identified in the D27 strain relative to published KIM 10+ sequence. However, further analysis showed that 55 of these SNP/INDELs and 3 repeats were errors in the KIM 10+ reference sequence. We conclude that both 454 and Illumina sequencing were required to obtain the most accurate and rapid sequence results for Y. pestis KIMD27. SNP and INDELS calls were most accurate when both Newbler and CLC Genomics Workbench were employed. For purposes of obtaining high quality genome sequence differences between strains, any identified differences should be verified in both the new and reference genomes. PMID:21559501

  11. Sequencing and functional analysis of the nifENXorf1orf2 gene cluster of Herbaspirillum seropedicae.

    PubMed

    Klassen, G; Pedrosa, F O; Souza, E M; Yates, M G; Rigo, L U

    1999-12-01

    A 5.1-kb DNA fragment from the nifHDK region of H. seropedicae was isolated and sequenced. Sequence analysis showed the presence of nifENXorf1orf2 but nifTY were not present. No nif or consensus promoter was identified. Furthermore, orf1 expression occurred only under nitrogen-fixing conditions and no promoter activity was detected between nifK and nifE, suggesting that these genes are expressed from the upstream nifH promoter and are parts of a unique nif operon. Mutagenesis studies indicate that nifN was essential for nitrogenase activity whereas nifXorf1orf2 were not. High homology between the C-terminal region of the NifX and NifB proteins from H. seropedicae was observed. Since the NifX and NifY proteins are important for FeMo cofactor (FeMoco) synthesis, we propose that alternative proteins with similar activities exist in H. seropedicae.

  12. Comparative Maps of Human 19p13.3 and Mouse Chromosome 10 Allow Identification of Sequences at Evolutionary Breakpoints

    PubMed Central

    Puttagunta, Radhika; Gordon, Laurie A.; Meyer, Gary E.; Kapfhamer, David; Lamerdin, Jane E.; Kantheti, Prameela; Portman, Kathleen M.; Chung, Wendy K.; Jenne, Dieter E.; Olsen, Anne S.; Burmeister, Margit

    2000-01-01

    A cosmid/bacterial artificial chromosome (BAC) contiguous (contig) map of human chromosome (HSA) 19p13.3 has been constructed, and over 50 genes have been localized to the contig. Genes and anonymous ESTs from ≈4000 kb of human 19p13.3 were placed on the central mouse chromosome 10 map by genetic mapping and pulsed-field gel electrophoresis (PFGE) analysis. A region of ∼2500 kb of HSA 19p13.3 is collinear to mouse chromosome (MMU) 10. In contrast, the adjacent ≈1200 kb are inverted. Two genes are located in a 50-kb region after the inversion on MMU 10, followed by a region of homology to mouse chromosome 17. The synteny breakpoint and one of the inversion breakpoints has been localized to sequenced regions in human <5 kb in size. Both breakpoints are rich in simple tandem repeats, including (TCTG)n, (CT)n, and (GTCTCT)n, suggesting that simple repeat sequences may be involved in chromosome breaks during evolution. The overall size of the region in mouse is smaller, although no large regions are missing. Comparing the physical maps to the genetic maps showed that in contrast to the higher-than-average rate of genetic recombination in gene-rich telomeric region on HSA 19p13.3, the average rate of recombination is lower than expected in the homologous mouse region. This might indicate that a hot spot of recombination may have been lost in mouse or gained in human during evolution, or that the position of sequences along the chromosome (telomeric compared to the middle of a chromosome) is important for recombination rates. PMID:10984455

  13. The nucleotide sequence and genome organization of Plasmopara halstedii virus.

    PubMed

    Heller-Dohmen, Marion; Göpfert, Jens C; Pfannstiel, Jens; Spring, Otmar

    2011-03-17

    Only very few viruses of Oomycetes have been studied in detail. Isometric virions were found in different isolates of the oomycete Plasmopara halstedii, the downy mildew pathogen of sunflower. However, complete nucleotide sequences and data on the genome organization were lacking. Viral RNA of different P. halstedii isolates was subjected to nucleotide sequencing and analysis of the viral genome. The N-terminal sequence of the viral coat protein was determined using Top-Down MALDI-TOF analysis. The complete nucleotide sequences of both single-stranded RNA segments (RNA1 and RNA2) were established. RNA1 consisted of 2793 nucleotides (nt) exclusive its 3' poly(A) tract and a single open-reading frame (ORF1) of 2745 nt. ORF1 was framed by a 5' untranslated region (5' UTR) of 18 nt and a 3' untranslated region (3' UTR) of 30 nt. ORF1 contained motifs of RNA-dependent RNA polymerases (RdRp) and showed similarities to RdRp of Scleropthora macrospora virus A (SmV A) and viruses within the Nodaviridae family. RNA2 consisted of 1526 nt exclusive its 3' poly(A) tract and a second ORF (ORF2) of 1128 nt. ORF2 coded for the single viral coat protein (CP) and was framed by a 5' UTR of 164 nt and a 3' UTR of 234 nt. The deduced amino acid sequence of ORF2 was verified by nano-LC-ESI-MS/MS experiments. Top-Down MALDI-TOF analysis revealed the N-terminal sequence of the CP. The N-terminal sequence represented a region within ORF2 suggesting a proteolytic processing of the CP in vivo. The CP showed similarities to CP of SmV A and viruses within the Tombusviridae family. Fragments of RNA1 (ca. 1.9 kb) and RNA2 (ca. 1.4 kb) were used to analyze the nucleotide sequence variation of virions in different P. halstedii isolates. Viral sequence variation was 0.3% or less regardless of their host's pathotypes, the geographical origin and the sensitivity towards the fungicide metalaxyl. The results showed the presence of a single and new virus type in different P. halstedii isolates

  14. Licochalcone A induces apoptosis in KB human oral cancer cells via a caspase-dependent FasL signaling pathway

    PubMed Central

    KIM, JAE-SUNG; PARK, MI-RA; LEE, SOOK-YOUNG; KIM, DO KYOUNG; MOON, SUNG-MIN; KIM, CHUN SUNG; CHO, SEUNG SIK; YOON, GOO; IM, HEE-JEONG; YOU, JAE-SEEK; OH, JI-SU; KIM, SU-GWAN

    2014-01-01

    Licochalcone A (Lico-A) is a natural phenol licorice compound with multiple bioactivities, including anti-inflammatory, anti-microbial, anti-fungal and osteogenesis-inducing properties. In the present study, we investigated the Lico-A-induced apoptotic effects and examined the associated apoptosis pathway in KB human oral cancer cells. Lico-A decreased the number of viable KB oral cancer cells. However, Lico-A did not have an effect on primary normal human oral keratinocytes. In addition, the IC50 value of Lico-A was determined to be ~50 μM following dose-dependent stimulation. KB oral cancer cells stimulated with Lico-A for 24 h showed chromatin condensation by DAPI staining, genomic DNA fragmentation by agarose gel electrophoresis and a gradually increased apoptotic cell population by FACS analysis. These data suggest that Lico-A induces apoptosis in KB oral cancer cells. Additionally, Lico-A-induced apoptosis in KB oral cancer cells was mediated by the expression of factor associated suicide ligand (FasL) and activated caspase-8 and −3 and poly(ADP-ribose) polymerase (PARP). Furthermore, in the KB oral cancer cells co-stimulation with a caspase inhibitor (Z-VAD-fmk) and Lico-A significantly abolished the apoptotic phenomena. Our findings demonstrated that Lico-A-induced apoptosis in KB oral cancer cells involves the extrinsic apoptotic signaling pathway, which involves a caspase-dependent FasL-mediated death receptor pathway. Our data suggest that Lico-A be developed as a chemotherapeutic agent for the management of oral cancer. PMID:24337492

  15. Cloning and sequencing of the pheP gene, which encodes the phenylalanine-specific transport system of Escherichia coli.

    PubMed Central

    Pi, J; Wookey, P J; Pittard, A J

    1991-01-01

    The phenylalanine-specific permease gene (pheP) of Escherichia coli has been cloned and sequenced. The gene was isolated on a 6-kb Sau3AI fragment from a chromosomal library, and its presence was verified by complementation of a mutant lacking the functional phenylalanine-specific permease. Subcloning from this fragment localized the pheP gene on a 2.7-kb HindIII-HindII fragment. The nucleotide sequence of this 2.7-kb region was determined. An open reading frame was identified which extends from a putative start point of translation (GTG at position 636) to a termination signal (TAA at position 2010). The assignment of the GTG as the initiation codon was verified by site-directed mutagenesis of the initiation codon and by introducing a chain termination mutation into the pheP-lacZ fusion construct. A single initiation site of transcription 30 bp upstream of the start point of translation was identified by the primer extension analysis. The pheP structural gene consists of 1,374 nucleotides specifying a protein of 458 amino acid residues. The PheP protein is very hydrophobic (71% nonpolar residues). A topological model predicted from the sequence analysis defines 12 transmembrane segments. This protein is highly homologous with the AroP (general aromatic transport) system of E. coli (59.6% identity) and to a lesser extent with the yeast permeases CAN1 (arginine), PUT4 (proline), and HIP1 (histidine) of Saccharomyces cerevisiae. Images PMID:1711024

  16. Breed relationships facilitate fine-mapping studies: A 7.8-kb deletion cosegregates with Collie eye anomaly across multiple dog breeds

    PubMed Central

    Parker, Heidi G.; Kukekova, Anna V.; Akey, Dayna T.; Goldstein, Orly; Kirkness, Ewen F.; Baysac, Kathleen C.; Mosher, Dana S.; Aguirre, Gustavo D.; Acland, Gregory M.; Ostrander, Elaine A.

    2007-01-01

    The features of modern dog breeds that increase the ease of mapping common diseases, such as reduced heterogeneity and extensive linkage disequilibrium, may also increase the difficulty associated with fine mapping and identifying causative mutations. One way to address this problem is by combining data from multiple breeds segregating the same trait after initial linkage has been determined. The multibreed approach increases the number of potentially informative recombination events and reduces the size of the critical haplotype by taking advantage of shortened linkage disequilibrium distances found across breeds. In order to identify breeds that likely share a trait inherited from the same ancestral source, we have used cluster analysis to divide 132 breeds of dog into five primary breed groups. We then use the multibreed approach to fine-map Collie eye anomaly (cea), a complex disorder of ocular development that was initially mapped to a 3.9-cM region on canine chromosome 37. Combined genotypes from affected individuals from four breeds of a single breed group significantly narrowed the candidate gene region to a 103-kb interval spanning only four genes. Sequence analysis revealed that all affected dogs share a homozygous deletion of 7.8 kb in the NHEJ1 gene. This intronic deletion spans a highly conserved binding domain to which several developmentally important proteins bind. This work both establishes that the primary cea mutation arose as a single disease allele in a common ancestor of herding breeds as well as highlights the value of comparative population analysis for refining regions of linkage. PMID:17916641

  17. MiR-214 regulates oral cancer KB cell apoptosis through targeting RASSF5.

    PubMed

    Li, T K; Yin, K; Chen, Z; Bao, Y; Zhang, S X

    2017-03-08

    Ras association domain family member 5 (RASSF5), a member of the Ras association domain family, induces cell apoptosis by phosphorylating FOXO3a, which triggers target gene BIM (pro-apoptotic factor) activation. MiR-214 is overexpressed in oral cancer tissue, indicating its possible involvement in oral cancer pathogenesis. Bioinformatics analysis has revealed a complimentary sequence between miR-214 and the 3'-UTR of RASSF5 mRNA. However, whether miR-124 regulates RASSF5 in oral cancer remains poorly understood. We aimed to investigate the role of miR-214 in RASSF5 expression regulation in oral cancer. Tumor and paracarcinoma tissues were obtained from 48 oral cancer patients to examine miR-214 and RASSF5 expression. The relationship between miR-214 and RASSF5 was investigated by dual luciferase reporter gene assay. Oral cancer KB cells were cultured in vitro and divided into inhibitor NC, miR-214 inhibitor, Scramble-pMD18, RASSF5-pMD18, and miR-214 inhibitor + RASSF5-pMD18 groups. Caspase 3 activity, cell apoptosis, and total protein expression were measured by spectrophotometry, flow cytometry, and western blot, respectively. MiR-214 expression was significantly increased, while that of RASSF5 decreased in oral cancer tumor tissues compared to paracarcinoma tissues. Luciferase assay showed that miR-214 suppressed RASSF5 expression by targeting its 3'-UTR. Down-regulation of miR-214 and/or enhancement of RASSF5 expression markedly increased FOXO3a phosphorylation, BIM expression, caspase 3 activity, and apoptosis. In conclusion, miR-214 expression was elevated and RASSF5 was down-regulated in oral cancer. Moreover, miR-214 regulated KB cell apoptosis through targeted inhibition of RASSF5 expression, FOXO3a phosphorylation, and BIM expression, suggesting its possible application as a novel therapeutic oral cancer target.

  18. Distribution and phylogenetic significance of the 71-kb inversion in the plastid genome in Funariidae (Bryophyta).

    PubMed

    Goffinet, Bernard; Wickett, Norman J; Werner, Olaf; Ros, Rosa Maria; Shaw, A Jonathan; Cox, Cymon J

    2007-04-01

    The recent assembly of the complete sequence of the plastid genome of the model taxon Physcomitrella patens (Funariaceae, Bryophyta) revealed that a 71-kb fragment, encompassing much of the large single copy region, is inverted. This inversion of 57% of the genome is the largest rearrangement detected in the plastid genomes of plants to date. Although initially considered diagnostic of Physcomitrella patens, the inversion was recently shown to characterize the plastid genome of two species from related genera within Funariaceae, but was lacking in another member of Funariidae. The phylogenetic significance of the inversion has remained ambiguous. Exemplars of all families included in Funariidae were surveyed. DNA sequences spanning the inversion break ends were amplified, using primers that anneal to genes on either side of the putative end points of the inversion. Primer combinations were designed to yield a product for either the inverted or the non-inverted architecture. The survey reveals that exemplars of eight genera of Funariaceae, the sole species of Disceliaceae and three generic representatives of Encalyptales all share the 71-kb inversion in the large single copy of the plastid genome. By contrast, the plastid genome of Gigaspermaceae (Funariales) is characterized by a gene order congruent with that described for other mosses, liverworts and hornworts, and hence it does not possess this inversion. The phylogenetic distribution of the inversion in the gene order supports a hypothesis only weakly supported by inferences from sequence data whereby Funariales are paraphyletic, with Funariaceae and Disceliaceae sharing a common ancestor with Encalyptales, and Gigaspermaceae sister to this combined clade. To reflect these relationships, Gigaspermaceae are excluded from Funariales and accommodated in their own order, Gigaspermales order nov., within Funariideae.

  19. Introducing the Forensic Research/Reference on Genetics knowledge base, FROG-kb.

    PubMed

    Rajeevan, Haseena; Soundararajan, Usha; Pakstis, Andrew J; Kidd, Kenneth K

    2012-09-01

    Online tools and databases based on multi-allelic short tandem repeat polymorphisms (STRPs) are actively used in forensic teaching, research, and investigations. The Fst value of each CODIS marker tends to be low across the populations of the world and most populations typically have all the common STRP alleles present diminishing the ability of these systems to discriminate ethnicity. Recently, considerable research is being conducted on single nucleotide polymorphisms (SNPs) to be considered for human identification and description. However, online tools and databases that can be used for forensic research and investigation are limited. The back end DBMS (Database Management System) for FROG-kb is Oracle version 10. The front end is implemented with specific code using technologies such as Java, Java Servlet, JSP, JQuery, and GoogleCharts. We present an open access web application, FROG-kb (Forensic Research/Reference on Genetics-knowledge base, http://frog.med.yale.edu), that is useful for teaching and research relevant to forensics and can serve as a tool facilitating forensic practice. The underlying data for FROG-kb are provided by the already extensively used and referenced ALlele FREquency Database, ALFRED (http://alfred.med.yale.edu). In addition to displaying data in an organized manner, computational tools that use the underlying allele frequencies with user-provided data are implemented in FROG-kb. These tools are organized by the different published SNP/marker panels available. This web tool currently has implemented general functions possible for two types of SNP panels, individual identification and ancestry inference, and a prediction function specific to a phenotype informative panel for eye color. The current online version of FROG-kb already provides new and useful functionality. We expect FROG-kb to grow and expand in capabilities and welcome input from the forensic community in identifying datasets and functionalities that will be most helpful

  20. Introducing the Forensic Research/Reference on Genetics knowledge base, FROG-kb

    PubMed Central

    2012-01-01

    Background Online tools and databases based on multi-allelic short tandem repeat polymorphisms (STRPs) are actively used in forensic teaching, research, and investigations. The Fst value of each CODIS marker tends to be low across the populations of the world and most populations typically have all the common STRP alleles present diminishing the ability of these systems to discriminate ethnicity. Recently, considerable research is being conducted on single nucleotide polymorphisms (SNPs) to be considered for human identification and description. However, online tools and databases that can be used for forensic research and investigation are limited. Methods The back end DBMS (Database Management System) for FROG-kb is Oracle version 10. The front end is implemented with specific code using technologies such as Java, Java Servlet, JSP, JQuery, and GoogleCharts. Results We present an open access web application, FROG-kb (Forensic Research/Reference on Genetics-knowledge base, http://frog.med.yale.edu), that is useful for teaching and research relevant to forensics and can serve as a tool facilitating forensic practice. The underlying data for FROG-kb are provided by the already extensively used and referenced ALlele FREquency Database, ALFRED (http://alfred.med.yale.edu). In addition to displaying data in an organized manner, computational tools that use the underlying allele frequencies with user-provided data are implemented in FROG-kb. These tools are organized by the different published SNP/marker panels available. This web tool currently has implemented general functions possible for two types of SNP panels, individual identification and ancestry inference, and a prediction function specific to a phenotype informative panel for eye color. Conclusion The current online version of FROG-kb already provides new and useful functionality. We expect FROG-kb to grow and expand in capabilities and welcome input from the forensic community in identifying datasets and

  1. Inhibition of the cardiac inward rectifier potassium currents by KB-R7943.

    PubMed

    Abramochkin, Denis V; Alekseeva, Eugenia I; Vornanen, Matti

    2013-09-01

    KB-R7943 (2-[2-[4-(4-nitrobenzyloxy)phenyl]ethyl]isothiourea) was developed as a specific inhibitor of the sarcolemmal sodium-calcium exchanger (NCX) with potential experimental and therapeutic use. However, KB-R7943 is shown to be a potent blocker of several ion currents including inward and delayed rectifier K(+) currents of cardiomyocytes. To further characterize KB-R7943 as a blocker of the cardiac inward rectifiers we compared KB-R7943 sensitivity of the background inward rectifier (IK1) and the carbacholine-induced inward rectifier (IKACh) currents in mammalian (Rattus norvegicus; rat) and fish (Carassius carassius; crucian carp) cardiac myocytes. The basal IK1 of ventricular myocytes was blocked with apparent IC50-values of 4.6×10(-6) M and 3.5×10(-6) M for rat and fish, respectively. IKACh was almost an order of magnitude more sensitive to KB-R7943 than IK1 with IC50-values of 6.2×10(-7) M for rat and 2.5×10(-7) M for fish. The fish cardiac NCX current was half-maximally blocked at the concentration of 1.9-3×10(-6) M in both forward and reversed mode of operation. Thus, the sensitivity of three cardiac currents to KB-R7943 block increases in the order IK1~INCXKB-R7943 to block inward rectifier potassium currents, in particular IKACh, should be taken into account when interpreting the data with this inhibitor from in vivo and in vitro experiments in both mammalian and fish models. © 2013.

  2. An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB.

    PubMed

    Bell, Michael J; Gillespie, Colin S; Swan, Daniel; Lord, Phillip

    2012-09-15

    Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use the UniProt Knowledgebase (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations. By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality. Source code is available at the authors website: http://homepages.cs.ncl.ac.uk/m.j.bell1/annotation. phillip.lord@newcastle.ac.uk.

  3. Construction of an 800-kb contig in the near-centromeric region of the rice blast resistance gene Pi-ta2 using a highly representative rice BAC library.

    PubMed

    Nakamura, S; Asakawa, S; Ohmido, N; Fukui, K; Shimizu, N; Kawasaki, S

    1997-05-01

    We constructed a rice Bacterial Artificial Chromosome (BAC) library from green leaf protoplasts of the cultivar Shimokita harboring the rice blast resistance gene Pi-ta. The average insert size of 155 kb and the library size of seven genome equivalents make it one of the most comprehensive BAC libraries available, and larger than many plant YAC libraries. The library clones were plated on seven high density membranes of microplate size, enabling efficient colony identification in colony hybridization experiments. Seven percent of clones carried chloroplast DNA. By probing with markers close to the blast resistance genes Pi-ta2(closely linked to Pi-ta) and Pi-b, respectively located in the centromeric region of chromosome 12 and near the telomeric end of chromosome 2, on average 2.2 +/- 1.3 and 8.0 +/- 2.6 BAC clones/marker were isolated. Differences in chromosomal structures may contribute to this wide variation in yield. A contig of about 800 kb, consisting of 19 clones, was constructed in the Pi-ta2 region. This region had a high frequency of repetitive sequences. To circumvent this difficulty, we devised a "two-step walking" method. The contig spanned a 300 kb region between markers located at 0 cM and 0.3 cM from Pi-ta. The ratio of physical to genetic distances (> 1,000 kb/cM) was more than three times larger than the average of rice (300 kb/cM). The low recombination rate and high frequency of repetitive sequences may also be related to the near centromeric character of this region. Fluorescent in situ hybridization (FISH) with a BAC clone from the Pi-b region yielded very clear signals on the long arm of chromosome 2, while a clone from the Pi-ta2 region showed various cross-hybridizing signals near the centromeric regions of all chromosomes.

  4. The complete sequence and structural analysis of human apolipoprotein B-100: relationship between apoB-100 and apoB-48 forms.

    PubMed Central

    Cladaras, C; Hadzopoulou-Cladaras, M; Nolte, R T; Atkinson, D; Zannis, V I

    1986-01-01

    formation of the LDL receptor-binding domain of apoB-100. Blotting analysis of intestinal RNA and hybridization of the blots with carboxy apoB cDNA probes produced a single 15-kb hybridization band whereas hybridization with amino terminal probes produced two hybridization bands of 15 and 8 kb. Our data indicate that both forms of apoB mRNA contain common sequences which extend from the amino terminal of apoB-100 to the vicinity of nucleotide residue 6300. These two messages may have resulted from differential splicing of the same primary apoB mRNA transcript. Images Fig. 4. Fig. 6. PMID:3030729

  5. Nucleotide sequence of the Varkud mitochondrial plasmid of Neurospora and synthesis of a hybrid transcript with a 5' leader derived from mitochondrial RNA.

    PubMed

    Akins, R A; Grant, D M; Stohl, L L; Bottorff, D A; Nargang, F E; Lambowitz, A M

    1988-11-05

    The Mauriceville and Varkud mitochondrial plasmids of Neurospora are closely related, closed circular DNAs (3.6 and 3.7 kb, respectively; 1 kb = 10(3) bases or base-pairs), whose characteristics suggest relationships to mitochondrial DNA introns and retrotransposons. Here, we characterized the structure of the Varkud plasmid, determined its complete nucleotide sequence and mapped its major transcripts. The Mauriceville and Varkud plasmids have more than 97% positional identity. Both plasmids contain a 710 amino acid open reading frame that encodes a reverse transcriptase-like protein. The amino acid sequence of this open reading frame is strongly conserved between the two plasmids (701/710 amino acids) as expected for a functionally important protein. Both plasmids have a 0.4 kb region that contains five PstI palindromes and a direct repeat of approximately 160 base-pairs. Comparison of sequences in this region suggests that the Varkud plasmid has diverged less from a common ancestor than has the Mauriceville plasmid. Two major transcripts of the Varkud plasmid were detected by Northern hybridization experiments: a full-length linear RNA of 3.7 kb and an additional prominent transcript of 4.9 kb, 1.2 kb longer than monomer plasmid. Remarkably, we find that the 4.9 kb transcript is a hybrid RNA consisting of the full-length 3.7 kb Varkud plasmid transcript plus a 5' leader of 1.2 kb that is derived from the 5' end of the mitochondrial small rRNA. This and other findings suggest that the Varkud plasmid, like certain RNA viruses, has a mechanism for joining heterologous RNAs to the 5' end of its major transcript, and that, under some circumstances, nucleotide sequences in mitochondria may be recombined at the RNA level.

  6. Sonication-based isolation and enrichment of Chlorella protothecoides chloroplasts for illumina genome sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Angelova, Angelina; Park, Sang-Hycuk; Kyndt, John

    2013-09-01

    With the increasing world demand for biofuel, a number of oleaginous algal species are being considered as renewable sources of oil. Chlorella protothecoides Krüger synthesizes triacylglycerols (TAGs) as storage compounds that can be converted into renewable fuel utilizing an anabolic pathway that is poorly understood. The paucity of algal chloroplast genome sequences has been an important constraint to chloroplast transformation and for studying gene expression in TAGs pathways. In this study, the intact chloroplasts were released from algal cells using sonication followed by sucrose gradient centrifugation, resulting in a 2.36-fold enrichment of chloroplasts from C. protothecoides, based on qPCR analysis.more » The C. protothecoides chloroplast genome (cpDNA) was determined using the Illumina HiSeq 2000 sequencing platform and found to be 84,576 Kb in size (8.57 Kb) in size, with a GC content of 30.8 %. This is the first report of an optimized protocol that uses a sonication step, followed by sucrose gradient centrifugation, to release and enrich intact chloroplasts from a microalga (C. prototheocoides) of sufficient quality to permit chloroplast genome sequencing with high coverage, while minimizing nuclear genome contamination. The approach is expected to guide chloroplast isolation from other oleaginous algal species for a variety of uses that benefit from enrichment of chloroplasts, ranging from biochemical analysis to genomics studies.« less

  7. Isolation and characterization of Y chromosome sequences from the African malaria mosquito Anopheles gambiae.

    PubMed Central

    Krzywinski, Jaroslaw; Nusskern, Deborah R; Kern, Marcia K; Besansky, Nora J

    2004-01-01

    The karyotype of the African malaria mosquito Anopheles gambiae contains two pairs of autosomes and a pair of sex chromosomes. The Y chromosome, constituting approximately 10% of the genome, remains virtually unexplored, despite the recent completion of the A. gambiae genome project. Here we report the identification and characterization of Y chromosome sequences of total length approaching 150 kb. We developed 11 Y-specific PCR markers that consistently yielded male-specific products in specimens from both laboratory colony and natural populations. The markers are characterized by low sequence polymorphism in samples collected across Africa and by presence in more than one copy on the Y. Screening of the A. gambiae BAC library using these markers allowed detection of 90 Y-linked BAC clones. Analysis of the BAC sequences and other Y-derived fragments showed massive accumulation of a few transposable elements. Nevertheless, more complex sequences are apparently present on the Y; these include portions of an approximately 48-kb-long unmapped AAAB01008227 scaffold from the whole genome shotgun assembly. Anopheles Y appears not to harbor any of the genes identified in Drosophila Y. However, experiments suggest that one of the ORFs from the AAAB01008227 scaffold represents a fragment of a gene with male-specific expression. PMID:15082548

  8. Characterization of an Equine α-S2-Casein Variant Due to a 1.3 kb Deletion Spanning Two Coding Exons

    PubMed Central

    Brinkmann, Julia; Koudelka, Tomas; Keppler, Julia K.; Tholey, Andreas; Schwarz, Karin; Thaller, Georg; Tetens, Jens

    2015-01-01

    The production and consumption of mare’s milk in Europe has gained importance, mainly based on positive health effects and a lower allergenic potential as compared to cows’ milk. The allergenicity of milk is to a certain extent affected by different genetic variants. In classical dairy species, much research has been conducted into the genetic variability of milk proteins, but the knowledge in horses is scarce. Here, we characterize two major forms of equine αS2-casein arising from genomic 1.3 kb in-frame deletion involving two coding exons, one of which represents an equid specific duplication. Findings at the DNA-level have been verified by cDNA sequencing from horse milk of mares with different genotypes. At the protein-level, we were able to show by SDS-page and in-gel digestion with subsequent LC-MS analysis that both proteins are actually expressed. The comparison with published sequences of other equids revealed that the deletion has probably occurred before the ancestor of present-day asses and zebras diverged from the horse lineage. PMID:26444874

  9. Sequence Analysis of the Cryptic Plasmid pMG101 from Rhodopseudomonas palustris and Construction of Stable Cloning Vectors

    PubMed Central

    Inui, Masayuki; Roh, Jung Hyeob; Zahn, Kenneth; Yukawa, Hideaki

    2000-01-01

    A 15-kb cryptic plasmid was obtained from a natural isolate of Rhodopseudomonas palustris. The plasmid, designated pMG101, was able to replicate in R. palustris and in closely related strains of Bradyrhizobium japonicum and phototrophic Bradyrhizobium species. However, it was unable to replicate in the purple nonsulfur bacterium Rhodobacter sphaeroides and in Rhizobium species. The replication region of pMG101 was localized to a 3.0-kb SalI-XhoI fragment, and this fragment was stably maintained in R. palustris for over 100 generations in the absence of selection. The complete nucleotide sequence of this fragment revealed two open reading frames (ORFs), ORF1 and ORF2. The deduced amino acid sequence of ORF1 is similar to sequences of Par proteins, which mediate plasmid stability from certain plasmids, while ORF2 was identified as a putative rep gene, coding for an initiator of plasmid replication, based on homology with the Rep proteins of several other plasmids. The function of these sequences was studied by deletion mapping and gene disruptions of ORF1 and ORF2. pMG101-based Escherichia coli-R. palustris shuttle cloning vectors pMG103 and pMG105 were constructed and were stably maintained in R. palustris growing under nonselective conditions. The ability of plasmid pMG101 to replicate in R. palustris and its close phylogenetic relatives should enable broad application of these vectors within this group of α-proteobacteria. PMID:10618203

  10. Nucleotide sequence of the Kaposi sarcoma-associated herpesvirus (HHV8)

    PubMed Central

    Russo, James J.; Bohenzky, Roy A.; Chien, Ming-Cheng; Chen, Jing; Yan, Ming; Maddalena, Dawn; Parry, J. Preston; Peruzzi, Daniela; Edelman, Isidore S.; Chang, Yuan; Moore, Patrick S.

    1996-01-01

    The genome of the Kaposi sarcoma-associated herpesvirus (KSHV or HHV8) was mapped with cosmid and phage genomic libraries from the BC-1 cell line. Its nucleotide sequence was determined except for a 3-kb region at the right end of the genome that was refractory to cloning. The BC-1 KSHV genome consists of a 140.5-kb-long unique coding region flanked by multiple G+C-rich 801-bp terminal repeat sequences. A genomic duplication that apparently arose in the parental tumor is present in this cell culture-derived strain. At least 81 ORFs, including 66 with homology to herpesvirus saimiri ORFs, and 5 internal repeat regions are present in the long unique region. The virus encodes homologs to complement-binding proteins, three cytokines (two macrophage inflammatory proteins and interleukin 6), dihydrofolate reductase, bcl-2, interferon regulatory factors, interleukin 8 receptor, neural cell adhesion molecule-like adhesin, and a D-type cyclin, as well as viral structural and metabolic proteins. Terminal repeat analysis of virus DNA from a KS lesion suggests a monoclonal expansion of KSHV in the KS tumor. PMID:8962146

  11. Sequence Based Structural Characterization and Genetic Diversity Analysis of Full Length TLR4 CDS in Crossbred and Indigenous Cattle.

    PubMed

    Mishra, Chinmoy; Kumar, Subodh; Sonwane, Arvind Asaram; Yathish, H M; Chaudhary, Rajni

    2017-01-02

    The exploration of candidate genes for immune response in cattle may be vital for improving our understanding regarding the species specific response to pathogens. Toll-like receptor 4 (TLR4) is mostly involved in protection against the deleterious effects of Gram negative pathogens. Approximately 2.6 kb long cDNA sequence of TLR4 gene covering the entire coding region was characterized in two Indian milk cattle (Vrindavani and Tharparkar). The phylogenetic analysis confirmed that the bovine TLR4 was apparently evolved from an ancestral form that predated the appearance of vertebrates, and it is grouped with buffalo, yak, and mithun TLR4s. Sequence analysis revealed a 2526-nucleotide long open reading frame (ORF) encoding 841 amino acids, similar to other cattle breeds. The calculated molecular weight of the translated ORF was 96144 and 96040.9 Da; the isoelectric point was 6.35 and 6.42 in Vrindavani and Tharparkar cattle, respectively. The Simple Modular Architecture Research Tool (SMART) analysis identified 14 leucine rich repeats (LRR) motifs in bovine TLR4 protein. The deduced TLR4 amino acid sequence of Tharparkar had 4 different substitutions as compared to Bos taurus, Sahiwal, and Vrindavani. The signal peptide cleavage site predicted to lie between 16th and 17th amino acid of mature peptide. The transmebrane helix was identified between 635-657 amino acids in the mature peptide.

  12. Identification of a 3.0-kb Major Recombination Hotspot in Patients with Sotos Syndrome Who Carry a Common 1.9-Mb Microdeletion

    PubMed Central

    Visser, Remco; Shimokawa, Osamu; Harada, Naoki; Kinoshita, Akira; Ohta, Tohru; Niikawa, Norio; Matsumoto, Naomichi

    2005-01-01

    Sotos syndrome (SoS) is a congenital dysmorphic disorder characterized by overgrowth in childhood, distinctive craniofacial features, and mental retardation. Haploinsufficiency of the NSD1 gene owing to either intragenic mutations or microdeletions is known to be the major cause of SoS. The common ∼2.2-Mb microdeletion encompasses the whole NSD1 gene and neighboring genes and is flanked by low-copy repeats (LCRs). Here, we report the identification of a 3.0-kb major recombination hotspot within these LCRs, in which we mapped deletion breakpoints in 78.7% (37/47) of patients with SoS who carry the common microdeletion. The deletion size was subsequently refined to 1.9 Mb. Sequencing of breakpoint fragments from all 37 patients revealed junctions between a segment of the proximal LCR (PLCR-B) and the corresponding region of the distal LCR (DLCR-2B). PLCR-B and DLCR-2B are the only directly oriented regions, whereas the remaining regions of the PLCR and DLCR are in inverted orientation. The PLCR, with a size of 394.0 kb, and the DLCR, with a size of of 429.8 kb, showed high overall homology (∼98.5%), with an increased sequence similarity (∼99.4%) within the 3.0-kb breakpoint cluster. Several recombination-associated motifs were identified in the hotspot and/or its vicinity. Interestingly, a 10-fold average increase of a translin motif, as compared with the normal distribution within the LCRs, was recognized. Furthermore, a heterozygous inversion of the interval between the LCRs was detected in all fathers of the children carrying a deletion in the paternally derived chromosome. The functional significance of these findings remains to be elucidated. Segmental duplications of the primate genome play a major role in chromosomal evolution. Evolutionary study showed that the duplication of the SoS LCRs occurred 23.3–47.6 million years ago, before the divergence of Old World monkeys. PMID:15580547

  13. The complete chloroplast genome sequence of the CAM epiphyte Spanish moss (Tillandsia usneoides, Bromeliaceae) and its comparative analysis.

    PubMed

    Poczai, Péter; Hyvönen, Jaakko

    2017-01-01

    Spanish moss (Tillandsia usneoides) is an epiphytic bromeliad widely distributed throughout tropical and warm temperate America. This plant is highly adapted to extreme environmental conditions. Striking features of this species include specialized trichomes (scales) covering the surface of its shoots aiding the absorption of water and nutrients directly from the atmosphere and a specific photosynthesis using crassulacean acid metabolism (CAM). Here we report the plastid genome of Spanish moss and present the comparison of genome organization and sequence evolution within Poales. The plastome of Spanish moss has a quadripartite structure consisting of a large single copy (LSC, 87,439 bp), two inverted regions (IRa and IRb, 26,803 bp) and short single copy (SSC, 18,612 bp) region. The plastid genome had 37.2% GC content and 134 genes with 88 being unique protein-coding genes and 20 of these are duplicated in the IR, similar to other reported bromeliads. Our study shows that early diverging lineages of Poales do not have high substitution rates as compared to grasses, and plastid genomes of bromeliads show structural features considered to be ancestral in graminids. These include the loss of the introns in the clpP and rpoC1 genes and the complete loss or partial degradation of accD and ycf genes in the Graminid clade. Further structural rearrangements appeared in the graminids lacking in Spanish moss, which include a 28-kb inversion between the trnG-UCC-rps14 region and 6-kb in the trnG-UCC-psbD, followed by a third <1kb inversion in the trnT sequence.

  14. The Mitochondrial Genome and a 60-kb Nuclear DNA Segment from Naegleria fowleri, the Causative Agent of Primary Amoebic Meningoencephalitis

    PubMed Central

    Herman, Emily K.; Greninger, Alexander L.; Visvesvara, Govinda S.; Marciano-Cabral, Francine; Dacks, Joel B.; Chiu, Charles Y.

    2013-01-01

    Naegleria fowleri is a unicellular eukaryote causing primary amoebic meningoencephalitis, a neuropathic disease killing 99% of those infected, usually within 7–14 days. N. fowleri is found globally in regions including the US and Australia. The genome of the related non-pathogenic species Naegleria gruberi has been sequenced, but the genetic basis for N. fowleri pathogenicity is unclear. To generate such insight, we sequenced and assembled the mitochondrial genome and a 60-kb segment of nuclear genome from N. fowleri. The mitochondrial genome is highly similar to its counterpart in N. gruberi in gene complement and organization, while distinct lack of synteny is observed for the nuclear segments. Even in this short (60-kb) segment, we identified examples of potential factors for pathogenesis, including ten novel N. fowleri-specific genes. We also identified a homologue of cathepsin B; proteases proposed to be involved in the pathogenesis of diverse eukaryotic pathogens, including N. fowleri. Finally, we demonstrate a likely case of horizontal gene transfer between N. fowleri and two unrelated amoebae, one of which causes granulomatous amoebic encephalitis. This initial look into the N. fowleri nuclear genome has revealed several examples of potential pathogenesis factors, improving our understanding of a neglected pathogen of increasing global importance. PMID:23360210

  15. Cloning, sequence analysis, and expression in Escherichia coli of a gene coding for a. beta. -mannanase from the extremely thermophilic bacterium Caldocellum saccharolyticum

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Luethi, E.; Jasmat, N.B.; Grayling, R.A.

    1991-03-01

    A {lambda} recombinant phage expressing {beta}-mannanase activity in Escherichia coli has been isolated from a genomic library of the extremely thermophilic anaerobe Caldocellum saccharolyticum. The gene was cloned into pBR322 on a 5-kb BamHI fragment, and its location was obtained by deletion analysis. The sequence of a 2.1-kb fragment containing the mannanase gene has been determined. One open reading frame was found which could code for a protein of M{sub r} 38,904. The mannanase gene (manA) was overexpressed in E. coli by cloning the gene downstream from the lacZ promoter of pUC18. The enzyme was most active at pH 6more » and 80 C and degraded locust bean gum, guar gum, Pinus radiata glucomannan, and konjak glucomannan. The noncoding region downstream from the mannanase gene showed strong homology to celB, a gene coding for a cellulase from the same organism, suggesting that the manA gene might have been inserted into its present position on the C. saccharolyticum genome by homologous recombination.« less

  16. A Comparison of the First Two Sequenced Chloroplast Genomes in Asteraceae: Lettuce and Sunflower

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Timme, Ruth E.; Kuehl, Jennifer V.; Boore, Jeffrey L.

    2006-01-20

    Asteraceae is the second largest family of plants, with over 20,000 species. For the past few decades, numerous phylogenetic studies have contributed to our understanding of the evolutionary relationships within this family, including comparisons of the fast evolving chloroplast gene, ndhF, rbcL, as well as non-coding DNA from the trnL intron plus the trnLtrnF intergenic spacer, matK, and, with lesser resolution, psbA-trnH. This culminated in a study by Panero and Funk in 2002 that used over 13,000 bp per taxon for the largest taxonomic revision of Asteraceae in over a hundred years. Still, some uncertainties remain, and it would bemore » very useful to have more information on the relative rates of sequence evolution among various genes and on genome structure as a potential set of phylogenetic characters to help guide future phylogenetic structures. By way of contributing to this, we report the first two complete chloroplast genome sequences from members of the Asteraceae, those of Helianthus annuus and Lactuca sativa. These plants belong to two distantly related subfamilies, Asteroideae and Cichorioideae, respectively. In addition to these, there is only one other published chloroplast genome sequence for any plant within the larger group called Eusterids II, that of Panax ginseng (Araliaceae, 156,318 bps, AY582139). Early chloroplast genome mapping studies demonstrated that H. annuus and L. sativa share a 22 kb inversion relative to members of the subfamily Barnadesioideae. By comparison to outgroups, this inversion was shown to be derived, indicating that the Asteroideae and Cichorioideae are more closely related than either is to the Barnadesioideae. Later sequencing study found that taxa that share this 22 kb inversion also contain within this region a second, smaller, 3.3 kb inversion. These sequences also enable an analysis of patterns of shared repeats in the genomes at fine level and of RNA editing by comparison to available EST sequences. In addition

  17. Sequence characterization of S100A8 gene reveals structural differences of protein and transcriptional factor binding sites in water buffalo and yak.

    PubMed

    Kathiravan, P; Goyal, S; Kataria, R S; Mishra, B P; Jayakumar, S; Joshi, B K

    2011-01-01

    The present study was undertaken to characterize the structure of S100A8 gene and its promoter in water buffalo and yak. Sequence data of 2.067 kb, 2.071 kb, and 2.052 kb with respect to complete S100A8 gene including 5' flanking region was generated in river buffalo, swamp buffalo, and yak, respectively. BLAST analysis of coding DNA sequences (CDS) of S100A8 gene revealed 95% homology of buffalo sequence with cattle, 85% with pig and horse, 83% with dog, 72-73% with murines, and around 79% with primates and humans. Phylogenetic analysis of predicted CDS revealed distinct clustering of murines, primates, and domestic animals with bovines and bubalines forming a subcluster among farm animals. In silico translation of predicted CDS revealed a sequence of 89 amino acids with 7 amino acid changes between cattle and buffalo and 2 changes between cattle and yak. The search for Pfam family revealed the N-terminal calcium binding domain and the noncanonical EF hand domain in the carboxy terminus, with more variations being observed in the N-terminal domain among different species. Two amino acid changes observed in carboxy terminal EF hand domain resulted in altered secondary structure of yak S100A8 protein. Analysis of S100A8 gene promoter revealed 14 putative motifs for transcriptional factor binding sites. Two putative motifs viz. C/EBP and v-Myb were found to be absent in swamp buffalo as compared to river buffalo and cattle. Differences in the structure of S100A8 protein and the transcriptional factor binding sites identified in the present study need to be analyzed further for their functional significance in yak and swamp buffalo respectively. Copyright © Taylor & Francis Group, LLC

  18. Cloning, Sequencing, and Characterization of the cgmB Gene of Sinorhizobium meliloti Involved in Cyclic β-Glucan Biosynthesis

    PubMed Central

    Wang, Ping; Ingram-Smith, Cheryl; Hadley, Jill A.; Miller, Karen J.

    1999-01-01

    Periplasmic cyclic β-glucans of Rhizobium species provide important functions during plant infection and hypo-osmotic adaptation. In Sinorhizobium meliloti (also known as Rhizobium meliloti), these molecules are highly modified with phosphoglycerol and succinyl substituents. We have previously identified an S. meliloti Tn5 insertion mutant, S9, which is specifically impaired in its ability to transfer phosphoglycerol substituents to the cyclic β-glucan backbone (M. W. Breedveld, J. A. Hadley, and K. J. Miller, J. Bacteriol. 177:6346–6351, 1995). In the present study, we have cloned, sequenced, and characterized this mutation at the molecular level. By using the Tn5 flanking sequences (amplified by inverse PCR) as a probe, an S. meliloti genomic library was screened, and two overlapping cosmid clones which functionally complement S9 were isolated. A 3.1-kb HindIII-EcoRI fragment found in both cosmids was shown to fully complement mutant S9. Furthermore, when a plasmid containing this 3.1-kb fragment was used to transform Rhizobium leguminosarum bv. trifolii TA-1JH, a strain which normally synthesizes only neutral cyclic β-glucans, anionic glucans containing phosphoglycerol substituents were produced, consistent with the functional expression of an S. meliloti phosphoglycerol transferase gene. Sequence analysis revealed the presence of two major, overlapping open reading frames within the 3.1-kb fragment. Primer extension analysis revealed that one of these open reading frames, ORF1, was transcribed and its transcription was osmotically regulated. This novel locus of S. meliloti is designated the cgm (cyclic glucan modification) locus, and the product encoded by ORF1 is referred to as CgmB. PMID:10419956

  19. An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region.

    PubMed Central

    Ashburner, M; Misra, S; Roote, J; Lewis, S E; Blazej, R; Davis, T; Doyle, C; Galle, R; George, R; Harris, N; Hartzell, G; Harvey, D; Hong, L; Houston, K; Hoskins, R; Johnson, G; Martin, C; Moshrefi, A; Palazzolo, M; Reese, M G; Spradling, A; Tsang, G; Wan, K; Whitelaw, K; Celniker, S

    1999-01-01

    A contiguous sequence of nearly 3 Mb from the genome of Drosophila melanogaster has been sequenced from a series of overlapping P1 and BAC clones. This region covers 69 chromosome polytene bands on chromosome arm 2L, including the genetically well-characterized "Adh region." A computational analysis of the sequence predicts 218 protein-coding genes, 11 tRNAs, and 17 transposable element sequences. At least 38 of the protein-coding genes are arranged in clusters of from 2 to 6 closely related genes, suggesting extensive tandem duplication. The gene density is one protein-coding gene every 13 kb; the transposable element density is one element every 171 kb. Of 73 genes in this region identified by genetic analysis, 49 have been located on the sequence; P-element insertions have been mapped to 43 genes. Ninety-five (44%) of the known and predicted genes match a Drosophila EST, and 144 (66%) have clear similarities to proteins in other organisms. Genes known to have mutant phenotypes are more likely to be represented in cDNA libraries, and far more likely to have products similar to proteins of other organisms, than are genes with no known mutant phenotype. Over 650 chromosome aberration breakpoints map to this chromosome region, and their nonrandom distribution on the genetic map reflects variation in gene spacing on the DNA. This is the first large-scale analysis of the genome of D. melanogaster at the sequence level. In addition to the direct results obtained, this analysis has allowed us to develop and test methods that will be needed to interpret the complete sequence of the genome of this species.Before beginning a Hunt, it is wise to ask someone what you are looking for before you begin looking for it. Milne 1926 PMID:10471707

  20. Sequences of two related multiple antibiotic resistance virulence plasmids sharing a unique IS26-related molecular signature isolated from different Escherichia coli pathotypes from different hosts.

    PubMed

    Venturini, Carola; Hassan, Karl A; Roy Chowdhury, Piklu; Paulsen, Ian T; Walker, Mark J; Djordjevic, Steven P

    2013-01-01

    Enterohemorrhagic Escherichia coli (EHEC) and atypical enteropathogenic E. coli (aEPEC) are important zoonotic pathogens that increasingly are becoming resistant to multiple antibiotics. Here we describe two plasmids, pO26-CRL125 (125 kb) from a human O26:H- EHEC, and pO111-CRL115 (115kb) from a bovine O111 aEPEC, that impart resistance to ampicillin, kanamycin, neomycin, streptomycin, sulfathiazole, trimethoprim and tetracycline and both contain atypical class 1 integrons with an identical IS26-mediated deletion in their 3´-conserved segment. Complete sequence analysis showed that pO26-CRL125 and pO111-CRL115 are essentially identical except for a 9.7 kb fragment, present in the backbone of pO26-CRL125 but absent in pO111-CRL115, and several indels. The 9.7 kb fragment encodes IncI-associated genes involved in plasmid stability during conjugation, a putative transposase gene and three imperfect repeats. Contiguous sequence identical to regions within these pO26-CRL125 imperfect repeats was identified in pO111-CRL115 precisely where the 9.7 kb fragment is missing, suggesting it may be mobile. Sequences shared between the plasmids include a complete IncZ replicon, a unique toxin/antitoxin system, IncI stability and maintenance genes, a novel putative serine protease autotransporter, and an IncI1 transfer system including a unique shufflon. Both plasmids carry a derivate Tn21 transposon with an atypical class 1 integron comprising a dfrA5 gene cassette encoding resistance to trimethoprim, and 24 bp of the 3´-conserved segment followed by Tn6026, which encodes resistance to ampicillin, kanymycin, neomycin, streptomycin and sulfathiazole. The Tn21-derivative transposon is linked to a truncated Tn1721, encoding resistance to tetracycline, via a region containing the IncP-1α oriV. Absence of the 5 bp direct repeats flanking Tn3-family transposons, indicates that homologous recombination events played a key role in the formation of this complex antibiotic resistance

  1. Cloning, sequencing, and expression of the gene coding for bile acid 7 alpha-hydroxysteroid dehydrogenase from Eubacterium sp. strain VPI 12708.

    PubMed Central

    Baron, S F; Franklund, C V; Hylemon, P B

    1991-01-01

    Southern blot analysis indicated that the gene encoding the constitutive, NADP-linked bile acid 7 alpha-hydroxysteroid dehydrogenase of Eubacterium sp. strain VPI 12708 was located on a 6.5-kb EcoRI fragment of the chromosomal DNA. This fragment was cloned into bacteriophage lambda gt11, and a 2.9-kb piece of this insert was subcloned into pUC19, yielding the recombinant plasmid pBH51. DNA sequence analysis of the 7 alpha-hydroxysteroid dehydrogenase gene in pBH51 revealed a 798-bp open reading frame, coding for a protein with a calculated molecular weight of 28,500. A putative promoter sequence and ribosome binding site were identified. The 7 alpha-hydroxysteroid dehydrogenase mRNA transcript in Eubacterium sp. strain VPI 12708 was about 0.94 kb in length, suggesting that it is monocistronic. An Escherichia coli DH5 alpha transformant harboring pBH51 had approximately 30-fold greater levels of 7 alpha-hydroxysteroid dehydrogenase mRNA, immunoreactive protein, and specific activity than Eubacterium sp. strain VPI 12708. The 7 alpha-hydroxysteroid dehydrogenase purified from the pBH51 transformant was similar in subunit molecular weight, specific activity, and kinetic properties to that from Eubacterium sp. strain VPI 12708, and it reached with antiserum raised against the authentic enzyme on Western immunoblots. Alignment of the amino acid sequence of the 7 alpha-hydroxysteroid dehydrogenase with those of 10 other pyridine nucleotide-linked alcohol/polyol dehydrogenases revealed six conserved amino acid residues in the N-terminal regions thought to function in coenzyme binding. Images PMID:1856160

  2. Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies.

    PubMed

    Feuk, Lars; MacDonald, Jeffrey R; Tang, Terence; Carson, Andrew R; Li, Martin; Rao, Girish; Khaja, Razi; Scherer, Stephen W

    2005-10-01

    With a draft genome-sequence assembly for the chimpanzee available, it is now possible to perform genome-wide analyses to identify, at a submicroscopic level, structural rearrangements that have occurred between chimpanzees and humans. The goal of this study was to investigate chromosomal regions that are inverted between the chimpanzee and human genomes. Using the net alignments for the builds of the human and chimpanzee genome assemblies, we identified a total of 1,576 putative regions of inverted orientation, covering more than 154 mega-bases of DNA. The DNA segments are distributed throughout the genome and range from 23 base pairs to 62 mega-bases in length. For the 66 inversions more than 25 kilobases (kb) in length, 75% were flanked on one or both sides by (often unrelated) segmental duplications. Using PCR and fluorescence in situ hybridization we experimentally validated 23 of 27 (85%) semi-randomly chosen regions; the largest novel inversion confirmed was 4.3 mega-bases at human Chromosome 7p14. Gorilla was used as an out-group to assign ancestral status to the variants. All experimentally validated inversion regions were then assayed against a panel of human samples and three of the 23 (13%) regions were found to be polymorphic in the human genome. These polymorphic inversions include 730 kb (at 7p22), 13 kb (at 7q11), and 1 kb (at 16q24) fragments with a 5%, 30%, and 48% minor allele frequency, respectively. Our results suggest that inversions are an important source of variation in primate genome evolution. The finding of at least three novel inversion polymorphisms in humans indicates this type of structural variation may be a more common feature of our genome than previously realized.

  3. Non-contiguous genome sequence of Mycobacterium simiae strain DSM 44165(T.).

    PubMed

    Sassi, Mohamed; Robert, Catherine; Raoult, Didier; Drancourt, Michel

    2013-01-01

    Mycobacterium simiae is a non-tuberculosis mycobacterium causing pulmonary infections in both immunocompetent and imunocompromized patients. We announce the draft genome sequence of M. simiae DSM 44165(T). The 5,782,968-bp long genome with 65.15% GC content (one chromosome, no plasmid) contains 5,727 open reading frames (33% with unknown function and 11 ORFs sizing more than 5000 -bp), three rRNA operons, 52 tRNA, one 66-bp tmRNA matching with tmRNA tags from Mycobacterium avium, Mycobacterium tuberculosis, Mycobacterium bovis, Mycobacterium microti, Mycobacterium marinum, and Mycobacterium africanum and 389 DNA repetitive sequences. Comparing ORFs and size distribution between M. simiae and five other Mycobacterium species M. simiae clustered with M. abscessus and M. smegmatis. A 40-kb prophage was predicted in addition to two prophage-like elements, 7-kb and 18-kb in size, but no mycobacteriophage was seen after the observation of 10(6) M. simiae cells. Fifteen putative CRISPRs were found. Three genes were predicted to encode resistance to aminoglycosides, betalactams and macrolide-lincosamide-streptogramin B. A total of 163 CAZYmes were annotated. M. simiae contains ESX-1 to ESX-5 genes encoding for a type-VII secretion system. Availability of the genome sequence may help depict the unique properties of this environmental, opportunistic pathogen.

  4. The complete chloroplast genome sequence of the CAM epiphyte Spanish moss (Tillandsia usneoides, Bromeliaceae) and its comparative analysis

    PubMed Central

    Hyvönen, Jaakko

    2017-01-01

    Spanish moss (Tillandsia usneoides) is an epiphytic bromeliad widely distributed throughout tropical and warm temperate America. This plant is highly adapted to extreme environmental conditions. Striking features of this species include specialized trichomes (scales) covering the surface of its shoots aiding the absorption of water and nutrients directly from the atmosphere and a specific photosynthesis using crassulacean acid metabolism (CAM). Here we report the plastid genome of Spanish moss and present the comparison of genome organization and sequence evolution within Poales. The plastome of Spanish moss has a quadripartite structure consisting of a large single copy (LSC, 87,439 bp), two inverted regions (IRa and IRb, 26,803 bp) and short single copy (SSC, 18,612 bp) region. The plastid genome had 37.2% GC content and 134 genes with 88 being unique protein-coding genes and 20 of these are duplicated in the IR, similar to other reported bromeliads. Our study shows that early diverging lineages of Poales do not have high substitution rates as compared to grasses, and plastid genomes of bromeliads show structural features considered to be ancestral in graminids. These include the loss of the introns in the clpP and rpoC1 genes and the complete loss or partial degradation of accD and ycf genes in the Graminid clade. Further structural rearrangements appeared in the graminids lacking in Spanish moss, which include a 28-kb inversion between the trnG-UCC–rps14 region and 6-kb in the trnG-UCC–psbD, followed by a third <1kb inversion in the trnT sequence. PMID:29095905

  5. Draft genome sequence of ramie, Boehmeria nivea (L.) Gaudich.

    PubMed

    Luan, Ming-Bao; Jian, Jian-Bo; Chen, Ping; Chen, Jun-Hui; Chen, Jian-Hua; Gao, Qiang; Gao, Gang; Zhou, Ju-Hong; Chen, Kun-Mei; Guang, Xuan-Min; Chen, Ji-Kang; Zhang, Qian-Qian; Wang, Xiao-Fei; Fang, Long; Sun, Zhi-Min; Bai, Ming-Zhou; Fang, Xiao-Dong; Zhao, Shan-Cen; Xiong, He-Ping; Yu, Chun-Ming; Zhu, Ai-Guo

    2018-05-01

    Ramie, Boehmeria nivea (L.) Gaudich, family Urticaceae, is a plant native to eastern Asia, and one of the world's oldest fibre crops. It is also used as animal feed and for the phytoremediation of heavy metal-contaminated farmlands. Thus, the genome sequence of ramie was determined to explore the molecular basis of its fibre quality, protein content and phytoremediation. For further understanding ramie genome, different paired-end and mate-pair libraries were combined to generate 134.31 Gb of raw DNA sequences using the Illumina whole-genome shotgun sequencing approach. The highly heterozygous B. nivea genome was assembled using the Platanus Genome Assembler, which is an effective tool for the assembly of highly heterozygous genome sequences. The final length of the draft genome of this species was approximately 341.9 Mb (contig N50 = 22.62 kb, scaffold N50 = 1,126.36 kb). Based on ramie genome annotations, 30,237 protein-coding genes were predicted, and the repetitive element content was 46.3%. The completeness of the final assembly was evaluated by benchmarking universal single-copy orthologous genes (BUSCO); 90.5% of the 1,440 expected embryophytic genes were identified as complete, and 4.9% were identified as fragmented. Phylogenetic analysis based on single-copy gene families and one-to-one orthologous genes placed ramie with mulberry and cannabis, within the clade of urticalean rosids. Genome information of ramie will be a valuable resource for the conservation of endangered Boehmeria species and for future studies on the biogeography and characteristic evolution of members of Urticaceae. © 2018 John Wiley & Sons Ltd.

  6. Autonomous replication and addition of telomerelike sequences to DNA microinjected into Paramecium tetraurelia macronuclei.

    PubMed Central

    Gilley, D; Preer, J R; Aufderheide, K J; Polisky, B

    1988-01-01

    Paramecium tetraurelia can be transformed by microinjection of cloned serotype A gene sequences into the macronucleus. Transformants are detected by their ability to express serotype A surface antigen from the injected templates. After injection, the DNA is converted from a supercoiled form to a linear form by cleavage at nonrandom sites. The linear form appears to replicate autonomously as a unit-length molecule and is present in transformants at high copy number. The injected DNA is further processed by the addition of paramecium-type telomeric sequences to the termini of the linear DNA. To examine the fate of injected linear DNA molecules, plasmid pSA14SB DNA containing the A gene was cleaved into two linear pieces, a 14-kilobase (kb) piece containing the A gene and flanking sequences and a 2.2-kb piece consisting of the procaryotic vector. In transformants expressing the A gene, we observed that two linear DNA species were present which correspond to the two species injected. Both species had Paramecium telomerelike sequences added to their termini. For the 2.2-kb DNA, we show that the site of addition of the telomerelike sequences is directly at one terminus and within one nucleotide of the other terminus. These results indicate that injected procaryotic DNA is capable of autonomous replication in Paramecium macronuclei and that telomeric addition in the macronucleus does not require specific recognition sequences. Images PMID:3211128

  7. Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome

    PubMed Central

    2011-01-01

    Background One of the key goals of oak genomics research is to identify genes of adaptive significance. This information may help to improve the conservation of adaptive genetic variation and the management of forests to increase their health and productivity. Deep-coverage large-insert genomic libraries are a crucial tool for attaining this objective. We report herein the construction of a BAC library for Quercus robur, its characterization and an analysis of BAC end sequences. Results The EcoRI library generated consisted of 92,160 clones, 7% of which had no insert. Levels of chloroplast and mitochondrial contamination were below 3% and 1%, respectively. Mean clone insert size was estimated at 135 kb. The library represents 12 haploid genome equivalents and, the likelihood of finding a particular oak sequence of interest is greater than 99%. Genome coverage was confirmed by PCR screening of the library with 60 unique genetic loci sampled from the genetic linkage map. In total, about 20,000 high-quality BAC end sequences (BESs) were generated by sequencing 15,000 clones. Roughly 5.88% of the combined BAC end sequence length corresponded to known retroelements while ab initio repeat detection methods identified 41 additional repeats. Collectively, characterized and novel repeats account for roughly 8.94% of the genome. Further analysis of the BESs revealed 1,823 putative genes suggesting at least 29,340 genes in the oak genome. BESs were aligned with the genome sequences of Arabidopsis thaliana, Vitis vinifera and Populus trichocarpa. One putative collinear microsyntenic region encoding an alcohol acyl transferase protein was observed between oak and chromosome 2 of V. vinifera. Conclusions This BAC library provides a new resource for genomic studies, including SSR marker development, physical mapping, comparative genomics and genome sequencing. BES analysis provided insight into the structure of the oak genome. These sequences will be used in the assembly of a

  8. Regulatory sequence analysis tools.

    PubMed

    van Helden, Jacques

    2003-07-01

    The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.

  9. The mitochondrial genome and a 60-kb nuclear DNA segment from Naegleria fowleri, the causative agent of primary amoebic meningoencephalitis.

    PubMed

    Herman, Emily K; Greninger, Alexander L; Visvesvara, Govinda S; Marciano-Cabral, Francine; Dacks, Joel B; Chiu, Charles Y

    2013-01-01

    Naegleria fowleri is a unicellular eukaryote causing primary amoebic meningoencephalitis, a neuropathic disease killing 99% of those infected, usually within 7-14 days. Naegleria fowleri is found globally in regions including the US and Australia. The genome of the related nonpathogenic species Naegleria gruberi has been sequenced, but the genetic basis for N. fowleri pathogenicity is unclear. To generate such insight, we sequenced and assembled the mitochondrial genome and a 60-kb segment of nuclear genome from N. fowleri. The mitochondrial genome is highly similar to its counterpart in N. gruberi in gene complement and organization, while distinct lack of synteny is observed for the nuclear segments. Even in this short (60-kb) segment, we identified examples of potential factors for pathogenesis, including ten novel N. fowleri-specific genes. We also identified a homolog of cathepsin B; proteases proposed to be involved in the pathogenesis of diverse eukaryotic pathogens, including N. fowleri. Finally, we demonstrate a likely case of horizontal gene transfer between N. fowleri and two unrelated amoebae, one of which causes granulomatous amoebic encephalitis. This initial look into the N. fowleri nuclear genome has revealed several examples of potential pathogenesis factors, improving our understanding of a neglected pathogen of increasing global importance. © 2013 The Author(s) Journal of Eukaryotic Microbiology © 2013 International Society of Protistologists.

  10. Identification of two small RNAs within the first 1.5-kb of the herpes simplex virus type 1-encoded latency-associated transcript.

    PubMed

    Peng, Weiping; Vitvitskaia, Olga; Carpenter, Dale; Wechsler, Steven L; Jones, Clinton

    2008-01-01

    The herpes simplex virus type 1 (HSV-1) latency-associated transcript (LAT) is abundantly expressed in latently infected neurons. In the rabbit or mouse ocular models of infection, expression of the first 1.5 kb of LAT coding sequences is sufficient for and necessary for wild-type levels of spontaneous reactivation from latency. The antiapoptosis functions of LAT, which maps to the same 1.5 kb of LAT, are important for the latency-reactivation cycle because replacement of LAT with other antiapoptosis genes (the baculovirus IAP gene or the bovine herpesvirus type 1 latency-related gene) restores wild-type levels of reactivation to a LAT null mutant. A recent study identified a micro-RNA within LAT that can inhibit apoptosis (Gupta et al, Nature 442: 82-85). In this study, the authors analyzed the first 1.5 kb of LAT for additional small RNAs that may have regulatory functions. Two LAT-specific small RNAs were detected in productively infected human neuroblastoma cells within the first 1.5 kb of LAT, in a region that is important for inhibiting apoptosis. Although these small RNAs possess extensive secondary structure and a stem-loop structure, bands migrating near 23 bases were not detected suggesting these small RNAs are not true micro-RNAs. Both of the small LAT-specific RNAs have the potential to base pair with the ICP4 mRNA. These two small LAT RNAs may play a role in the latency-reactivation cycle by reducing apoptosis and/or by reducing ICP4 RNA expression.

  11. The human MCP-2 gene (SCYA8): Cloning, sequence analysis, tissue expression, and assignment to the CC chemokine gene contig on chromosome 17q11.2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Van Coillie, E.; Fiten, P.; Van Damme, J.

    1997-03-01

    Monocyte chemotactic proteins (MCPs) form a subfamily of chemokines that recruit leukocytes to sites of inflammation and that may contribute to tumor-associated leukocyte infiltration and to the antiviral state against HIV infection. With the use of degenerate primers that were based on CC chemokine consensus sequences, the known MIP-1{alpha}/LD78{alpha}, MCP-1, and MCP-3 genes and the previously unidentified eotaxin and MCP-2 genes were isolated from a YAC contig from human chromosome 17q11.2. The amplified genomic MCP-2 fragment was used to isolate an MCP-2 cosmid from which the gene sequence was determined. The MCP-2 gene shares with the MCP-1 and MCP-3 genesmore » a conserved intron-exon structure and a coding nucleotide sequence homology of 77%. By Northern blot analysis the 1.0-kb MCP-2 mRNA was predominantly detectable in the small intestine, peripheral blood, heart, placenta, lung, skeletal muscle, ovary, colon, spinal cord, pancreas, and thymus. Transcripts of 1.5 and 2.4 kb were found in the testis, the small intestine, and the colon. The isolation of the MCP-2 gene from the chemokine contig localized it on YAC clones of chromosome 17q11.2, which also contain the eotaxin, MCP-1, MCP-3, and NCC-1/MCP-4 genes. The combination of using degenerate primer PCR and YACs illustrates that novel genes can efficiently be isolated from gene cluster contigs with less redundancy and effort than the isolation of novel ESTs. 42 refs., 5 figs., 2 tabs.« less

  12. Sequence analysis of DBL2β domain of vargene of Indonesian Plasmodium falciparum

    NASA Astrophysics Data System (ADS)

    Sulistyaningsih, E.; Romadhon, B. D.; Palupi, I.; Hidayah, F.; Dewi, R.; Prasetyo, A.

    2018-03-01

    Malaria is a major health problem in tropical countries including Indonesia. The most deadly agent is Plasmodium falciparum. In P. falciparum infection, PfEMP1 is supposed to play an important role in the pathogenesis of malaria. PfEMP1 is encoded by var gene family, it is a polymorphic protein where the extra-cellular portion contains of three distinct binding domains: Duffy binding-like (DBL), Cysteine-rich interdomain regions (CIDR) and C2. PfEMP1 varies in domain composition and binding specificity. The study explored the characteristic of Indonesian DBL2β-var genes and investigated its role to the malaria outcome. Twenty blood samples from clinically mild to severe malaria patients in Jember, East Java were collected for DNA extraction. Diagnosis was confirmed by Giemsa-stained thick blood smear. PCR was conducted using specific primer targeting on the full-length of DBL2ß and resulted approximately single band of 1,7 kb in a sample. This band was observed only from severe malaria sample. Sequence analysis directly from PCR product showed 74-99% similarities with previous sequences in Gene Bank. In conclusion, the DBL2β domain of vargene of Indonesian isolates was 1603 nucleotides in length and there was a possible association of the existence of DBL2β domain with the severity of malaria outcome.

  13. Regulation of iron assimilation: nucleotide sequence analysis of an iron-regulated promoter from a fluorescent pseudomonad.

    PubMed

    O'Sullivan, D J; O'Gara, F

    1991-08-01

    An iron-regulated promoter was cloned on a 2.1 kb Bg/II fragment from Pseudomonas sp. strain M114 and fused to the lacZ reporter gene. Iron-regulated lacZ expression from the resulting construct (pSP1) in strain M114 was mediated via the Fur-like repressor which also regulates siderophore production in this strain. A 390 bp StuI-PstI internal fragment contained the necessary information for iron-regulated promoter expression. This fragment was sequenced and the initiation point for transcription was determined by primer extension analysis. The region directly upstream of the transcription start point contained no significant homology to known promoter consensus sequences. However the -16 to -25 bp region contained homology to four other iron-regulated pseudomonad promoters. Deletion of bases downstream from the transcriptional start did not affect the iron-regulated expression of the promoter. The -37 and -43 bp regions exhibited some homology to the 19 bp Escherichia coli Fur-binding consensus sequence. When expressed in E. coli (via a cloned transacting factor from strain M114) lacZ expression from pSP1 was found to be regulated by iron. A region of greater than 77 bases but less than 131 upstream from the transcriptional start was found to be necessary for promoter activity, further suggesting that a transcriptional activator may be required for expression.

  14. Enhancing genome assemblies by integrating non-sequence based data

    PubMed Central

    2011-01-01

    Introduction Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. Methods The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Results Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated

  15. Enhancing genome assemblies by integrating non-sequence based data.

    PubMed

    Heider, Thomas N; Lindsay, James; Wang, Chenwei; O'Neill, Rachel J; Pask, Andrew J

    2011-05-28

    Many genome projects were underway before the advent of high-throughput sequencing and have thus been supported by a wealth of genome information from other technologies. Such information frequently takes the form of linkage and physical maps, both of which can provide a substantial amount of data useful in de novo sequencing projects. Furthermore, the recent abundance of genome resources enables the use of conserved synteny maps identified in related species to further enhance genome assemblies. The tammar wallaby (Macropus eugenii) is a model marsupial mammal with a low coverage genome. However, we have access to extensive comparative maps containing over 14,000 markers constructed through the physical mapping of conserved loci, chromosome painting and comprehensive linkage maps. Using a custom Bioperl pipeline, information from the maps was aligned to assembled tammar wallaby contigs using BLAT. This data was used to construct pseudo paired-end libraries with intervals ranging from 5-10 MB. We then used Bambus (a program designed to scaffold eukaryotic genomes by ordering and orienting contigs through the use of paired-end data) to scaffold our libraries. To determine how map data compares to sequence based approaches to enhance assemblies, we repeated the experiment using a 0.5× coverage of unique reads from 4 KB and 8 KB Illumina paired-end libraries. Finally, we combined both the sequence and non-sequence-based data to determine how a combined approach could further enhance the quality of the low coverage de novo reconstruction of the tammar wallaby genome. Using the map data alone, we were able order 2.2% of the initial contigs into scaffolds, and increase the N50 scaffold size to 39 KB (36 KB in the original assembly). Using only the 0.5× paired-end sequence based data, 53% of the initial contigs were assigned to scaffolds. Combining both data sets resulted in a further 2% increase in the number of initial contigs integrated into a scaffold (55% total

  16. Can Inferred Provenance and Its Visualisation Be Used to Detect Erroneous Annotation? A Case Study Using UniProtKB

    PubMed Central

    Bell, Michael J.; Collison, Matthew; Lord, Phillip

    2013-01-01

    A constant influx of new data poses a challenge in keeping the annotation in biological databases current. Most biological databases contain significant quantities of textual annotation, which often contains the richest source of knowledge. Many databases reuse existing knowledge; during the curation process annotations are often propagated between entries. However, this is often not made explicit. Therefore, it can be hard, potentially impossible, for a reader to identify where an annotation originated from. Within this work we attempt to identify annotation provenance and track its subsequent propagation. Specifically, we exploit annotation reuse within the UniProt Knowledgebase (UniProtKB), at the level of individual sentences. We describe a visualisation approach for the provenance and propagation of sentences in UniProtKB which enables a large-scale statistical analysis. Initially levels of sentence reuse within UniProtKB were analysed, showing that reuse is heavily prevalent, which enables the tracking of provenance and propagation. By analysing sentences throughout UniProtKB, a number of interesting propagation patterns were identified, covering over sentences. Over sentences remain in the database after they have been removed from the entries where they originally occurred. Analysing a subset of these sentences suggest that approximately are erroneous, whilst appear to be inconsistent. These results suggest that being able to visualise sentence propagation and provenance can aid in the determination of the accuracy and quality of textual annotation. Source code and supplementary data are available from the authors website at http://homepages.cs.ncl.ac.uk/m.j.bell1/sentence_analysis/. PMID:24143170

  17. Cloning, sequencing, and analysis of the griseusin polyketide synthase gene cluster from Streptomyces griseus.

    PubMed Central

    Yu, T W; Bibb, M J; Revill, W P; Hopwood, D A

    1994-01-01

    A fragment of DNA was cloned from the Streptomyces griseus K-63 genome by using genes (act) for the actinorhodin polyketide synthase (PKS) of Streptomyces coelicolor as a probe. Sequencing of a 5.4-kb segment of the cloned DNA revealed a set of five gris open reading frames (ORFs), corresponding to the act PKS genes, in the following order: ORF1 for a ketosynthase, ORF2 for a chain length-determining factor, ORF3 for an acyl carrier protein, ORF5 for a ketoreductase, and ORF4 for a cyclase-dehydrase. Replacement of the gris genes with a marker gene in the S. griseus genome by using a single-stranded suicide vector propagated in Escherichia coli resulted in loss of the ability to produce griseusins A and B, showing that the five gris genes do indeed encode the type II griseusin PKS. These genes, encoding a PKS that is programmed differently from those for other aromatic PKSs so far available, will provide further valuable material for analysis of the programming mechanism by the construction and analysis of strains carrying hybrid PKS. Images PMID:8169211

  18. Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4

    DOE PAGES

    Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.; ...

    2015-08-15

    Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl

  19. Complete genome sequence of the phenanthrene-degrading soil bacterium Delftia acidovorans Cs1-4

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shetty, Ameesha R.; de Gannes, Vidya; Obi, Chioma C.

    Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs inmore » two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl

  20. Sequence-based analysis of pQBR103; a representative of a unique, transfer-proficient mega plasmid resident in the microbial community of sugar beet

    PubMed Central

    Tett, Adrian; Spiers, Andrew J; Crossman, Lisa C; Ager, Duane; Ciric, Lena; Dow, J Maxwell; Fry, John C; Harris, David; Lilley, Andrew; Oliver, Anna; Parkhill, Julian; Quail, Michael A; Rainey, Paul B; Saunders, Nigel J; Seeger, Kathy; Snyder, Lori AS; Squares, Rob; Thomas, Christopher M; Turner, Sarah L; Zhang, Xue-Xian; Field, Dawn; Bailey, Mark J

    2009-01-01

    The plasmid pQBR103 was found within Pseudomonas populations colonizing the leaf and root surfaces of sugar beet plants growing at Wytham, Oxfordshire, UK. At 425 kb it is the largest self-transmissible plasmid yet sequenced from the phytosphere. It is known to enhance the competitive fitness of its host, and parts of the plasmid are known to be actively transcribed in the plant environment. Analysis of the complete sequence of this plasmid predicts a coding sequence (CDS)-rich genome containing 478 CDSs and an exceptional degree of genetic novelty; 80% of predicted coding sequences cannot be ascribed a function and 60% are orphans. Of those to which function could be assigned, 40% bore greatest similarity to sequences from Pseudomonas spp, and the majority of the remainder showed similarity to other c-proteobacterial genera and plasmids. pQBR103 has identifiable regions presumed responsible for replication and partitioning, but despite being tra+ lacks the full complement of any previously described conjugal transfer functions. The DNA sequence provided few insights into the functional significance of plant-induced transcriptional regions, but suggests that 14% of CDSs may be expressed (11 CDSs with functional annotation and 54 without), further highlighting the ecological importance of these novel CDSs. Comparative analysis indicates that pQBR103 shares significant regions of sequence with other plasmids isolated from sugar beet plants grown at the same geographic location. These plasmid sequences indicate there is more novelty in the mobile DNA pool accessible to phytosphere pseudomonas than is currently appreciated or understood. PMID:18043644

  1. Emergence of Sequence Type 779 Methicillin-Resistant Staphylococcus aureus Harboring a Novel Pseudo Staphylococcal Cassette Chromosome mec (SCCmec)-SCC-SCCCRISPR Composite Element in Irish Hospitals

    PubMed Central

    Kinnevey, Peter M.; Shore, Anna C.; Brennan, Grainne I.; Sullivan, Derek J.; Ehricht, Ralf; Monecke, Stefan; Slickers, Peter

    2013-01-01

    Methicillin-resistant Staphylococcus aureus (MRSA) has been a major cause of nosocomial infection in Irish hospitals for 4 decades, and replacement of predominant MRSA clones has occurred several times. An MRSA isolate recovered in 2006 as part of a larger study of sporadic MRSA exhibited a rare spa (t878) and multilocus sequence (ST779) type and was nontypeable by PCR- and DNA microarray-based staphylococcal cassette chromosome mec (SCCmec) element typing. Whole-genome sequencing revealed the presence of a novel 51-kb composite island (CI) element with three distinct domains, each flanked by direct repeat and inverted repeat sequences, including (i) a pseudo SCCmec element (16.3 kb) carrying mecA with a novel mec class region, a fusidic acid resistance gene (fusC), and two copper resistance genes (copB and copC) but lacking ccr genes; (ii) an SCC element (17.5 kb) carrying a novel ccrAB4 allele; and (iii) an SCC element (17.4 kb) carrying a novel ccrC allele and a clustered regularly interspaced short palindromic repeat (CRISPR) region. The novel CI was subsequently identified by PCR in an additional 13 t878/ST779 MRSA isolates, six from bloodstream infections, recovered between 2006 and 2011 in 11 hospitals. Analysis of open reading frames (ORFs) carried by the CI showed amino acid sequence similarity of 44 to 100% to ORFs from S. aureus and coagulase-negative staphylococci (CoNS). These findings provide further evidence of genetic transfer between S. aureus and CoNS and show how this contributes to the emergence of novel SCCmec elements and MRSA strains. Ongoing surveillance of this MRSA strain is warranted and will require updating of currently used SCCmec typing methods. PMID:23147725

  2. OC-2-KB: A software pipeline to build an evidence-based obesity and cancer knowledge base.

    PubMed

    Lossio-Ventura, Juan Antonio; Hogan, William; Modave, François; Guo, Yi; He, Zhe; Hicks, Amanda; Bian, Jiang

    2017-11-01

    Obesity has been linked to several types of cancer. Access to adequate health information activates people's participation in managing their own health, which ultimately improves their health outcomes. Nevertheless, the existing online information about the relationship between obesity and cancer is heterogeneous and poorly organized. A formal knowledge representation can help better organize and deliver quality health information. Currently, there are several efforts in the biomedical domain to convert unstructured data to structured data and store them in Semantic Web knowledge bases (KB). In this demo paper, we present, OC-2-KB (Obesity and Cancer to Knowledge Base), a system that is tailored to guide the automatic KB construction for managing obesity and cancer knowledge from free-text scientific literature (i.e., PubMed abstracts) in a systematic way. OC-2-KB has two important modules which perform the acquisition of entities and the extraction then classification of relationships among these entities. We tested the OC-2-KB system on a data set with 23 manually annotated obesity and cancer PubMed abstracts and created a preliminary KB with 765 triples. We conducted a preliminary evaluation on this sample of triples and reported our evaluation results.

  3. Sequence analysis of the PIP5K locus in Eimeria maxima provides further evidence for eimerian genome plasticity and segmental organization.

    PubMed

    Song, B K; Pan, M Z; Lau, Y L; Wan, K L

    2014-07-29

    Commercial flocks infected by Eimeria species parasites, including Eimeria maxima, have an increased risk of developing clinical or subclinical coccidiosis; an intestinal enteritis associated with increased mortality rates in poultry. Currently, infection control is largely based on chemotherapy or live vaccines; however, drug resistance is common and vaccines are relatively expensive. The development of new cost-effective intervention measures will benefit from unraveling the complex genetic mechanisms that underlie host-parasite interactions, including the identification and characterization of genes encoding proteins such as phosphatidylinositol 4-phosphate 5-kinase (PIP5K). We previously identified a PIP5K coding sequence within the E. maxima genome. In this study, we analyzed two bacterial artificial chromosome clones presenting a ~145-kb E. maxima (Weybridge strain) genomic region spanning the PIP5K gene locus. Sequence analysis revealed that ~95% of the simple sequence repeats detected were located within regions comparable to the previously described feature-rich segments of the Eimeria tenella genome. Comparative sequence analysis with the orthologous E. maxima (Houghton strain) region revealed a moderate level of conserved synteny. Unique segmental organizations and telomere-like repeats were also observed in both genomes. A number of incomplete transposable elements were detected and further scrutiny of these elements in both orthologous segments revealed interesting nesting events, which may play a role in facilitating genome plasticity in E. maxima. The current analysis provides more detailed information about the genome organization of E. maxima and may help to reveal genotypic differences that are important for expression of traits related to pathogenicity and virulence.

  4. Identification and characterization of large DNA deletions affecting oil quality traits in soybean seeds through transcriptome sequencing analysis.

    PubMed

    Goettel, Wolfgang; Ramirez, Martha; Upchurch, Robert G; An, Yong-Qiang Charles

    2016-08-01

    Identification and characterization of a 254-kb genomic deletion on a duplicated chromosome segment that resulted in a low level of palmitic acid in soybean seeds using transcriptome sequencing. A large number of soybean genotypes varying in seed oil composition and content have been identified. Understanding the molecular mechanisms underlying these variations is important for breeders to effectively utilize them as a genetic resource. Through design and application of a bioinformatics approach, we identified nine co-regulated gene clusters by comparing seed transcriptomes of nine soybean genotypes varying in oil composition and content. We demonstrated that four gene clusters in the genotypes M23, Jack and N0304-303-3 coincided with large-scale genome rearrangements. The co-regulated gene clusters in M23 and Jack mapped to a previously described 164-kb deletion and a copy number amplification of the Rhg1 locus, respectively. The coordinately down-regulated gene clusters in N0304-303-3 were caused by a 254-kb deletion containing 19 genes including a fatty acyl-ACP thioesterase B gene (FATB1a). This deletion was associated with reduced palmitic acid content in seeds and was the molecular cause of a previously reported nonfunctional FATB1a allele, fap nc . The M23 and N0304-304-3 deletions were located in duplicated genome segments retained from the Glycine-specific whole genome duplication that occurred 13 million years ago. The homoeologous genes in these duplicated regions shared a strong similarity in both their encoded protein sequences and transcript accumulation levels, suggesting that they may have conserved and important functions in seeds. The functional conservation of homoeologous genes may result in genetic redundancy and gene dosage effects for their associated seed traits, explaining why the large deletion did not cause lethal effects or completely eliminate palmitic acid in N0304-303-3.

  5. Dominant Sequences of Human Major Histocompatibility Complex Conserved Extended Haplotypes from HLA-DQA2 to DAXX

    PubMed Central

    Larsen, Charles E.; Alford, Dennis R.; Trautwein, Michael R.; Jalloh, Yanoh K.; Tarnacki, Jennifer L.; Kunnenkeri, Sushruta K.; Fici, Dolores A.; Yunis, Edmond J.; Awdeh, Zuheir L.; Alper, Chester A.

    2014-01-01

    We resequenced and phased 27 kb of DNA within 580 kb of the MHC class II region in 158 population chromosomes, most of which were conserved extended haplotypes (CEHs) of European descent or contained their centromeric fragments. We determined the single nucleotide polymorphism and deletion-insertion polymorphism alleles of the dominant sequences from HLA-DQA2 to DAXX for these CEHs. Nine of 13 CEHs remained sufficiently intact to possess a dominant sequence extending at least to DAXX, 230 kb centromeric to HLA-DPB1. We identified the regions centromeric to HLA-DQB1 within which single instances of eight “common” European MHC haplotypes previously sequenced by the MHC Haplotype Project (MHP) were representative of those dominant CEH sequences. Only two MHP haplotypes had a dominant CEH sequence throughout the centromeric and extended class II region and one MHP haplotype did not represent a known European CEH anywhere in the region. We identified the centromeric recombination transition points of other MHP sequences from CEH representation to non-representation. Several CEH pairs or groups shared sequence identity in small blocks but had significantly different (although still conserved for each separate CEH) sequences in surrounding regions. These patterns partly explain strong calculated linkage disequilibrium over only short (tens to hundreds of kilobases) distances in the context of a finite number of observed megabase-length CEHs comprising half a population's haplotypes. Our results provide a clearer picture of European CEH class II allelic structure and population haplotype architecture, improved regional CEH markers, and raise questions concerning regional recombination hotspots. PMID:25299700

  6. Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences.

    PubMed

    Bergman, C M; Kreitman, M

    2001-08-01

    Comparative genomic approaches to gene and cis-regulatory prediction are based on the principle that differential DNA sequence conservation reflects variation in functional constraint. Using this principle, we analyze noncoding sequence conservation in Drosophila for 40 loci with known or suspected cis-regulatory function encompassing >100 kb of DNA. We estimate the fraction of noncoding DNA conserved in both intergenic and intronic regions and describe the length distribution of ungapped conserved noncoding blocks. On average, 22%-26% of noncoding sequences surveyed are conserved in Drosophila, with median block length approximately 19 bp. We show that point substitution in conserved noncoding blocks exhibits transition bias as well as lineage effects in base composition, and occurs more than an order of magnitude more frequently than insertion/deletion (indel) substitution. Overall, patterns of noncoding DNA structure and evolution differ remarkably little between intergenic and intronic conserved blocks, suggesting that the effects of transcription per se contribute minimally to the constraints operating on these sequences. The results of this study have implications for the development of alignment and prediction algorithms specific to noncoding DNA, as well as for models of cis-regulatory DNA sequence evolution.

  7. Recombination-dependent replication and gene conversion homogenize repeat sequences and diversify plastid genome structure.

    PubMed

    Ruhlman, Tracey A; Zhang, Jin; Blazier, John C; Sabir, Jamal S M; Jansen, Robert K

    2017-04-01

    There is a misinterpretation in the literature regarding the variable orientation of the small single copy region of plastid genomes (plastomes). The common phenomenon of small and large single copy inversion, hypothesized to occur through intramolecular recombination between inverted repeats (IR) in a circular, single unit-genome, in fact, more likely occurs through recombination-dependent replication (RDR) of linear plastome templates. If RDR can be primed through both intra- and intermolecular recombination, then this mechanism could not only create inversion isomers of so-called single copy regions, but also an array of alternative sequence arrangements. We used Illumina paired-end and PacBio single-molecule real-time (SMRT) sequences to characterize repeat structure in the plastome of Monsonia emarginata (Geraniaceae). We used OrgConv and inspected nucleotide alignments to infer ancestral nucleotides and identify gene conversion among repeats and mapped long (>1 kb) SMRT reads against the unit-genome assembly to identify alternative sequence arrangements. Although M. emarginata lacks the canonical IR, we found that large repeats (>1 kilobase; kb) represent ∼22% of the plastome nucleotide content. Among the largest repeats (>2 kb), we identified GC-biased gene conversion and mapping filtered, long SMRT reads to the M. emarginata unit-genome assembly revealed alternative, substoichiometric sequence arrangements. We offer a model based on RDR and gene conversion between long repeated sequences in the M. emarginata plastome and provide support that both intra-and intermolecular recombination between large repeats, particularly in repeat-rich plastomes, varies unit-genome structure while homogenizing the nucleotide sequence of repeats. © 2017 Botanical Society of America.

  8. Anonymous marker loci within 400 kb of HLA-A generate haplotypes in linkage disequilibrium with the hemochromatosis gene (HFE)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yaouanq, J.; Perichon, M.; Treut, A.L.

    1994-02-01

    The hemochromatosis gene (HFE) maps to 6p21.3 and is less than 1 cM from the HLA class I gene; however, the precise physical location of the gene has remained elusive and controversial. The unambiguous identification of a crossover event within hemochromatosis families is very difficult; it is particularly hampered by the variability of the phenotypic expression as well as by the sex- and age-related penetrance of the disease. For these considerations, traditional linkage analysis could prove of limited value in further refining the extrapolated physical position of HFE. The authors therefore embarked upon a linkage-disequilibrium analysis of HFE and normalmore » chromosomes for the Brittany population. In this report, 66 hemochromatosis families yielding 151 hemochromatosis chromosomes and 182 normal chromosomes were RFLP-typed with a battery of probes, including two newly derived polymorphic markers from the 6.7 and HLA-F loci located 150 and 250 kb telomeric to HLA-A, respectively. The results suggest a strong peak of existing linkage disequilibrium focused within the i82-to-6.7 interval (approximately 250 kb). The zone of linkage disequilibrium is flanked by the i97 locus, positioned 30 kb proximal to i82, and the HLA-F gene, found 250 kb distal to HLA-A, markers of which display no significant association with HFE. These data support the possibility that HFE resides within the 400-kb expanse of DNA between i97 and HLA-F. Alternatively, the very tight association of HLA-A3 and allele 1 of the 6.7 locus, both of which are comprised by the major ancestral or founder HFE haplotype in Brittany, supports the possibility that the disease gene may reside immediately telomeric to the 6.7 locus within the linkage-disequilibrium zone. Additionally, hemochromatosis haplotypes possessing HLA-A11 and the low-frequency HLA-F polymorphism (allele 2) are supportive of a separate founder chromosome containing a second, independently arising mutant allele. 69 refs., 1 fig., 5

  9. Molecular Cloning and Sequence Analysis of the Sta58 Major Antigen Gene of Rickettsia tsutsugamushi: Sequence homology and Antigenic Comparison of Sta58 to the 60-Kilodalton Family of Stress Proteins

    DTIC Science & Technology

    1990-05-01

    Sta58 antigen and the Sta56 strain- GroES, C. burnetii HtpA, Mycobacterium tuberculosis 12- specific major antigen of R. tsutsugamushi (strain Karp...kb HindlIl fragment carrying the gene for the Sta58 tuberculosis, and Mycobacterium smegmatis (65-kDa anti- protein was subjected to DNA sequence...the Hsp6O and HsplO proteins. R. tsu., R. isutsugamushi; M. lep., Mvtcobacteriutn leprae : C. bur., C. burneiii; Synech.. Synechococcus strain 6301; T

  10. Complete Genome Sequences of Salmonella enterica Serovars Anatum and Anatum var. 15+, Isolated from Retail Ground Turkey

    PubMed Central

    Marasini, Daya; Abo-Shama, Usama H.

    2016-01-01

    The complete genome sequences of two isolates of Salmonella enterica serovars Anatum and Anatum var. 15+ revealed the presence of two plasmids of 112 kb and 3 kb in size in each. The chromosome of Salmonella Anatum (4.83 Mb) was slightly smaller than that of Salmonella Anatum var. 15+ (4.88 Mb). PMID:26798111

  11. PMS2 inactivation by a complex rearrangement involving an HERV retroelement and the inverted 100-kb duplicon on 7p22.1.

    PubMed

    Vogt, Julia; Wernstedt, Annekatrin; Ripperger, Tim; Pabst, Brigitte; Zschocke, Johannes; Kratz, Christian; Wimmer, Katharina

    2016-11-01

    Biallelic PMS2 mutations are responsible for more than half of all cases of constitutional mismatch repair deficiency (CMMRD), a recessively inherited childhood cancer predisposition syndrome. The mismatch repair gene PMS2 is partly embedded within one copy of an inverted 100-kb low-copy repeat (LCR) on 7p22.1. In an individual with CMMRD syndrome, PMS2 was found to be homozygously inactivated by a complex chromosomal rearrangement, which separates the 5'-part from the 3'-part of the gene. The rearrangement involves sequences of the inverted 100-kb LCR and a human endogenous retrovirus element and may be associated with an inversion that is indistinguishable from the known inversion polymorphism affecting the ~0.7-Mb sequence intervening the LCR. Its formation is best explained by a replication-based mechanism (RBM) such as fork stalling and template switching/microhomology-mediated break-induced replication (FoSTeS/MMBIR). This finding supports the hypothesis that the inverted LCR can not only facilitate the formation of the non-allelic homologous recombination-mediated inversion polymorphism but it also promotes the occurrence of more complex rearrangements that can be associated with a large inversion, as well, but are mediated by a RBM. This further suggests that among the inversion polymorphism on 7p22.1, more complex rearrangements might be hidden. Furthermore, as the locus is embedded in a common fragile site (CFS) region, this rearrangement also supports the recently raised hypothesis that CFS sequence motifs may facilitate replication-based rearrangement mechanisms.

  12. PMS2 inactivation by a complex rearrangement involving an HERV retroelement and the inverted 100-kb duplicon on 7p22.1

    PubMed Central

    Vogt, Julia; Wernstedt, Annekatrin; Ripperger, Tim; Pabst, Brigitte; Zschocke, Johannes; Kratz, Christian; Wimmer, Katharina

    2016-01-01

    Biallelic PMS2 mutations are responsible for more than half of all cases of constitutional mismatch repair deficiency (CMMRD), a recessively inherited childhood cancer predisposition syndrome. The mismatch repair gene PMS2 is partly embedded within one copy of an inverted 100-kb low-copy repeat (LCR) on 7p22.1. In an individual with CMMRD syndrome, PMS2 was found to be homozygously inactivated by a complex chromosomal rearrangement, which separates the 5′-part from the 3′-part of the gene. The rearrangement involves sequences of the inverted 100-kb LCR and a human endogenous retrovirus element and may be associated with an inversion that is indistinguishable from the known inversion polymorphism affecting the ~0.7-Mb sequence intervening the LCR. Its formation is best explained by a replication-based mechanism (RBM) such as fork stalling and template switching/microhomology-mediated break-induced replication (FoSTeS/MMBIR). This finding supports the hypothesis that the inverted LCR can not only facilitate the formation of the non-allelic homologous recombination-mediated inversion polymorphism but it also promotes the occurrence of more complex rearrangements that can be associated with a large inversion, as well, but are mediated by a RBM. This further suggests that among the inversion polymorphism on 7p22.1, more complex rearrangements might be hidden. Furthermore, as the locus is embedded in a common fragile site (CFS) region, this rearrangement also supports the recently raised hypothesis that CFS sequence motifs may facilitate replication-based rearrangement mechanisms. PMID:27329736

  13. Physical mapping and BAC-end sequence analysis provide initial insights into the flax (Linum usitatissimum L.) genome

    PubMed Central

    2011-01-01

    Background Flax (Linum usitatissimum L.) is an important source of oil rich in omega-3 fatty acids, which have proven health benefits and utility as an industrial raw material. Flax seeds also contain lignans which are associated with reducing the risk of certain types of cancer. Its bast fibres have broad industrial applications. However, genomic tools needed for molecular breeding were non existent. Hence a project, Total Utilization Flax GENomics (TUFGEN) was initiated. We report here the first genome-wide physical map of flax and the generation and analysis of BAC-end sequences (BES) from 43,776 clones, providing initial insights into the genome. Results The physical map consists of 416 contigs spanning ~368 Mb, assembled from 32,025 fingerprints, representing roughly 54.5% to 99.4% of the estimated haploid genome (370-675 Mb). The N50 size of the contigs was estimated to be ~1,494 kb. The longest contig was ~5,562 kb comprising 437 clones. There were 96 contigs containing more than 100 clones. Approximately 54.6 Mb representing 8-14.8% of the genome was obtained from 80,337 BES. Annotation revealed that a large part of the genome consists of ribosomal DNA (~13.8%), followed by known transposable elements at 6.1%. Furthermore, ~7.4% of sequence was identified to harbour novel repeat elements. Homology searches against flax-ESTs and NCBI-ESTs suggested that ~5.6% of the transcriptome is unique to flax. A total of 4064 putative genomic SSRs were identified and are being developed as novel markers for their use in molecular breeding. Conclusion The first genome-wide physical map of flax constructed with BAC clones provides a framework for accessing target loci with economic importance for marker development and positional cloning. Analysis of the BES has provided insights into the uniqueness of the flax genome. Compared to other plant genomes, the proportion of rDNA was found to be very high whereas the proportion of known transposable elements was low. The SSRs

  14. Physical mapping and BAC-end sequence analysis provide initial insights into the flax (Linum usitatissimum L.) genome.

    PubMed

    Ragupathy, Raja; Rathinavelu, Rajkumar; Cloutier, Sylvie

    2011-05-09

    Flax (Linum usitatissimum L.) is an important source of oil rich in omega-3 fatty acids, which have proven health benefits and utility as an industrial raw material. Flax seeds also contain lignans which are associated with reducing the risk of certain types of cancer. Its bast fibres have broad industrial applications. However, genomic tools needed for molecular breeding were non existent. Hence a project, Total Utilization Flax GENomics (TUFGEN) was initiated. We report here the first genome-wide physical map of flax and the generation and analysis of BAC-end sequences (BES) from 43,776 clones, providing initial insights into the genome. The physical map consists of 416 contigs spanning ~368 Mb, assembled from 32,025 fingerprints, representing roughly 54.5% to 99.4% of the estimated haploid genome (370-675 Mb). The N50 size of the contigs was estimated to be ~1,494 kb. The longest contig was ~5,562 kb comprising 437 clones. There were 96 contigs containing more than 100 clones. Approximately 54.6 Mb representing 8-14.8% of the genome was obtained from 80,337 BES. Annotation revealed that a large part of the genome consists of ribosomal DNA (~13.8%), followed by known transposable elements at 6.1%. Furthermore, ~7.4% of sequence was identified to harbour novel repeat elements. Homology searches against flax-ESTs and NCBI-ESTs suggested that ~5.6% of the transcriptome is unique to flax. A total of 4064 putative genomic SSRs were identified and are being developed as novel markers for their use in molecular breeding. The first genome-wide physical map of flax constructed with BAC clones provides a framework for accessing target loci with economic importance for marker development and positional cloning. Analysis of the BES has provided insights into the uniqueness of the flax genome. Compared to other plant genomes, the proportion of rDNA was found to be very high whereas the proportion of known transposable elements was low. The SSRs identified from BES will be

  15. Cloning and sequence analysis of a cDNA encoding the alpha-subunit of mouse beta-N-acetylhexosaminidase and comparison with the human enzyme.

    PubMed Central

    Beccari, T; Hoade, J; Orlacchio, A; Stirling, J L

    1992-01-01

    cDNAs encoding the mouse beta-N-acetylhexosaminidase alpha-subunit were isolated from a mouse testis library. The longest of these (1.7 kb) was sequenced and showed 83% similarity with the human alpha-subunit cDNA sequence. The 5' end of the coding sequence was obtained from a genomic DNA clone. Alignment of the human and mouse sequences showed that all three putative N-glycosylation sites are conserved, but that the mouse alpha-subunit has an additional site towards the C-terminus. All eight cysteines in the human sequence are conserved in the mouse. There are an additional two cysteines in the mouse alpha-subunit signal peptide. All amino acids affected in Tay-Sachs-disease mutations are conserved in the mouse. Images Fig. 1. PMID:1379046

  16. Complete genome sequence and phenotype microarray analysis of Cronobacter sakazakii SP291: a persistent isolate cultured from a powdered infant formula production facility.

    PubMed

    Yan, Qiongqiong; Power, Karen A; Cooney, Shane; Fox, Edward; Gopinath, Gopal R; Grim, Christopher J; Tall, Ben D; McCusker, Matthew P; Fanning, Séamus

    2013-01-01

    Outbreaks of human infection linked to the powdered infant formula (PIF) food chain and associated with the bacterium Cronobacter, are of concern to public health. These bacteria are regarded as opportunistic pathogens linked to life-threatening infections predominantly in neonates, with an under developed immune system. Monitoring the microbiological ecology of PIF production sites is an important step in attempting to limit the risk of contamination in the finished food product. Cronobacter species, like other microorganisms can adapt to the production environment. These organisms are known for their desiccation tolerance, a phenotype that can aid their survival in the production site and PIF itself. In evaluating the genome data currently available for Cronobacter species, no sequence information has been published describing a Cronobacter sakazakii isolate found to persist in a PIF production facility. Here we report on the complete genome sequence of one such isolate, Cronobacter sakazakii SP291 along with its phenotypic characteristics. The genome of C. sakazakii SP291 consists of a 4.3-Mb chromosome (56.9% GC) and three plasmids, denoted as pSP291-1, [118.1-kb (57.2% GC)], pSP291-2, [52.1-kb (49.2% GC)], and pSP291-3, [4.4-kb (54.0% GC)]. When C. sakazakii SP291 was compared to the reference C. sakazakii ATCC BAA-894, which is also of PIF origin, the annotated genome data identified two interesting functional categories, comprising of genes related to the bacterial stress response and resistance to antimicrobial and toxic compounds. Using a phenotypic microarray (PM), we provided a full metabolic profile comparing C. sakazakii SP291 and the previously sequenced C. sakazakii ATCC BAA-894. These data extend our understanding of the genome of this important neonatal pathogen and provides further insights into the genotypes associated with features that can contribute to its persistence in the PIF environment.

  17. The 80-kb DNA duplication on BTA1 is the only remaining candidate mutation for the polled phenotype of Friesian origin

    PubMed Central

    2014-01-01

    Background The absence of horns, called polled phenotype, is the favored trait in modern cattle husbandry. To date, polled cattle are obtained primarily by dehorning calves. Dehorning is a practice that raises animal welfare issues, which can be addressed by selecting for genetically hornless cattle. In the past 20 years, there have been many studies worldwide to identify unique genetic markers in complete association with the polled trait in cattle and recently, two different alleles at the POLLED locus, both resulting in the absence of horns, were reported: (1) the Celtic allele, which is responsible for the polled phenotype in most breeds and for which a single candidate mutation was detected and (2) the Friesian allele, which is responsible for the polled phenotype predominantly in the Holstein-Friesian breed and in a few other breeds, but for which five candidate mutations were identified in a 260-kb haplotype. Further studies based on genome-wide sequencing and high-density SNP (single nucleotide polymorphism) genotyping confirmed the existence of the Celtic and Friesian variants and narrowed down the causal Friesian haplotype to an interval of 145 kb. Results Almost 6000 animals were genetically tested for the polled trait and we detected a recombinant animal which enabled us to reduce the Friesian POLLED haplotype to a single causal mutation, namely a 80-kb duplication. Moreover, our results clearly disagree with the recently reported perfect co-segregation of the POLLED mutation and a SNP at position 1 390 292 bp on bovine chromosome 1 in the Holstein-Friesian population. Conclusion We conclude that the 80-kb duplication, as the only remaining variant within the shortened Friesian haplotype, represents the most likely causal mutation for the polled phenotype of Friesian origin. PMID:24993890

  18. Comparative analysis of the complete sequence of the plastid genome of Parthenium argentatum and identification of DNA barcodes to differentiate Parthenium species and lines

    PubMed Central

    2009-01-01

    Background Parthenium argentatum (guayule) is an industrial crop that produces latex, which was recently commercialized as a source of latex rubber safe for people with Type I latex allergy. The complete plastid genome of P. argentatum was sequenced. The sequence provides important information useful for genetic engineering strategies. Comparison to the sequences of plastid genomes from three other members of the Asteraceae, Lactuca sativa, Guitozia abyssinica and Helianthus annuus revealed details of the evolution of the four genomes. Chloroplast-specific DNA barcodes were developed for identification of Parthenium species and lines. Results The complete plastid genome of P. argentatum is 152,803 bp. Based on the overall comparison of individual protein coding genes with those in L. sativa, G. abyssinica and H. annuus, we demonstrate that the P. argentatum chloroplast genome sequence is most closely related to that of H. annuus. Similar to chloroplast genomes in G. abyssinica, L. sativa and H. annuus, the plastid genome of P. argentatum has a large 23 kb inversion with a smaller 3.4 kb inversion, within the large inversion. Using the matK and psbA-trnH spacer chloroplast DNA barcodes, three of the four Parthenium species tested, P. tomentosum, P. hysterophorus and P. schottii, can be differentiated from P. argentatum. In addition, we identified lines within P. argentatum. Conclusion The genome sequence of the P. argentatum chloroplast will enrich the sequence resources of plastid genomes in commercial crops. The availability of the complete plastid genome sequence may facilitate transformation efficiency by using the precise sequence of endogenous flanking sequences and regulatory elements in chloroplast transformation vectors. The DNA barcoding study forms the foundation for genetic identification of commercially significant lines of P. argentatum that are important for producing latex. PMID:19917140

  19. Sequence Ready Characterization of the Pericentromeric Region of 19p12

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Evan E. Eichler

    2006-08-31

    Current mapping and sequencing strategies have been inadequate within the proximal portion of 19p12 due, in part, to the presence of a recently expanded ZNF (zinc-finger) gene family and the presence of large (25-50 kb) inverted beta-satellite repeat structures which bracket this tandemly duplicated gene family. The virtual of absence of classically defined “unique” sequence within the region has hampered efforts to identify and characterize a suitable minimal tiling path of clones which can be used as templates required for finished sequencing of the region. The goal of this proposal is to develop and implement a novel sequence-anchor strategy tomore » generate a contiguous BAC map of the most proximal portion of chromosome 19p12 for the purpose of complete sequence characterization. The target region will be an estimated 4.5 Mb of DNA extending from STS marker D19S450 (the beginning of the ZNF gene cluster) to the centromeric (alpha-satellite) junction of 19p11. The approach will entail 1) pre-selection of 19p12 BAC and cosmid clones (NIH approved library) utilizing both 19p12 -unique and 19p12-SPECIFIC repeat probes (Eichler et al., 1998); 2) the generation of a BAC/cosmid end-sequence map across the region with a density of one marker every 8kb; 3) the development of a second-generation of STS (sequence tagged sites) which will be used to identify and verify clonal overlap at the level of the sequence; 4) incorporation of these sequence-anchored overlapping clones into existing cosmid/BAC restriction maps developed at Livermore National Laboratory; and 5) validation of the organization of this region utilizing high-resolution FISH techniques (extended chromatin analysis) on monochromosomal 19 somatic cell hybrids and parental cell lines of source material. The data generated will be used in the selection of the most parsimonious tiling path of BAC clones to be sequenced as part of the JGI effort on chromosome 19 and should serve as a model for the

  20. Cytotoxic agents for KB and SiHa cells from n-hexane fraction of Cissampelos pareira and its chemical composition.

    PubMed

    Bala, Manju; Pratap, Kunal; Verma, Praveen Kumar; Padwad, Yogendra; Singh, Bikram

    2015-01-01

    Eleven constituents were characterised by gas chromatography-mass spectrometry analysis, and five molecules were isolated using column chromatography. The in vitro study of the extract and isolated molecules against KB and SiHa cell lines revealed oleanolic acid (1) and oleic acid (2) as potent cytotoxic molecules with potential anticancer activity. The IC50 values of n-hexane extract (CPHF), oleanolic acid (1) and oleic acid (2) were >300, 56.08 and 70.7 μg/mL (μM), respectively, against KB cell lines and >300, 47.24 and 80.2 μg/mL (μM), respectively, against SiHa cell lines.

  1. Megabase sequencing of human genome by ordered-shotgun-sequencing (OSS) strategy

    NASA Astrophysics Data System (ADS)

    Chen, Ellson Y.

    1997-05-01

    So far we have used OSS strategy to sequence over 2 megabases DNA in large-insert clones from regions of human X chromosomes with different characteristic levels of GC content. The method starts by randomly fragmenting a BAC, YAC or PAC to 8-12 kb pieces and subcloning those into lambda phage. Insert-ends of these clones are sequenced and overlapped to create a partial map. Complete sequencing is then done on a minimal tiling path of selected subclones, recursively focusing on those at the edges of contigs to facilitate mergers of clones across the entire target. To reduce manual labor, PCR processes have been adapted to prepare sequencing templates throughout the entire operation. The streamlined process can thus lend itself to further automation. The OSS approach is suitable for large- scale genomic sequencing, providing considerable flexibility in the choice of subclones or regions for more or less intensive sequencing. For example, subclones containing contaminating host cell DNA or cloning vector can be recognized and ignored with minimal sequencing effort; regions overlapping a neighboring clone already sequenced need not be redone; and segments containing tandem repeats or long repetitive sequences can be spotted early on and targeted for additional attention.

  2. RSAT 2015: Regulatory Sequence Analysis Tools

    PubMed Central

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-01-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632

  3. Genome-wide linkage and copy number variation analysis reveals 710 kb duplication on chromosome 1p31.3 responsible for autosomal dominant omphalocele

    PubMed Central

    Radhakrishna, Uppala; Nath, Swapan K; McElreavey, Ken; Ratnamala, Uppala; Sun, Celi; Maiti, Amit K; Gagnebin, Maryline; Béna, Frédérique; Newkirk, Heather L; Sharp, Andrew J; Everman, David B; Murray, Jeffrey C; Schwartz, Charles E; Antonarakis, Stylianos E; Butler, Merlin G

    2017-01-01

    Background Omphalocele is a congenital birth defect characterised by the presence of internal organs located outside of the ventral abdominal wall. The purpose of this study was to identify the underlying genetic mechanisms of a large autosomal dominant Caucasian family with omphalocele. Methods and findings A genetic linkage study was conducted in a large family with an autosomal dominant transmission of an omphalocele using a genome-wide single nucleotide polymorphism (SNP) array. The analysis revealed significant evidence of linkage (non-parametric NPL = 6.93, p=0.0001; parametric logarithm of odds (LOD) = 2.70 under a fully penetrant dominant model) at chromosome band 1p31.3. Haplotype analysis narrowed the locus to a 2.74 Mb region between markers rs2886770 (63014807 bp) and rs1343981 (65757349 bp). Molecular characterisation of this interval using array comparative genomic hybridisation followed by quantitative microsphere hybridisation analysis revealed a 710 kb duplication located at 63.5–64.2 Mb. All affected individuals who had an omphalocele and shared the haplotype were positive for this duplicated region, while the duplication was absent from all normal individuals of this family. Multipoint linkage analysis using the duplication as a marker yielded a maximum LOD score of 3.2 at 1p31.3 under a dominant model. The 710 kb duplication at 1p31.3 band contains seven known genes including FOXD3, ALG6, ITGB3BP, KIAA1799, DLEU2L, PGM1, and the proximal portion of ROR1. Importantly, this duplication is absent from the database of genomic variants. Conclusions The present study suggests that development of an omphalocele in this family is controlled by overexpression of one or more genes in the duplicated region. To the authors’ knowledge, this is the first reported association of an inherited omphalocele condition with a chromosomal rearrangement. PMID:22499347

  4. The amiodarone derivative KB130015 activates hERG1 potassium channels via a novel mechanism

    PubMed Central

    Gessner, Guido; Macianskiene, Regina; Starkus, John G.; Schönherr, Roland; Heinemann, Stefan H.

    2010-01-01

    Human ether à go-go related gene (hERG1) potassium channels underlie the repolarizing IKr current in the heart. Since they are targets of various drugs with cardiac side effects we tested whether the amiodarone derivative 2-methyl-3-(3,5-diiodo-4-carboxymethoxybenzyl)benzofuran (KB130015) blocks hERG1 channels like its parent compound. Using patch-clamp and two-electrode voltage-clamp techniques we found that KB130015 blocks native and recombinant hERG1 channels at high voltages, but it activates them at low voltages. The activating effect has an apparent EC50 value of 12 μM and is brought about by an about 4-fold acceleration of activation kinetics and a shift in voltage-dependent activation by −16 mV. Channel activation was not use-dependent and was independent of inactivation gating. KB130015 presumably binds to the hERG1 pore from the cytosolic side and functionally competes with hERG1 block by amiodarone, E4031 (N-[4-[[1-[2-(6-methyl-2-pyridinyl)ethyl] -4-piperidinyl] carbonyl] phenyl] methanesulfonamide dihydrochloride), and sertindole. Vice versa, amiodarone attenuates hERG1 activation by KB130015. Based on synergic channel activation by mallotoxin and KB130015 we conclude that the hERG1 pore contains at least two sites for activators that are functionally coupled among each other and to the cavity-blocker site. KB130015 and amiodarone may serve as lead structures for the identification of hERG1 pore-interacting drugs favoring channel activation vs. block. PMID:20097192

  5. Detection of α-thalassemia-1 Southeast Asian and Thai Type Deletions and β-thalassemia 3.5-kb Deletion by Single-tube Multiplex Real-time PCR with SYBR Green1 and High-resolution Melting Analysis

    PubMed Central

    Wiengkum, Thanatcha; Srithep, Sarinee; Chainoi, Isarapong; Singboottra, Panthong; Wongwiwatthananukit, Sanchai

    2011-01-01

    Background Prevention and control of thalassemia requires simple, rapid, and accurate screening tests for carrier couples who are at risk of conceiving fetuses with severe thalassemia. Methods Single-tube multiplex real-time PCR with SYBR Green1 and high-resolution melting (HRM) analysis were used for the identification of α-thalassemia-1 Southeast Asian (SEA) and Thai type deletions and β-thalassemia 3.5-kb gene deletion. The results were compared with those obtained using conventional gap-PCR. DNA samples were derived from 28 normal individuals, 11 individuals with α-thalassemia-1 SEA type deletion, 2 with α-thalassemia-1 Thai type deletion, and 2 with heterozygous β-thalassemia 3.5-kb gene deletion. Results HRM analysis indicated that the amplified fragments from α-thalassemia-1 SEA type deletion, α-thalassemia-1 Thai type deletion, β-thalassemia 3.5-kb gene deletion, and the wild-type β-globin gene had specific peak heights at mean melting temperature (Tm) values of 86.89℃, 85.66℃, 77.24℃, and 74.92℃, respectively. The results obtained using single-tube multiplex real-time PCR with SYBR Green1 and HRM analysis showed 100% consistency with those obtained using conventional gap-PCR. Conclusions Single-tube multiplex real-time PCR with SYBR Green1 and HRM analysis is a potential alternative for routine clinical screening of the common types of α- and β-thalassemia large gene deletions, since it is simple, cost-effective, and highly accurate. PMID:21779184

  6. Nucleotide sequences of Herpes Simplex Virus type 1 (HSV-1) affecting virus entry, cell fusion, and production of glycoprotein gB (VP7)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    DeLuca, N.; Bzik, D.J.; Bond, V.C.

    1982-10-30

    The tsB5 strain of Herpes Simplex Virus type 1 (HSV-1) contains at least two mutations; one mutation specifies the syncytial phenotype and the other confers temperature sensitivity for virus growth. These functions are known to be located between the prototypic map coordinates 0.30 and 0.42. In this study it was demonstrated that tsB5 enters human embryonic lung (HEL) cells more rapidly than KOS, another strain of HSV-1. The EcoRI restriction fragment F from the KOS strain (map coordinates 0.315 to 0.421) was mapped with eight restriction endonucleases, and 16 recombinant plasmids were constructed which contained varying portions of the KOSmore » genome. Recombinant viruses were generated by marker-rescue and marker-transfer cotransfection procedures, using intact DNA from one strain and a recombinant plasmid containing DNA from the other strain. The region of the crossover between the two nonisogenic strains was inferred by the identification of restriction sites in the recombinants that were characteristic of the parental strains. The recombinants were subjected to phenotypic analysis. Syncytium formation, rate of virus entry, and the production of gB were all separable by the crossovers that produced the recombinants. The KOS sequences which rescue the syncytial phenotype of tsB5 were localized to 1.5 kb (map coordinates 0.345 to 0.355), and the temperature-sensitive mutation was localized to 1.2 kb (0.360 to 0.368), giving an average separation between the mutations of 2.5 kb on the 150-kb genome. DNA sequences that specify a functional domain for virus entry were localized to the nucleotide sequences between the two mutations. All three functions could be encoded by the virus gene specifying the gB glycoprotein.« less

  7. Pressure derivatives of elastic moduli of fused quartz to 10 kb

    USGS Publications Warehouse

    Peselnick, L.; Meister, R.; Wilson, W.H.

    1967-01-01

    Measurements of the longitudinal and shear moduli were made on fused quartz to 10 kb at 24??5??C. The anomalous behavior of the bulk modulus K at low pressure, ???K ???P 0, at higher pressures. The pressure derivative of the rigidity modulus ???G ???P remains constant and negative for the pressure range covered. A 15-kb hydrostatic pressure vessel is described for use with ultrasonic pulse instrumentation for precise measurements of elastic moduli and density changes with pressure. The placing of the transducer outside the pressure medium, and the use of C-ring pressure seals result in ease of operation and simplicity of design. ?? 1967.

  8. Narrowing the wingless-2 mutation to a 227 kb candidate region on chicken chromosome 12

    PubMed Central

    Webb, A E; Youngworth, I A; Kaya, M; Gitter, C L; O’Hare, E A; May, B; Cheng, H H; Delany, M E

    2018-01-01

    ABSTRACT Wingless-2 (wg-2) is an autosomal recessive mutation in chicken that results in an embryonic lethal condition. Affected individuals exhibit a multisystem syndrome characterized by absent wings, truncated legs, and craniofacial, kidney, and feather malformations. Previously, work focused on phenotype description, establishing the autosomal recessive pattern of Mendelian inheritance and placing the mutation on an inbred genetic background to create the congenic line UCD Wingless-2.331. The research described in this paper employed the complementary tools of breeding, genetics, and genomics to map the chromosomal location of the mutation and successively narrow the size of the region for analysis of the causative element. Specifically, the wg-2 mutation was initially mapped to a 7 Mb region of chromosome 12 using an Illumina 3 K SNP array. Subsequent SNP genotyping and exon sequencing combined with analysis from improved genome assemblies narrowed the region of interest to a maximum size of 227 kb. Within this region, 3 validated and 3 predicted candidate genes are found, and these are described. The wg-2 mutation is a valuable resource to contribute to an improved understanding of the developmental pathways involved in chicken and avian limb development as well as serving as a model for human development, as the resulting syndrome shares features with human congenital disorders. PMID:29562287

  9. RSAT 2015: Regulatory Sequence Analysis Tools.

    PubMed

    Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

    2015-07-01

    RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Repeated sequence sets in mitochondrial DNA molecules of root knot nematodes (Meloidogyne): nucleotide sequences, genome location and potential for host-race identification.

    PubMed Central

    Okimoto, R; Chamberlin, H M; Macfarlane, J L; Wolstenholme, D R

    1991-01-01

    Within a 7 kb segment of the mtDNA molecule of the root knot nematode, Meloidogyne javanica, that lacks standard mitochondrial genes, are three sets of strictly tandemly arranged, direct repeat sequences: approximately 36 copies of a 102 ntp sequence that contains a TaqI site; 11 copies of a 63 ntp sequence, and 5 copies of an 8 ntp sequence. The 7 kb repeat-containing segment is bounded by putative tRNAasp and tRNAf-met genes and the arrangement of sequences within this segment is: the tRNAasp gene; a unique 1,528 ntp segment that contains two highly stable hairpin-forming sequences; the 102 ntp repeat set; the 8 ntp repeat set; a unique 1,068 ntp segment; the 63 ntp repeat set; and the tRNAf-met gene. The nucleotide sequences of the 102 ntp copies and the 63 ntp copies have been conserved among the species examined. Data from Southern hybridization experiments indicate that 102 ntp and 63 ntp repeats occur in the mtDNAs of three, two and two races of M.incognita, M.hapla and M.arenaria, respectively. Nucleotide sequences of the M.incognita Race-3 102 ntp repeat were found to be either identical or highly similar to those of the M.javanica 102 ntp repeat. Differences in migration distance and number of 102 ntp repeat-containing bands seen in Southern hybridization autoradiographs of restriction-digested mtDNAs of M.javanica and the different host races of M.incognita, M.hapla and M.arenaria are sufficient to distinguish the different host races of each species. Images PMID:2027769

  11. Characterization of bovine ruminal epithelial bacterial communities using 16S rRNA sequencing, PCR-DGGE, and qRT-PCR analysis.

    PubMed

    Li, Meiju; Zhou, Mi; Adamowicz, Elizabeth; Basarab, John A; Guan, Le Luo

    2012-02-24

    Currently, knowledge regarding the ecology and function of bacteria attached to the epithelial tissue of the rumen wall is limited. In this study, the diversity of the bacterial community attached to the rumen epithelial tissue was compared to the rumen content bacterial community using 16S rRNA gene sequencing, PCR-DGGE, and qRT-PCR analysis. Sequence analysis of 2785 randomly selected clones from six 16S rDNA (∼1.4kb) libraries showed that the community structures of three rumen content libraries clustered together and were separated from the rumen tissue libraries. The diversity index of each library revealed that ruminal content bacterial communities (4.12/4.42/4.88) were higher than ruminal tissue communities (2.90/2.73/3.23), based on 97% similarity. The phylum Firmicutes was predominant in the ruminal tissue communities, while the phylum Bacteroidetes was predominant in the ruminal content communities. The phyla Fibrobacteres, Planctomycetes, and Verrucomicrobia were only detected in the ruminal content communities. PCR-DGGE analysis of the bacterial profiles of the rumen content and ruminal epithelial tissue samples from 22 steers further confirmed that there is a distinct bacterial community that inhibits the rumen epithelium. The distinctive epimural bacterial communities suggest that Firmicutes, together with other epithelial-specific species, may have additional functions other than food digestion. Copyright © 2011 Elsevier B.V. All rights reserved.

  12. Genome sequence and comparative analysis of a putative entomopathogenic Serratia isolated from Caenorhabditis briggsae.

    PubMed

    Abebe-Akele, Feseha; Tisa, Louis S; Cooper, Vaughn S; Hatcher, Philip J; Abebe, Eyualem; Thomas, W Kelley

    2015-07-18

    Entomopathogenic associations between nematodes in the genera Steinernema and Heterorhabdus with their cognate bacteria from the bacterial genera Xenorhabdus and Photorhabdus, respectively, are extensively studied for their potential as biological control agents against invasive insect species. These two highly coevolved associations were results of convergent evolution. Given the natural abundance of bacteria, nematodes and insects, it is surprising that only these two associations with no intermediate forms are widely studied in the entomopathogenic context. Discovering analogous systems involving novel bacterial and nematode species would shed light on the evolutionary processes involved in the transition from free living organisms to obligatory partners in entomopathogenicity. We report the complete genome sequence of a new member of the enterobacterial genus Serratia that forms a putative entomopathogenic complex with Caenorhabditis briggsae. Analysis of the 5.04 MB chromosomal genome predicts 4599 protein coding genes, seven sets of ribosomal RNA genes, 84 tRNA genes and a 64.8 KB plasmid encoding 74 genes. Comparative genomic analysis with three of the previously sequenced Serratia species, S. marcescens DB11 and S. proteamaculans 568, and Serratia sp. AS12, revealed that these four representatives of the genus share a core set of ~3100 genes and extensive structural conservation. The newly identified species shares a more recent common ancestor with S. marcescens with 99% sequence identity in rDNA sequence and orthology across 85.6% of predicted genes. Of the 39 genes/operons implicated in the virulence, symbiosis, recolonization, immune evasion and bioconversion, 21 (53.8%) were present in Serratia while 33 (84.6%) and 35 (89%) were present in Xenorhabdus and Photorhabdus EPN bacteria respectively. The majority of unique sequences in Serratia sp. SCBI (South African Caenorhabditis briggsae Isolate) are found in ~29 genomic islands of 5 to 65 genes and are

  13. Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

    PubMed Central

    Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

    2010-01-01

    Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665

  14. VenomKB, a new knowledge base for facilitating the validation of putative venom therapies

    PubMed Central

    Romano, Joseph D.; Tatonetti, Nicholas P.

    2015-01-01

    Animal venoms have been used for therapeutic purposes since the dawn of recorded history. Only a small fraction, however, have been tested for pharmaceutical utility. Modern computational methods enable the systematic exploration of novel therapeutic uses for venom compounds. Unfortunately, there is currently no comprehensive resource describing the clinical effects of venoms to support this computational analysis. We present VenomKB, a new publicly accessible knowledge base and website that aims to act as a repository for emerging and putative venom therapies. Presently, it consists of three database tables: (1) Manually curated records of putative venom therapies supported by scientific literature, (2) automatically parsed MEDLINE articles describing compounds that may be venom derived, and their effects on the human body, and (3) automatically retrieved records from the new Semantic Medline resource that describe the effects of venom compounds on mammalian anatomy. Data from VenomKB may be selectively retrieved in a variety of popular data formats, are open-source, and will be continually updated as venom therapies become better understood. PMID:26601758

  15. Purification and sequence analysis of 4-methyl-5-nitrocatechol oxygenase from Burkholderia sp. strain DNT.

    PubMed Central

    Haigler, B E; Suen, W C; Spain, J C

    1996-01-01

    4-Methyl-5-nitrocatechol (MNC) is an intermediate in the degradation of 2,4-dinitrotoluene by Burkholderia sp. strain DNT. In the presence of NADPH and oxygen, MNC monooxygenase catalyzes the removal of the nitro group from MNC to form 2-hydroxy-5-methylquinone. The gene (dntB) encoding MNC monooxygenase has been previously cloned and characterized. In order to examine the properties of MNC monooxygenase and to compare it with other enzymes, we sequenced the gene encoding the MNC monooxygenase and purified the enzyme from strain DNT. dntB was localized within a 2.2-kb ApaI DNA fragment. Sequence analysis of this fragment revealed an open reading frame of 1,644 bp with an N-terminal amino acid sequence identical to that of purified MNC monooxygenase from strain DNT. Comparison of the derived amino acid sequences with those of other genes showed that DntB contains the highly conserved ADP and flavin adenine dinucleotide (FAD) binding motifs characteristic of flavoprotein hydroxylases. MNC monooxygenase was purified to homogeneity from strain DNT by anion exchange and gel filtration chromatography. Sodium dodecyl sulfate-polyacrylamide gel electrophoresis revealed a single protein with a molecular weight of 60,200, which is consistent with the size determined from the gene sequence. The native molecular weight determined by gel filtration was 65,000, which indicates that the native enzyme is a monomer. It used either NADH or NADPH as electron donors, and NADPH was the preferred cofactor. The purified enzyme contained 1 mol of FAD per mol of protein, which is also consistent with the detection of an FAD binding motif in the amino acid sequence of DntB. MNC monooxygenase has a narrow substrate specificity. MNC and 4-nitrocatechol are good substrates whereas 3-methyl-4-nitrophenol, 3-methyl-4-nitrocatechol, 4-nitrophenol, 3-nitrophenol, and 4-chlorocatechol were not. These studies suggest that MNC monooxygenase is a flavoprotein that shares some properties with

  16. A comparative genomics strategy for targeted discovery of single-nucleotide polymorphisms and conserved-noncoding sequences in orphan crops.

    PubMed

    Feltus, F A; Singh, H P; Lohithaswa, H C; Schulze, S R; Silva, T D; Paterson, A H

    2006-04-01

    Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species.

  17. Coinheritance of hemoglobin D-Punjab and β0-thalassemia 3.4 kb deletion in a Thai girl

    PubMed Central

    Panyasai, Sitthichai; Rahad, Sarinna; Pornprasert, Sakorn

    2017-01-01

    Hemoglobin (Hb) D. Punjab [β121(GH4) Glu→Gln; HBB: C.364G>C] and β0-thalassemia 3.4 kb deletion are very rare in the Thai population. For the first time, the coinheritance of HbD-Punjab with β0-thalassemia 3.4 kb deletion was reported in a 7-year-old Thai girl. She had mild anemia (Hb 115.0 g/L and mean corpuscular hemoglobin 18.1 pg) with red blood cell microcytosis (mean corpuscular volume 52.5 fL). By capillary electrophoresis (CE), HbD-Punjab was found at a migration position of 180 s with the value of 81.9% while the level of HbA2 was 7.3%. Based on the elevated HbA2, the molecular analysis for detection of β0-thalassemia mutations was performed. The 490 bp amplified fragments from β0-thalassemia 3.4 kb deletion was observed. Thus, the coinheritance of HbD-Punjab with β0-thalassemia can be found in the Thai population. The HbA2 measured on CE is a reliable parameter for differentiating the homozygote of HbD-Punjab and compound heterozygote of HbD-Punjab and β0-thalassemia. PMID:28970692

  18. A Role for the NF-kb/Rel Transcription Factors in Human Breast Cancer

    DTIC Science & Technology

    1998-07-01

    binding proteins present in a series of nuclear extracts from cell lines and from breast tumor tissues as well as normal mammary epithelium. Finally, we...RelA is nuclear in several examples. Our recent data on nuclear extracts of breast tumors shows that there is a significant increase in NF-KB binding...Figure 2 in the appendix). Additionally, immunoblotting of nuclear extracts versus adjacent tissue controls showed that NF-KB p50, p52 and c-Rel were

  19. Sequence Data for Clostridium autoethanogenum using Three Generations of Sequencing Technologies

    DOE PAGES

    Utturkar, Sagar M.; Klingeman, Dawn Marie; Bruno-Barcena, José M.; ...

    2015-04-14

    During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequencemore » datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.« less

  20. Combining Next Generation Sequencing with Bulked Segregant Analysis to Fine Map a Stem Moisture Locus in Sorghum (Sorghum bicolor L. Moench).

    PubMed

    Han, Yucui; Lv, Peng; Hou, Shenglin; Li, Suying; Ji, Guisu; Ma, Xue; Du, Ruiheng; Liu, Guoqing

    2015-01-01

    Sorghum is one of the most promising bioenergy crops. Stem juice yield, together with stem sugar concentration, determines sugar yield in sweet sorghum. Bulked segregant analysis (BSA) is a gene mapping technique for identifying genomic regions containing genetic loci affecting a trait of interest that when combined with deep sequencing could effectively accelerate the gene mapping process. In this study, a dry stem sorghum landrace was characterized and the stem water controlling locus, qSW6, was fine mapped using QTL analysis and the combined BSA and deep sequencing technologies. Results showed that: (i) In sorghum variety Jiliang 2, stem water content was around 80% before flowering stage. It dropped to 75% during grain filling with little difference between different internodes. In landrace G21, stem water content keeps dropping after the flag leaf stage. The drop from 71% at flowering time progressed to 60% at grain filling time. Large differences exist between different internodes with the lowest (51%) at the 7th and 8th internodes at dough stage. (ii) A quantitative trait locus (QTL) controlling stem water content mapped on chromosome 6 between SSR markers Ch6-2 and gpsb069 explained about 34.7-56.9% of the phenotypic variation for the 5th to 10th internodes, respectively. (iii) BSA and deep sequencing analysis narrowed the associated region to 339 kb containing 38 putative genes. The results could help reveal molecular mechanisms underlying juice yield of sorghum and thus to improve total sugar yield.

  1. Analysis for complete genomic sequence of HLA-B and HLA-C alleles in the Chinese Han population.

    PubMed

    Zhu, F; He, Y; Zhang, W; He, J; He, J; Xu, X; Lv, H; Yan, L

    2011-08-01

    In the present study, we have determined the complete genomic sequence and analysed the intron polymorphism of partial HLA-B and HLA-C alleles in the Chinese Han population. Over 3.0 kb DNA fragments of HLA-B and HLA-C loci were amplified by polymerase chain reaction from partial 5' untranslated region to 3' noncoding region respectively, and then the amplified products were sequenced. Full-length nucleotide sequences of 14 HLA-B alleles and 10 HLA-C alleles were obtained and have been submitted to GenBank and IMGT/HLA database. Two novel alleles of HLA-B*52:01:01:02 and HLA-B*59:01:01:02 were identified, and the complete genomic sequence of HLA-B*52:01:01:01 was firstly reported. Totally 157 and 167 polymorphism positions were found in the full-length genomic sequence of HLA-B and HLA-C loci respectively. Our results suggested that many single nucleotide polymorphisms existed in the exon and intron regions, and the data can provide useful information for understanding the evolution of HLA-B and HLA-C alleles. © 2011 Blackwell Publishing Ltd.

  2. Kilo-sequencing: an ordered strategy for rapid DNA sequence data acquisition.

    PubMed Central

    Barnes, W M; Bevan, M

    1983-01-01

    A strategy for rapid DNA sequence acquisition in an ordered, nonrandom manner, while retaining all of the conveniences of the dideoxy method with M13 transducing phage DNA template, is described. Target DNA 3 to 14 kb in size can be stably carried by our M13 vectors. Suitable targets are stretches of DNA which lack an enzyme recognition site which is unique on our cloning vectors and adjacent to the sequencing primer; current sites that are so useful when lacking are Pst, Xba, HindIII, BglII, EcoRI. By an in vitro procedure, we cut RF DNA once randomly and once specifically, to create thousands of deletions which start at the unique restriction site adjacent to the dideoxy sequencing primer and extend various distances across the target DNA. Phage carrying a desired size of deletions, whose DNA as template will give rise to DNA sequence data in a desired location along the target DNA, may be purified by electrophoresis alive on agarose gels. Phage running in the same location on the agarose gel thus conveniently give rise to nucleotide sequence data from the same kilobase of target DNA. Images PMID:6298723

  3. DNA sequence responsible for the amplification of adjacent genes.

    PubMed

    Pasion, S G; Hartigan, J A; Kumar, V; Biswas, D K

    1987-10-01

    A 10.3-kb DNA fragment in the 5'-flanking region of the rat prolactin (rPRL) gene was isolated from F1BGH(1)2C1, a strain of rat pituitary tumor cells (GH cells) that produces prolactin in response to 5-bromodeoxyuridine (BrdU). Following transfection and integration into genomic DNA of recipient mouse L cells, this DNA induced amplification of the adjacent thymidine kinase gene from Herpes simplex virus type 1 (HSV1TK). We confirmed the ability of this "Amplicon" sequence to induce amplification of other linked or unlinked genes in DNA-mediated gene transfer studies. When transferred into the mouse L cells with the 10.3-5'rPRL gene sequence of BrdU-responsive cells, both the human growth hormone and the HSV1TK genes are amplified in response to 5-bromodeoxyuridine. This observation is substantiated by BrdU-induced amplification of the cotransferred bacterial Neo gene. Cotransfection studies reveal that the BrdU-induced amplification capability is associated with a 4-kb DNA sequence in the 5'-flanking region of the rPRL gene of BrdU-responsive cells. These results demonstrate that genes of heterologous origin, linked or unlinked, and selected or unselected, can be coamplified when located within the amplification boundary of the Amplicon sequence.

  4. Large-Scale Collection and Analysis of Full-Length cDNAs from Brachypodium distachyon and Integration with Pooideae Sequence Resources

    PubMed Central

    Mochida, Keiichi; Uehara-Yamaguchi, Yukiko; Takahashi, Fuminori; Yoshida, Takuhiro; Sakurai, Tetsuya; Shinozaki, Kazuo

    2013-01-01

    A comprehensive collection of full-length cDNAs is essential for correct structural gene annotation and functional analyses of genes. We constructed a mixed full-length cDNA library from 21 different tissues of Brachypodium distachyon Bd21, and obtained 78,163 high quality expressed sequence tags (ESTs) from both ends of ca. 40,000 clones (including 16,079 contigs). We updated gene structure annotations of Brachypodium genes based on full-length cDNA sequences in comparison with the latest publicly available annotations. About 10,000 non-redundant gene models were supported by full-length cDNAs; ca. 6,000 showed some transcription unit modifications. We also found ca. 580 novel gene models, including 362 newly identified in Bd21. Using the updated transcription start sites, we searched a total of 580 plant cis-motifs in the −3 kb promoter regions and determined a genome-wide Brachypodium promoter architecture. Furthermore, we integrated the Brachypodium full-length cDNAs and updated gene structures with available sequence resources in wheat and barley in a web-accessible database, the RIKEN Brachypodium FL cDNA database. The database represents a “one-stop” information resource for all genomic information in the Pooideae, facilitating functional analysis of genes in this model grass plant and seamless knowledge transfer to the Triticeae crops. PMID:24130698

  5. Evaluation of three read-depth based CNV detection tools using whole-exome sequencing data.

    PubMed

    Yao, Ruen; Zhang, Cheng; Yu, Tingting; Li, Niu; Hu, Xuyun; Wang, Xiumin; Wang, Jian; Shen, Yiping

    2017-01-01

    Whole exome sequencing (WES) has been widely accepted as a robust and cost-effective approach for clinical genetic testing of small sequence variants. Detection of copy number variants (CNV) within WES data have become possible through the development of various algorithms and software programs that utilize read-depth as the main information. The aim of this study was to evaluate three commonly used, WES read-depth based CNV detection programs using high-resolution chromosomal microarray analysis (CMA) as a standard. Paired CMA and WES data were acquired for 45 samples. A total of 219 CNVs (size ranged from 2.3 kb - 35 mb) identified on three CMA platforms (Affymetrix, Agilent and Illumina) were used as standards. CNVs were called from WES data using XHMM, CoNIFER, and CNVnator with modified settings. All three software packages detected an elevated proportion of small variants (< 20 kb) compared to CMA. XHMM and CoNIFER had poor detection sensitivity (22.2 and 14.6%), which correlated with the number of capturing probes involved. CNVnator detected most variants and had better sensitivity (87.7%); however, suffered from an overwhelming detection of small CNVs below 20 kb, which required further confirmation. Size estimation of variants was exaggerated by CNVnator and understated by XHMM and CoNIFER. Low concordances of CNV, detected by three different read-depth based programs, indicate the immature status of WES-based CNV detection. Low sensitivity and uncertain specificity of WES-based CNV detection in comparison with CMA based CNV detection suggests that CMA will continue to play an important role in detecting clinical grade CNV in the NGS era, which is largely based on WES.

  6. ESTuber db: an online database for Tuber borchii EST sequences.

    PubMed

    Lazzari, Barbara; Caprera, Andrea; Cosentino, Cristian; Stella, Alessandra; Milanesi, Luciano; Viotti, Angelo

    2007-03-08

    The ESTuber database (http://www.itb.cnr.it/estuber) includes 3,271 Tuber borchii expressed sequence tags (EST). The dataset consists of 2,389 sequences from an in-house prepared cDNA library from truffle vegetative hyphae, and 882 sequences downloaded from GenBank and representing four libraries from white truffle mycelia and ascocarps at different developmental stages. An automated pipeline was prepared to process EST sequences using public software integrated by in-house developed Perl scripts. Data were collected in a MySQL database, which can be queried via a php-based web interface. Sequences included in the ESTuber db were clustered and annotated against three databases: the GenBank nr database, the UniProtKB database and a third in-house prepared database of fungi genomic sequences. An algorithm was implemented to infer statistical classification among Gene Ontology categories from the ontology occurrences deduced from the annotation procedure against the UniProtKB database. Ontologies were also deduced from the annotation of more than 130,000 EST sequences from five filamentous fungi, for intra-species comparison purposes. Further analyses were performed on the ESTuber db dataset, including tandem repeats search and comparison of the putative protein dataset inferred from the EST sequences to the PROSITE database for protein patterns identification. All the analyses were performed both on the complete sequence dataset and on the contig consensus sequences generated by the EST assembly procedure. The resulting web site is a resource of data and links related to truffle expressed genes. The Sequence Report and Contig Report pages are the web interface core structures which, together with the Text search utility and the Blast utility, allow easy access to the data stored in the database.

  7. Genome Sequences of Akhmeta Virus, an Early Divergent Old World Orthopoxvirus.

    PubMed

    Gao, Jinxin; Gigante, Crystal; Khmaladze, Ekaterine; Liu, Pengbo; Tang, Shiyuyun; Wilkins, Kimberly; Zhao, Kun; Davidson, Whitni; Nakazawa, Yoshinori; Maghlakelidze, Giorgi; Geleishvili, Marika; Kokhreidze, Maka; Carroll, Darin S; Emerson, Ginny; Li, Yu

    2018-05-12

    Annotated whole genome sequences of three isolates of the Akhmeta virus (AKMV), a novel species of orthopoxvirus (OPXV), isolated from the Akhmeta and Vani regions of the country Georgia, are presented and discussed. The AKMV genome is similar in genomic content and structure to that of the cowpox virus (CPXV), but a lower sequence identity was found between AKMV and Old World OPXVs than between other known species of Old World OPXVs. Phylogenetic analysis showed that AKMV diverged prior to other Old World OPXV. AKMV isolates formed a monophyletic clade in the OPXV phylogeny, yet the sequence variability between AKMV isolates was higher than between the monkeypox virus strains in the Congo basin and West Africa. An AKMV isolate from Vani contained approximately six kb sequence in the left terminal region that shared a higher similarity with CPXV than with other AKMV isolates, whereas the rest of the genome was most similar to AKMV, suggesting recombination between AKMV and CPXV in a region containing several host range and virulence genes.

  8. Accurate, multi-kb reads resolve complex populations and detect rare microorganisms.

    PubMed

    Sharon, Itai; Kertesz, Michael; Hug, Laura A; Pushkarev, Dmitry; Blauwkamp, Timothy A; Castelle, Cindy J; Amirebrahimi, Mojgan; Thomas, Brian C; Burstein, David; Tringe, Susannah G; Williams, Kenneth H; Banfield, Jillian F

    2015-04-01

    Accurate evaluation of microbial communities is essential for understanding global biogeochemical processes and can guide bioremediation and medical treatments. Metagenomics is most commonly used to analyze microbial diversity and metabolic potential, but assemblies of the short reads generated by current sequencing platforms may fail to recover heterogeneous strain populations and rare organisms. Here we used short (150-bp) and long (multi-kb) synthetic reads to evaluate strain heterogeneity and study microorganisms at low abundance in complex microbial communities from terrestrial sediments. The long-read data revealed multiple (probably dozens of) closely related species and strains from previously undescribed Deltaproteobacteria and Aminicenantes (candidate phylum OP8). Notably, these are the most abundant organisms in the communities, yet short-read assemblies achieved only partial genome coverage, mostly in the form of short scaffolds (N50 = ∼ 2200 bp). Genome architecture and metabolic potential for these lineages were reconstructed using a new synteny-based method. Analysis of long-read data also revealed thousands of species whose abundances were <0.1% in all samples. Most of the organisms in this "long tail" of rare organisms belong to phyla that are also represented by abundant organisms. Genes encoding glycosyl hydrolases are significantly more abundant than expected in rare genomes, suggesting that rare species may augment the capability for carbon turnover and confer resilience to changing environmental conditions. Overall, the study showed that a diversity of closely related strains and rare organisms account for a major portion of the communities. These are probably common features of many microbial communities and can be effectively studied using a combination of long and short reads. © 2015 Sharon et al.; Published by Cold Spring Harbor Laboratory Press.

  9. Draft sequencing and analysis of the genome of pufferfish Takifugu flavidus.

    PubMed

    Gao, Yang; Gao, Qiang; Zhang, Huan; Wang, Lingling; Zhang, Fuchong; Yang, Chuanyan; Song, Linsheng

    2014-12-01

    The pufferfish Takifugu flavidus is an important economic species due to its outstanding flavour and high market value. It has been regarded as an excellent model of genetic study for decades as well. In the present study, three mate-pair libraries of T. flavidus genome were sequenced by the SOLiD 4 next-generation sequencing platform, and the draft genome was constructed with the short reads using an assisted assembly strategy. The draft consists of 50,947 scaffolds with an N50 value of 305.7 kb, and the average GC content was 45.2%. The combined length of repetitive sequences was 26.5 Mb, which accounted for 6.87% of the genome, indicating that the compactness of T. flavidus genome was approximative with that of T. rubripes genome. A total of 1,253 non-coding RNA genes and 30,285 protein-encoding genes were assigned to the genome. There were 132,775 and 394 presumptive genes playing roles in the colour pattern variation, the relatively slow growth and the lipid metabolism, respectively. Among them, genes involved in the microtubule-dependent transport system, angiogenesis, decapentaplegic pathway and lipid mobilization were significantly expanded in the T. flavidus genome. This draft genome provides a valuable resource for understanding and improving both fundamental and applied research with pufferfish in the future. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  10. A robust method to analyze copy number alterations of less than 100 kb in single cells using oligonucleotide array CGH.

    PubMed

    Möhlendick, Birte; Bartenhagen, Christoph; Behrens, Bianca; Honisch, Ellen; Raba, Katharina; Knoefel, Wolfram T; Stoecklein, Nikolas H

    2013-01-01

    Comprehensive genome wide analyses of single cells became increasingly important in cancer research, but remain to be a technically challenging task. Here, we provide a protocol for array comparative genomic hybridization (aCGH) of single cells. The protocol is based on an established adapter-linker PCR (WGAM) and allowed us to detect copy number alterations as small as 56 kb in single cells. In addition we report on factors influencing the success of single cell aCGH downstream of the amplification method, including the characteristics of the reference DNA, the labeling technique, the amount of input DNA, reamplification, the aCGH resolution, and data analysis. In comparison with two other commercially available non-linear single cell amplification methods, WGAM showed a very good performance in aCGH experiments. Finally, we demonstrate that cancer cells that were processed and identified by the CellSearch® System and that were subsequently isolated from the CellSearch® cartridge as single cells by fluorescence activated cell sorting (FACS) could be successfully analyzed using our WGAM-aCGH protocol. We believe that even in the era of next-generation sequencing, our single cell aCGH protocol will be a useful and (cost-) effective approach to study copy number alterations in single cells at resolution comparable to those reported currently for single cell digital karyotyping based on next generation sequencing data.

  11. A 590 kb deletion caused by non-allelic homologous recombination between two LINE-1 elements in a patient with mesomelia-synostosis syndrome.

    PubMed

    Kohmoto, Tomohiro; Naruto, Takuya; Watanabe, Miki; Fujita, Yuji; Ujiro, Sae; Okamoto, Nana; Horikawa, Hideaki; Masuda, Kiyoshi; Imoto, Issei

    2017-04-01

    Mesomelia-synostoses syndrome (MSS) is a rare, autosomal-dominant, syndromal osteochondrodysplasia characterized by mesomelic limb shortening, acral synostoses, and multiple congenital malformations due to a non-recurrent deletion at 8q13 that always encompasses two coding-genes, SULF1 and SLCO5A1. To date, five unrelated patients have been reported worldwide, and MMS was previously proposed to not be a genomic disorder associated with deletions recurring from non-allelic homologous recombination (NAHR) in at least two analyzed cases. We conducted targeted gene panel sequencing and subsequent array-based copy number analysis in an 11-year-old undiagnosed Japanese female patient with multiple congenital anomalies that included mesomelic limb shortening and detected a novel 590 Kb deletion at 8q13 encompassing the same gene set as reported previously, resulting in the diagnosis of MSS. Breakpoint sequences of the deleted region in our case demonstrated the first LINE-1s (L1s)-mediated unequal NAHR event utilizing two distant L1 elements as homology substrates in this disease, which may represent a novel causative mechanism of the 8q13 deletion, expanding the range of mechanisms involved in the chromosomal rearrangements responsible for MSS. © 2017 Wiley Periodicals, Inc.

  12. Complete genome sequence and phenotype microarray analysis of Cronobacter sakazakii SP291: a persistent isolate cultured from a powdered infant formula production facility

    PubMed Central

    Yan, Qiongqiong; Power, Karen A.; Cooney, Shane; Fox, Edward; Gopinath, Gopal R.; Grim, Christopher J.; Tall, Ben D.; McCusker, Matthew P.; Fanning, Séamus

    2013-01-01

    Outbreaks of human infection linked to the powdered infant formula (PIF) food chain and associated with the bacterium Cronobacter, are of concern to public health. These bacteria are regarded as opportunistic pathogens linked to life-threatening infections predominantly in neonates, with an under developed immune system. Monitoring the microbiological ecology of PIF production sites is an important step in attempting to limit the risk of contamination in the finished food product. Cronobacter species, like other microorganisms can adapt to the production environment. These organisms are known for their desiccation tolerance, a phenotype that can aid their survival in the production site and PIF itself. In evaluating the genome data currently available for Cronobacter species, no sequence information has been published describing a Cronobacter sakazakii isolate found to persist in a PIF production facility. Here we report on the complete genome sequence of one such isolate, Cronobacter sakazakii SP291 along with its phenotypic characteristics. The genome of C. sakazakii SP291 consists of a 4.3-Mb chromosome (56.9% GC) and three plasmids, denoted as pSP291-1, [118.1-kb (57.2% GC)], pSP291-2, [52.1-kb (49.2% GC)], and pSP291-3, [4.4-kb (54.0% GC)]. When C. sakazakii SP291 was compared to the reference C. sakazakii ATCC BAA-894, which is also of PIF origin, the annotated genome data identified two interesting functional categories, comprising of genes related to the bacterial stress response and resistance to antimicrobial and toxic compounds. Using a phenotypic microarray (PM), we provided a full metabolic profile comparing C. sakazakii SP291 and the previously sequenced C. sakazakii ATCC BAA-894. These data extend our understanding of the genome of this important neonatal pathogen and provides further insights into the genotypes associated with features that can contribute to its persistence in the PIF environment. PMID:24032028

  13. A novel tandem repeat sequence located on human chromosome 4p: isolation and characterization.

    PubMed

    Kogi, M; Fukushige, S; Lefevre, C; Hadano, S; Ikeda, J E

    1997-06-01

    In an effort to analyze the genomic region of the distal half of human chromosome 4p, to where Huntington disease and other diseases have been mapped, we have isolated the cosmid clone (CRS447) that was likely to contain a region with specific repeat sequences. Clone CRS447 was subjected to detailed analysis, including chromosome mapping, restriction mapping, and DNA sequencing. Chromosome mapping by both a human-CHO hybrid cell panel and FISH revealed that CRS447 was predominantly located in the 4p15.1-15.3 region. CRS447 was shown to consist of tandem repeats of 4.7-kb units present on chromosome 4p. A single EcoRI unit was subcloned (pRS447), and the complete sequence was determined as 4752 nucleotides. When pRS447 was used as a probe, the number of copies of this repeat per haploid genome was estimated to be 50-70. Sequence analysis revealed that it contained two internal CA repeats and one putative ORF. Database search established that this sequence was unreported. However, two homologous STS markers were found in the database. We concluded that CRS447/pRS447 is a novel tandem repeat sequence that is mainly specific to human chromosome 4p.

  14. Molecular Characterization of Transgene Integration by Next-Generation Sequencing in Transgenic Cattle

    PubMed Central

    Zhang, Ran; Yin, Yinliang; Zhang, Yujun; Li, Kexin; Zhu, Hongxia; Gong, Qin; Wang, Jianwu; Hu, Xiaoxiang; Li, Ning

    2012-01-01

    As the number of transgenic livestock increases, reliable detection and molecular characterization of transgene integration sites and copy number are crucial not only for interpreting the relationship between the integration site and the specific phenotype but also for commercial and economic demands. However, the ability of conventional PCR techniques to detect incomplete and multiple integration events is limited, making it technically challenging to characterize transgenes. Next-generation sequencing has enabled cost-effective, routine and widespread high-throughput genomic analysis. Here, we demonstrate the use of next-generation sequencing to extensively characterize cattle harboring a 150-kb human lactoferrin transgene that was initially analyzed by chromosome walking without success. Using this approach, the sites upstream and downstream of the target gene integration site in the host genome were identified at the single nucleotide level. The sequencing result was verified by event-specific PCR for the integration sites and FISH for the chromosomal location. Sequencing depth analysis revealed that multiple copies of the incomplete target gene and the vector backbone were present in the host genome. Upon integration, complex recombination was also observed between the target gene and the vector backbone. These findings indicate that next-generation sequencing is a reliable and accurate approach for the molecular characterization of the transgene sequence, integration sites and copy number in transgenic species. PMID:23185606

  15. A Comparative Genomics Strategy for Targeted Discovery of Single-Nucleotide Polymorphisms and Conserved-Noncoding Sequences in Orphan Crops1[W

    PubMed Central

    Feltus, F.A.; Singh, H.P.; Lohithaswa, H.C.; Schulze, S.R.; Silva, T.D.; Paterson, A.H.

    2006-01-01

    Completed genome sequences provide templates for the design of genome analysis tools in orphan species lacking sequence information. To demonstrate this principle, we designed 384 PCR primer pairs to conserved exonic regions flanking introns, using Sorghum/Pennisetum expressed sequence tag alignments to the Oryza genome. Conserved-intron scanning primers (CISPs) amplified single-copy loci at 37% to 80% success rates in taxa that sample much of the approximately 50-million years of Poaceae divergence. While the conserved nature of exons fostered cross-taxon amplification, the lesser evolutionary constraints on introns enhanced single-nucleotide polymorphism detection. For example, in eight rice (Oryza sativa) genotypes, polymorphism averaged 12.1 per kb in introns but only 3.6 per kb in exons. Curiously, among 124 CISPs evaluated across Oryza, Sorghum, Pennisetum, Cynodon, Eragrostis, Zea, Triticum, and Hordeum, 23 (18.5%) seemed to be subject to rigid intron size constraints that were independent of per-nucleotide DNA sequence variation. Furthermore, we identified 487 conserved-noncoding sequence motifs in 129 CISP loci. A large CISP set (6,062 primer pairs, amplifying introns from 1,676 genes) designed using an automated pipeline showed generally higher abundance in recombinogenic than in nonrecombinogenic regions of the rice genome, thus providing relatively even distribution along genetic maps. CISPs are an effective means to explore poorly characterized genomes for both DNA polymorphism and noncoding sequence conservation on a genome-wide or candidate gene basis, and also provide anchor points for comparative genomics across a diverse range of species. PMID:16607031

  16. A Deletion of More than 800 kb Is the Most Recurrent Mutation in Chilean Patients with SHOX Gene Defects.

    PubMed

    Poggi, Helena; Vera, Alejandra; Avalos, Carolina; Lagos, Marcela; Mellado, Cecilia; Aracena, Mariana; Aravena, Teresa; Garcia, Hernan; Godoy, Claudia; Cattani, Andreina; Reyes, Loreto; Lacourt, Patricia; Rumie, Hana; Mericq, Veronica; Arriaza, Marta; Martinez-Aguayo, Alejandro

    2015-01-01

    Deletions in the SHOX gene are the most frequent genetic cause of Leri-Weill syndrome and Langer mesomelic dysplasia, which are also present in idiopathic short stature. To describe the molecular and clinical findings observed in 23 of 45 non-consanguineous Chilean patients with different phenotypes related to SHOX deficiency. Multiplex ligation-dependent probe amplification was used to detect the deletions; the SHOX coding region and deletion-flanking areas were sequenced to identify point mutations and single-nucleotide polymorphisms (SNPs). The main genetic defects identified in 21 patients consisted of deletions; one of them, a large deletion of >800 kb, was found in 8 patients. Also, a smaller deletion of >350 kb was observed in 4 patients. Although we could not precisely determine the deletion breakpoint, we were able to identify a common haplotype in 7 of the 8 patients with the larger deletion based on 22 informative SNPs. These results suggest that the large deletion-bearing allele has a common ancestor and was either introduced by European immigrants or had originated in our Amerindian population. This study allowed us to identify one recurrent deletion in Chilean patients; also, it contributed to expanding our knowledge about the genetic background of our population. © 2015 S. Karger AG, Basel.

  17. Sequence Analysis of IncA/C and IncI1 Plasmids Isolated from Multidrug-Resistant Salmonella Newport Using Single-Molecule Real-Time Sequencing.

    PubMed

    Cao, Guojie; Allard, Marc; Hoffmann, Maria; Muruvanda, Tim; Luo, Yan; Payne, Justin; Meng, Kevin; Zhao, Shaohua; McDermott, Patrick; Brown, Eric; Meng, Jianghong

    2018-06-01

    Multidrug-resistant (MDR) plasmids play an important role in disseminating antimicrobial resistance genes. To elucidate the antimicrobial resistance gene compositions in A/C incompatibility complex (IncA/C) plasmids carried by animal-derived MDR Salmonella Newport, and to investigate the spread mechanism of IncA/C plasmids, this study characterizes the complete nucleotide sequences of IncA/C plasmids by comparative analysis. Complete nucleotide sequencing of plasmids and chromosomes of six MDR Salmonella Newport strains was performed using PacBio RSII. Open reading frames were assigned using prokaryotic genome annotation pipeline (PGAP). To understand genomic diversity and evolutionary relationships among Salmonella Newport IncA/C plasmids, we included three complete IncA/C plasmid sequences with similar backbones from Salmonella Newport and Escherichia coli: pSN254, pAM04528, and peH4H, and additional 200 draft chromosomes. With the exception of canine isolate CVM22462, which contained an additional IncI1 plasmid, each of the six MDR Salmonella Newport strains contained only the IncA/C plasmid. These IncA/C plasmids (including references) ranged in size from 80.1 (pCVM21538) to 176.5 kb (pSN254) and carried various resistance genes. Resistance genes floR, tetA, tetR, strA, strB, sul, and mer were identified in all IncA/C plasmids. Additionally, bla CMY-2 and sugE were present in all IncA/C plasmids, excepting pCVM21538. Plasmid pCVM22462 was capable of being transferred by conjugation. The IncI1 plasmid pCVM22462b in CVM22462 carried bla CMY-2 and sugE. Our data showed that MDR Salmonella Newport strains carrying similar IncA/C plasmids clustered together in the phylogenetic tree using chromosome sequences and the IncA/C plasmids from animal-derived Salmonella Newport contained diverse resistance genes. In the current study, we analyzed genomic diversities and phylogenetic relationships among MDR Salmonella Newport using complete plasmids and chromosome

  18. De novo sequencing, assembly and analysis of salivary gland transcriptome of Haemaphysalis flava and identification of sialoprotein genes.

    PubMed

    Xu, Xing-Li; Cheng, Tian-Yin; Yang, Hu; Yan, Fen; Yang, Ya

    2015-06-01

    Saliva plays an important role in feeding and pathogen transmission, identification and analysis of tick salivary gland (SG) proteins is considered as a hot spot in anti-tick researching area. Herein, we present the first description of SG transcriptome of Haemaphysalis flava using next-generation sequencing (NGS). A total of over 143 million high-quality reads were assembled into 54,357 unigenes, of which 20,145 (37.06%) had significant similarities to proteins in the Swiss-Prot database. 13,513 annotated sequences were associated with GO terms. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed that 14,280 unigenes were assigned to 279 KEGG pathways in total. Reads per kb per million reads (RPKM) analysis showed that there were 3035 down-regulated unigenes and 2260 up-regulated unigenes in the engorged ticks (ET) compared with the semi-engorged one (SET). Several important genes are associated with blood feeding and ingestion as secreted salivary proteins, concluding cysteine, longipain, 4D8, calreticulin, metalloproteases, serine protease inhibitor, enolase, heat shock protein and AV422 in SG, were identified. The qRT-PCR results confirmed that patterns of these genes (except for the longipain gene) expression were consistent with RNA-seq results. This de novo assembly of SG transcriptome of H. flava not only provides more chance for screening and cloning functional genes, but also forms a solid basis for further insight into the changes of salivary proteins during blood-feeding. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Genetic mapping of the LOBED LEAF 1 (ClLL1) gene to a 127.6-kb region in watermelon (Citrullus lanatus L.)

    PubMed Central

    Wei, Chunhua; Chen, Xiner; Wang, Zhongyuan; Liu, Qiyan; Li, Hao; Zhang, Yong; Ma, Jianxiang; Yang, Jianqiang

    2017-01-01

    The lobed leaf character is a unique morphologic trait in crops, featuring many potential advantages for agricultural productivity. Although the majority of watermelon varieties feature lobed leaves, the genetic factors responsible for lobed leaf formation remain elusive. The F2:3 leaf shape segregating population offers the opportunity to study the underlying mechanism of lobed leaf formation in watermelon. Genetic analysis revealed that a single dominant allele (designated ClLL1) controlled the lobed leaf trait. A large-sized F3:4 population derived from F2:3 individuals was used to map ClLL1. A total of 5,966 reliable SNPs and indels were identified genome-wide via a combination of BSA and RNA-seq. Using the validated SNP and indel markers, the location of ClLL1 was narrowed down to a 127.6-kb region between markers W08314 and W07061, containing 23 putative ORFs. Expression analysis via qRT-PCR revealed differential expression patterns (fold-changes above 2-fold or below 0.5-fold) of three ORFs (ORF3, ORF11, and ORF18) between lobed and non-lobed leaf plants. Based on gene annotation and expression analysis, ORF18 (encoding an uncharacterized protein) and ORF22 (encoding a homeobox-leucine zipper-like protein) were considered as most likely candidate genes. Furthermore, sequence analysis revealed no polymorphisms in cDNA sequences of ORF18; however, two notable deletions were identified in ORF22. This study is the first report to map a leaf shape gene in watermelon and will facilitate cloning and functional characterization of ClLL1 in future studies. PMID:28704497

  20. Genetic mapping of the LOBED LEAF 1 (ClLL1) gene to a 127.6-kb region in watermelon (Citrullus lanatus L.).

    PubMed

    Wei, Chunhua; Chen, Xiner; Wang, Zhongyuan; Liu, Qiyan; Li, Hao; Zhang, Yong; Ma, Jianxiang; Yang, Jianqiang; Zhang, Xian

    2017-01-01

    The lobed leaf character is a unique morphologic trait in crops, featuring many potential advantages for agricultural productivity. Although the majority of watermelon varieties feature lobed leaves, the genetic factors responsible for lobed leaf formation remain elusive. The F2:3 leaf shape segregating population offers the opportunity to study the underlying mechanism of lobed leaf formation in watermelon. Genetic analysis revealed that a single dominant allele (designated ClLL1) controlled the lobed leaf trait. A large-sized F3:4 population derived from F2:3 individuals was used to map ClLL1. A total of 5,966 reliable SNPs and indels were identified genome-wide via a combination of BSA and RNA-seq. Using the validated SNP and indel markers, the location of ClLL1 was narrowed down to a 127.6-kb region between markers W08314 and W07061, containing 23 putative ORFs. Expression analysis via qRT-PCR revealed differential expression patterns (fold-changes above 2-fold or below 0.5-fold) of three ORFs (ORF3, ORF11, and ORF18) between lobed and non-lobed leaf plants. Based on gene annotation and expression analysis, ORF18 (encoding an uncharacterized protein) and ORF22 (encoding a homeobox-leucine zipper-like protein) were considered as most likely candidate genes. Furthermore, sequence analysis revealed no polymorphisms in cDNA sequences of ORF18; however, two notable deletions were identified in ORF22. This study is the first report to map a leaf shape gene in watermelon and will facilitate cloning and functional characterization of ClLL1 in future studies.

  1. Population-genetic analysis of HvABCG31 promoter sequence in wild barley (Hordeum vulgare ssp. spontaneum)

    PubMed Central

    2012-01-01

    Background The cuticle is an important adaptive structure whose origin played a crucial role in the transition of plants from aqueous to terrestrial conditions. HvABCG31/Eibi1 is an ABCG transporter gene, involved in cuticle formation that was recently identified in wild barley (Hordeum vulgare ssp. spontaneum). To study the genetic variation of HvABCG31 in different habitats, its 2 kb promoter region was sequenced from 112 wild barley accessions collected from five natural populations from southern and northern Israel. The sites included three mesic and two xeric habitats, and differed in annual rainfall, soil type, and soil water capacity. Results Phylogenetic analysis of the aligned HvABCG31 promoter sequences clustered the majority of accessions (69 out of 71) from the three northern mesic populations into one cluster, while all 21 accessions from the Dead Sea area, a xeric southern population, and two isolated accessions (one from a xeric population at Mitzpe Ramon and one from the xeric ‘African Slope’ of “Evolution Canyon”) formed the second cluster. The southern arid populations included six haplotypes, but they differed from the consensus sequence at a large number of positions, while the northern mesic populations included 15 haplotypes that were, on average, more similar to the consensus sequence. Most of the haplotypes (20 of 22) were unique to a population. Interestingly, higher genetic variation occurred within populations (54.2%) than among populations (45.8%). Analysis of the promoter region detected a large number of transcription factor binding sites: 121–128 and 121–134 sites in the two southern arid populations, and 123–128,125–128, and 123–125 sites in the three northern mesic populations. Three types of TFBSs were significantly enriched: those related to GA (gibberellin), Dof (DNA binding with one finger), and light. Conclusions Drought stress and adaptive natural selection may have been important determinants in the observed

  2. Joint Sequence Analysis: Association and Clustering

    ERIC Educational Resources Information Center

    Piccarreta, Raffaella

    2017-01-01

    In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of…

  3. Characterization of a 3.3-kb plasmid of Escherichia coli O157:H7 and evaluation of stability of genetically engineered derivatives of this plasmid expressing green fluorescence.

    PubMed

    Sharma, Vijay K; Stanton, Thaddeus B

    2008-12-10

    Enterohemorrhagic Escherichia coli (EHEC) O157:H7 (strain 86-24) harbors a 3.3-kb plasmid (pSP70) that does not encode a selectable phenotype. A 1.1-kb fragment of DNA encoding kanamycin resistance (Kan(r)) was inserted by in vitro transposon mutagenesis at a random location on pSP70 to construct pSP70-Kan(r) that conferred Kan(r) to the host E. coli strain. Oligonucleotides complementary to 5' and 3' ends of the fragment encoding Kan(r) were used for initiating nucleotide sequencing from the plus and minus strands of pSP70, and thereafter primer walking was used to determine nucleotide sequence of pSP70. Analysis of nucleotide sequence revealed that pSP70 contained 3306 base pairs in its genome and that the genome was almost 100% identical to nucleotide sequences of small plasmids identified in EHEC O157:H7 isolates from Germany and Japan. A DNA cassette encoding a green fluorescent protein (GFP), ampicillin resistance (Amp(r)), and a double transcriptional terminator (DT) was cloned in pSP70 either at the BamHI site (created by deletion of mobA by PCR) or at the NsiI site located downstream of mobA to generate pSP70 DeltamobA-GFP/Amp(r)/DT (pSM431) and pSP70-GFP/Amp(r)/DT (pSM433), respectively. Introduction of pSM431 or pSM433 into EHEC O157:H7 yielded ampicillin-resistant colonies that glowed green under UV illumination. Consecutive subcultures of EHEC O157:H7, carrying pSM431 or pSM433 under conditions simulating the environment of bovine intestine (no selective antibiotic, incubation temperature of 39 degrees C, with or without oxygen), demonstrated that these plasmids were highly stable as greater than 95% of the isolates recovered from these subcultures were positive for green fluorescence. These findings indicate that EHEC O157:H7 carrying pSM431 or pSM433 would be useful for studying persistence and shedding of this important food-borne pathogen in cattle.

  4. Genomic insights from whole genome sequencing of four clonal outbreak Campylobacter jejuni assessed within the global C. jejuni population.

    PubMed

    Clark, Clifford G; Berry, Chrystal; Walker, Matthew; Petkau, Aaron; Barker, Dillon O R; Guan, Cai; Reimer, Aleisha; Taboada, Eduardo N

    2016-12-03

    Whole genome sequencing (WGS) is useful for determining clusters of human cases, investigating outbreaks, and defining the population genetics of bacteria. It also provides information about other aspects of bacterial biology, including classical typing results, virulence, and adaptive strategies of the organism. Cell culture invasion and protein expression patterns of four related multilocus sequence type 21 (ST21) C. jejuni isolates from a significant Canadian water-borne outbreak were previously associated with the presence of a CJIE1 prophage. Whole genome sequencing was used to examine the genetic diversity among these isolates and confirm that previous observations could be attributed to differential prophage carriage. Moreover, we sought to determine the presence of genome sequences that could be used as surrogate markers to delineate outbreak-associated isolates. Differential carriage of the CJIE1 prophage was identified as the major genetic difference among the four outbreak isolates. High quality single-nucleotide variant (hqSNV) and core genome multilocus sequence typing (cgMLST) clustered these isolates within expanded datasets consisting of additional C. jejuni strains. The number and location of homopolymeric tract regions was identical in all four outbreak isolates but differed from all other C. jejuni examined. Comparative genomics and PCR amplification enabled the identification of large chromosomal inversions of approximately 93 kb and 388 kb within the outbreak isolates associated with transducer-like proteins containing long nucleotide repeat sequences. The 93-kb inversion was characteristic of the outbreak-associated isolates, and the gene content of this inverted region displayed high synteny with the reference strain. The four outbreak isolates were clonally derived and differed mainly in the presence of the CJIE1 prophage, validating earlier findings linking the prophage to phenotypic differences in virulence assays and protein expression

  5. Enhanced Optical Breakdown in KB Cells Labeled with Folate-Targeted Silver/Dendrimer Composite Nanodevices

    PubMed Central

    Tse, Christine; Zohdy, Marwa J.; Ye, Jing Yong; O'Donnell, Matthew; Lesniak, Wojciech; Balogh, Lajos

    2010-01-01

    Enhanced optical breakdown of KB cells (a human oral epidermoid cancer cell known to overexpress folate receptors) targeted with silver/dendrimer composite nanodevices (CNDs) is described. CNDs {(Ag0}25-PAMAM_E5.(NH2)42(NGly)74(NFA)2.7} were fabricated by reactive encapsulation, using a biocompatible template of dendrimer-folic acid (FA) conjugates. Preferential uptake of the folate-targeted CNDs (of various treatment concentrations and surface functionality) by KB cells was visualized with confocal microscopy and transmission electron microscopy (TEM). Intracellular laser-induced optical breakdown (LIOB) threshold and dynamics were detected and characterized by high-frequency ultrasonic monitoring of resulting transient bubble events. When irradiated with a near-infrared (NIR), femtosecond laser, the CND-targeted KB cells acted as well-confined activators of laser energy, enhancing nonlinear energy absorption, exhibiting a significant reduction in breakdown threshold, and thus selectively promoting intracellular LIOB. PMID:20883823

  6. RSAT: regulatory sequence analysis tools.

    PubMed

    Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

    2008-07-01

    The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.

  7. Complete Chloroplast Genome Sequences of Mongolia Medicine Artemisia frigida and Phylogenetic Relationships with Other Plants

    PubMed Central

    Liu, Yue; Huo, Naxin; Dong, Lingli; Wang, Yi; Zhang, Shuixian; Young, Hugh A.; Feng, Xiaoxiao; Gu, Yong Qiang

    2013-01-01

    Background Artemisia frigida Willd. is an important Mongolian traditional medicinal plant with pharmacological functions of stanch and detumescence. However, there is little sequence and genomic information available for Artemisia frigida, which makes phylogenetic identification, evolutionary studies, and genetic improvement of its value very difficult. We report the complete chloroplast genome sequence of Artemisia frigida based on 454 pyrosequencing. Methodology/Principal Findings The complete chloroplast genome of Artemisia frigida is 151,076 bp including a large single copy (LSC) region of 82,740 bp, a small single copy (SSC) region of 18,394 bp and a pair of inverted repeats (IRs) of 24,971 bp. The genome contains 114 unique genes and 18 duplicated genes. The chloroplast genome of Artemisia frigida contains a small 3.4 kb inversion within a large 23 kb inversion in the LSC region, a unique feature in Asteraceae. The gene order in the SSC region of Artemisia frigida is inverted compared with the other 6 Asteraceae species with the chloroplast genomes sequenced. This inversion is likely caused by an intramolecular recombination event only occurred in Artemisia frigida. The existence of rich SSR loci in the Artemisia frigida chloroplast genome provides a rare opportunity to study population genetics of this Mongolian medicinal plant. Phylogenetic analysis demonstrates a sister relationship between Artemisia frigida and four other species in Asteraceae, including Ageratina adenophora, Helianthus annuus, Guizotia abyssinica and Lactuca sativa, based on 61 protein-coding sequences. Furthermore, Artemisia frigida was placed in the tribe Anthemideae in the subfamily Asteroideae (Asteraceae) based on ndhF and trnL-F sequence comparisons. Conclusion The chloroplast genome sequence of Artemisia frigida was assembled and analyzed in this study, representing the first plastid genome sequenced in the Anthemideae tribe. This complete chloroplast genome sequence will be

  8. PISMA: A Visual Representation of Motif Distribution in DNA Sequences.

    PubMed

    Alcántara-Silva, Rogelio; Alvarado-Hermida, Moisés; Díaz-Contreras, Gibrán; Sánchez-Barrios, Martha; Carrera, Samantha; Galván, Silvia Carolina

    2017-01-01

    Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code-like, as a gene-map-like, and as a transcript scheme. We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf.

  9. Calibrating genomic and allelic coverage bias in single-cell sequencing.

    PubMed

    Zhang, Cheng-Zhong; Adalsteinsson, Viktor A; Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L; Meyerson, Matthew; Love, J Christopher

    2015-04-16

    Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1-10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (∼0.1 × ) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples.

  10. Calibrating genomic and allelic coverage bias in single-cell sequencing

    PubMed Central

    Francis, Joshua; Cornils, Hauke; Jung, Joonil; Maire, Cecile; Ligon, Keith L.; Meyerson, Matthew; Love, J. Christopher

    2016-01-01

    Artifacts introduced in whole-genome amplification (WGA) make it difficult to derive accurate genomic information from single-cell genomes and require different analytical strategies from bulk genome analysis. Here, we describe statistical methods to quantitatively assess the amplification bias resulting from whole-genome amplification of single-cell genomic DNA. Analysis of single-cell DNA libraries generated by different technologies revealed universal features of the genome coverage bias predominantly generated at the amplicon level (1–10 kb). The magnitude of coverage bias can be accurately calibrated from low-pass sequencing (~0.1 ×) to predict the depth-of-coverage yield of single-cell DNA libraries sequenced at arbitrary depths. We further provide a benchmark comparison of single-cell libraries generated by multi-strand displacement amplification (MDA) and multiple annealing and looping-based amplification cycles (MALBAC). Finally, we develop statistical models to calibrate allelic bias in single-cell whole-genome amplification and demonstrate a census-based strategy for efficient and accurate variant detection from low-input biopsy samples. PMID:25879913

  11. Comparative sequence analysis of a region on human chromosome 13q14, frequently deleted in B-cell chronic lymphocytic leukemia, and its homologous region on mouse chromosome 14.

    PubMed

    Kapanadze, B; Makeeva, N; Corcoran, M; Jareborg, N; Hammarsund, M; Baranova, A; Zabarovsky, E; Vorontsova, O; Merup, M; Gahrton, G; Jansson, M; Yankovsky, N; Einhorn, S; Oscier, D; Grandér, D; Sangfelt, O

    2000-12-15

    Previous studies have indicated the presence of a putative tumor suppressor gene on human chromosome 13q14, commonly deleted in patients with B-cell chronic lymphocytic leukemia (B-CLL). We have recently identified a minimally deleted region encompassing parts of two adjacent genes, termed LEU1 and LEU2 (leukemia-associated genes 1 and 2), and several additional transcripts. In addition, 50 kb centromeric to this region we have identified another gene, LEU5/RFP2. To elucidate further the complex genomic organization of this region, we have identified, mapped, and sequenced the homologous region in the mouse. Fluorescence in situ hybridization analysis demonstrated that the region maps to mouse chromosome 14. The overall organization and gene order in this region were found to be highly conserved in the mouse. Sequence comparison between the human deletion hotspot region and its homologous mouse region revealed a high degree of sequence conservation with an overall score of 74%. However, our data also show that in terms of transcribed sequences, only two of those, human LEU2 and LEU5/RFP2, are clearly conserved, strengthening the case for these genes as putative candidate B-CLL tumor suppressor genes.

  12. Draft genome sequence of the silver pomfret fish, Pampus argenteus.

    PubMed

    AlMomin, Sabah; Kumar, Vinod; Al-Amad, Sami; Al-Hussaini, Mohsen; Dashti, Talal; Al-Enezi, Khaznah; Akbar, Abrar

    2016-01-01

    Silver pomfret, Pampus argenteus, is a fish species from coastal waters. Despite its high commercial value, this edible fish has not been sequenced. Hence, its genetic and genomic studies have been limited. We report the first draft genome sequence of the silver pomfret obtained using a Next Generation Sequencing (NGS) technology. We assembled 38.7 Gb of nucleotides into scaffolds of 350 Mb with N50 of about 1.5 kb, using high quality paired end reads. These scaffolds represent 63.7% of the estimated silver pomfret genome length. The newly sequenced and assembled genome has 11.06% repetitive DNA regions, and this percentage is comparable to that of the tilapia genome. The genome analysis predicted 16 322 genes. About 91% of these genes showed homology with known proteins. Many gene clusters were annotated to protein and fatty-acid metabolism pathways that may be important in the context of the meat texture and immune system developmental processes. The reference genome can pave the way for the identification of many other genomic features that could improve breeding and population-management strategies, and it can also help characterize the genetic diversity of P. argenteus.

  13. Prediction of water-rock interaction to 50 kb and 1,000 °C with equations of state for aqueous species

    NASA Astrophysics Data System (ADS)

    Sverjensky, D. A.; Harrison, B. W.; Azzolini, D.

    2012-12-01

    Comprehensive quantitative theoretical evaluation of water-rock interactions under deep crustal and upper mantle conditions has long been restricted to a pressure of 5.0 kb - too low to address mantle metasomatism in subduction zones or the origin of diamond. The reason for this restriction is the lack of information on the dielectric constant of water (ɛH2O) needed for the revised Helgeson-Kirkham-Flowers (HKF) equations for aqueous species [1]. Equation of state coefficients are available for hundreds of aqueous species in SUPCRT92 [2], but calculations can only be made to 5.0 kb. One way around this involves empirical extrapolation of equilibrium constants as functions of the logarithm of the density of water (ρH2O) [3]. However, this approach is best suited to simple systems. In order to model water-rock interactions, scores of equilibrium constants involving minerals and aqueous species must be known and internal consistency maintained. In the present study, the applicability of the HKF equations for aqueous species was extended to 50 kb by developing estimates of ɛH2O. We used a statistical mechanically-based equation for ɛ of a hard-sphere fluid applicable to water and other fluids [4]. It was calibrated with experimental data [5] and data from a comprehensive analysis of the literature [6] and extrapolated to a density of 1.1 g.cm-3. Values of ln(ɛH2O) were found to be linear with ln(ρH2O) enabling estimation of ɛH2O to 50 kb. Values of ρH2O were computed with a comprehensive evaluation [7] chosen because it is closely consistent with experimental data at less than 10 kb [8] as well as fluid inclusion studies to 40 kb [9]. Standard Gibbs free energies of water as a function of temperature and pressure were also calculated using volumes from [7]. The resulting dielectric constants were tested at 727 °C and 50 kb by comparison with the results of molecular dynamics [10] and ab initio quantum chemical calculations [11]. Additional testing was carried

  14. Auditory sequence analysis and phonological skill

    PubMed Central

    Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E.; Turton, Stuart; Griffiths, Timothy D.

    2012-01-01

    This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence. PMID:22951739

  15. ChloroKB: A Web Application for the Integration of Knowledge Related to Chloroplast Metabolic Network.

    PubMed

    Gloaguen, Pauline; Bournais, Sylvain; Alban, Claude; Ravanel, Stéphane; Seigneurin-Berny, Daphné; Matringe, Michel; Tardif, Marianne; Kuntz, Marcel; Ferro, Myriam; Bruley, Christophe; Rolland, Norbert; Vandenbrouck, Yves; Curien, Gilles

    2017-06-01

    Higher plants, as autotrophic organisms, are effective sources of molecules. They hold great promise for metabolic engineering, but the behavior of plant metabolism at the network level is still incompletely described. Although structural models (stoichiometry matrices) and pathway databases are extremely useful, they cannot describe the complexity of the metabolic context, and new tools are required to visually represent integrated biocurated knowledge for use by both humans and computers. Here, we describe ChloroKB, a Web application (http://chlorokb.fr/) for visual exploration and analysis of the Arabidopsis ( Arabidopsis thaliana ) metabolic network in the chloroplast and related cellular pathways. The network was manually reconstructed through extensive biocuration to provide transparent traceability of experimental data. Proteins and metabolites were placed in their biological context (spatial distribution within cells, connectivity in the network, participation in supramolecular complexes, and regulatory interactions) using CellDesigner software. The network contains 1,147 reviewed proteins (559 localized exclusively in plastids, 68 in at least one additional compartment, and 520 outside the plastid), 122 proteins awaiting biochemical/genetic characterization, and 228 proteins for which genes have not yet been identified. The visual presentation is intuitive and browsing is fluid, providing instant access to the graphical representation of integrated processes and to a wealth of refined qualitative and quantitative data. ChloroKB will be a significant support for structural and quantitative kinetic modeling, for biological reasoning, when comparing novel data with established knowledge, for computer analyses, and for educational purposes. ChloroKB will be enhanced by continuous updates following contributions from plant researchers. © 2017 American Society of Plant Biologists. All Rights Reserved.

  16. Molecular Cloning and Analysis of L(1)ogre, a Locus of Drosophila Melanogaster with Prominent Effects on the Postembryonic Development of the Central Nervous System

    PubMed Central

    Watanabe, T.; Kankel, D. R.

    1990-01-01

    Previous genetic studies have shown that wild-type function of the l(1)ogre (lethal (1) optic ganglion reduced) locus is essential for the generation and/or maintenance of the postembryonic neuroblasts including those from which the optic lobe is descended. In the present study molecular isolation and characterization of the l(1)ogre locus was carried out to study the structure and expression of this gene in order to gain information about the nature of l(1)ogre function and its relevance to the development of the central nervous system. About 70 kilobases (kb) of genomic DNA were isolated that spanned the region where l(1)ogre was known to reside. Southern analysis of a l(1)ogre mutation and subsequent P element-mediated DNA transformation mapped the l(1)ogre(+) function within a genomic fragment of 12.5 kb. Northern analyses showed that a 2.9-kb message transcribed from this 12.5-kb region represented l(1)ogre. A 2.15-kb portion of a corresponding cDNA clone was sequenced. An open reading frame (ORF) of 1,086 base pairs was found, and a protein sequence of 362 amino acids with one highly hydrophobic segment was deduced from conceptual translation of this ORF. PMID:1963867

  17. Insights into the Evolution of Mitochondrial Genome Size from Complete Sequences of Citrullus lanatus and Cucurbita pepo (Cucurbitaceae)

    PubMed Central

    Alverson, Andrew J.; Wei, XiaoXin; Rice, Danny W.; Stern, David B.; Barry, Kerrie; Palmer, Jeffrey D.

    2010-01-01

    The mitochondrial genomes of seed plants are unusually large and vary in size by at least an order of magnitude. Much of this variation occurs within a single family, the Cucurbitaceae, whose genomes range from an estimated 390 to 2,900 kb in size. We sequenced the mitochondrial genomes of Citrullus lanatus (watermelon: 379,236 nt) and Cucurbita pepo (zucchini: 982,833 nt)—the two smallest characterized cucurbit mitochondrial genomes—and determined their RNA editing content. The relatively compact Citrullus mitochondrial genome actually contains more and longer genes and introns, longer segmental duplications, and more discernibly nuclear-derived DNA. The large size of the Cucurbita mitochondrial genome reflects the accumulation of unprecedented amounts of both chloroplast sequences (>113 kb) and short repeated sequences (>370 kb). A low mutation rate has been hypothesized to underlie increases in both genome size and RNA editing frequency in plant mitochondria. However, despite its much larger genome, Cucurbita has a significantly higher synonymous substitution rate (and presumably mutation rate) than Citrullus but comparable levels of RNA editing. The evolution of mutation rate, genome size, and RNA editing are apparently decoupled in Cucurbitaceae, reflecting either simple stochastic variation or governance by different factors. PMID:20118192

  18. High Quality Maize Centromere 10 Sequence Reveals Evidence of Frequent Recombination Events

    PubMed Central

    Wolfgruber, Thomas K.; Nakashima, Megan M.; Schneider, Kevin L.; Sharma, Anupma; Xie, Zidian; Albert, Patrice S.; Xu, Ronghui; Bilinski, Paul; Dawe, R. Kelly; Ross-Ibarra, Jeffrey; Birchler, James A.; Presting, Gernot G.

    2016-01-01

    The ancestral centromeres of maize contain long stretches of the tandemly arranged CentC repeat. The abundance of tandem DNA repeats and centromeric retrotransposons (CR) has presented a significant challenge to completely assembling centromeres using traditional sequencing methods. Here, we report a nearly complete assembly of the 1.85 Mb maize centromere 10 from inbred B73 using PacBio technology and BACs from the reference genome project. The error rates estimated from overlapping BAC sequences are 7 × 10−6 and 5 × 10−5 for mismatches and indels, respectively. The number of gaps in the region covered by the reassembly was reduced from 140 in the reference genome to three. Three expressed genes are located between 92 and 477 kb from the inferred ancestral CentC cluster, which lies within the region of highest centromeric repeat density. The improved assembly increased the count of full-length CR from 5 to 55 and revealed a 22.7 kb segmental duplication that occurred approximately 121,000 years ago. Our analysis provides evidence of frequent recombination events in the form of partial retrotransposons, deletions within retrotransposons, chimeric retrotransposons, segmental duplications including higher order CentC repeats, a deleted CentC monomer, centromere-proximal inversions, and insertion of mitochondrial sequences. Double-strand DNA break (DSB) repair is the most plausible mechanism for these events and may be the major driver of centromere repeat evolution and diversity. In many cases examined here, DSB repair appears to be mediated by microhomology, suggesting that tandem repeats may have evolved to efficiently repair frequent DSBs in centromeres. PMID:27047500

  19. Enhanced Salt Tolerance Conferred by the Complete 2.3 kb cDNA of the Rice Vacuolar Na+/H+ Antiporter Gene Compared to 1.9 kb Coding Region with 5′ UTR in Transgenic Lines of Rice

    PubMed Central

    Amin, U. S. M.; Biswas, Sudip; Elias, Sabrina M.; Razzaque, Samsad; Haque, Taslima; Malo, Richard; Seraj, Zeba I.

    2016-01-01

    Soil salinity is one of the most challenging problems that restricts the normal growth and production of rice worldwide. It has therefore become very important to produce more saline tolerant rice varieties. This study shows constitutive over-expression of the vacuolar Na+/H+ antiporter gene (OsNHX1) from the rice landrace (Pokkali) and attainment of enhanced level of salinity tolerance in transgenic rice plants. It also shows that inclusion of the complete un-translated regions (UTRs) of the alternatively spliced OsNHX1 gene provides a higher level of tolerance to the transgenic rice. Two separate transformation events of the OsNHX1 gene, one with 1.9 kb region containing the 5′ UTR with CDS and the other of 2.3 kb, including 5′ UTR, CDS, and the 3′ UTR regions were performed. The transgenic plants with these two different constructs were advanced to the T3 generation and physiological and molecular screening of homozygous plants was conducted at seedling and reproductive stages under salinity (NaCl) stress. Both transgenic lines were observed to be tolerant compared to WT plants at both physiological stages. However, the transgenic lines containing the CDS with both the 5′ and 3′ UTR were significantly more tolerant compared to the transgenic lines containing OsNHX1 gene without the 3′ UTR. At the seedling stage at 12 dS/m stress, the chlorophyll content was significantly higher (P < 0.05) and the electrolyte leakage significantly lower (P < 0.05) in the order 2.3 kb > 1.9 kb > and WT lines. Yield in g/plant in the best line from the 2.3 kb plants was significantly more (P < 0.01) compared, respectively, to the best 1.9 kb line and WT plants at stress of 6 dS/m. Transformation with the complete transcripts rather than the CDS may therefore provide more durable level of tolerance. PMID:26834778

  20. SMRT sequencing of the Vitis vinifera cv. ‘Flame seedless’ genome using a SMRTbell-free library preparation from Swift Biosciences

    USDA-ARS?s Scientific Manuscript database

    Single Molecule Real-Time (SMRT) sequencing provides advantages to the sequencing of complex genomes. The long reads generated are superior for resolving complex genomic regions and provide highly contiguous de novo assemblies. Current SMRTbell libraries generate average read lengths of 10-15kb. How...

  1. Minimap2: pairwise alignment for nucleotide sequences.

    PubMed

    Li, Heng

    2018-05-10

    Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥ 100bp in length, ≥1kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. https://github.com/lh3/minimap2. hengli@broadinstitute.org.

  2. Comparative analysis of the complete genome of an epidemic hospital sequence type 203 clone of vancomycin-resistant Enterococcus faecium

    PubMed Central

    2013-01-01

    Background In this report we have explored the genomic and microbiological basis for a sustained increase in bloodstream infections at a major Australian hospital caused by Enterococcus faecium multi-locus sequence type (ST) 203, an outbreak strain that has largely replaced a predecessor ST17 sequence type. Results To establish a ST203 reference sequence we fully assembled and annotated the genome of Aus0085, a 2009 vancomycin-resistant Enterococcus faecium (VREfm) bloodstream isolate, and the first example of a completed ST203 genome. Aus0085 has a 3.2 Mb genome, comprising a 2.9 Mb circular chromosome and six circular plasmids (2 kb–130 kb). Twelve percent of the 3222 coding sequences (CDS) in Aus0085 are not present in ST17 E. faecium Aus0004 and ST18 E. faecium TX16. Extending this comparison to an additional 12 ST17 and 14 ST203 E. faecium hospital isolate genomes revealed only six genomic regions spanning 41 kb that were present in all ST203 and absent from all ST17 genomes. The 40 CDS have predicted functions that include ion transport, riboflavin metabolism and two phosphotransferase systems. Comparison of the vancomycin resistance-conferring Tn1549 transposon between Aus0004 and Aus0085 revealed differences in transposon length and insertion site, and van locus sequence variation that correlated with a higher vancomycin MIC in Aus0085. Additional phenotype comparisons between ST17 and ST203 isolates showed that while there were no differences in biofilm-formation and killing of Galleria mellonella, ST203 isolates grew significantly faster and out-competed ST17 isolates in growth assays. Conclusions Here we have fully assembled and annotated the first ST203 genome, and then characterized the genomic differences between ST17 and ST203 E. faecium. We also show that ST203 E. faecium are faster growing and can out-compete ST17 E. faecium. While a causal genetic basis for these phenotype differences is not provided here, this study revealed conserved genetic

  3. A prokaryotic viral sequence is expressed and conserved in mammalian brain.

    PubMed

    Yeh, Yang-Hui; Gunasekharan, Vignesh; Manuelidis, Laura

    2017-07-03

    A natural and permanent transfer of prokaryotic viral sequences to mammals has not been reported by others. Circular "SPHINX" DNAs <5 kb were previously isolated from nuclease-protected cytoplasmic particles in rodent neuronal cell lines and brain. Two of these DNAs were sequenced after Φ29 polymerase amplification, and they revealed significant but imperfect homology to segments of commensal Acinetobacter phage viruses. These findings were surprising because the brain is isolated from environmental microorganisms. The 1.76-kb DNA sequence (SPHINX 1.8), with an iteron before its ORF, was evaluated here for its expression in neural cells and brain. A rabbit affinity purified antibody generated against a peptide without homology to mammalian sequences labeled a nonglycosylated ∼41-kDa protein (spx1) on Western blots, and the signal was efficiently blocked by the competing peptide. Spx1 was resistant to limited proteinase K digestion, but was unrelated to the expression of host prion protein or its pathologic amyloid form. Remarkably, spx1 concentrated in selected brain synapses, such as those on anterior motor horn neurons that integrate many complex neural inputs. SPHINX 1.8 appears to be involved in tissue-specific differentiation, including essential functions that preserve its propagation during mammalian evolution, possibly via maternal inheritance. The data here indicate that mammals can share and exchange a larger world of prokaryotic viruses than previously envisioned.

  4. Transposon-like properties of the major, long repetitive sequence family in the genome of Physarum polycephalum

    PubMed Central

    Pearston, Douglas H.; Gordon, Mairi; Hardman, Norman

    1985-01-01

    A family of long, highly-repetitive sequences, referred to previously as `HpaII-repeats', dominates the genome of the eukaryotic slime mould Physarum polycephalum. These sequences are found exclusively in scrambled clusters. They account for about one-half of the total complement of repetitive DNA in Physarum, and represent the major sequence component found in hypermethylated, 20-50 kb segments of Physarum genomic DNA that fail to be cleaved using the restriction endonuclease HpaII. The structure of this abundant repetitive element was investigated by analysing cloned segments derived from the hypermethylated genomic DNA compartment. We show that the `HpaII-repeat' forms part of a larger repetitive DNA structure, ∼8.6 kb in length, with several structural features in common with recognised eukaryotic transposable genetic elements. Scrambled clusters of the sequence probably arise as a result of transposition-like events, during which the element preferentially recombines in either orientation with target sites located in other copies of the same repeated sequence. The target sites for transposition/recombination are not related in sequence but in all cases studied they are potentially capable of promoting the formation of small `cruciforms' or `Z-DNA' structures which might be recognised during the recombination process. ImagesFig. 3.Fig. 4. PMID:16453652

  5. Structural and transcription analysis of two homologous genes for the P700 chlorophyll a-apoproteins in Chlamydomonas reinhardii: evidence for in vivo trans-splicing

    PubMed Central

    Kück, Ulrich; Choquet, Yves; Schneider, Michel; Dron, Michel; Bennoun, Pierre

    1987-01-01

    The two homologous genes for the P700 chlorophyll a-apoproteins (ps1A1 and ps1A2) are encoded by the plastom in the green alga Chlamydomonas reinhardii. The structure and organization of the two genes were determined by comparison with the homologous genes from maize using data from heterologous hybridizations as well as from DNA and RNA sequencing. While the ps1A2 (736 codons) gene shows a continuous gene organization, the ps1A1 (754 codons) gene possesses some unusual features. The discontinuous gene is split into three separate exons which are scattered around the circular chloroplast genome. Exon 1 (86 bp) is separated by ∼50 kb from exon 2 (198 bp), which is located ∼ 90 kb apart from exon 3 (1984 bp). All exons are flanked by intronic sequences of group II. Transcription analysis reveals that the ps1A2 gene hybridizes with a 2.8-kb transcript, while all exon regions of the ps1A1 gene are homologous to a mature mRNA of 2.7 kb. From our data we conclude that the three distantly separated exonic sequences of the ps1A1 gene constitute a functional gene which probably operates by a trans-splicing mechanism. ImagesFig. 3.Fig. 5.Fig. 6. PMID:16453785

  6. Nucleotide Sequence Diversity and Linkage Disequilibrium of Four Nuclear Loci in Foxtail Millet (Setaria italica).

    PubMed

    He, Shui-Lian; Yang, Yang; Morrell, Peter L; Yi, Ting-Shuang

    2015-01-01

    Foxtail millet (Setaria italica (L.) Beauv) is one of the earliest domesticated grains, which has been cultivated in northern China by 8,700 years before present (YBP) and across Eurasia by 4,000 YBP. Owing to a small genome and diploid nature, foxtail millet is a tractable model crop for studying functional genomics of millets and bioenergy grasses. In this study, we examined nucleotide sequence diversity, geographic structure, and levels of linkage disequilibrium at four nuclear loci (ADH1, G3PDH, IGS1 and TPI1) in representative samples of 311 landrace accessions across its cultivated range. Higher levels of nucleotide sequence and haplotype diversity were observed in samples from China relative to other sampled regions. Genetic assignment analysis classified the accessions into seven clusters based on nucleotide sequence polymorphisms. Intralocus LD decayed rapidly to half the initial value within ~1.2 kb or less.

  7. Large-scale oscillation of structure-related DNA sequence features in human chromosome 21

    NASA Astrophysics Data System (ADS)

    Li, Wentian; Miramontes, Pedro

    2006-08-01

    Human chromosome 21 is the only chromosome in the human genome that exhibits oscillation of the (G+C) content of a cycle length of hundreds kilobases (kb) ( 500kb near the right telomere). We aim at establishing the existence of a similar periodicity in structure-related sequence features in order to relate this (G+C)% oscillation to other biological phenomena. The following quantities are shown to oscillate with the same 500kb periodicity in human chromosome 21: binding energy calculated by two sets of dinucleotide-based thermodynamic parameters, AA/TT and AAA/TTT bi- and tri-nucleotide density, 5'-TA-3' dinucleotide density, and signal for 10- or 11-base periodicity of AA/TT or AAA/TTT. These intrinsic quantities are related to structural features of the double helix of DNA molecules, such as base-pair binding, untwisting or unwinding, stiffness, and a putative tendency for nucleosome formation.

  8. Generation of sequence signatures from DNA amplification fingerprints with mini-hairpin and microsatellite primers.

    PubMed

    Caetano-Anollés, G; Gresshoff, P M

    1996-06-01

    DNA amplification fingerprinting (DAF) with mini-hairpins harboring arbitrary "core" sequences at their 3' termini were used to fingerprint a variety of templates, including PCR products and whole genomes, to establish genetic relationships between plant tax at the interspecific and intraspecific level, and to identify closely related fungal isolates and plant accessions. No correlation was observed between the sequence of the arbitrary core, the stability of the mini-hairpin structure and DAF efficiency. Mini-hairpin primers with short arbitrary cores and primers complementary to simple sequence repeats present in microsatellites were also used to generate arbitrary signatures from amplification profiles (ASAP). The ASAP strategy is a dual-step amplification procedure that uses at least one primer in each fingerprinting stage. ASAP was able to reproducibly amplify DAF products (representing about 10-15 kb of sequence) following careful optimization of amplification parameters such as primer and template concentration. Avoidance of primer sequences partially complementary to DAF product termini was necessary in order to produce distinct fingerprints. This allowed the combinatorial use of oligomers in nucleic acid screening, with numerous ASAP fingerprinting reactions based on a limited number of primer sequences. Mini-hairpin primers and ASAP analysis significantly increased detection of polymorphic DNA, separating closely related bermudagrass (Cynodon) cultivars and detecting putatively linked markers in bulked segregant analysis of the soybean (Glycine max) supernodulation (nitrate-tolerant symbiosis) locus.

  9. Modern Computational Techniques for the HMMER Sequence Analysis

    PubMed Central

    2013-01-01

    This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications—hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. PMID:25937944

  10. Pipeline for large-scale microdroplet bisulfite PCR-based sequencing allows the tracking of hepitype evolution in tumors.

    PubMed

    Herrmann, Alexander; Haake, Andrea; Ammerpohl, Ole; Martin-Guerrero, Idoia; Szafranski, Karol; Stemshorn, Kathryn; Nothnagel, Michael; Kotsopoulos, Steve K; Richter, Julia; Warner, Jason; Olson, Jeff; Link, Darren R; Schreiber, Stefan; Krawczak, Michael; Platzer, Matthias; Nürnberg, Peter; Siebert, Reiner; Hampe, Jochen

    2011-01-01

    Cytosine methylation provides an epigenetic level of cellular plasticity that is important for development, differentiation and cancerogenesis. We adopted microdroplet PCR to bisulfite treated target DNA in combination with second generation sequencing to simultaneously assess DNA sequence and methylation. We show measurement of methylation status in a wide range of target sequences (total 34 kb) with an average coverage of 95% (median 100%) and good correlation to the opposite strand (rho = 0.96) and to pyrosequencing (rho = 0.87). Data from lymphoma and colorectal cancer samples for SNRPN (imprinted gene), FGF6 (demethylated in the cancer samples) and HS3ST2 (methylated in the cancer samples) serve as a proof of principle showing the integration of SNP data and phased DNA-methylation information into "hepitypes" and thus the analysis of DNA methylation phylogeny in the somatic evolution of cancer.

  11. Complete sequences of IncHI1 plasmids carrying blaCTX-M-1 and qnrS1 in equine Escherichia coli provide new insights into plasmid evolution.

    PubMed

    Dolejska, Monika; Villa, Laura; Minoia, Marco; Guardabassi, Luca; Carattoli, Alessandra

    2014-09-01

    To determine the structure of two multidrug-resistant IncHI1 plasmids carrying blaCTX-M-1 in Escherichia coli isolates disseminated in an equine clinic in the Czech Republic. A complete nucleotide sequencing of 239 kb IncHI1 (pEQ1) and 287 kb IncHI1/X1 (pEQ2) plasmids was performed using the 454-Genome Sequencer FLX system. The sequences were compared using bioinformatic tools with other sequenced IncHI1 plasmids. A comparative analysis of pEQ1 and pEQ2 identified high nucleotide identity with the IncHI1 type 2 plasmids. A novel 24 kb module containing an operon involved in short-chain fructooligosaccharide uptake and metabolism was found in the pEQ backbones. The role of the pEQ plasmids in the metabolism of short-chain fructooligosaccharides was demonstrated by studying the growth of E. coli cells in the presence of these sugars. The module containing the blaCTX-M-1 gene was formed by a truncated macrolide resistance cluster and flanked by IS26 as previously observed in IncI1 and IncN plasmids. The IncHI1 plasmid changed size and gained the quinolone resistance gene qnrS1 as a result of IS26-mediated fusion with an IncX1 plasmid. Our data highlight the structure and evolution of IncHI1 from equine E. coli. A plasmid-mediated sugar metabolic element could play a key role in strain fitness, contributing to the successful dissemination and maintenance of these plasmids in the intestinal microflora of horses. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  12. De novo Assembly of a 40 Mb Eukaryotic Genome from Short Sequence Reads: Sordaria macrospora, a Model Organism for Fungal Morphogenesis

    PubMed Central

    Nowrousian, Minou; Stajich, Jason E.; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D.; Pöggeler, Stefanie; Read, Nick D.; Seiler, Stephan; Smith, Kristina M.; Zickler, Denise; Kück, Ulrich; Freitag, Michael

    2010-01-01

    Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30–90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in ∼4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for comparative

  13. De novo assembly of a 40 Mb eukaryotic genome from short sequence reads: Sordaria macrospora, a model organism for fungal morphogenesis.

    PubMed

    Nowrousian, Minou; Stajich, Jason E; Chu, Meiling; Engh, Ines; Espagne, Eric; Halliday, Karen; Kamerewerd, Jens; Kempken, Frank; Knab, Birgit; Kuo, Hsiao-Che; Osiewacz, Heinz D; Pöggeler, Stefanie; Read, Nick D; Seiler, Stephan; Smith, Kristina M; Zickler, Denise; Kück, Ulrich; Freitag, Michael

    2010-04-08

    Filamentous fungi are of great importance in ecology, agriculture, medicine, and biotechnology. Thus, it is not surprising that genomes for more than 100 filamentous fungi have been sequenced, most of them by Sanger sequencing. While next-generation sequencing techniques have revolutionized genome resequencing, e.g. for strain comparisons, genetic mapping, or transcriptome and ChIP analyses, de novo assembly of eukaryotic genomes still presents significant hurdles, because of their large size and stretches of repetitive sequences. Filamentous fungi contain few repetitive regions in their 30-90 Mb genomes and thus are suitable candidates to test de novo genome assembly from short sequence reads. Here, we present a high-quality draft sequence of the Sordaria macrospora genome that was obtained by a combination of Illumina/Solexa and Roche/454 sequencing. Paired-end Solexa sequencing of genomic DNA to 85-fold coverage and an additional 10-fold coverage by single-end 454 sequencing resulted in approximately 4 Gb of DNA sequence. Reads were assembled to a 40 Mb draft version (N50 of 117 kb) with the Velvet assembler. Comparative analysis with Neurospora genomes increased the N50 to 498 kb. The S. macrospora genome contains even fewer repeat regions than its closest sequenced relative, Neurospora crassa. Comparison with genomes of other fungi showed that S. macrospora, a model organism for morphogenesis and meiosis, harbors duplications of several genes involved in self/nonself-recognition. Furthermore, S. macrospora contains more polyketide biosynthesis genes than N. crassa. Phylogenetic analyses suggest that some of these genes may have been acquired by horizontal gene transfer from a distantly related ascomycete group. Our study shows that, for typical filamentous fungi, de novo assembly of genomes from short sequence reads alone is feasible, that a mixture of Solexa and 454 sequencing substantially improves the assembly, and that the resulting data can be used for

  14. De-novo RNA Sequencing and Metabolite Profiling to Identify Genes Involved in Anthocyanin Biosynthesis in Korean Black Raspberry (Rubus coreanus Miquel)

    PubMed Central

    Rim, Yeonggil; Kumar, Ritesh; Han, Xiao; Lee, Sang Yeol; Lee, Choong Hwan; Kim, Jae-Yean

    2014-01-01

    The Korean black raspberry (Rubus coreanus Miquel, KB) on ripening is usually consumed as fresh fruit, whereas the unripe KB has been widely used as a source of traditional herbal medicine. Such a stage specific utilization of KB has been assumed due to the changing metabolite profile during fruit ripening process, but so far molecular and biochemical changes during its fruit maturation are poorly understood. To analyze biochemical changes during fruit ripening process at molecular level, firstly, we have sequenced, assembled, and annotated the transcriptome of KB fruits. Over 4.86 Gb of normalized cDNA prepared from fruits was sequenced using Illumina HiSeq™ 2000, and assembled into 43,723 unigenes. Secondly, we have reported that alterations in anthocyanins and proanthocyanidins are the major factors facilitating variations in these stages of fruits. In addition, up-regulation of F3′H1, DFR4 and LDOX1 resulted in the accumulation of cyanidin derivatives during the ripening process of KB, indicating the positive relationship between the expression of anthocyanin biosynthetic genes and the anthocyanin accumulation. Furthermore, the ability of RcMCHI2 (R. coreanus Miquel chalcone flavanone isomerase 2) gene to complement Arabidopsis transparent testa 5 mutant supported the feasibility of our transcriptome library to provide the gene resources for improving plant nutrition and pigmentation. Taken together, these datasets obtained from transcriptome library and metabolic profiling would be helpful to define the gene-metabolite relationships in this non-model plant. PMID:24505466

  15. Sequence Analysis of APOA5 Among the Kuwaiti Population Identifies Association of rs2072560, rs2266788, and rs662799 With TG and VLDL Levels

    PubMed Central

    Jasim, Anfal A.; Al-Bustan, Suzanne A.; Al-Kandari, Wafa; Al-Serri, Ahmad; AlAskar, Huda

    2018-01-01

    Common variants of Apolipoprotein A5 (APOA5) have been associated with lipid levels yet very few studies have reported full sequence data from various ethnic groups. The purpose of this study was to analyse the full APOA5 gene sequence to identify variants in 100 healthy Kuwaitis of Arab ethnicities and assess their association with variation in lipid levels in a cohort of 733 samples. Sanger method was used in the direct sequencing of the full 3.7 Kb APOA5 and multiple sequence alignment was used to identify variants. The complete APOA5 sequence in Kuwaiti Arabs has been deposited in GenBank (KJ401315). A total of 20 reported single nucleotide polymorphisms (SNPs) were identified. Two novel SNPs were also identified: a synonymous 2197G>A polymorphism at genomic position 116661525 and a 3′ UTR 3222 C>T polymorphism at genomic position 116660500 based on human genome assembly GRCh37/hg:19. Five SNPs along with the two novel SNPs were selected for validation in the cohort. Association of those SNPs with lipid levels was tested and minor alleles of three SNPs (rs2072560, rs2266788, and rs662799) were found significantly associated with TG and VLDL levels. This is the first study to report the full APOA5 sequence and SNPs in an Arab ethnic group. Analysis of the variants identified and comparison to other populations suggests a distinctive genetic component in Arabs. The positive association observed for rs2072560 and rs2266788 with TG and VLDL levels confirms their role in lipid metabolism. PMID:29686695

  16. Sequence Analysis of APOA5 Among the Kuwaiti Population Identifies Association of rs2072560, rs2266788, and rs662799 With TG and VLDL Levels.

    PubMed

    Jasim, Anfal A; Al-Bustan, Suzanne A; Al-Kandari, Wafa; Al-Serri, Ahmad; AlAskar, Huda

    2018-01-01

    Common variants of Apolipoprotein A5 ( APOA 5) have been associated with lipid levels yet very few studies have reported full sequence data from various ethnic groups. The purpose of this study was to analyse the full APOA5 gene sequence to identify variants in 100 healthy Kuwaitis of Arab ethnicities and assess their association with variation in lipid levels in a cohort of 733 samples. Sanger method was used in the direct sequencing of the full 3.7 Kb APOA5 and multiple sequence alignment was used to identify variants. The complete APOA5 sequence in Kuwaiti Arabs has been deposited in GenBank (KJ401315). A total of 20 reported single nucleotide polymorphisms (SNPs) were identified. Two novel SNPs were also identified: a synonymous 2197G>A polymorphism at genomic position 116661525 and a 3' UTR 3222 C>T polymorphism at genomic position 116660500 based on human genome assembly GRCh37/hg:19. Five SNPs along with the two novel SNPs were selected for validation in the cohort. Association of those SNPs with lipid levels was tested and minor alleles of three SNPs (rs2072560, rs2266788, and rs662799) were found significantly associated with TG and VLDL levels. This is the first study to report the full APOA5 sequence and SNPs in an Arab ethnic group. Analysis of the variants identified and comparison to other populations suggests a distinctive genetic component in Arabs. The positive association observed for rs2072560 and rs2266788 with TG and VLDL levels confirms their role in lipid metabolism.

  17. Genome Calligrapher: A Web Tool for Refactoring Bacterial Genome Sequences for de Novo DNA Synthesis.

    PubMed

    Christen, Matthias; Deutsch, Samuel; Christen, Beat

    2015-08-21

    Recent advances in synthetic biology have resulted in an increasing demand for the de novo synthesis of large-scale DNA constructs. Any process improvement that enables fast and cost-effective streamlining of digitized genetic information into fabricable DNA sequences holds great promise to study, mine, and engineer genomes. Here, we present Genome Calligrapher, a computer-aided design web tool intended for whole genome refactoring of bacterial chromosomes for de novo DNA synthesis. By applying a neutral recoding algorithm, Genome Calligrapher optimizes GC content and removes obstructive DNA features known to interfere with the synthesis of double-stranded DNA and the higher order assembly into large DNA constructs. Subsequent bioinformatics analysis revealed that synthesis constraints are prevalent among bacterial genomes. However, a low level of codon replacement is sufficient for refactoring bacterial genomes into easy-to-synthesize DNA sequences. To test the algorithm, 168 kb of synthetic DNA comprising approximately 20 percent of the synthetic essential genome of the cell-cycle bacterium Caulobacter crescentus was streamlined and then ordered from a commercial supplier of low-cost de novo DNA synthesis. The successful assembly into eight 20 kb segments indicates that Genome Calligrapher algorithm can be efficiently used to refactor difficult-to-synthesize DNA. Genome Calligrapher is broadly applicable to recode biosynthetic pathways, DNA sequences, and whole bacterial genomes, thus offering new opportunities to use synthetic biology tools to explore the functionality of microbial diversity. The Genome Calligrapher web tool can be accessed at https://christenlab.ethz.ch/GenomeCalligrapher  .

  18. Structural and functional analysis of mouse Msx1 gene promoter: sequence conservation with human MSX1 promoter points at potential regulatory elements.

    PubMed

    Gonzalez, S M; Ferland, L H; Robert, B; Abdelhay, E

    1998-06-01

    Vertebrate Msx genes are related to one of the most divergent homeobox genes of Drosophila, the muscle segment homeobox (msh) gene, and are expressed in a well-defined pattern at sites of tissue interactions. This pattern of expression is conserved in vertebrates as diverse as quail, zebrafish, and mouse in a range of sites including neural crest, appendages, and craniofacial structures. In the present work, we performed structural and functional analyses in order to identify potential cis-acting elements that may be regulating Msx1 gene expression. To this end, a 4.9-kb segment of the 5'-flanking region was sequenced and analyzed for transcription-factor binding sites. Four regions showing a high concentration of these sites were identified. Transfection assays with fragments of regulatory sequences driving the expression of the bacterial lacZ reporter gene showed that a region of 4 kb upstream of the transcription start site contains positive and negative elements responsible for controlling gene expression. Interestingly, a fragment of 130 bp seems to contain the minimal elements necessary for gene expression, as its removal completely abolishes gene expression in cultured cells. These results are reinforced by comparison of this region with the human Msx1 gene promoter, which shows extensive conservation, including many consensus binding sites, suggesting a regulatory role for them.

  19. Extracellular proteins of Vibrio cholerae: molecular cloning, nucleotide sequence and characterization of the deoxyribonuclease (DNase) together with its periplasmic localization in Escherichia coli K-12.

    PubMed

    Focareta, T; Manning, P A

    1987-01-01

    The gene encoding the extracellular DNase of Vibrio cholerae was cloned into Escherichia coli K-12. A maximal coding region of 1.2 kb and a minimal region of 0.6 kb were determined by transposon mutagenesis and deletion analysis. The nucleotide sequence of this region contained a single open reading frame of 690 bp corresponding to a protein of Mr 26,389 with a typical N-terminal signal sequence of 18 aa which, when removed, would give a mature protein of Mr 24,163. This is in good agreement with the size of 24 kDa, calculated directly by Coomassie blue staining following sodium dodecyl sulphate-polyacrylamide gel electrophoresis and indirectly via a DNA-hydrolysis assay. The protein is located in the periplasmic space of E. coli K-12 unlike in V. cholerae where it is excreted into the extracellular medium. The introduction of the DNase gene into a periplasmic (tolA) leaky mutant of E. coli K-12 facilitates the release of the protein, further confirming the periplasmic location.

  20. Genetic variation in RPS6KA1, RPS6KA2, RPS6KB1, RPS6KB2, and PDK1 and risk of colon or rectal cancer

    PubMed Central

    Slattery, Martha L.; Lundgreen, Abbie; Herrick, Jennifer S.; Wolff, Roger K.

    2010-01-01

    RPS6KA1, RPS6KA2, RPS6KB1, RPS6KB2, and PDK1 are involved in several pathways central to the carcinogenic process, including regulation of cell growth, insulin, and inflammation. We evaluated genetic variation in their candidate genes to obtain a better understanding of their association with colon and rectal cancer. We used data from two population-based case-control studies of colon (n=1574 cases, 1940 controls) and rectal (n=791 cases, 999 controls) cancer. We observed genetic variation in RPS6KA1, RPS6KA2, and PRS6KB2 were associated with risk of developing colon cancer while only genetic variation in RPS6KA2 was associated with altering risk of rectal cancer. These genes also interacted significantly with other genes operating in similar mechanisms, including Akt1, FRAP1, NFκB1, and PIK3CA. Assessment of tumor markers indicated that these genes and this pathway may importantly contributed to CIMP+ tumors and tumors with KRAS2 mutations. Our findings implicate these candidate genes in the etiology of colon and rectal cancer and provide information on how these genes operate with other genes in the pathway. Our data further suggest that this pathway may lead to CIMP+ and KRAS2-mutated tumors. PMID:21035469

  1. SOBA: sequence ontology bioinformatics analysis.

    PubMed

    Moore, Barry; Fan, Guozhen; Eilbeck, Karen

    2010-07-01

    The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.

  2. rsfMRI effects of KB220Z™ on Neural Pathways in Reward Circuitry of Abstinent Genotyped Heroin Addicts

    PubMed Central

    Blum, Kenneth; Liu, Yijun; Wang, Wei; Wang, Yarong; Zhang, Yi; Oscar-Berman, Marlene; Smolen, Andrew; Febo, Marcelo; Han, David; Simpatico, Thomas; Cronjé, Frans J; Demetrovics, Zsolt; Gold, Mark S.

    2016-01-01

    Recently Willuhn et al. reported that cocaine use and even non-substance related addictive behavior, increases, as dopaminergic function is reduced. Chronic cocaine exposure has been associated with decreases in D2/D3 receptors, also associated with lower activation to cues in occipital cortex and cerebellum in a recent PET study from Volkow’s group. Therefore, treatment strategies, like dopamine agonist therapy, that might conserve dopamine function may be an interesting approach to relapse prevention in psychoactive drug and behavioral addictions. To this aim, we evaluated the effect of KB220Z™ on reward circuitry of ten heroin addicts undergoing protracted abstinence, an average 16.9 months. In a randomized placebo-controlled crossover study of KB220Z™ five subjects completed a triple blinded–experiment in which the subject, the person administering the treatment and the person evaluating the response to treatment were blinded as to which treatment any particular subject was receiving. In addition, nine subjects total were genotyped utilizing the GARSRX™ test. We preliminarily report that KB220Z ™ induced an increase in BOLD activation in caudate-accumbens-dopaminergic pathways compared to placebo following one-hour acute administration. Furthermore, KB220Z™ also reduced resting state activity in the putamen of abstinent heroin addicts. In the second phase of this pilot study of all ten abstinent heroin-dependent subjects, three brain regions of interest (ROIs) we observed to be significantly activated from resting state by KB220Z compared to placebo (P < 0.05). Increased functional connectivity was observed in a putative network that included the dorsal anterior cingulate, medial frontal gyrus, nucleus accumbens, posterior cingulate, occipital cortical areas and cerebellum. These results and other qEEG study results suggest a putative anti-craving/anti-relapse role for KB220Z in addiction by direct or indirect dopaminergic interaction. Due to

  3. The complete genomic sequence of egg drop syndrome virus strain AAV-2.

    PubMed

    Jin, Q; Zeng, L; Yang, F; Li, M; Hou, Y

    1999-12-01

    In the search for the genome of egg drop syndrome virus (EDSV-76) Chinese strain AAV-2, part of restriction endonuclease physical map is analyzed, the complete genomic library is organized. On basis of this, the complete genome nucleotide sequences (32 838 bp in length, including terminal structures) are determined. The data analysis shows: compared with the other Adenoviruses, strain AAV-2 has more disparity on genomic structure and the distribution of open reading frame (ORF). There are no clear E1, E3 and E4 regions in AAV-2 genome. Two segments located at both ends of genome (1.1 kb and 8.3 kb in length respectively) have no homology with the other adenovirus genomes. In addition, strain AAV-2 genome lacks ORFs encoding ElA, pV and pIX, which are common ORFs encoding early, lately proteins in Adenovirus. This reveals differences between EDSA-76, the sole standard strain of group III Avian Adenoviruses, and the other Avian Adenoviruses for the first time. It will help the search for Avian Adenovirus and will also help the search of all Adenoviruses.

  4. Image sequence analysis workstation for multipoint motion analysis

    NASA Astrophysics Data System (ADS)

    Mostafavi, Hassan

    1990-08-01

    This paper describes an application-specific engineering workstation designed and developed to analyze motion of objects from video sequences. The system combines the software and hardware environment of a modem graphic-oriented workstation with the digital image acquisition, processing and display techniques. In addition to automation and Increase In throughput of data reduction tasks, the objective of the system Is to provide less invasive methods of measurement by offering the ability to track objects that are more complex than reflective markers. Grey level Image processing and spatial/temporal adaptation of the processing parameters is used for location and tracking of more complex features of objects under uncontrolled lighting and background conditions. The applications of such an automated and noninvasive measurement tool include analysis of the trajectory and attitude of rigid bodies such as human limbs, robots, aircraft in flight, etc. The system's key features are: 1) Acquisition and storage of Image sequences by digitizing and storing real-time video; 2) computer-controlled movie loop playback, freeze frame display, and digital Image enhancement; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored Image sequence; 4) model-based estimation and tracking of the six degrees of freedom of a rigid body: 5) field-of-view and spatial calibration: 6) Image sequence and measurement data base management; and 7) offline analysis software for trajectory plotting and statistical analysis.

  5. ChloroKB: A Web Application for the Integration of Knowledge Related to Chloroplast Metabolic Network1[OPEN

    PubMed Central

    Gloaguen, Pauline; Alban, Claude; Ravanel, Stéphane; Seigneurin-Berny, Daphné; Matringe, Michel; Ferro, Myriam; Bruley, Christophe; Rolland, Norbert; Vandenbrouck, Yves

    2017-01-01

    Higher plants, as autotrophic organisms, are effective sources of molecules. They hold great promise for metabolic engineering, but the behavior of plant metabolism at the network level is still incompletely described. Although structural models (stoichiometry matrices) and pathway databases are extremely useful, they cannot describe the complexity of the metabolic context, and new tools are required to visually represent integrated biocurated knowledge for use by both humans and computers. Here, we describe ChloroKB, a Web application (http://chlorokb.fr/) for visual exploration and analysis of the Arabidopsis (Arabidopsis thaliana) metabolic network in the chloroplast and related cellular pathways. The network was manually reconstructed through extensive biocuration to provide transparent traceability of experimental data. Proteins and metabolites were placed in their biological context (spatial distribution within cells, connectivity in the network, participation in supramolecular complexes, and regulatory interactions) using CellDesigner software. The network contains 1,147 reviewed proteins (559 localized exclusively in plastids, 68 in at least one additional compartment, and 520 outside the plastid), 122 proteins awaiting biochemical/genetic characterization, and 228 proteins for which genes have not yet been identified. The visual presentation is intuitive and browsing is fluid, providing instant access to the graphical representation of integrated processes and to a wealth of refined qualitative and quantitative data. ChloroKB will be a significant support for structural and quantitative kinetic modeling, for biological reasoning, when comparing novel data with established knowledge, for computer analyses, and for educational purposes. ChloroKB will be enhanced by continuous updates following contributions from plant researchers. PMID:28442501

  6. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    PubMed Central

    Matochko, Wadim L.; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  7. Identification of Small Exonic CNV from Whole-Exome Sequence Data and Application to Autism Spectrum Disorder

    PubMed Central

    Poultney, Christopher S.; Goldberg, Arthur P.; Drapeau, Elodie; Kou, Yan; Harony-Nicolas, Hala; Kajiwara, Yuji; De Rubeis, Silvia; Durand, Simon; Stevens, Christine; Rehnström, Karola; Palotie, Aarno; Daly, Mark J.; Ma’ayan, Avi; Fromer, Menachem; Buxbaum, Joseph D.

    2013-01-01

    Copy number variation (CNV) is an important determinant of human diversity and plays important roles in susceptibility to disease. Most studies of CNV carried out to date have made use of chromosome microarray and have had a lower size limit for detection of about 30 kilobases (kb). With the emergence of whole-exome sequencing studies, we asked whether such data could be used to reliably call rare exonic CNV in the size range of 1–30 kilobases (kb), making use of the eXome Hidden Markov Model (XHMM) program. By using both transmission information and validation by molecular methods, we confirmed that small CNV encompassing as few as three exons can be reliably called from whole-exome data. We applied this approach to an autism case-control sample (n = 811, mean per-target read depth = 161) and observed a significant increase in the burden of rare (MAF ≤1%) 1–30 kb CNV, 1–30 kb deletions, and 1–10 kb deletions in ASD. CNV in the 1–30 kb range frequently hit just a single gene, and we were therefore able to carry out enrichment and pathway analyses, where we observed enrichment for disruption of genes in cytoskeletal and autophagy pathways in ASD. In summary, our results showed that XHMM provided an effective means to assess small exonic CNV from whole-exome data, indicated that rare 1–30 kb exonic deletions could contribute to risk in up to 7% of individuals with ASD, and implicated a candidate pathway in developmental delay syndromes. PMID:24094742

  8. KB3D Reference Manual. Version 1.a

    NASA Technical Reports Server (NTRS)

    Munoz, Cesar; Siminiceanu, Radu; Carreno, Victor A.; Dowek, Gilles

    2005-01-01

    This paper is a reference manual describing the implementation of the KB3D conflict detection and resolution algorithm. The algorithm has been implemented in the Java and C++ programming languages. The reference manual gives a short overview of the detection and resolution functions, the structural implementation of the program, inputs and outputs to the program, and describes how the program is used. Inputs to the program can be rectangular coordinates or geodesic coordinates. The reference manual also gives examples of conflict scenarios and the resolution outputs the program produces.

  9. Development of simple sequence repeat markers and diversity analysis in alfalfa (Medicago sativa L.).

    PubMed

    Wang, Zan; Yan, Hongwei; Fu, Xinnian; Li, Xuehui; Gao, Hongwen

    2013-04-01

    Efficient and robust molecular markers are essential for molecular breeding in plant. Compared to dominant and bi-allelic markers, multiple alleles of simple sequence repeat (SSR) markers are particularly informative and superior in genetic linkage map and QTL mapping in autotetraploid species like alfalfa. The objective of this study was to enrich SSR markers directly from alfalfa expressed sequence tags (ESTs). A total of 12,371 alfalfa ESTs were retrieved from the National Center for Biotechnology Information. Total 774 SSR-containing ESTs were identified from 716 ESTs. On average, one SSR was found per 7.7 kb of EST sequences. Tri-nucleotide repeats (48.8 %) was the most abundant motif type, followed by di-(26.1 %), tetra-(11.5 %), penta-(9.7 %), and hexanucleotide (3.9 %). One hundred EST-SSR primer pairs were successfully designed and 29 exhibited polymorphism among 28 alfalfa accessions. The allele number per marker ranged from two to 21 with an average of 6.8. The PIC values ranged from 0.195 to 0.896 with an average of 0.608, indicating a high level of polymorphism of the EST-SSR markers. Based on the 29 EST-SSR markers, assessment of genetic diversity was conducted and found that Medicago sativa ssp. sativa was clearly different from the other subspecies. The high transferability of those EST-SSR markers was also found for relative species.

  10. PISMA: A Visual Representation of Motif Distribution in DNA Sequences

    PubMed Central

    Alcántara-Silva, Rogelio; Alvarado-Hermida, Moisés; Díaz-Contreras, Gibrán; Sánchez-Barrios, Martha; Carrera, Samantha; Galván, Silvia Carolina

    2017-01-01

    Background: Because the graphical presentation and analysis of motif distribution can provide insights for experimental hypothesis, PISMA aims at identifying motifs on DNA sequences, counting and showing them graphically. The motif length ranges from 2 to 10 bases, and the DNA sequences range up to 10 kb. The motif distribution is shown as a bar-code–like, as a gene-map–like, and as a transcript scheme. Results: We obtained graphical schemes of the CpG site distribution from 91 human papillomavirus genomes. Also, we present 2 analyses: one of DNA motifs associated with either methylation-resistant or methylation-sensitive CpG islands and another analysis of motifs associated with exosome RNA secretion. Availability and Implementation: PISMA is developed in Java; it is executable in any type of hardware and in diverse operating systems. PISMA is freely available to noncommercial users. The English version and the User Manual are provided in Supplementary Files 1 and 2, and a Spanish version is available at www.biomedicas.unam.mx/wp-content/software/pisma.zip and www.biomedicas.unam.mx/wp-content/pdf/manual/pisma.pdf. PMID:28469418

  11. Nucleotide sequence of the coat protein gene of Lettuce big-vein virus.

    PubMed

    Sasaya, T; Ishikawa, K; Koganezawa, H

    2001-06-01

    A sequence of 1425 nt was established that included the complete coat protein (CP) gene of Lettuce big-vein virus (LBVV). The LBVV CP gene encodes a 397 amino acid protein with a predicted M(r) of 44486. Antisera raised against synthetic peptides corresponding to N-terminal or C-terminal parts of the LBVV CP reacted in Western blot analysis with a protein with an M(r) of about 48000. RNA extracted from purified particles of LBVV by using proteinase K, SDS and phenol migrated in gels as two single-stranded RNA species of approximately 7.3 kb (ss-1) and 6.6 kb (ss-2). After denaturation by heat and annealing at room temperature, the RNA migrated as four species, ss-1, ss-2 and two additional double-stranded RNAs (ds-1 and ds-2). The Northern blot hybridization analysis using riboprobes from a full-length clone of the LBVV CP gene indicated that ss-2 has a negative-sense nature and contains the LBVV CP gene. Moreover, ds-2 is a double-stranded form of ss-2. Database searches showed that the LBVV CP most resembled the nucleocapsid proteins of rhabdoviruses. These results indicate that it would be appropriate to classify LBVV as a negative-sense single-stranded RNA virus rather than as a double-stranded RNA virus.

  12. The Composite 259-kb Plasmid of Martelella mediterranea DSM 17316T–A Natural Replicon with Functional RepABC Modules from Rhodobacteraceae and Rhizobiaceae

    PubMed Central

    Bartling, Pascal; Brinkmann, Henner; Bunk, Boyke; Overmann, Jörg; Göker, Markus; Petersen, Jörn

    2017-01-01

    A multipartite genome organization with a chromosome and many extrachromosomal replicons (ECRs) is characteristic for Alphaproteobacteria. The best investigated ECRs of terrestrial rhizobia are the symbiotic plasmids for legume root nodulation and the tumor-inducing (Ti) plasmid of Agrobacterium tumefaciens. RepABC plasmids represent the most abundant alphaproteobacterial replicon type. The currently known homologous replication modules of rhizobia and Rhodobacteraceae are phylogenetically distinct. In this study, we surveyed type-strain genomes from the One Thousand Microbial Genomes (KMG-I) project and identified a roseobacter-specific RepABC-type operon in the draft genome of the marine rhizobium Martelella mediterranea DSM 17316T. PacBio genome sequencing demonstrated the presence of three circular ECRs with sizes of 593, 259, and 170-kb. The rhodobacteral RepABC module is located together with a rhizobial equivalent on the intermediate sized plasmid pMM259, which likely originated in the fusion of a pre-existing rhizobial ECR with a conjugated roseobacter plasmid. Further evidence for horizontal gene transfer (HGT) is given by the presence of a roseobacter-specific type IV secretion system on the 259-kb plasmid and the rhodobacteracean origin of 62% of the genes on this plasmid. Functionality tests documented that the genuine rhizobial RepABC module from the Martelella 259-kb plasmid is only maintained in A. tumefaciens C58 (Rhizobiaceae) but not in Phaeobacter inhibens DSM 17395 (Rhodobacteraceae). Unexpectedly, the roseobacter-like replication system is functional and stably maintained in both host strains, thus providing evidence for a broader host range than previously proposed. In conclusion, pMM259 is the first example of a natural plasmid that likely mediates genetic exchange between roseobacters and rhizobia. PMID:28983283

  13. Information theory applications for biological sequence analysis.

    PubMed

    Vinga, Susana

    2014-05-01

    Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.

  14. Terminal region sequence variations in variola virus DNA.

    PubMed

    Massung, R F; Loparev, V N; Knight, J C; Totmenin, A V; Chizhikov, V E; Parsons, J M; Safronov, P F; Gutorov, V V; Shchelkunov, S N; Esposito, J J

    1996-07-15

    Genome DNA terminal region sequences were determined for a Brazilian alastrim variola minor virus strain Garcia-1966 that was associated with an 0.8% case-fatality rate and African smallpox strains Congo-1970 and Somalia-1977 associated with variola major (9.6%) and minor (0.4%) mortality rates, respectively. A base sequence identity of > or = 98.8% was determined after aligning 30 kb of the left- or right-end region sequences with cognate sequences previously determined for Asian variola major strains India-1967 (31% death rate) and Bangladesh-1975 (18.5% death rate). The deduced amino acid sequences of putative proteins of > or = 65 amino acids also showed relatively high identity, although the Asian and African viruses were clearly more related to each other than to alastrim virus. Alastrim virus contained only 10 of 70 proteins that were 100% identical to homologs in Asian strains, and 7 alastrim-specific proteins were noted.

  15. Biological Control Potential of Bacillus amyloliquefaciens KB3 Isolated from the Feces of Allomyrina dichotoma Larvae

    PubMed Central

    Nam, Hyo-Song; Yang, Hyun-Ju; Oh, Byung Jun; Anderson, Anne J.; Kim, Young Cheol

    2016-01-01

    Most biocontrol agents for plant diseases have been isolated from sources such as soils and plants. As an alternative source, we examined the feces of tertiary larvae of the herbivorous rhino beetle, Allomyrina dichotoma for presence of biocontrol-active microbes. The initial screen was performed to detect antifungal activity against two common fungal plant pathogens. The strain with strongest antifungal activity was identified as Bacillus amyloliquefaciens KB3. The inhibitory activity of this strain correlated with lipopeptide productions, including iturin A and surfactin. Production of these surfactants in the KB3 isolate varied with the culture phase and growth medium used. In planta biocontrol activities of cell-free culture filtrates of KB3 were similar to those of the commercial biocontrol agent, B. subtilis QST-713. These results support the presence of microbes with the potential to inhibit fungal growth, such as plant pathogens, in diverse ecological niches. PMID:27298603

  16. The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding.

    PubMed

    Shirasawa, Kenta; Isuzugawa, Kanji; Ikenaga, Mitsunobu; Saito, Yutaro; Yamamoto, Toshiya; Hirakawa, Hideki; Isobe, Sachiko

    2017-10-01

    We determined the genome sequence of sweet cherry (Prunus avium) using next-generation sequencing technology. The total length of the assembled sequences was 272.4 Mb, consisting of 10,148 scaffold sequences with an N50 length of 219.6 kb. The sequences covered 77.8% of the 352.9 Mb sweet cherry genome, as estimated by k-mer analysis, and included >96.0% of the core eukaryotic genes. We predicted 43,349 complete and partial protein-encoding genes. A high-density consensus map with 2,382 loci was constructed using double-digest restriction site-associated DNA sequencing. Comparing the genetic maps of sweet cherry and peach revealed high synteny between the two genomes; thus the scaffolds were integrated into pseudomolecules using map- and synteny-based strategies. Whole-genome resequencing of six modern cultivars found 1,016,866 SNPs and 162,402 insertions/deletions, out of which 0.7% were deleterious. The sequence variants, as well as simple sequence repeats, can be used as DNA markers. The genomic information helps us to identify agronomically important genes and will accelerate genetic studies and breeding programs for sweet cherries. Further information on the genomic sequences and DNA markers is available in DBcherry (http://cherry.kazusa.or.jp (8 May 2017, date last accessed)). © The Author 2017. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  17. Induction of necrosis and apoptosis to KB cancer cells by sanguinarine is associated with reactive oxygen species production and mitochondrial membrane depolarization

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chang, M.-C.; Chan, C.-P.; Wang, Y.-J.

    2007-01-15

    Sanguinarine is a benzopheanthridine alkaloid present in the root of Sanguinaria canadensis L. and Chellidonium majus L. In this study, sanguinarine (2 and 3 {mu}M) exhibited cytotoxicity to KB cancer cells by decreasing MTT reduction to 83% and 52% of control after 24-h of exposure. Sanguinarine also inhibited the colony forming capacity (> 52-58%) and growth of KB cancer cells at concentrations higher than 0.5-1 {mu}M. Short-term exposure to sanguinarine (> 0.5 {mu}M) effectively suppressed the adhesion of KB cells to collagen and fibronectin (FN). Sanguinarine (2 and 3 {mu}M) induced evident apoptosis as indicated by an increase in sub-G0/G1more » populations, which was detected after 6-h of exposure. Only a slight increase in cells arresting in S-phase and G2/M was noted. Induction of KB cell apoptosis and necrosis by sanguinarine (2 and 3 {mu}M) was further confirmed by Annexin V-PI dual staining flow cytometry and the presence of DNA fragmentation. The cytotoxicity by sanguinarine was accompanied by an increase in production of reactive oxygen species (ROS) and depolarization of mitochondrial membrane potential as indicated by single cell flow cytometric analysis of DCF and rhodamine fluorescence. NAC (1 and 3 mM) and catalase (2000 U/ml) prevented the sanguinarine-induced ROS production and cytotoxicity, whereas dimethylthiourea (DMT) showed no marked preventive effect. These results suggest that sanguinarine has anticarcinogenic properties with induction of ROS production and mitochondrial membrane depolarization, which mediate cancer cell death.« less

  18. Identification of Cannabis sativa L. using the 1-kbTHCA synthase-fluorescence in situ hybridization probe.

    PubMed

    Jeangkhwoa, Pattraporn; Bandhaya, Achirapa; Umpunjun, Puangpaka; Chuenboonngarm, Ngarmnij; Panvisavas, Nathinee

    2017-03-01

    This study reports a successful application of fluorescence in situ hybridization (FISH) technique in the identification of Cannabis sativa L. cells recovered from fresh and dried powdered plant materials. Two biotin-16-dUTP-labeled FISH probes were designed from the Cannabis-specific tetrahydrocannabinolic acid synthase (THCAS) gene and the ITS region of the 45S rRNA gene. Specificity of probe-target hybridization was tested against the target and 4 non-target plant species, i.e., Humulus lupulus, Mitragyna speciosa, Papaver sp., and Nicotiana tabacum. The 1-kb THCA synthase hybridization probe gave Cannabis-specific hybridization signals, unlike the 700-bp Cannabis-ITS hybridization probe. Probe-target hybridization was also confirmed against 20 individual Cannabis plant samples. The 1-kb THCA synthase and 700-bp Cannabis-ITS hybridization probes clearly showed 2 hybridization signals per cell with reproducibility. The 1-kb THCA synthase probe did not give any FISH signal when tested against H. lupulus, its closely related member of the Canabaceae family. It was also showed that 1-kb THCA synthase FISH probe can be applied to identify small amount of dried powdered Cannabis material with an addition of rehydration step prior to the experimental process. This study provided an alternative identification method for Cannabis trace. Copyright © 2016. Published by Elsevier B.V.

  19. Sequence analysis by iterated maps, a review.

    PubMed

    Almeida, Jonas S

    2014-05-01

    Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.

  20. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans

    PubMed Central

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-01-01

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. PMID:26199191

  1. Influence of flanking sequences on variability in expression levels of an introduced gene in transgenic tobacco plants.

    PubMed Central

    Dean, C; Jones, J; Favreau, M; Dunsmuir, P; Bedbrook, J

    1988-01-01

    The petunia rbcS gene SSU301 was introduced into tobacco using Agrobacterium tumefaciens-mediated transformation. The time at which rbcS expression was maximal after transfer of the tobacco plants to the greenhouse was determined. The expression level of the SSU301 gene varied up to 9 fold between individual tobacco plants which had been standardized physiologically as much as possible. The presence of adjacent pUC plasmid sequences did not affect the expression of the SSU301 gene. In an attempt to reduce the between-transformant variability in expression, the SSU301 gene was introduced into tobacco surrounded by 10kb of 5' and 13 kb of 3' DNA sequences which normally flank SSU301 in petunia. The longer flanking regions did not reduce the between-transformant variability of SSU301 gene expression. Images PMID:3174450

  2. Identification, cloning, and sequencing of a fragment of Amsacta moorei entomopoxvirus DNA containing the spheroidin gene and three vaccinia virus-related open reading frames.

    PubMed Central

    Hall, R L; Moyer, R W

    1991-01-01

    Entomopoxvirus virions are frequently contained within crystalline occlusion bodies, which are composed of primarily a single protein, spheroidin, which is analogous to the polyhedrin protein of baculovirus. The spheroidin gene of Amsacta moorei entomopoxvirus was identified following the microsequencing of polypeptides generated from cyanogen bromide treatment of spheroidin and the subsequent synthesis of oligonucleotide hybridization probes. DNA sequencing of a 6.8-kb region of DNA containing the spheroidin gene showed that the spheroidin protein is derived from a 3.0-kb open reading frame potentially encoding a protein of 115 kDa. Three copies of the heptanucleotide, TTTTTNT, a sequence associated with early gene transcription in the vertebrate poxviruses, and four in-frame translational termination signals were found within 60 bp upstream of the putative spheroidin gene promoter (TAAATG). The spheroidin gene promoter region contains the sequence TAAATG, which is found in many late promoters of the vertebrate poxviruses and which serves as the site of transcriptional initiation, as shown by primer extension. Primer extension experiments also showed that spheroidin gene transcripts contain 5' poly(A) sequences typical of vertebrate poxvirus late transcripts. The 92 bases upstream of the initiating TAAATG are unusually A + T rich and contain only 7 G or C residues. An analysis of open reading frames around the spheroidin gene suggests that the colinear core of "essential genes" typical of the vertebrate poxviruses is absent in A. moorei entomopoxvirus. Images PMID:1942245

  3. Comparative Sequence Analysis of Multidrug-Resistant IncA/C Plasmids from Salmonella enterica.

    PubMed

    Hoffmann, Maria; Pettengill, James B; Gonzalez-Escalona, Narjol; Miller, John; Ayers, Sherry L; Zhao, Shaohua; Allard, Marc W; McDermott, Patrick F; Brown, Eric W; Monday, Steven R

    2017-01-01

    Determinants of multidrug resistance (MDR) are often encoded on mobile elements, such as plasmids, transposons, and integrons, which have the potential to transfer among foodborne pathogens, as well as to other virulent pathogens, increasing the threats these traits pose to human and veterinary health. Our understanding of MDR among Salmonella has been limited by the lack of closed plasmid genomes for comparisons across resistance phenotypes, due to difficulties in effectively separating the DNA of these high-molecular weight, low-copy-number plasmids from chromosomal DNA. To resolve this problem, we demonstrate an efficient protocol for isolating, sequencing and closing IncA/C plasmids from Salmonella sp. using single molecule real-time sequencing on a Pacific Biosciences (Pacbio) RS II Sequencer. We obtained six Salmonella enterica isolates from poultry, representing six different serovars, each exhibiting the MDR-Ampc resistance profile. Salmonella plasmids were obtained using a modified mini preparation and transformed with Escherichia coli DH10Br. A Qiagen Large-Construct kit™ was used to recover highly concentrated and purified plasmid DNA that was sequenced using PacBio technology. These six closed IncA/C plasmids ranged in size from 104 to 191 kb and shared a stable, conserved backbone containing 98 core genes, with only six differences among those core genes. The plasmids encoded a number of antimicrobial resistance genes, including those for quaternary ammonium compounds and mercury. We then compared our six IncA/C plasmid sequences: first with 14 IncA/C plasmids derived from S. enterica available at the National Center for Biotechnology Information (NCBI), and then with an additional 38 IncA/C plasmids derived from different taxa. These comparisons allowed us to build an evolutionary picture of how antimicrobial resistance may be mediated by this common plasmid backbone. Our project provides detailed genetic information about resistance genes in

  4. Anti and Androgenic Activities in MDA-KB2 Cells: A Comparison of Performance in 96 Well Versus HTS Assays

    EPA Science Inventory

    We developed the MDA-kb2 cell line to screen androgen agonists/antagonists (Wilson et al., ToxSci 66:69, 2002). MDA-kb2 has been used to quantify anti- and androgenic activities of chemicals, mixtures, combustion by-products, oil dispersants and waste, source and drinking water s...

  5. Complete Genome Sequence of the Quality Control Strain Staphylococcus aureus subsp. aureus ATCC 25923

    PubMed Central

    Treangen, Todd J.; Maybank, Rosslyn A.; Enke, Sana; Friss, Mary Beth; Diviak, Lynn F.; Karaolis, David K. R.; Koren, Sergey; Ondov, Brian; Phillippy, Adam M.; Bergman, Nicholas H.

    2014-01-01

    Staphylococcus aureus subsp. aureus ATCC 25923 is commonly used as a control strain for susceptibility testing to antibiotics and as a quality control strain for commercial products. We present the completed genome sequence for the strain, consisting of the chromosome and a 27.5-kb plasmid. PMID:25377701

  6. The 2.1-kb inverted repeat DNA sequences flank the mat2,3 silent region in two species of Schizosaccharomyces and are involved in epigenetic silencing in Schizosaccharomyces pombe.

    PubMed Central

    Singh, Gurjeet; Klar, Amar J S

    2002-01-01

    The mat2,3 region of the fission yeast Schizosaccharomyces pombe exhibits a phenomenon of transcriptional silencing. This region is flanked by two identical DNA sequence elements, 2.1 kb in length, present in inverted orientation: IRL on the left and IRR on the right of the silent region. The repeats do not encode any ORF. The inverted repeat DNA region is also present in a newly identified related species, which we named S. kambucha. Interestingly, the left and right repeats share perfect identity within a species, but show approximately 2% bases interspecies variation. Deletion of IRL results in variegated expression of markers inserted in the silent region, while deletion of the IRR causes their derepression. When deletions of these repeats were genetically combined with mutations in different trans-acting genes previously shown to cause a partial defect in silencing, only mutations in clr1 and clr3 showed additive defects in silencing with the deletion of IRL. The rate of mat1 switching is also affected by deletion of repeats. The IRL or IRR deletion did not cause significant derepression of the mat2 or mat3 loci. These results implicate repeats for maintaining full repression of the mat2,3 region, for efficient mat1 switching, and further support the notion that multiple pathways cooperate to silence the mat2,3 domain. PMID:12399374

  7. Illumina Synthetic Long Read Sequencing Allows Recovery of Missing Sequences even in the “Finished” C. elegans Genome

    PubMed Central

    Li, Runsheng; Hsieh, Chia-Ling; Young, Amanda; Zhang, Zhihong; Ren, Xiaoliang; Zhao, Zhongying

    2015-01-01

    Most next-generation sequencing platforms permit acquisition of high-throughput DNA sequences, but the relatively short read length limits their use in genome assembly or finishing. Illumina has recently released a technology called Synthetic Long-Read Sequencing that can produce reads of unusual length, i.e., predominately around 10 Kb. However, a systematic assessment of their use in genome finishing and assembly is still lacking. We evaluate the promise and deficiency of the long reads in these aspects using isogenic C. elegans genome with no gap. First, the reads are highly accurate and capable of recovering most types of repetitive sequences. However, the presence of tandem repetitive sequences prevents pre-assembly of long reads in the relevant genomic region. Second, the reads are able to reliably detect missing but not extra sequences in the C. elegans genome. Third, the reads of smaller size are more capable of recovering repetitive sequences than those of bigger size. Fourth, at least 40 Kbp missing genomic sequences are recovered in the C. elegans genome using the long reads. Finally, an N50 contig size of at least 86 Kbp can be achieved with 24×reads but with substantial mis-assembly errors, highlighting a need for novel assembly algorithm for the long reads. PMID:26039588

  8. Identification of small exonic CNV from whole-exome sequence data and application to autism spectrum disorder.

    PubMed

    Poultney, Christopher S; Goldberg, Arthur P; Drapeau, Elodie; Kou, Yan; Harony-Nicolas, Hala; Kajiwara, Yuji; De Rubeis, Silvia; Durand, Simon; Stevens, Christine; Rehnström, Karola; Palotie, Aarno; Daly, Mark J; Ma'ayan, Avi; Fromer, Menachem; Buxbaum, Joseph D

    2013-10-03

    Copy number variation (CNV) is an important determinant of human diversity and plays important roles in susceptibility to disease. Most studies of CNV carried out to date have made use of chromosome microarray and have had a lower size limit for detection of about 30 kilobases (kb). With the emergence of whole-exome sequencing studies, we asked whether such data could be used to reliably call rare exonic CNV in the size range of 1-30 kilobases (kb), making use of the eXome Hidden Markov Model (XHMM) program. By using both transmission information and validation by molecular methods, we confirmed that small CNV encompassing as few as three exons can be reliably called from whole-exome data. We applied this approach to an autism case-control sample (n = 811, mean per-target read depth = 161) and observed a significant increase in the burden of rare (MAF ≤1%) 1-30 kb CNV, 1-30 kb deletions, and 1-10 kb deletions in ASD. CNV in the 1-30 kb range frequently hit just a single gene, and we were therefore able to carry out enrichment and pathway analyses, where we observed enrichment for disruption of genes in cytoskeletal and autophagy pathways in ASD. In summary, our results showed that XHMM provided an effective means to assess small exonic CNV from whole-exome data, indicated that rare 1-30 kb exonic deletions could contribute to risk in up to 7% of individuals with ASD, and implicated a candidate pathway in developmental delay syndromes. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  9. Glaucarubinone sensitizes KB cells to paclitaxel by inhibiting ABC transporters via ROS-dependent and p53-mediated activation of apoptotic signaling pathways

    PubMed Central

    Karthikeyan, Subburayan; Hoti, Sugeerappa Laxmanappa; Nazeer, Yasin; Hegde, Harsha Vasudev

    2016-01-01

    Multidrug resistance (MDR) is considered to be the major contributor to failure of chemotherapy in oral squamous cell carcinoma (SCC). This study was aimed to explore the effects and mechanisms of glaucarubinone (GLU), one of the major quassinoids from Simarouba glauca DC, in potentiating cytotoxicity of paclitaxel (PTX), an anticancer drug in KB cells. Our data showed that the administration of GLU pre-treatment significantly enhanced PTX anti-proliferative effect in ABCB1 over-expressing KB cells. The Rh 123 drug efflux studies revealed that there was a significant transport function inhibition by GLU-PTX treatment. Interestingly, it was also found that this enhanced anticancer efficacy of GLU was associated with PTX-induced cell arrest in the G2/M phase of cell cycle. Further, the combined treatment of GLU-PTX had significant decrease in the expression levels of P-gp, MRPs, and BCRP in resistant KB cells at both mRNA and protein levels. Furthermore, the combination treatments showed significant reactive oxygen species (ROS) production, chromatin condensation and reduced mitochondrial membrane potential in resistant KB cells. The results from DNA fragmentation analysis also demonstrated the GLU induced apoptosis in KB cells and its synergy with PTX. Importantly, GLU and/or PTX triggered apoptosis through the activation of pro-apoptotic proteins such as p53, Bax, and caspase-9. Our findings demonstrated for the first time that GLU causes cell death in human oral cancer cells via the ROS-dependent suppression of MDR transporters and p53-mediated activation of the intrinsic mitochondrial pathway of apoptosis. Additionally, the present study also focussed on investigation of the protective effect of GLU and combination drugs in human normal blood lymphocytes. Normal blood lymphocytes assay indicated that GLU is able to induce selective toxicity in cancer cells and in silico molecular docking studies support the choice of GLU as ABC inhibitor to enhance PTX efficacy

  10. Molecular Analysis of VanA Outbreak of Enterococcus faecium in Two Warsaw Hospitals: The Importance of Mobile Genetic Elements

    PubMed Central

    Wardal, Ewa; Markowska, Katarzyna; Żabicka, Dorota; Wróblewska, Marta; Giemza, Małgorzata; Mik, Ewa; Połowniak-Pracka, Hanna; Woźniak, Agnieszka; Hryniewicz, Waleria; Sadowy, Ewa

    2014-01-01

    Vancomycin-resistant Enterococcus faecium represents a growing threat in hospital-acquired infections. Two outbreaks of this pathogen from neighboring Warsaw hospitals have been analyzed in this study. Pulsed-field gel electrophoresis (PFGE) of SmaI-digested DNA, multilocus VNTR analysis (MLVA), and multilocus sequence typing (MLST) revealed a clonal variability of isolates which belonged to three main lineages (17, 18, and 78) of nosocomial E. faecium. All isolates were multidrug resistant and carried several resistance, virulence, and plasmid-specific genes. Almost all isolates shared the same variant of Tn1546 transposon, characterized by the presence of insertion sequence ISEf1 and a point mutation in the vanA gene. In the majority of cases, this transposon was located on 50 kb or 100 kb pRUM-related plasmids, which lacked, however, the axe-txe toxin-antitoxin genes. 100 kb plasmid was easily transferred by conjugation and was found in various clonal backgrounds in both institutions, while 50 kb plasmid was not transferable and occurred solely in MT159/ST78 strains that disseminated clonally in one institution. Although molecular data indicated the spread of VRE between two institutions or a potential common source of this alert pathogen, epidemiological investigations did not reveal the possible route by which outbreak strains disseminated. PMID:25003118

  11. The isolation of cDNAs from OATL1 at Xp11.2 using a 480-kb YAC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Geraghty, M.T.; Brody, L.C.; Martin, L.S.

    1993-05-01

    Using an ornithine-{delta}-aminotransferase (OAT) cDNA, the authors identified five YACs that cover two nonadjacent OAT-related loci in Xp11.2-p11.3, designated OATL1 (distal) and OATL2 (proximal). Because several retinal degenerative disorders map to this region, they used YAC2 (480 kb), which covers the most distal part of OATL1, as a probe to screen a retinal cDNA library. From 8 {times} 10{sup 4} plaques screened, they isolated 13 clones. Two were OAT cDNAs. The remaining 11 were divided into eight groups by cross-hybridization. Groups 1-4 contain cDNAs that originate from single-copy X-linked genes in YAC2. Each has an open reading frame of >500more » bp and detects one or more transcripts on a Northern blot. The gene for each was sublocalized and ordered in YAC2. The cDNAs in groups 5-8 contained two or more Alu sequences, had no open reading frames, and did not detect transcripts. The cDNAs from groups 1-4 provide expressed sequence tags and identify candidate genes for the genetic disorders that map to this region. 28 refs., 5 figs., 1 tab.« less

  12. A genome-wide BAC-end sequence survey provides first insights into sweetpotato (Ipomoea batatas (L.) Lam.) genome composition.

    PubMed

    Si, Zengzhi; Du, Bing; Huo, Jinxi; He, Shaozhen; Liu, Qingchang; Zhai, Hong

    2016-11-21

    Sweetpotato, Ipomoea batatas (L.) Lam., is an important food crop widely grown in the world. However, little is known about the genome of this species because it is a highly heterozygous hexaploid. Gaining a more in-depth knowledge of sweetpotato genome is therefore necessary and imperative. In this study, the first bacterial artificial chromosome (BAC) library of sweetpotato was constructed. Clones from the BAC library were end-sequenced and analyzed to provide genome-wide information about this species. The BAC library contained 240,384 clones with an average insert size of 101 kb and had a 7.93-10.82 × coverage of the genome, and the probability of isolating any single-copy DNA sequence from the library was more than 99%. Both ends of 8310 BAC clones randomly selected from the library were sequenced to generate 11,542 high-quality BAC-end sequences (BESs), with an accumulative length of 7,595,261 bp and an average length of 658 bp. Analysis of the BESs revealed that 12.17% of the sweetpotato genome were known repetitive DNA, including 7.37% long terminal repeat (LTR) retrotransposons, 1.15% Non-LTR retrotransposons and 1.42% Class II DNA transposons etc., 18.31% of the genome were identified as sweetpotato-unique repetitive DNA and 10.00% of the genome were predicted to be coding regions. In total, 3,846 simple sequences repeats (SSRs) were identified, with a density of one SSR per 1.93 kb, from which 288 SSRs primers were designed and tested for length polymorphism using 20 sweetpotato accessions, 173 (60.07%) of them produced polymorphic bands. Sweetpotato BESs had significant hits to the genome sequences of I. trifida and more matches to the whole-genome sequences of Solanum lycopersicum than those of Vitis vinifera, Theobroma cacao and Arabidopsis thaliana. The first BAC library for sweetpotato has been successfully constructed. The high quality BESs provide first insights into sweetpotato genome composition, and have significant hits to the genome

  13. Comparative sequence analysis of the potato cyst nematode resistance locus H1 reveals a major lack of co-linearity between three haplotypes in potato (Solanum tuberosum ssp.).

    PubMed

    Finkers-Tomczak, Anna; Bakker, Erin; de Boer, Jan; van der Vossen, Edwin; Achenbach, Ute; Golas, Tomasz; Suryaningrat, Suwardi; Smant, Geert; Bakker, Jaap; Goverse, Aska

    2011-02-01

    The H1 locus confers resistance to the potato cyst nematode Globodera rostochiensis pathotypes 1 and 4. It is positioned at the distal end of chromosome V of the diploid Solanum tuberosum genotype SH83-92-488 (SH) on an introgression segment derived from S. tuberosum ssp. andigena. Markers from a high-resolution genetic map of the H1 locus (Bakker et al. in Theor Appl Genet 109:146-152, 2004) were used to screen a BAC library to construct a physical map covering a 341-kb region of the resistant haplotype coming from SH. For comparison, physical maps were also generated of the two haplotypes from the diploid susceptible genotype RH89-039-16 (S. tuberosum ssp. tuberosum/S. phureja), spanning syntenic regions of 700 and 319 kb. Gene predictions on the genomic segments resulted in the identification of a large cluster consisting of variable numbers of the CC-NB-LRR type of R genes for each haplotype. Furthermore, the regions were interspersed with numerous transposable elements and genes coding for an extensin-like protein and an amino acid transporter. Comparative analysis revealed a major lack of gene order conservation in the sequences of the three closely related haplotypes. Our data provide insight in the evolutionary mechanisms shaping the H1 locus and will facilitate the map-based cloning of the H1 resistance gene.

  14. Cloning, sequencing, and characterization of the Bacillus subtilis biotin biosynthetic operon.

    PubMed

    Bower, S; Perkins, J B; Yocum, R R; Howitt, C L; Rahaim, P; Pero, J

    1996-07-01

    A 10-kb region of the Bacillus subtilis genome that contains genes involved in biotin-biosynthesis was cloned and sequenced. DNA sequence analysis indicated that B. subtilis contains homologs of the Escherichia coli and Bacillus sphaericus bioA, bioB, bioD, and bioF genes. These four genes and a homolog of the B. sphaericus bioW gene are arranged in a single operon in the order bioWAFDR and are followed by two additional genes, bioI and orf2. bioI and orf2 show no similarity to any other known biotin biosynthetic genes. The bioI gene encodes a protein with similarity to cytochrome P-450s and was able to complement mutations in either bioC or bioH of E. coli. Mutations in bioI caused B. subtilis to grow poorly in the absence of biotin. The bradytroph phenotype of bioI mutants was overcome by pimelic acid, suggesting that the product of bioI functions at a step prior to pimelic acid synthesis. The B. subtilis bio operon is preceded by a putative vegetative promoter sequence and contains just downstream a region of dyad symmetry with homology to the bio regulatory region of B. sphaericus. Analysis of a bioW-lacZ translational fusion indicated that expression of the biotin operon is regulated by biotin and the B. subtilis birA gene.

  15. Cloning, sequencing, and characterization of the Bacillus subtilis biotin biosynthetic operon.

    PubMed Central

    Bower, S; Perkins, J B; Yocum, R R; Howitt, C L; Rahaim, P; Pero, J

    1996-01-01

    A 10-kb region of the Bacillus subtilis genome that contains genes involved in biotin-biosynthesis was cloned and sequenced. DNA sequence analysis indicated that B. subtilis contains homologs of the Escherichia coli and Bacillus sphaericus bioA, bioB, bioD, and bioF genes. These four genes and a homolog of the B. sphaericus bioW gene are arranged in a single operon in the order bioWAFDR and are followed by two additional genes, bioI and orf2. bioI and orf2 show no similarity to any other known biotin biosynthetic genes. The bioI gene encodes a protein with similarity to cytochrome P-450s and was able to complement mutations in either bioC or bioH of E. coli. Mutations in bioI caused B. subtilis to grow poorly in the absence of biotin. The bradytroph phenotype of bioI mutants was overcome by pimelic acid, suggesting that the product of bioI functions at a step prior to pimelic acid synthesis. The B. subtilis bio operon is preceded by a putative vegetative promoter sequence and contains just downstream a region of dyad symmetry with homology to the bio regulatory region of B. sphaericus. Analysis of a bioW-lacZ translational fusion indicated that expression of the biotin operon is regulated by biotin and the B. subtilis birA gene. PMID:8763940

  16. Deletion of a 760 kb region at 4p16 determines the prenatal and postnatal growth retardation characteristic of Wolf-Hirschhorn syndrome.

    PubMed

    Concolino, Daniela; Rossi, Elena; Strisciuglio, Pietro; Iembo, Maria Antonietta; Giorda, Roberto; Ciccone, Roberto; Tenconi, Romano; Zuffardi, Orsetta

    2007-10-01

    Recently the genotype/phenotype map of Wolf-Hirschhorn syndrome (WHS) has been refined, using small 4p deletions covering or flanking the critical region in patients showing only some of the WHS malformations. Accordingly, prenatal-onset growth retardation and failure to thrive have been found to result from haploinsufficiency for a 4p gene located between 0.4 and 1.3 Mb, whereas microcephaly results from haploinsufficiency of at least two different 4p regions, one of 2.2-2.38 Mb and a second one of 1.9-1.28 Mb. We defined the deletion size of a ring chromosome (r(4)) in a girl with prenatal onset growth retardation, severe failure to thrive and true microcephaly but without the WHS facial gestalt and mental retardation. A high-resolution comparative genome hybridisation array revealed a 760 kb 4p terminal deletion. This case, together with a familial 4p deletion involving the distal 400 kb reported in normal women, may narrow the critical region for short stature on 4p to 360-760 kb. This region is also likely to contain a gene for microcephaly. "In silico" analysis of all genes within the critical region failed to reveal any strikingly suggestive expression pattern; all genes remain candidates for short stature and microcephaly.

  17. Allelic association of sequence variants in the herpes virus entry mediator-B gene (PVRL2) with the severity of multiple sclerosis.

    PubMed

    Schmidt, S; Pericak-Vance, M A; Sawcer, S; Barcellos, L F; Hart, J; Sims, J; Prokop, A M; van der Walt, J; DeLoa, C; Lincoln, R R; Oksenberg, J R; Compston, A; Hauser, S L; Haines, J L; Gregory, S G

    2006-07-01

    Discrepant findings have been reported regarding an association of the apolipoprotein E (APOE) gene with the clinical course of multiple sclerosis (MS). To resolve these discrepancies, we examined common sequence variation in six candidate genes residing in a 380-kb genomic region surrounding and including the APOE locus for an association with MS severity. We genotyped at least three polymorphisms in each of six candidate genes in 1,540 Caucasian MS families (729 single-case and multiple-case families from the United States, 811 single-case families from the UK). By applying the quantitative transmission/disequilibrium test to a recently proposed MS severity score, the only statistically significant (P=0.003) association with MS severity was found for an intronic variant in the Herpes Virus Entry Mediator-B Gene PVRL2. Additional genotyping extended the association to a 16.6 kb block spanning intron 1 to intron 2 of the gene. Sequencing of PVRL2 failed to identify variants with an obvious functional role. In conclusion, the analysis of a very large data set suggests that genetic polymorphisms in PVRL2 may influence MS severity and supports the possibility that viral factors may contribute to the clinical course of MS, consistent with previous reports.

  18. mESAdb: microRNA Expression and Sequence Analysis Database

    PubMed Central

    Kaya, Koray D.; Karakülah, Gökhan; Yakıcıer, Cengiz M.; Acar, Aybar C.; Konu, Özlen

    2011-01-01

    microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data. PMID:21177657

  19. mESAdb: microRNA expression and sequence analysis database.

    PubMed

    Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen

    2011-01-01

    microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.

  20. Overview of recurrent chromosomal losses in retinoblastoma detected by low coverage next generation sequencing

    PubMed Central

    García-Chequer, A.J.; Méndez-Tenorio, A.; Olguín-Ruiz, G.; Sánchez-Vallejo, C.; Isa, P.; Arias, C.F.; Torres, J.; Hernández-Angeles, A.; Ramírez-Ortiz, M.A.; Lara, C.; Cabrera-Muñoz, M.L.; Sadowinski-Pine, S.; Bravo-Ortiz, J.C.; Ramón-García, G.; Diegopérez-Ramírez, J.; Ramírez-Reyes, G.; Casarrubias-Islas, R.; Ramírez, J.; Orjuela, M.A.; Ponce-Castañeda, M.V.

    2016-01-01

    Genes are frequently lost or gained in malignant tumors and the analysis of these changes can be informative about the underlying tumor biology. Retinoblastoma is a pediatric intraocular malignancy, and since deletions in chromosome 13 have been described in this tumor, we performed genome wide sequencing with the Illumina platform to test whether recurrent losses could be detected in low coverage data from DNA pools of Rb cases. An in silico reference profile for each pool was created from the human genome sequence GRCh37p5; a chromosome integrity score and a graphics 40 Kb window analysis approach, allowed us to identify with high resolution previously reported non random recurrent losses in all chromosomes of these tumors. We also found a pattern of gains and losses associated to clear and dark cytogenetic bands respectively. We further analyze a pool of medulloblastoma and found a more stable genomic profile and previously reported losses in this tumor. This approach facilitates identification of recurrent deletions from many patients that may be biological relevant for tumor development. PMID:26883451

  1. Genome sequencing and analysis of a highly virulent Vibrio parahaemolyticus strain isolated from the marine environment

    NASA Astrophysics Data System (ADS)

    Parks, M. C.; Moreno, E.

    2016-02-01

    Vibrio parahaemolyticus [Vp] is a Gram-negative bacterium and a natural inhabitant of coastal marine ecosystems worldwide. Vp is also a coincidental pathogen of humans. Virulent strains are commonly identified by the presence of the thermostable direct (tdh) or tdh-related (trh) hemolysin genes. However, virulence is multifaceted and many clinical Vp isolates do not carry tdh or trh. In this study, we sequenced and assembled the draft genome of a tdh- and trh-negative environmental isolate (805) shown previously to be highly virulent in zebrafish. To investigate potential mechanisms of virulence, we compared 805 to the clinical V. parahaemolyticus type strain (RIMD2210633). Pairwise comparison revealed the presence of multiple genomic regions including an IncF conjugative pilus (1.3 Kb) and a colicin V plasmid (1.49 Kb). These features are homologous to genomic regions present in clinical V. vulnificus and V. cholerae strains. Genome comparison also revealed the presence of five toxin-antitoxin systems. Isolate 805 likely attained these new features through the lateral acquisition of mobile genomic material - a hypothesis supported by the aberrant GC content of these regions. Colicin V plasmids are a diverse group of IncF plasmids found in invasive bacterial strains. Similarly, an abundance of toxin-antitoxin systems have been linked to virulence in Gram-negative bacteria. Current efforts are focused on characterizing 142 coding features present in 805 but absent from the type strain.

  2. Genome-wide patterns of copy number variation in the diversified chicken genomes using next-generation sequencing.

    PubMed

    Yi, Guoqiang; Qu, Lujiang; Liu, Jianfeng; Yan, Yiyuan; Xu, Guiyun; Yang, Ning

    2014-11-07

    Copy number variation (CNV) is important and widespread in the genome, and is a major cause of disease and phenotypic diversity. Herein, we performed a genome-wide CNV analysis in 12 diversified chicken genomes based on whole genome sequencing. A total of 8,840 CNV regions (CNVRs) covering 98.2 Mb and representing 9.4% of the chicken genome were identified, ranging in size from 1.1 to 268.8 kb with an average of 11.1 kb. Sequencing-based predictions were confirmed at a high validation rate by two independent approaches, including array comparative genomic hybridization (aCGH) and quantitative PCR (qPCR). The Pearson's correlation coefficients between sequencing and aCGH results ranged from 0.435 to 0.755, and qPCR experiments revealed a positive validation rate of 91.71% and a false negative rate of 22.43%. In total, 2,214 (25.0%) predicted CNVRs span 2,216 (36.4%) RefSeq genes associated with specific biological functions. Besides two previously reported copy number variable genes EDN3 and PRLR, we also found some promising genes with potential in phenotypic variation. Two genes, FZD6 and LIMS1, related to disease susceptibility/resistance are covered by CNVRs. The highly duplicated SOCS2 may lead to higher bone mineral density. Entire or partial duplication of some genes like POPDC3 may have great economic importance in poultry breeding. Our results based on extensive genetic diversity provide a more refined chicken CNV map and genome-wide gene copy number estimates, and warrant future CNV association studies for important traits in chickens.

  3. Development, characterization and cross species amplification of polymorphic microsatellite markers from expressed sequence tags of turmeric (Curcuma longa L.).

    PubMed

    Siju, S; Dhanya, K; Syamkumar, S; Sasikumar, B; Sheeja, T E; Bhat, A I; Parthasarathy, V A

    2010-02-01

    Expressed sequence tags (ESTs) from turmeric (Curcuma longa L.) were used for the screening of type and frequency of Class I (hypervariable) simple sequence repeats (SSRs). A total of 231 microsatellite repeats were detected from 12,593 EST sequences of turmeric after redundancy elimination. The average density of Class I SSRs accounts to one SSR per 17.96 kb of EST. Mononucleotides were the most abundant class of microsatellite repeat in turmeric ESTs followed by trinucleotides. A robust set of 17 polymorphic EST-SSRs were developed and used for evaluating 20 turmeric accessions. The number of alleles detected ranged from 3 to 8 per loci. The developed markers were also evaluated in 13 related species of C. longa confirming high rate (100%) of cross species transferability. The polymorphic microsatellite markers generated from this study could be used for genetic diversity analysis and resolving the taxonomic confusion prevailing in the genus.

  4. Sequence Analysis and Initial Characterization of Two Isozymes of Hydroxylaminobenzene Mutase from Pseudomonas pseudoalcaligenes JS45

    PubMed Central

    Davis, John K.; Paoli, George C.; He, Zhongqi; Nadeau, Lloyd J.; Somerville, Charles C.; Spain, Jim C.

    2000-01-01

    Pseudomonas pseudoalcaligenes JS45 grows on nitrobenzene by a partially reductive pathway in which the intermediate hydroxylaminobenzene is enzymatically rearranged to 2-aminophenol by hydroxylaminobenzene mutase (HAB mutase). The properties of the enzyme, the reaction mechanism, and the evolutionary origin of the gene(s) encoding the enzyme are unknown. In this study, two open reading frames (habA and habB), each encoding an HAB mutase enzyme, were cloned from a P. pseudoalcaligenes JS45 genomic library and sequenced. The open reading frames encoding HabA and HabB are separated by 2.5 kb and are divergently transcribed. The deduced amino acid sequences of HabA and HabB are 44% identical. The HAB mutase specific activities in crude extracts of Escherichia coli clones synthesizing either HabA or HabB were similar to the specific activities of extracts of strain JS45 grown on nitrobenzene. HAB mutase activity in E. coli extracts containing HabB withstood heating at 85°C for 10 min, but extracts containing HabA were inactivated when they were heated at temperatures above 60°C. HAB mutase activity in extracts of P. pseudoalcaligenes JS45 grown on nitrobenzene exhibited intermediate temperature stability. Although both the habA gene and the habB gene conferred HAB mutase activity when they were separately cloned and expressed in E. coli, reverse transcriptase PCR analysis indicated that only habA is transcribed in P. pseudoalcaligenes JS45. A mutant strain derived from strain JS45 in which the habA gene was disrupted was unable to grow on nitrobenzene, which provided physiological evidence that HabA is involved in the degradation of nitrobenzene. A strain in which habB was disrupted grew on nitrobenzene. Gene Rv3078 of Mycobacterium tuberculosis H37Rv encodes a protein whose deduced amino acid sequence is 52% identical to the HabB amino acid sequence. E. coli containing M. tuberculosis gene Rv3078 cloned into pUC18 exhibited low levels of HAB mutase activity

  5. Complete Genome Sequence of the Quality Control Strain Staphylococcus aureus subsp. aureus ATCC 25923.

    PubMed

    Treangen, Todd J; Maybank, Rosslyn A; Enke, Sana; Friss, Mary Beth; Diviak, Lynn F; Karaolis, David K R; Koren, Sergey; Ondov, Brian; Phillippy, Adam M; Bergman, Nicholas H; Rosovitz, M J

    2014-11-06

    Staphylococcus aureus subsp. aureus ATCC 25923 is commonly used as a control strain for susceptibility testing to antibiotics and as a quality control strain for commercial products. We present the completed genome sequence for the strain, consisting of the chromosome and a 27.5-kb plasmid. Copyright © 2014 Treangen et al.

  6. Comparative Analysis of Vertebrate Dystrophin Loci Indicate Intron Gigantism as a Common Feature

    PubMed Central

    Pozzoli, Uberto; Elgar, Greg; Cagliani, Rachele; Riva, Laura; Comi, Giacomo P.; Bresolin, Nereo; Bardoni, Alessandra; Sironi, Manuela

    2003-01-01

    The human DMD gene is the largest known to date, spanning > 2000 kb on the X chromosome. The gene size is mainly accounted for by huge intronic regions. We sequenced 190 kb of Fugu rubripes (pufferfish) genomic DNA corresponding to the complete dystrophin gene (FrDMD) and provide the first report of gene structure and sequence comparison among dystrophin genomic sequences from different vertebrate organisms. Almost all intron positions and phases are conserved between FrDMD and its mammalian counterparts, and the predicted protein product of the Fugu gene displays 55% identity and 71% similarity to human dystrophin. In analogy to the human gene, FrDMD presents several-fold longer than average intronic regions. Analysis of intron sequences of the human and murine genes revealed that they are extremely conserved in size and that a similar fraction of total intron length is represented by repetitive elements; moreover, our data indicate that intron expansion through repeat accumulation in the two orthologs is the result of independent insertional events. The hypothesis that intron length might be functionally relevant to the DMD gene regulation is proposed and substantiated by the finding that dystrophin intron gigantism is common to the three vertebrate genes. [Supplemental material is available online at www.genome.org.] PMID:12727896

  7. Effects and mechanism of GA-13315 on the proliferation and apoptosis of KB cells in oral cancer.

    PubMed

    Shen, Shan; Tang, Jingxia

    2017-08-01

    The present study describes the effects and mechanism of GA-13315 on the proliferation and apoptosis of KB cells in oral cancer. Oral cancer is twice as common in men than women. More than 90% of oral cancers in men and 85% in women are linked to lifestyle and environmental factors. PPP2R2B methylation may be associated with survival and prognosis in patients with gliomas. In tumor cell proliferation and apoptosis, the mechanism of PPP2R2B remains unclear. In the present study, we found that PPP2R2B expression of H1299 cells is significantly decreased after being treated by GA-13315. KB cells were isolated from patients with oral cancer and treated with GA-13315 (5 µM). Cells without GA-13315 treatment served as the control group. An MTT experiment was performed to detect the post-treatment cell growth between the groups. A flow cytometry was used to detect cell apoptosis. Western blot analysis and quantitative polymerase chain reaction methods were used for detecting the expression of PPP2R2B. Compared with the control group, the cell proliferation of the treatment group slowed after being treated with GA-13315. The difference was statistically significant (P<0.05). Western blotting showed that the PPP2R2B expression of cells was reduced after being treated with GA-13315. Compared with the control group, the difference was statistically significant (P<0.05). According to results from the Transwell migration assay, the invasiveness of the KB cells of oral cancer were weakened after being treated by GA-13315. GA-13315 can accelerate the apoptosis of oral cancer cells and presents a dose correlation. The biological effect is exerted through the decrease of PPP2R2B.

  8. Analysis of sequence repeats of proteins in the PDB.

    PubMed

    Mary Rajathei, David; Selvaraj, Samuel

    2013-12-01

    Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.

  9. Genomic Investigation Reveals Highly Conserved, Mosaic, Recombination Events Associated with Capsular Switching among Invasive Neisseria meningitidis Serogroup W Sequence Type (ST)-11 Strains.

    PubMed

    Mustapha, Mustapha M; Marsh, Jane W; Krauland, Mary G; Fernandez, Jorge O; de Lemos, Ana Paula S; Dunning Hotopp, Julie C; Wang, Xin; Mayer, Leonard W; Lawrence, Jeffrey G; Hiller, N Luisa; Harrison, Lee H

    2016-07-03

    Neisseria meningitidis is an important cause of meningococcal disease globally. Sequence type (ST)-11 clonal complex (cc11) is a hypervirulent meningococcal lineage historically associated with serogroup C capsule and is believed to have acquired the W capsule through a C to W capsular switching event. We studied the sequence of capsule gene cluster (cps) and adjoining genomic regions of 524 invasive W cc11 strains isolated globally. We identified recombination breakpoints corresponding to two distinct recombination events within W cc11: A 8.4-kb recombinant region likely acquired from W cc22 including the sialic acid/glycosyl-transferase gene, csw resulted in a C→W change in capsular phenotype and a 13.7-kb recombinant segment likely acquired from Y cc23 lineage includes 4.5 kb of cps genes and 8.2 kb downstream of the cps cluster resulting in allelic changes in capsule translocation genes. A vast majority of W cc11 strains (497/524, 94.8%) retain both recombination events as evidenced by sharing identical or very closely related capsular allelic profiles. These data suggest that the W cc11 capsular switch involved two separate recombination events and that current global W cc11 meningococcal disease is caused by strains bearing this mosaic capsular switch. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  10. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    PubMed

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  11. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis

    PubMed Central

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. Availability http://www.cemb.edu.pk/sw.html Abbreviations RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language. PMID:23055611

  12. Towards a transcription map spanning a 250 kb area within the DiGeorge syndrome chromosome region

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wong, W.; Emanuel, B.S.; Siegert, J.

    1994-09-01

    DiGeorge syndrome (DGS) and velocardiofacial syndrome (VCFS) are congenital anomalies affecting predominantly the thymus, parathyroid glands, heart and craniofacial development. Detection of 22q11.2 deletions in the majority of DGS and VCFS patients implicate 22q11 haploinsufficiency in the etiology of these disorders. The VCFS/DGS critical region lies within the proximal portion of a commonly deleted 1.2 Mb region in 22q11. A 250 kb cosmid contig covering this critical region and containing D22S74 (N25) has been established. From this contig, eleven cosmids with minimal overlap were biotinylated by nick translation, and hybridized to PCR-amplified cDNAs prepared from different tissues. The use ofmore » cDNAs from a variety of tissues increases the likelihood of identifying low abundance transcripts and tissue-specific expressed sequences. A DGCR-specific cDNA sublibrary consisting of 670 cDNA clones has been constructed. To date, 49 cDNA clones from this sub-library have been identified with single copy probes and cosmids containing putative CpG islands. Based on sequence analysis, 25 of the clones contain regions of homology to several cDNAs which map within the proximal contig. LAN is a novel partial cDNA isolated from a fetal brain library probed with one of the cosmids in the proximal contig. Using LAN as a probe, we have found 19 positive clones in the DGCR-specific cDNA sub-library (4 clones from fetal brain, 14 from adult skeletal muscle and one from fetal liver). Some of the LAN-positive clones extend the partial cDNA in the 5{prime} direction and will be useful in assembling a full length transcript. This resource will be used to develop a complete transcriptional map of the critical region in order to identify candidate gene(s) involved in the etiology of DGS/VCFS and to determine the relationship between the transcriptional and physical maps of 22q11.« less

  13. NexGen Production – Sequencing and Analysis

    ScienceCinema

    Muzny, Donna

    2018-01-16

    Donna Muzny of the Baylor College of Medicine Human Genome Sequencing Center discusses next generation sequencing platforms and evaluating pipeline performance on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.

  14. The Large Mitochondrial Genome of Symbiodinium minutum Reveals Conserved Noncoding Sequences between Dinoflagellates and Apicomplexans.

    PubMed

    Shoguchi, Eiichi; Shinzato, Chuya; Hisata, Kanako; Satoh, Nori; Mungpakdee, Sutada

    2015-07-20

    Even though mitochondrial genomes, which characterize eukaryotic cells, were first discovered more than 50 years ago, mitochondrial genomics remains an important topic in molecular biology and genome sciences. The Phylum Alveolata comprises three major groups (ciliates, apicomplexans, and dinoflagellates), the mitochondrial genomes of which have diverged widely. Even though the gene content of dinoflagellate mitochondrial genomes is reportedly comparable to that of apicomplexans, the highly fragmented and rearranged genome structures of dinoflagellates have frustrated whole genomic analysis. Consequently, noncoding sequences and gene arrangements of dinoflagellate mitochondrial genomes have not been well characterized. Here we report that the continuous assembled genome (∼326 kb) of the dinoflagellate, Symbiodinium minutum, is AT-rich (∼64.3%) and that it contains three protein-coding genes. Based upon in silico analysis, the remaining 99% of the genome comprises transcriptomic noncoding sequences. RNA edited sites and unique, possible start and stop codons clarify conserved regions among dinoflagellates. Our massive transcriptome analysis shows that almost all regions of the genome are transcribed, including 27 possible fragmented ribosomal RNA genes and 12 uncharacterized small RNAs that are similar to mitochondrial RNA genes of the malarial parasite, Plasmodium falciparum. Gene map comparisons show that gene order is only slightly conserved between S. minutum and P. falciparum. However, small RNAs and intergenic sequences share sequence similarities with P. falciparum, suggesting that the function of noncoding sequences has been preserved despite development of very different genome structures. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  15. Separation of 1-23-kb complementary DNA strands by urea-agarose gel electrophoresis.

    PubMed

    Hegedüs, Eva; Kókai, Endre; Kotlyar, Alexander; Dombrádi, Viktor; Szabó, Gábor

    2009-09-01

    Double-stranded (ds), as well as denatured, single-stranded (ss) DNA samples can be analyzed on urea-agarose gels. Here we report that after denaturation by heat in the presence of 8 M urea, the two strands of the same ds DNA fragment of approximately 1-20-kb size migrate differently in 1 M urea containing agarose gels. The two strands are readily distinguished on Southern blots by ss-specific probes. The different migration of the two strands could be attributed to their different, base composition-dependent conformation impinging on the electrophoretic mobility of the ss molecules. This phenomenon can be exploited for the efficient preparation of strand-specific probes and for the separation of the complementary DNA strands for subsequent analysis, offering a new tool for various cell biological research areas.

  16. First Report of cfr-Carrying Plasmids in the Pandemic Sequence Type 22 Methicillin-Resistant Staphylococcus aureus Staphylococcal Cassette Chromosome mec Type IV Clone

    PubMed Central

    Shore, Anna C.; Lazaris, Alexandros; Kinnevey, Peter M.; Brennan, Orla M.; Brennan, Gráinne I.; O'Connell, Brian; Feßler, Andrea T.; Schwarz, Stefan

    2016-01-01

    Linezolid is often the drug of last resort for serious methicillin-resistant Staphylococcus aureus (MRSA) infections. Linezolid resistance is mediated by mutations in 23S rRNA and genes for ribosomal proteins; cfr, encoding phenicol, lincosamide, oxazolidinone, pleuromutilin, and streptogramin A (PhLOPSA) resistance; its homologue cfr(B); or optrA, conferring oxazolidinone and phenicol resistance. Linezolid resistance is rare in S. aureus, and cfr is even rarer. This study investigated the clonality and linezolid resistance mechanisms of two MRSA isolates from patients in separate Irish hospitals. Isolates were subjected to cfr PCR, PhLOPSA susceptibility testing, 23S rRNA PCR and sequencing, DNA microarray profiling, spa typing, pulsed-field gel electrophoresis (PFGE), plasmid curing, and conjugative transfer. Whole-genome sequencing was used for single-nucleotide variant (SNV) analysis, multilocus sequence typing, L protein mutation identification, cfr plasmid sequence analysis, and optrA and cfr(B) detection. Isolates M12/0145 and M13/0401 exhibited linezolid MICs of 64 and 16 mg/liter, respectively, and harbored identical 23S rRNA and L22 mutations, but M12/0145 exhibited the mutation in 2/6 23S rRNA alleles, compared to 1/5 in M13/0401. Both isolates were sequence type 22 MRSA staphylococcal cassette chromosome mec type IV (ST22-MRSA-IV)/spa type t032 isolates, harbored cfr, exhibited the PhLOPSA phenotype, and lacked optrA and cfr(B). They differed by five PFGE bands and 603 SNVs. Isolate M12/0145 harbored cfr and fexA on a 41-kb conjugative pSCFS3-type plasmid, whereas M13/0401 harbored cfr and lsa(B) on a novel 27-kb plasmid. This is the first report of cfr in the pandemic ST22-MRSA-IV clone. Different cfr plasmids and mutations associated with linezolid resistance in genotypically distinct ST22-MRSA-IV isolates highlight that prudent management of linezolid use is essential. PMID:26953212

  17. TaxKB: a knowledge base for new taxane-related drug discovery.

    PubMed

    Murugan, Kasi; Shanmugasamy, Sangeetha; Al-Sohaibani, Saleh; Vignesh, Naga; Palanikannan, Kandavel; Vimala, Antonydhason; Kumar, Gopal Ramesh

    2015-01-01

    Taxanes are naturally occurring compounds which belong to a powerful group of chemotherapeutic drugs with anticancer properties. Their current use, clinical efficacy, and unique mechanism of action indicate their potentiality for cancer drug discovery and development thereby promising to reduce the high economy associated with cancer worldwide. Extensive research has been carried out on taxanes with the aim to combat issues of drug resistance, side effects, limited natural supply, and also to increase the therapeutic index of these molecules. These efforts have led to the isolation of many naturally occurring compounds belonging to this family (more than 350 different kinds), and the synthesis of semisynthetic analogs of the naturally existing molecules (>500), and has also led to the characterization of many (>1000) of them. A web-based database system on clinically exploitable taxanes, providing a link between the structure and the pharmacological property of these molecules could help to reduce the druggability gap for these molecules. Taxane knowledge base (TaxKB, http://bioinfo.au-kbc.org.in/taxane/Taxkb/), is an online multi-tier relational database that currently holds data on 42 parameters of 250 natural and 503 semisynthetic analogs of taxanes. This database provides researchers with much-needed information necessary for drug development. TaxKB enables the user to search data on the structure, drug-likeness, and physicochemical properties of both natural and synthetic taxanes with a "General Search" option in addition to a "Parameter Specific Search." It displays 2D structure and allows the user to download the 3D structure (a PDB file) of taxanes that can be viewed with any molecular visualization tool. The ultimate aim of TaxKB is to provide information on Absorption, Distribution, Metabolism, and Excretion/Toxicity (ADME/T) as well as data on bioavailability and target interaction properties of candidate anticancer taxanes, ahead of expensive clinical

  18. Scalable Kernel Methods and Algorithms for General Sequence Analysis

    ERIC Educational Resources Information Center

    Kuksa, Pavel

    2011-01-01

    Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…

  19. Fluorescent Inhibitors as Tools To Characterize Enzymes: Case Study of the Lipid Kinase Phosphatidylinositol 4-Kinase IIIβ (PI4KB).

    PubMed

    Humpolickova, Jana; Mejdrová, Ivana; Matousova, Marika; Nencka, Radim; Boura, Evzen

    2017-01-12

    The lipid kinase phosphatidylinositol 4-kinase IIIβ (PI4KB) is an essential host factor for many positive-sense single-stranded RNA (+RNA) viruses including human pathogens hepatitis C virus (HCV), Severe acute respiratory syndrome (SARS), coxsackie viruses, and rhinoviruses. Inhibitors of PI4KB are considered to be potential broad-spectrum virostatics, and it is therefore critical to develop a biochemical understanding of the kinase. Here, we present highly potent and selective fluorescent inhibitors that we show to be useful chemical biology tools especially in determination of dissociation constants. Moreover, we show that the coumarin-labeled inhibitor can be used to image PI4KB in cells using fluorescence-lifetime imaging microscopy (FLIM) microscopy.

  20. Cloning and characterization of an autonomous replication sequence from Coxiella burnetii.

    PubMed Central

    Suhan, M; Chen, S Y; Thompson, H A; Hoover, T A; Hill, A; Williams, J C

    1994-01-01

    A Coxiella burnetii chromosomal fragment capable of functioning as an origin for the replication of a kanamycin resistance (Kanr) plasmid was isolated by use of origin search methods utilizing an Escherichia coli host. The 5.8-kb fragment was subcloned into phagemid vectors and was deleted progressively by an exonuclease III-S1 technique. Plasmids containing progressively shorter DNA fragments were then tested for their capability to support replication by transformation of an E. coli polA strain. A minimal autonomous replication sequence (ARS) was delimited to 403 bp. Sequencing of the entire 5.8-kb region revealed that the minimal ARS contained two consensus DnaA boxes, three A + T-rich 21-mers, a transcriptional promoter leading rightwards, and potential integration host factor and factor of inversion stimulation binding sites. Database comparisons of deduced amino acid sequences revealed that open reading frames located around the ARS were homologous to genes often, but not always, found near bacterial chromosomal origins; these included identities with rpmH and rnpA in E. coli and identities with the 9K protein and 60K membrane protein in E. coli and Pseudomonas species. These and direct hybridization data suggested that the ARS was chromosomal and not associated with the resident plasmid QpH1. Two-dimensional agarose gel electrophoresis did not reveal the presence of initiating intermediates, indicating that the ARS did not initiate chromosome replication during laboratory growth of C. burnetii. Images PMID:8071197

  1. Alu sequence involvement in transcriptional insulation of the keratin 18 gene in transgenic mice.

    PubMed Central

    Thorey, I S; Ceceña, G; Reynolds, W; Oshima, R G

    1993-01-01

    The human keratin 18 (K18) gene is expressed in a variety of adult simple epithelial tissues, including liver, intestine, lung, and kidney, but is not normally found in skin, muscle, heart, spleen, or most of the brain. Transgenic animals derived from the cloned K18 gene express the transgene in appropriate tissues at levels directly proportional to the copy number and independently of the sites of integration. We have investigated in transgenic mice the dependence of K18 gene expression on the distal 5' and 3' flanking sequences and upon the RNA polymerase III promoter of an Alu repetitive DNA transcription unit immediately upstream of the K18 promoter. Integration site-independent expression of tandemly duplicated K18 transgenes requires the presence of either an 825-bp fragment of the 5' flanking sequence or the 3.5-kb 3' flanking sequence. Mutation of the RNA polymerase III promoter of the Alu element within the 825-bp fragment abolishes copy number-dependent expression in kidney but does not abolish integration site-independent expression when assayed in the absence of the 3' flanking sequence of the K18 gene. The characteristics of integration site-independent expression and copy number-dependent expression are separable. In addition, the formation of the chromatin state of the K18 gene, which likely restricts the tissue-specific expression of this gene, is not dependent upon the distal flanking sequences of the 10-kb K18 gene but rather may depend on internal regulatory regions of the gene. Images PMID:7692231

  2. Cloning and sequencing of a gene encoding a novel extracellular neutral proteinase from Streptomyces sp. strain C5 and expression of the gene in Streptomyces lividans 1326.

    PubMed Central

    Lampel, J S; Aphale, J S; Lampel, K A; Strohl, W R

    1992-01-01

    The gene encoding a novel milk protein-hydrolyzing proteinase was cloned on a 6.56-kb SstI fragment from Streptomyces sp. strain C5 genomic DNA into Streptomyces lividans 1326 by using the plasmid vector pIJ702. The gene encoding the small neutral proteinase (snpA) was located within a 2.6-kb BamHI-SstI restriction fragment that was partially sequenced. The molecular mass of the deduced amino acid sequence of the mature protein was determined to be 15,740, which corresponds very closely with the relative molecular mass of the purified protein (15,500) determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis. The N-terminal amino acid sequence of the purified neutral proteinase was determined, and the DNA encoding this sequence was found to be located within the sequenced DNA. The deduced amino acid sequence contains a conserved zinc binding site, although secondary ligand binding and active sites typical of thermolysinlike metalloproteinases are absent. The combination of its small size, deduced amino acid sequence, and substrate and inhibition profile indicate that snpA encodes a novel neutral proteinase. Images PMID:1569011

  3. Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

    NASA Astrophysics Data System (ADS)

    Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

    1998-03-01

    Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.

  4. Miniature Transposable Sequences Are Frequently Mobilized in the Bacterial Plant Pathogen Pseudomonas syringae pv. phaseolicola

    PubMed Central

    Bardaji, Leire; Añorga, Maite; Jackson, Robert W.; Martínez-Bilbao, Alejandro; Yanguas-Casás, Natalia; Murillo, Jesús

    2011-01-01

    Mobile genetic elements are widespread in Pseudomonas syringae, and often associate with virulence genes. Genome reannotation of the model bean pathogen P. syringae pv. phaseolicola 1448A identified seventeen types of insertion sequences and two miniature inverted-repeat transposable elements (MITEs) with a biased distribution, representing 2.8% of the chromosome, 25.8% of the 132-kb virulence plasmid and 2.7% of the 52-kb plasmid. Employing an entrapment vector containing sacB, we estimated that transposition frequency oscillated between 2.6×10−5 and 1.1×10−6, depending on the clone, although it was stable for each clone after consecutive transfers in culture media. Transposition frequency was similar for bacteria grown in rich or minimal media, and from cells recovered from compatible and incompatible plant hosts, indicating that growth conditions do not influence transposition in strain 1448A. Most of the entrapped insertions contained a full-length IS801 element, with the remaining insertions corresponding to sequences smaller than any transposable element identified in strain 1448A, and collectively identified as miniature sequences. From these, fragments of 229, 360 and 679-nt of the right end of IS801 ended in a consensus tetranucleotide and likely resulted from one-ended transposition of IS801. An average 0.7% of the insertions analyzed consisted of IS801 carrying a fragment of variable size from gene PSPPH_0008/PSPPH_0017, showing that IS801 can mobilize DNA in vivo. Retrospective analysis of complete plasmids and genomes of P. syringae suggests, however, that most fragments of IS801 are likely the result of reorganizations rather than one-ended transpositions, and that this element might preferentially contribute to genome flexibility by generating homologous regions of recombination. A further miniature sequence previously found to affect host range specificity and virulence, designated MITEPsy1 (100-nt), represented an average 2.4% of the total

  5. Miniature transposable sequences are frequently mobilized in the bacterial plant pathogen Pseudomonas syringae pv. phaseolicola.

    PubMed

    Bardaji, Leire; Añorga, Maite; Jackson, Robert W; Martínez-Bilbao, Alejandro; Yanguas-Casás, Natalia; Murillo, Jesús

    2011-01-01

    Mobile genetic elements are widespread in Pseudomonas syringae, and often associate with virulence genes. Genome reannotation of the model bean pathogen P. syringae pv. phaseolicola 1448A identified seventeen types of insertion sequences and two miniature inverted-repeat transposable elements (MITEs) with a biased distribution, representing 2.8% of the chromosome, 25.8% of the 132-kb virulence plasmid and 2.7% of the 52-kb plasmid. Employing an entrapment vector containing sacB, we estimated that transposition frequency oscillated between 2.6×10(-5) and 1.1×10(-6), depending on the clone, although it was stable for each clone after consecutive transfers in culture media. Transposition frequency was similar for bacteria grown in rich or minimal media, and from cells recovered from compatible and incompatible plant hosts, indicating that growth conditions do not influence transposition in strain 1448A. Most of the entrapped insertions contained a full-length IS801 element, with the remaining insertions corresponding to sequences smaller than any transposable element identified in strain 1448A, and collectively identified as miniature sequences. From these, fragments of 229, 360 and 679-nt of the right end of IS801 ended in a consensus tetranucleotide and likely resulted from one-ended transposition of IS801. An average 0.7% of the insertions analyzed consisted of IS801 carrying a fragment of variable size from gene PSPPH_0008/PSPPH_0017, showing that IS801 can mobilize DNA in vivo. Retrospective analysis of complete plasmids and genomes of P. syringae suggests, however, that most fragments of IS801 are likely the result of reorganizations rather than one-ended transpositions, and that this element might preferentially contribute to genome flexibility by generating homologous regions of recombination. A further miniature sequence previously found to affect host range specificity and virulence, designated MITEPsy1 (100-nt), represented an average 2.4% of the total

  6. Sequencing, Assembly and Analysis of Human Microbial Communities

    ScienceCinema

    Petrosino, Joe

    2018-02-02

    Joe Petrosino of Baylor College of Medicine discusses using next generation sequencing technologies to study human microbial communities associated with health and disease on June 4, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.

  7. Contiguous 22.1-kb deletion embracing AVPR2 and ARHGAP4 genes at novel breakpoints leads to nephrogenic diabetes insipidus in a Chinese pedigree.

    PubMed

    Bai, Ying; Chen, Yibing; Kong, Xiangdong

    2018-02-02

    It has been reported that mutations in arginine vasopressin type 2 receptor (AVPR2) cause congenital X-linked nephrogenic diabetes insipidus (NDI). However, only a few cases of AVPR2 deletion have been documented in China. An NDI pedigree was included in this study, including the proband and his mother. All NDI patients had polyuria, polydipsia, and growth retardation. PCR mapping, long range PCR and sanger sequencing were used to identify genetic causes of NDI. A novel 22,110 bp deletion comprising AVPR2 and ARH4GAP4 genes was identified by PCR mapping, long range PCR and sanger sequencing. The deletion happened perhaps due to the 4-bp homologous sequence (TTTT) at the junctions of both 5' and 3' breakpoints. The gross deletion co-segregates with NDI. After analyzing available data of putative clinical signs of AVPR2 and ARH4GAP4 deletion, we reconsider the potential role of AVPR2 deletion in short stature. We identified a novel 22.1-kb deletion leading to X-linked NDI in a Chinese pedigree, which would increase the current knowledge in AVPR2 mutation.

  8. Analysis of noise-induced temporal correlations in neuronal spike sequences

    NASA Astrophysics Data System (ADS)

    Reinoso, José A.; Torrent, M. C.; Masoller, Cristina

    2016-11-01

    We investigate temporal correlations in sequences of noise-induced neuronal spikes, using a symbolic method of time-series analysis. We focus on the sequence of time-intervals between consecutive spikes (inter-spike-intervals, ISIs). The analysis method, known as ordinal analysis, transforms the ISI sequence into a sequence of ordinal patterns (OPs), which are defined in terms of the relative ordering of consecutive ISIs. The ISI sequences are obtained from extensive simulations of two neuron models (FitzHugh-Nagumo, FHN, and integrate-and-fire, IF), with correlated noise. We find that, as the noise strength increases, temporal order gradually emerges, revealed by the existence of more frequent ordinal patterns in the ISI sequence. While in the FHN model the most frequent OP depends on the noise strength, in the IF model it is independent of the noise strength. In both models, the correlation time of the noise affects the OP probabilities but does not modify the most probable pattern.

  9. Design and Analysis of Single-Cell Sequencing Experiments.

    PubMed

    Grün, Dominic; van Oudenaarden, Alexander

    2015-11-05

    Recent advances in single-cell sequencing hold great potential for exploring biological systems with unprecedented resolution. Sequencing the genome of individual cells can reveal somatic mutations and allows the investigation of clonal dynamics. Single-cell transcriptome sequencing can elucidate the cell type composition of a sample. However, single-cell sequencing comes with major technical challenges and yields complex data output. In this Primer, we provide an overview of available methods and discuss experimental design and single-cell data analysis. We hope that these guidelines will enable a growing number of researchers to leverage the power of single-cell sequencing. Copyright © 2015 Elsevier Inc. All rights reserved.

  10. Thermodynamically balanced inside-out (TBIO) PCR-based gene synthesis: a novel method of primer design for high-fidelity assembly of longer gene sequences

    PubMed Central

    Gao, Xinxin; Yo, Peggy; Keith, Andrew; Ragan, Timothy J.; Harris, Thomas K.

    2003-01-01

    A novel thermodynamically-balanced inside-out (TBIO) method of primer design was developed and compared with a thermodynamically-balanced conventional (TBC) method of primer design for PCR-based gene synthesis of codon-optimized gene sequences for the human protein kinase B-2 (PKB2; 1494 bp), p70 ribosomal S6 subunit protein kinase-1 (S6K1; 1622 bp) and phosphoinositide-dependent protein kinase-1 (PDK1; 1712 bp). Each of the 60mer TBIO primers coded for identical nucleotide regions that the 60mer TBC primers covered, except that half of the TBIO primers were reverse complement sequences. In addition, the TBIO and TBC primers contained identical regions of temperature- optimized primer overlaps. The TBC method was optimized to generate sequential overlapping fragments (∼0.4–0.5 kb) for each of the gene sequences, and simultaneous and sequential combinations of overlapping fragments were tested for their ability to be assembled under an array of PCR conditions. However, no fully synthesized gene sequences could be obtained by this approach. In contrast, the TBIO method generated an initial central fragment (∼0.4–0.5 kb), which could be gel purified and used for further inside-out bidirectional elongation by additional increments of 0.4–0.5 kb. By using the newly developed TBIO method of PCR-based gene synthesis, error-free synthetic genes for the human protein kinases PKB2, S6K1 and PDK1 were obtained with little or no corrective mutagenesis. PMID:14602936

  11. Whole Genome Sequence Analysis Using JSpecies Tool Establishes Clonal Relationships between Listeria monocytogenes Strains from Epidemiologically Unrelated Listeriosis Outbreaks

    DOE PAGES

    Burall, Laurel S.; Grim, Christopher J.; Mammel, Mark K.; ...

    2016-03-07

    In an effort to build a comprehensive genomic approach to food safety challenges, the FDA has implemented a whole genome sequencing effort, GenomeTrakr, which involves the sequencing and analysis of genomes of foodborne pathogens. As a part of this effort, we routinely sequence whole genomes of Listeria monocytogenes (Lm) isolates associated with human listeriosis outbreaks, as well as those isolated through other sources. To rapidly establish genetic relatedness of these genomes, we evaluated tetranucleotide frequency analysis via the JSpecies program to provide a cursory analysis of strain relatedness. The JSpecies tetranucleotide (tetra) analysis plots standardized (z-score) tetramer word frequencies ofmore » two strains against each other and uses linear regression analysis to determine similarity (r 2). This tool was able to validate the close relationships between outbreak related strains from four different outbreaks. Included in this study was the analysis of Lm strains isolated during the recent caramel apple outbreak and stone fruit incident in 2014. We identified that many of the isolates from these two outbreaks shared a common 4b variant (4bV) serotype, also designated as IVb-v1, using a qPCR protocol developed in our laboratory. The 4bV serotype is characterized by the presence of a 6.3 Kb DNA segment normally found in serotype 1/2a, 3a, 1/2c and 3c strains but not in serotype 4b or 1/2b strains. We decided to compare these strains at a genomic level using the JSpecies Tetra tool. Specifically, we compared several 4bV and 4b isolates and identified a high level of similarity between the stone fruit and apple 4bV strains, but not the 4b strains co-identified in the caramel apple outbreak or other 4b or 4bV strains in our collection. This finding was further substantiated by a SNP-based analysis. Additionally, we were able to identify close relatedness between isolates from clinical cases from 1993–1994 and a single case from 2011 as well as links

  12. Whole Genome Sequence Analysis Using JSpecies Tool Establishes Clonal Relationships between Listeria monocytogenes Strains from Epidemiologically Unrelated Listeriosis Outbreaks

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Burall, Laurel S.; Grim, Christopher J.; Mammel, Mark K.

    In an effort to build a comprehensive genomic approach to food safety challenges, the FDA has implemented a whole genome sequencing effort, GenomeTrakr, which involves the sequencing and analysis of genomes of foodborne pathogens. As a part of this effort, we routinely sequence whole genomes of Listeria monocytogenes (Lm) isolates associated with human listeriosis outbreaks, as well as those isolated through other sources. To rapidly establish genetic relatedness of these genomes, we evaluated tetranucleotide frequency analysis via the JSpecies program to provide a cursory analysis of strain relatedness. The JSpecies tetranucleotide (tetra) analysis plots standardized (z-score) tetramer word frequencies ofmore » two strains against each other and uses linear regression analysis to determine similarity (r 2). This tool was able to validate the close relationships between outbreak related strains from four different outbreaks. Included in this study was the analysis of Lm strains isolated during the recent caramel apple outbreak and stone fruit incident in 2014. We identified that many of the isolates from these two outbreaks shared a common 4b variant (4bV) serotype, also designated as IVb-v1, using a qPCR protocol developed in our laboratory. The 4bV serotype is characterized by the presence of a 6.3 Kb DNA segment normally found in serotype 1/2a, 3a, 1/2c and 3c strains but not in serotype 4b or 1/2b strains. We decided to compare these strains at a genomic level using the JSpecies Tetra tool. Specifically, we compared several 4bV and 4b isolates and identified a high level of similarity between the stone fruit and apple 4bV strains, but not the 4b strains co-identified in the caramel apple outbreak or other 4b or 4bV strains in our collection. This finding was further substantiated by a SNP-based analysis. Additionally, we were able to identify close relatedness between isolates from clinical cases from 1993–1994 and a single case from 2011 as well as links

  13. Exome sequencing and SNP analysis detect novel compound heterozygosity in fatty acid hydroxylase-associated neurodegeneration

    PubMed Central

    Pierson, Tyler Mark; Simeonov, Dimitre R; Sincan, Murat; Adams, David A; Markello, Thomas; Golas, Gretchen; Fuentes-Fajardo, Karin; Hansen, Nancy F; Cherukuri, Praveen F; Cruz, Pedro; Blackstone, Craig; Tifft, Cynthia; Boerkoel, Cornelius F; Gahl, William A

    2012-01-01

    Fatty acid hydroxylase-associated neurodegeneration due to fatty acid 2-hydroxylase deficiency presents with a wide range of phenotypes including spastic paraplegia, leukodystrophy, and/or brain iron deposition. All previously described families with this disorder were consanguineous, with homozygous mutations in the probands. We describe a 10-year-old male, from a non-consanguineous family, with progressive spastic paraplegia, dystonia, ataxia, and cognitive decline associated with a sural axonal neuropathy. The use of high-throughput sequencing techniques combined with SNP array analyses revealed a novel paternally derived missense mutation and an overlapping novel maternally derived ∼28-kb genomic deletion in FA2H. This patient provides further insight into the consistent features of this disorder and expands our understanding of its phenotypic presentation. The presence of a sural nerve axonal neuropathy had not been previously associated with this disorder and so may extend the phenotype. PMID:22146942

  14. Long-read sequencing data analysis for yeasts.

    PubMed

    Yue, Jia-Xing; Liti, Gianni

    2018-06-01

    Long-read sequencing technologies have become increasingly popular due to their strengths in resolving complex genomic regions. As a leading model organism with small genome size and great biotechnological importance, the budding yeast Saccharomyces cerevisiae has many isolates currently being sequenced with long reads. However, analyzing long-read sequencing data to produce high-quality genome assembly and annotation remains challenging. Here, we present a modular computational framework named long-read sequencing data analysis for yeasts (LRSDAY), the first one-stop solution that streamlines this process. Starting from the raw sequencing reads, LRSDAY can produce chromosome-level genome assembly and comprehensive genome annotation in a highly automated manner with minimal manual intervention, which is not possible using any alternative tool available to date. The annotated genomic features include centromeres, protein-coding genes, tRNAs, transposable elements (TEs), and telomere-associated elements. Although tailored for S. cerevisiae, we designed LRSDAY to be highly modular and customizable, making it adaptable to virtually any eukaryotic organism. When applying LRSDAY to an S. cerevisiae strain, it takes ∼41 h to generate a complete and well-annotated genome from ∼100× Pacific Biosciences (PacBio) running the basic workflow with four threads. Basic experience working within the Linux command-line environment is recommended for carrying out the analysis using LRSDAY.

  15. High-throughput sequencing: a failure mode analysis.

    PubMed

    Yang, George S; Stott, Jeffery M; Smailus, Duane; Barber, Sarah A; Balasundaram, Miruna; Marra, Marco A; Holt, Robert A

    2005-01-04

    Basic manufacturing principles are becoming increasingly important in high-throughput sequencing facilities where there is a constant drive to increase quality, increase efficiency, and decrease operating costs. While high-throughput centres report failure rates typically on the order of 10%, the causes of sporadic sequencing failures are seldom analyzed in detail and have not, in the past, been formally reported. Here we report the results of a failure mode analysis of our production sequencing facility based on detailed evaluation of 9,216 ESTs generated from two cDNA libraries. Two categories of failures are described; process-related failures (failures due to equipment or sample handling) and template-related failures (failures that are revealed by close inspection of electropherograms and are likely due to properties of the template DNA sequence itself). Preventative action based on a detailed understanding of failure modes is likely to improve the performance of other production sequencing pipelines.

  16. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability

    PubMed Central

    Kirby, Jacqueline C; Speltz, Peter; Rasmussen, Luke V; Basford, Melissa; Gottesman, Omri; Peissig, Peggy L; Pacheco, Jennifer A; Tromp, Gerard; Pathak, Jyotishman; Carrell, David S; Ellis, Stephen B; Lingren, Todd; Thompson, Will K; Savova, Guergana; Haines, Jonathan; Roden, Dan M; Harris, Paul A

    2016-01-01

    Objective Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems. Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites. Results As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%). Discussion These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others. Conclusion By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data. PMID:27026615

  17. Genetic analysis of biodegradation of tetralin by a Sphingomonas strain

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hernaez, M.J.; Santero, E.; Reineke, W.

    Tetralin (1,2,3,4-tetrahydronaphthalene) is produced for industrial purposes from naphthalene by catalytic hydrogenation or from anthracene by cracking. A strain designated TFA which very efficiently utilizes tetralin has been isolated from the Rhine river. The strain has been identified as Sphingomonas macrogoltabidus, based on 16S rDNA sequence similarity. Genetic analysis of tetralin biodegradation has been performed by insertion mutagenesis and by physical analysis and analysis of complementation between the mutants. The genes involved in tetralin utilization are clustered in a region of 9 kb, comprising at least five genes grouped in two divergently transcribed operons.

  18. Genome Sequence of the Freshwater Yangtze Finless Porpoise.

    PubMed

    Yuan, Yuan; Zhang, Peijun; Wang, Kun; Liu, Mingzhong; Li, Jing; Zheng, Jingsong; Wang, Ding; Xu, Wenjie; Lin, Mingli; Dong, Lijun; Zhu, Chenglong; Qiu, Qiang; Li, Songhai

    2018-04-16

    The Yangtze finless porpoise ( Neophocaena asiaeorientalis ssp. asiaeorientalis ) is a subspecies of the narrow-ridged finless porpoise ( N. asiaeorientalis ). In total, 714.28 gigabases (Gb) of raw reads were generated by whole-genome sequencing of the Yangtze finless porpoise, using an Illumina HiSeq 2000 platform. After filtering the low-quality and duplicated reads, we assembled a draft genome of 2.22 Gb, with contig N50 and scaffold N50 values of 46.69 kilobases (kb) and 1.71 megabases (Mb), respectively. We identified 887.63 Mb of repetitive sequences and predicted 18,479 protein-coding genes in the assembled genome. The phylogenetic tree showed a relationship between the Yangtze finless porpoise and the Yangtze River dolphin, which diverged approximately 20.84 million years ago. In comparisons with the genomes of 10 other mammals, we detected 44 species-specific gene families, 164 expanded gene families, and 313 positively selected genes in the Yangtze finless porpoise genome. The assembled genome sequence and underlying sequence data are available at the National Center for Biotechnology Information under BioProject accession number PRJNA433603.

  19. Genome Sequence of the Freshwater Yangtze Finless Porpoise

    PubMed Central

    Yuan, Yuan; Zhang, Peijun; Wang, Kun; Liu, Mingzhong; Li, Jing; Zheng, Jinsong; Wang, Ding; Xu, Wenjie; Lin, Mingli; Dong, Lijun; Zhu, Chenglong; Qiu, Qiang

    2018-01-01

    The Yangtze finless porpoise (Neophocaena asiaeorientalis ssp. asiaeorientalis) is a subspecies of the narrow-ridged finless porpoise (N. asiaeorientalis). In total, 714.28 gigabases (Gb) of raw reads were generated by whole-genome sequencing of the Yangtze finless porpoise, using an Illumina HiSeq 2000 platform. After filtering the low-quality and duplicated reads, we assembled a draft genome of 2.22 Gb, with contig N50 and scaffold N50 values of 46.69 kilobases (kb) and 1.71 megabases (Mb), respectively. We identified 887.63 Mb of repetitive sequences and predicted 18,479 protein-coding genes in the assembled genome. The phylogenetic tree showed a relationship between the Yangtze finless porpoise and the Yangtze River dolphin, which diverged approximately 20.84 million years ago. In comparisons with the genomes of 10 other mammals, we detected 44 species-specific gene families, 164 expanded gene families, and 313 positively selected genes in the Yangtze finless porpoise genome. The assembled genome sequence and underlying sequence data are available at the National Center for Biotechnology Information under BioProject accession number PRJNA433603. PMID:29659530

  20. Categorizing accident sequences in the external radiotherapy for risk analysis

    PubMed Central

    2013-01-01

    Purpose This study identifies accident sequences from the past accidents in order to help the risk analysis application to the external radiotherapy. Materials and Methods This study reviews 59 accidental cases in two retrospective safety analyses that have collected the incidents in the external radiotherapy extensively. Two accident analysis reports that accumulated past incidents are investigated to identify accident sequences including initiating events, failure of safety measures, and consequences. This study classifies the accidents by the treatments stages and sources of errors for initiating events, types of failures in the safety measures, and types of undesirable consequences and the number of affected patients. Then, the accident sequences are grouped into several categories on the basis of similarity of progression. As a result, these cases can be categorized into 14 groups of accident sequence. Results The result indicates that risk analysis needs to pay attention to not only the planning stage, but also the calibration stage that is committed prior to the main treatment process. It also shows that human error is the largest contributor to initiating events as well as to the failure of safety measures. This study also illustrates an event tree analysis for an accident sequence initiated in the calibration. Conclusion This study is expected to provide sights into the accident sequences for the prospective risk analysis through the review of experiences. PMID:23865005

  1. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

    2004-05-11

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  2. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  3. DSAP: deep-sequencing small RNA analysis pipeline.

    PubMed

    Huang, Po-Jung; Liu, Yi-Chung; Lee, Chi-Ching; Lin, Wei-Chen; Gan, Richie Ruei-Chi; Lyu, Ping-Chiang; Tang, Petrus

    2010-07-01

    DSAP is an automated multiple-task web service designed to provide a total solution to analyzing deep-sequencing small RNA datasets generated by next-generation sequencing technology. DSAP uses a tab-delimited file as an input format, which holds the unique sequence reads (tags) and their corresponding number of copies generated by the Solexa sequencing platform. The input data will go through four analysis steps in DSAP: (i) cleanup: removal of adaptors and poly-A/T/C/G/N nucleotides; (ii) clustering: grouping of cleaned sequence tags into unique sequence clusters; (iii) non-coding RNA (ncRNA) matching: sequence homology mapping against a transcribed sequence library from the ncRNA database Rfam (http://rfam.sanger.ac.uk/); and (iv) known miRNA matching: detection of known miRNAs in miRBase (http://www.mirbase.org/) based on sequence homology. The expression levels corresponding to matched ncRNAs and miRNAs are summarized in multi-color clickable bar charts linked to external databases. DSAP is also capable of displaying miRNA expression levels from different jobs using a log(2)-scaled color matrix. Furthermore, a cross-species comparative function is also provided to show the distribution of identified miRNAs in different species as deposited in miRBase. DSAP is available at http://dsap.cgu.edu.tw.

  4. Comparative genomic analysis of the false killer whale (Pseudorca crassidens) LMBR1 locus.

    PubMed

    Kim, Dae-Won; Choi, Sang-Haeng; Kim, Ryong Nam; Kim, Sun-Hong; Paik, Sang-Gi; Nam, Seong-Hyeuk; Kim, Dong-Wook; Kim, Aeri; Kang, Aram; Park, Hong-Seog

    2010-09-01

    The sequencing and comparative genomic analysis of LMBR1 loci in mammals or other species, including human, would be very important in understanding evolutionary genetic changes underlying the evolution of limb development. In this regard, comparative genomic annotation of the false killer whale LMBR1 locus could shed new light on the evolution of limb development. We sequenced two false killer whale BAC clones, corresponding to 156 kb and 144 kb, respectively, harboring the tightly linked RNF32, LMBR1, and NOM1 genes. Our annotation of the false killer whale LMBR1 gene showed that it consists of 17 exons (1473 bp), in contrast to 18 exons (1596 bp) in human, and it displays 93.1% and 95.6% nucleotide and amino acid sequence similarity, respectively, compared with the human gene. In particular, we discovered that exon 10, deleted in the false killer whale LMBR1 gene, is present only in primates, and this fact strongly implies that exon 10 might be crucial in determining primate-specific limb development. ZRS and TFBS sequences have been well conserved across 11 species, suggesting that these regions could be involved in an important function of limb development and limb patterning. The neighboring gene RNF32 showed several lineage-conserved exons, such as exons 2 through 9 conserved in eutherian mammals, exons 3 through 9 conserved in mammals, and exons 5 through 9 conserved in vertebrates. The other neighboring gene, NOM1, had undergone a substitution (ATG→GTA) at the start codon, giving rise to a 36 bp shorter N-terminal sequence compared with the human sequence. Our comparative analysis of the false killer whale LMBR1 genomic locus provides important clues regarding the genetic regions that may play crucial roles in limb development and patterning.

  5. Sequence information gain based motif analysis.

    PubMed

    Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre

    2015-11-09

    The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.

  6. ReadXplorer—visualization and analysis of mapped sequences

    PubMed Central

    Hilker, Rolf; Stadermann, Kai Bernd; Doppmeier, Daniel; Kalinowski, Jörn; Stoye, Jens; Straube, Jasmin; Winnebald, Jörn; Goesmann, Alexander

    2014-01-01

    Motivation: Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. Results: ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion–insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. Availability and implementation: ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual. Contact: rhilker@mikrobio.med.uni-giessen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24790157

  7. Berberine induces FasL-related apoptosis through p38 activation in KB human oral cancer cells

    PubMed Central

    KIM, JAE-SUNG; OH, DAHYE; YIM, MIN-JI; PARK, JIN-JU; KANG, KYEONG-ROK; CHO, IN-A; MOON, SUNG-MIN; OH, JI-SU; YOU, JAE-SEEK; KIM, CHUN SUNG; KIM, DO KYUNG; LEE, SOOK-YOUNG; LEE, GYEONG-JE; IM, HEE-JEONG; KIM, SU-GWAN

    2015-01-01

    In the present study, we examined the anticancer properties of berberine in KB oral cancer cells with a specific focus on its cellular mechanism. Berberine did not affect the cell viability of the primary human normal oral keratinocytes that were used as a control. However, the viability of KB cells was found to decrease significantly in the presence of berberine in a dose-dependent manner. Furthermore, in KB cells, berberine induced the fragmentation of genomic DNA, changes in cell morphology, and nuclear condensation. In addition, caspase-3 and -7 activation, and an increase in apoptosis were observed. Berberine was also found to upregulate significantly the expression of the death receptor ligand, FasL. In turn, this upregulation triggered the activation of pro-apoptotic factors such as caspase-8, -9 and -3 and poly(ADP-ribose) polymerase (PARP). Furthermore, pro-apoptotic factors such as Bax, Bad and Apaf-1 were also significantly upregulated by berberine. Anti-apoptotic factors such as Bcl-2 and Bcl-xL were downregulated. Z-VAD-FMK, a cell-permeable pan-caspase inhibitor, suppressed the activation of caspase-3 and PARP. These results clearly indicate that berberine-induced cell death of KB oral cancer cells was mediated by both extrinsic death receptor-dependent and intrinsic mitochondrial-dependent apoptotic signaling pathways. In addition, berberine-induced upregulation of FasL was shown to be mediated by the p38 MAPK signaling pathway. We also found that berberine-induced migration suppression was mediated by downregulation of MMP-2 and MMP-9 through phosphorylation of p38 MAPK. In summary, berberine has the potential to be used as a chemotherapeutic agent, with limited side-effects, for the management of oral cancer. PMID:25634589

  8. Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods.

    PubMed

    Dal Molin, Alessandra; Baruzzo, Giacomo; Di Camillo, Barbara

    2017-01-01

    The sequencing of the transcriptomes of single-cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types and for the study of stochastic gene expression. In recent years, various tools for analyzing single-cell RNA-sequencing data have been proposed, many of them with the purpose of performing differentially expression analysis. In this work, we compare four different tools for single-cell RNA-sequencing differential expression, together with two popular methods originally developed for the analysis of bulk RNA-sequencing data, but largely applied to single-cell data. We discuss results obtained on two real and one synthetic dataset, along with considerations about the perspectives of single-cell differential expression analysis. In particular, we explore the methods performance in four different scenarios, mimicking different unimodal or bimodal distributions of the data, as characteristic of single-cell transcriptomics. We observed marked differences between the selected methods in terms of precision and recall, the number of detected differentially expressed genes and the overall performance. Globally, the results obtained in our study suggest that is difficult to identify a best performing tool and that efforts are needed to improve the methodologies for single-cell RNA-sequencing data analysis and gain better accuracy of results.

  9. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, M.S.

    1998-08-18

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device. 27 figs.

  10. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2003-08-19

    A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.

  11. CAFE: aCcelerated Alignment-FrEe sequence analysis.

    PubMed

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

    2017-07-03

    Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Complete genome sequence of Rhodothermus marinus type strain (R-10).

    PubMed

    Nolan, Matt; Tindall, Brian J; Pomrenke, Helga; Lapidus, Alla; Copeland, Alex; Glavina Del Rio, Tijana; Lucas, Susan; Chen, Feng; Tice, Hope; Cheng, Jan-Fang; Saunders, Elizabeth; Han, Cliff; Bruce, David; Goodwin, Lynne; Chain, Patrick; Pitluck, Sam; Ovchinikova, Galina; Pati, Amrita; Ivanova, Natalia; Mavromatis, Konstantinos; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D; Brettin, Thomas; Göker, Markus; Bristow, James; Eisen, Jonathan A; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C; Klenk, Hans-Peter; Detter, John C

    2009-12-29

    Rhodothermus marinus Alfredsson et al. 1995 is the type species of the genus and is of phylogenetic interest because the Rhodothermaceae represent the deepest lineage in the phylum Bacteroidetes. R. marinus R-10(T) is a Gram-negative, non-motile, non-spore-forming bacterium isolated from marine hot springs off the coast of Iceland. Strain R-10(T) is strictly aerobic and requires slightly halophilic conditions for growth. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first complete genome sequence of the genus Rhodothermus, and only the second sequence from members of the family Rhodothermaceae. The 3,386,737 bp genome (including a 125 kb plasmid) with its 2914 protein-coding and 48 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.

  13. A Novel Partial Sequence Alignment Tool for Finding Large Deletions

    PubMed Central

    Aruk, Taner; Ustek, Duran; Kursun, Olcay

    2012-01-01

    Finding large deletions in genome sequences has become increasingly more useful in bioinformatics, such as in clinical research and diagnosis. Although there are a number of publically available next generation sequencing mapping and sequence alignment programs, these software packages do not correctly align fragments containing deletions larger than one kb. We present a fast alignment software package, BinaryPartialAlign, that can be used by wet lab scientists to find long structural variations in their experiments. For BinaryPartialAlign, we make use of the Smith-Waterman (SW) algorithm with a binary-search-based approach for alignment with large gaps that we called partial alignment. BinaryPartialAlign implementation is compared with other straight-forward applications of SW. Simulation results on mtDNA fragments demonstrate the effectiveness (runtime and accuracy) of the proposed method. PMID:22566777

  14. The mitochondrial genome sequences of the round goby and the sand goby reveal patterns of recent evolution in gobiid fish.

    PubMed

    Adrian-Kalchhauser, Irene; Svensson, Ola; Kutschera, Verena E; Alm Rosenblad, Magnus; Pippel, Martin; Winkler, Sylke; Schloissnig, Siegfried; Blomberg, Anders; Burkhardt-Holm, Patricia

    2017-02-16

    Vertebrate mitochondrial genomes are optimized for fast replication and low cost of RNA expression. Accordingly, they are devoid of introns, are transcribed as polycistrons and contain very little intergenic sequences. Usually, vertebrate mitochondrial genomes measure between 16.5 and 17 kilobases (kb). During genome sequencing projects for two novel vertebrate models, the invasive round goby and the sand goby, we found that the sand goby genome is exceptionally small (16.4 kb), while the mitochondrial genome of the round goby is much larger than expected for a vertebrate. It is 19 kb in size and is thus one of the largest fish and even vertebrate mitochondrial genomes known to date. The expansion is attributable to a sequence insertion downstream of the putative transcriptional start site. This insertion carries traces of repeats from the control region, but is mostly novel. To get more information about this phenomenon, we gathered all available mitochondrial genomes of Gobiidae and of nine gobioid species, performed phylogenetic analyses, analysed gene arrangements, and compared gobiid mitochondrial genome sizes, ecological information and other species characteristics with respect to the mitochondrial phylogeny. This allowed us amongst others to identify a unique arrangement of tRNAs among Ponto-Caspian gobies. Our results indicate that the round goby mitochondrial genome may contain novel features. Since mitochondrial genome organisation is tightly linked to energy metabolism, these features may be linked to its invasion success. Also, the unique tRNA arrangement among Ponto-Caspian gobies may be helpful in studying the evolution of this highly adaptive and invasive species group. Finally, we find that the phylogeny of gobiids can be further refined by the use of longer stretches of linked DNA sequence.

  15. AFEAP cloning: a precise and efficient method for large DNA sequence assembly.

    PubMed

    Zeng, Fanli; Zang, Jinping; Zhang, Suhua; Hao, Zhimin; Dong, Jingao; Lin, Yibin

    2017-11-14

    Recent development of DNA assembly technologies has spurred myriad advances in synthetic biology, but new tools are always required for complicated scenarios. Here, we have developed an alternative DNA assembly method named AFEAP cloning (Assembly of Fragment Ends After PCR), which allows scarless, modular, and reliable construction of biological pathways and circuits from basic genetic parts. The AFEAP method requires two-round of PCRs followed by ligation of the sticky ends of DNA fragments. The first PCR yields linear DNA fragments and is followed by a second asymmetric (one primer) PCR and subsequent annealing that inserts overlapping overhangs at both sides of each DNA fragment. The overlapping overhangs of the neighboring DNA fragments annealed and the nick was sealed by T4 DNA ligase, followed by bacterial transformation to yield the desired plasmids. We characterized the capability and limitations of new developed AFEAP cloning and demonstrated its application to assemble DNA with varying scenarios. Under the optimized conditions, AFEAP cloning allows assembly of an 8 kb plasmid from 1-13 fragments with high accuracy (between 80 and 100%), and 8.0, 11.6, 19.6, 28, and 35.6 kb plasmids from five fragments at 91.67, 91.67, 88.33, 86.33, and 81.67% fidelity, respectively. AFEAP cloning also is capable to construct bacterial artificial chromosome (BAC, 200 kb) with a fidelity of 46.7%. AFEAP cloning provides a powerful, efficient, seamless, and sequence-independent DNA assembly tool for multiple fragments up to 13 and large DNA up to 200 kb that expands synthetic biologist's toolbox.

  16. Spectroscopic studies of the binding of Cu(II) complexes of oxicam NSAIDs to alternating G-C and homopolymeric G-C sequences

    NASA Astrophysics Data System (ADS)

    Chakraborty, Sreeja; Bose, Madhuparna; Sarkar, Munna

    2014-03-01

    Drugs belonging to the Non-steroidal anti-inflammatory (NSAID) group are not only used as anti-inflammatory, analgesic and anti-pyretic agents, but also show anti-cancer effects. Complexing them with a bioactive metal like copper, show an enhancement in their anti-cancer effects compared to the bare drugs, whose exact mechanism of action is not yet fully understood. For the first time, it was shown by our group that Cu(II)-NSAIDs can directly bind to the DNA backbone. The ability of the copper complexes of NSAIDs namely meloxicam and piroxicam to bind to the DNA backbone could be a possible molecular mechanism behind their enhanced anticancer effects. Elucidating base sequence specific interaction of Cu(II)-NSAIDs to the DNA will provide information on their possible binding sites in the genome sequence. In this work, we present how these complexes respond to differences in structure and hydration pattern of GC rich sequences. For this, binding studies of Cu(II) complexes of piroxicam [Cu(II)-(Px)2 (L)2] and meloxicam [Cu(II)-(Mx)2 (L)] with alternating GC (polydG-dC) and homopolymeric GC (polydG-polydC) sequences were carried out using a combination of spectroscopic techniques that include UV-Vis absorption, fluorescence and circular dichroism (CD) spectroscopy. The Cu(II)-NSAIDs show strong binding affinity to both polydG-dC and polydG-polydC. The role reversal of Cu(II)-meloxicam from a strong binder of polydG-dC (Kb = 11.5 × 103 M-1) to a weak binder of polydG-polydC (Kb = 5.02 × 103 M-1), while Cu(II)-piroxicam changes from a strong binder of polydG-polydC (Kb = 8.18 × 103 M-1) to a weak one of polydG-dC (Kb = 2.18 × 103 M-1), point to the sensitivity of these complexes to changes in the backbone structures/hydration. Changes in the profiles of UV absorption band and CD difference spectra, upon complex binding to polynucleotides and the results of competitive binding assay using ethidium bromide (EtBr) fluorescence indicate different binding modes in each

  17. galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

    PubMed

    Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

    2004-06-12

    The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se

  18. Initial sequencing and comparative analysis of the mouse genome

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan

    2002-12-15

    The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of themore » genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.« less

  19. Determination of the melon chloroplast and mitochondrial genome sequences reveals that the largest reported mitochondrial genome in plants contains a significant amount of DNA having a nuclear origin

    PubMed Central

    2011-01-01

    Background The melon belongs to the Cucurbitaceae family, whose economic importance among vegetable crops is second only to Solanaceae. The melon has a small genome size (454 Mb), which makes it suitable for molecular and genetic studies. Despite similar nuclear and chloroplast genome sizes, cucurbits show great variation when their mitochondrial genomes are compared. The melon possesses the largest plant mitochondrial genome, as much as eight times larger than that of other cucurbits. Results The nucleotide sequences of the melon chloroplast and mitochondrial genomes were determined. The chloroplast genome (156,017 bp) included 132 genes, with 98 single-copy genes dispersed between the small (SSC) and large (LSC) single-copy regions and 17 duplicated genes in the inverted repeat regions (IRa and IRb). A comparison of the cucumber and melon chloroplast genomes showed differences in only approximately 5% of nucleotides, mainly due to short indels and SNPs. Additionally, 2.74 Mb of mitochondrial sequence, accounting for 95% of the estimated mitochondrial genome size, were assembled into five scaffolds and four additional unscaffolded contigs. An 84% of the mitochondrial genome is contained in a single scaffold. The gene-coding region accounted for 1.7% (45,926 bp) of the total sequence, including 51 protein-coding genes, 4 conserved ORFs, 3 rRNA genes and 24 tRNA genes. Despite the differences observed in the mitochondrial genome sizes of cucurbit species, Citrullus lanatus (379 kb), Cucurbita pepo (983 kb) and Cucumis melo (2,740 kb) share 120 kb of sequence, including the predicted protein-coding regions. Nevertheless, melon contained a high number of repetitive sequences and a high content of DNA of nuclear origin, which represented 42% and 47% of the total sequence, respectively. Conclusions Whereas the size and gene organisation of chloroplast genomes are similar among the cucurbit species, mitochondrial genomes show a wide variety of sizes, with a non

  20. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    1999-10-26

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  1. Computer-aided visualization and analysis system for sequence evaluation

    DOEpatents

    Chee, Mark S.

    2001-06-05

    A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).

  2. A DNA sequence analysis package for the IBM personal computer.

    PubMed Central

    Lagrimini, L M; Brentano, S T; Donelson, J E

    1984-01-01

    We present here a collection of DNA sequence analysis programs, called "PC Sequence" (PCS), which are designed to run on the IBM Personal Computer (PC). These programs are written in IBM PC compiled BASIC and take full advantage of the IBM PC's speed, error handling, and graphics capabilities. For a modest initial expense in hardware any laboratory can use these programs to quickly perform computer analysis on DNA sequences. They are written with the novice user in mind and require very little training or previous experience with computers. Also provided are a text editing program for creating and modifying DNA sequence files and a communications program which enables the PC to communicate with and collect information from mainframe computers and DNA sequence databases. PMID:6546433

  3. Novel 5.712 kb mitochondrial DNA deletion in a patient with Pearson syndrome: a case report.

    PubMed

    Park, Joonhong; Ryu, Hyejin; Jang, Woori; Chae, Hyojin; Kim, Myungshin; Kim, Yonggoo; Kim, Jiyeon; Lee, Jae Wook; Chung, Nack-Gyun; Cho, Bin; Suh, Byung Kyu

    2015-05-01

    Pearson marrow‑pancreas syndrome (PS) is a progressive multi‑organ disorder caused by deletions and duplications of mitochondrial DNA (mtDNA). PS is often fatal in infancy, and the majority of patients with PS succumb to the disease before reaching three‑years‑of‑age, due to septicemia, metabolic acidosis or hepatocellular insufficiency. The present report describes the case of a four‑month‑old infant with severe normocytic normochromic anemia, vacuolization of hematopoietic precursors and metabolic acidosis. After extensive clinical investigation, the patient was diagnosed with PS, which was confirmed by molecular analysis of mtDNA. The molecular analysis detected a novel large‑scale (5.712 kb) deletion spanning nucleotides 8,011 to 13,722 of mtDNA, which lacked direct repeats at the deletion boundaries. The present report is, to the best of our knowledge, the first case reported in South Korea.

  4. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability.

    PubMed

    Kirby, Jacqueline C; Speltz, Peter; Rasmussen, Luke V; Basford, Melissa; Gottesman, Omri; Peissig, Peggy L; Pacheco, Jennifer A; Tromp, Gerard; Pathak, Jyotishman; Carrell, David S; Ellis, Stephen B; Lingren, Todd; Thompson, Will K; Savova, Guergana; Haines, Jonathan; Roden, Dan M; Harris, Paul A; Denny, Joshua C

    2016-11-01

    Health care generated data have become an important source for clinical and genomic research. Often, investigators create and iteratively refine phenotype algorithms to achieve high positive predictive values (PPVs) or sensitivity, thereby identifying valid cases and controls. These algorithms achieve the greatest utility when validated and shared by multiple health care systems.Materials and Methods We report the current status and impact of the Phenotype KnowledgeBase (PheKB, http://phekb.org), an online environment supporting the workflow of building, sharing, and validating electronic phenotype algorithms. We analyze the most frequent components used in algorithms and their performance at authoring institutions and secondary implementation sites. As of June 2015, PheKB contained 30 finalized phenotype algorithms and 62 algorithms in development spanning a range of traits and diseases. Phenotypes have had over 3500 unique views in a 6-month period and have been reused by other institutions. International Classification of Disease codes were the most frequently used component, followed by medications and natural language processing. Among algorithms with published performance data, the median PPV was nearly identical when evaluated at the authoring institutions (n = 44; case 96.0%, control 100%) compared to implementation sites (n = 40; case 97.5%, control 100%). These results demonstrate that a broad range of algorithms to mine electronic health record data from different health systems can be developed with high PPV, and algorithms developed at one site are generally transportable to others. By providing a central repository, PheKB enables improved development, transportability, and validity of algorithms for research-grade phenotypes using health care generated data. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  5. Whole-Genome Sequence of Coxiella burnetii Nine Mile RSA439 (Phase II, Clone 4), a Laboratory Workhorse Strain

    PubMed Central

    Beare, Paul A.; Moses, Abraham S.; Martens, Craig A.; Heinzen, Robert A.

    2017-01-01

    ABSTRACT Here, we report the whole-genome sequence of Coxiella burnetii Nine Mile RSA439 (phase II, clone 4), a laboratory strain used extensively to investigate the biology of this intracellular bacterial pathogen. The genome consists of a 1.97-Mb chromosome and a 37.32-kb plasmid. PMID:28596399

  6. Development of a good-quality speech coder for transmission over noisy channels at 2.4 kb/s

    NASA Astrophysics Data System (ADS)

    Viswanathan, V. R.; Berouti, M.; Higgins, A.; Russell, W.

    1982-03-01

    This report describes the development, study, and experimental results of a 2.4 kb/s speech coder called harmonic deviations (HDV) vocoder, which transmits good-quality speech over noisy channels with bit-error rates of up to 1%. The HDV coder is based on the linear predictive coding (LPC) vocoder, and it transmits additional information over and above the data transmitted by the LPC vocoder, in the form of deviations between the speech spectrum and the LPC all-pole model spectrum at a selected set of frequencies. At the receiver, the spectral deviations are used to generate the excitation signal for the all-pole synthesis filter. The report describes and compares several methods for extracting the spectral deviations from the speech signal and for encoding them. To limit the bit-rate of the HDV coder to 2.4 kb/s the report discusses several methods including orthogonal transformation and minimum-mean-square-error scalar quantization of log area ratios, two-stage vector-scalar quantization, and variable frame rate transmission. The report also presents the results of speech-quality optimization of the HDV coder at 2.4 kb/s.

  7. Complete genome sequence of a new bipartite begomovirus infecting fluted pumpkin (Telfairia occidentalis) plants in Cameroon.

    PubMed

    Leke, Walter N; Khatabi, Behnam; Fondong, Vincent N; Brown, Judith K

    2016-08-01

    The complete genome sequence was determined and characterized for a previously unreported bipartite begomovirus from fluted pumpkin (Telfairia occidentalis, family Cucurbitaceae) plants displaying mosaic symptoms in Cameroon. The DNA-A and DNA-B components were ~2.7 kb and ~2.6 kb in size, and the arrangement of viral coding regions on the genomic components was like those characteristic of other known bipartite begomoviruses originating in the Old World. While the DNA-A component was more closely related to that of chayote yellow mosaic virus (ChaYMV), at 78 %, the DNA-B component was more closely related to that of soybean chlorotic blotch virus (SbCBV), at 64 %. This newly discovered bipartite Old World virus is herein named telfairia mosaic virus (TelMV).

  8. Mitochondrial genome sequencing helps show the evolutionary mechanism of mitochondrial genome formation in Brassica

    PubMed Central

    2011-01-01

    Background Angiosperm mitochondrial genomes are more complex than those of other organisms. Analyses of the mitochondrial genome sequences of at least 11 angiosperm species have showed several common properties; these cannot easily explain, however, how the diverse mitotypes evolved within each genus or species. We analyzed the evolutionary relationships of Brassica mitotypes by sequencing. Results We sequenced the mitotypes of cam (Brassica rapa), ole (B. oleracea), jun (B. juncea), and car (B. carinata) and analyzed them together with two previously sequenced mitotypes of B. napus (pol and nap). The sizes of whole single circular genomes of cam, jun, ole, and car are 219,747 bp, 219,766 bp, 360,271 bp, and 232,241 bp, respectively. The mitochondrial genome of ole is largest as a resulting of the duplication of a 141.8 kb segment. The jun mitotype is the result of an inherited cam mitotype, and pol is also derived from the cam mitotype with evolutionary modifications. Genes with known functions are conserved in all mitotypes, but clear variation in open reading frames (ORFs) with unknown functions among the six mitotypes was observed. Sequence relationship analysis showed that there has been genome compaction and inheritance in the course of Brassica mitotype evolution. Conclusions We have sequenced four Brassica mitotypes, compared six Brassica mitotypes and suggested a mechanism for mitochondrial genome formation in Brassica, including evolutionary events such as inheritance, duplication, rearrangement, genome compaction, and mutation. PMID:21988783

  9. Melting relations in the Fe-rich portion of the system FeFeS at 30 kb pressure

    USGS Publications Warehouse

    Brett, R.; Bell, P.M.

    1969-01-01

    The melting relations of FeFeS mixtures covering the composition range from Fe to Fe67S33 have been determined at 30 kb pressure. The phase relations are similar to those at low pressure. The eutectic has a composition of Fe72.9S27.1 and a temperature of 990??C. Solubility of S in Fe at elevated temperatures at 30 kb is of the same order of magnitude as at low pressure. Sulfur may have significantly lowered the melting point of iron in the upper mantle during the period of coalescence of metal prior to core formation in the primitive earth. ?? 1969.

  10. New energy transfer dyes for DNA sequencing.

    PubMed Central

    Lee, L G; Spurgeon, S L; Heiner, C R; Benson, S C; Rosenblum, B B; Menchen, S M; Graham, R J; Constantinescu, A; Upadhya, K G; Cassel, J M

    1997-01-01

    We have synthesized a set of four energy transfer dyes and demonstrated their use in automated DNA sequencing. The donor dyes are the 5- or 6-carboxy isomers of 4'-aminomethylfluorescein and the acceptor dyes are a novel set of four 4,7-dichloro-substituted rhodamine dyes which have narrower emission spectra than the standard, unsubstituted rhodamines. A rigid amino acid linker, 4-aminomethylbenzoic acid, was used to separate the dyes. The brightness of each dye in an automated sequencing instrument equipped with a dual line argon ion laser (488 and 514 nm excitation) was 2-2.5 times greater than the standard dye-primers with a 2 times reduction in multicomponent noise. The overall improvement in signal-to-noise was 4- to 5-fold. The utility of the new dye set was demonstrated by sequencing of a BAC DNA with an 80 kb insert. Measurement of the extinction coefficients and the relative quantum yields of the dichlororhodamine components of the energy transfer dyes showed their values were reduced by 20-25% compared with the dichlororhodamine dyes alone. PMID:9207029

  11. Analysis of xylem formation in pine by cDNA sequencing

    NASA Technical Reports Server (NTRS)

    Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.; hide

    1998-01-01

    Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

  12. Complete Sequencing of pNDM-HK Encoding NDM-1 Carbapenemase from a Multidrug-Resistant Escherichia coli Strain Isolated in Hong Kong

    PubMed Central

    Ho, Pak Leung; Lo, Wai U.; Yeung, Man Kiu; Lin, Chi Ho; Chow, Kin Hung; Ang, Irene; Tong, Amy Hin Yan; Bao, Jessie Yun-Juan; Lok, Si; Lo, Janice Yee Chi

    2011-01-01

    Background The emergence of plasmid-mediated carbapenemases, such as NDM-1 in Enterobacteriaceae is a major public health issue. Since they mediate resistance to virtually all β-lactam antibiotics and there is often co-resistance to other antibiotic classes, the therapeutic options for infections caused by these organisms are very limited. Methodology We characterized the first NDM-1 producing E. coli isolate recovered in Hong Kong. The plasmid encoding the metallo-β-lactamase gene was sequenced. Principal Findings The plasmid, pNDM-HK readily transferred to E. coli J53 at high frequencies. It belongs to the broad host range IncL/M incompatibility group and is 88803 bp in size. Sequence alignment showed that pNDM-HK has a 55 kb backbone which shared 97% homology with pEL60 originating from the plant pathogen, Erwina amylovora in Lebanon and a 28.9 kb variable region. The plasmid backbone includes the mucAB genes mediating ultraviolet light resistance. The 28.9 kb region has a composite transposon-like structure which includes intact or truncated genes associated with resistance to β-lactams (bla TEM-1, bla NDM-1, Δbla DHA-1), aminoglycosides (aacC2, armA), sulphonamides (sul1) and macrolides (mel, mph2). It also harbors the following mobile elements: IS26, ISCR1, tnpU, tnpAcp2, tnpD, ΔtnpATn1 and insL. Certain blocks within the 28.9 kb variable region had homology with the corresponding sequences in the widely disseminated plasmids, pCTX-M3, pMUR050 and pKP048 originating from bacteria in Poland in 1996, in Spain in 2002 and in China in 2006, respectively. Significance The genetic support of NDM-1 gene suggests that it has evolved through complex pathways. The association with broad host range plasmid and multiple mobile genetic elements explain its observed horizontal mobility in multiple bacterial taxa. PMID:21445317

  13. Multilevel analysis of sports video sequences

    NASA Astrophysics Data System (ADS)

    Han, Jungong; Farin, Dirk; de With, Peter H. N.

    2006-01-01

    We propose a fully automatic and flexible framework for analysis and summarization of tennis broadcast video sequences, using visual features and specific game-context knowledge. Our framework can analyze a tennis video sequence at three levels, which provides a broad range of different analysis results. The proposed framework includes novel pixel-level and object-level tennis video processing algorithms, such as a moving-player detection taking both the color and the court (playing-field) information into account, and a player-position tracking algorithm based on a 3-D camera model. Additionally, we employ scene-level models for detecting events, like service, base-line rally and net-approach, based on a number real-world visual features. The system can summarize three forms of information: (1) all court-view playing frames in a game, (2) the moving trajectory and real-speed of each player, as well as relative position between the player and the court, (3) the semantic event segments in a game. The proposed framework is flexible in choosing the level of analysis that is desired. It is effective because the framework makes use of several visual cues obtained from the real-world domain to model important events like service, thereby increasing the accuracy of the scene-level analysis. The paper presents attractive experimental results highlighting the system efficiency and analysis capabilities.

  14. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    PubMed Central

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  15. Genome sequence of the Japanese oak silk moth, Antheraea yamamai: the first draft genome in the family Saturniidae

    PubMed Central

    Kim, Seong-Ryul; Kwak, Woori; Kim, Hyaekang; Kim, Kee-Young; Kim, Su-Bae; Choi, Kwang-Ho; Kim, Seong-Wan; Hwang, Jae-Sam; Kim, Minjee; Kim, Iksoo; Goo, Tae-Won

    2018-01-01

    Abstract Background Antheraea yamamai, also known as the Japanese oak silk moth, is a wild species of silk moth. Silk produced by A. yamamai, referred to as tensan silk, shows different characteristics such as thickness, compressive elasticity, and chemical resistance compared with common silk produced from the domesticated silkworm, Bombyx mori. Its unique characteristics have led to its use in many research fields including biotechnology and medical science, and the scientific as well as economic importance of the wild silk moth continues to gradually increase. However, no genomic information for the wild silk moth, including A. yamamai, is currently available. Findings In order to construct the A. yamamai genome, a total of 147G base pairs using Illumina and Pacbio sequencing platforms were generated, providing 210-fold coverage based on the 700-Mb estimated genome size of A. yamamai. The assembled genome of A. yamamai was 656 Mb (>2 kb) with 3675 scaffolds, and the N50 length of assembly was 739 Kb with a 34.07% GC ratio. Identified repeat elements covered 37.33% of the total genome, and the completeness of the constructed genome assembly was estimated to be 96.7% by Benchmarking Universal Single-Copy Orthologs v2 analysis. A total of 15 481 genes were identified using Evidence Modeler based on the gene prediction results obtained from 3 different methods (ab initio, RNA-seq-based, known-gene-based) and manual curation. Conclusions Here we present the genome sequence of A. yamamai, the first genome sequence of the wild silk moth. These results provide valuable genomic information, which will help enrich our understanding of the molecular mechanisms relating to not only specific phenotypes such as wild silk itself but also the genomic evolution of Saturniidae. PMID:29186418

  16. Comparative sequence analysis of the potato cyst nematode resistance locus H1 reveals a major lack of co-linearity between three haplotypes in potato (Solanum tuberosum ssp.)

    PubMed Central

    Bakker, Erin; de Boer, Jan; van der Vossen, Edwin; Achenbach, Ute; Golas, Tomasz; Suryaningrat, Suwardi; Smant, Geert; Bakker, Jaap; Goverse, Aska

    2010-01-01

    The H1 locus confers resistance to the potato cyst nematode Globodera rostochiensis pathotypes 1 and 4. It is positioned at the distal end of chromosome V of the diploid Solanum tuberosum genotype SH83-92-488 (SH) on an introgression segment derived from S. tuberosum ssp. andigena. Markers from a high-resolution genetic map of the H1 locus (Bakker et al. in Theor Appl Genet 109:146–152, 2004) were used to screen a BAC library to construct a physical map covering a 341-kb region of the resistant haplotype coming from SH. For comparison, physical maps were also generated of the two haplotypes from the diploid susceptible genotype RH89-039-16 (S. tuberosum ssp. tuberosum/S. phureja), spanning syntenic regions of 700 and 319 kb. Gene predictions on the genomic segments resulted in the identification of a large cluster consisting of variable numbers of the CC-NB-LRR type of R genes for each haplotype. Furthermore, the regions were interspersed with numerous transposable elements and genes coding for an extensin-like protein and an amino acid transporter. Comparative analysis revealed a major lack of gene order conservation in the sequences of the three closely related haplotypes. Our data provide insight in the evolutionary mechanisms shaping the H1 locus and will facilitate the map-based cloning of the H1 resistance gene. Electronic supplementary material The online version of this article (doi:10.1007/s00122-010-1472-9) contains supplementary material, which is available to authorized users. PMID:21049265

  17. Isolation and sequence analysis of the Pseudomonas syringae pv. tomato gene encoding a 2,3-diphosphoglycerate-independent phosphoglyceromutase.

    PubMed

    Morris, V L; Jackson, D P; Grattan, M; Ainsworth, T; Cuppels, D A

    1995-04-01

    Pseudomonas syringae pv. tomato DC3481, a Tn5-induced mutant of the tomato pathogen DC3000, cannot grow and elicit disease symptoms on tomato seedlings. It also cannot grow on minimal medium containing malate, citrate, or succinate, three of the major organic acids found in tomatoes. We report here that this mutant also cannot use, as a sole carbon and/or energy source, a wide variety of hexoses and intermediates of hexose catabolism. Uptake studies have shown that DC3481 is not deficient in transport. A 3.8-kb EcoRI fragment of DC3000 DNA, which complements the Tn5 mutation, has been cloned and sequenced. The deduced amino acid sequences of two of the three open reading frames (ORFs) present on this fragment, ORF2 and ORF3, had no significant homology with sequences in the GenBank databases. However, the 510-amino-acid sequence of ORF1, the site of the Tn5 insertion, strongly resembled the deduced amino acid sequences of the Bacillus subtilis and Zea mays genes encoding 2,3-diphosphoglycerate (DPG)-independent phosphoglyceromutase (PGM) (52% identity and 72% similarity and 37% identity and 57% similarity, respectively). PGMs not requiring the cofactor DPG are usually found in plants and algae. Enzyme assays confirmed that P. syringae PGM activity required an intact ORF1. Not only is DC3481 the first PGM-deficient pseudomonad mutant to be described, but the P. syringae pgm gene is the first gram-negative bacterial gene identified that appears to code for a DPG-independent PGM. PGM activity appears essential for the growth and pathogenicity of P. syringae pv. tomato on its host plant.

  18. Isolation and sequence analysis of the Pseudomonas syringae pv. tomato gene encoding a 2,3-diphosphoglycerate-independent phosphoglyceromutase.

    PubMed Central

    Morris, V L; Jackson, D P; Grattan, M; Ainsworth, T; Cuppels, D A

    1995-01-01

    Pseudomonas syringae pv. tomato DC3481, a Tn5-induced mutant of the tomato pathogen DC3000, cannot grow and elicit disease symptoms on tomato seedlings. It also cannot grow on minimal medium containing malate, citrate, or succinate, three of the major organic acids found in tomatoes. We report here that this mutant also cannot use, as a sole carbon and/or energy source, a wide variety of hexoses and intermediates of hexose catabolism. Uptake studies have shown that DC3481 is not deficient in transport. A 3.8-kb EcoRI fragment of DC3000 DNA, which complements the Tn5 mutation, has been cloned and sequenced. The deduced amino acid sequences of two of the three open reading frames (ORFs) present on this fragment, ORF2 and ORF3, had no significant homology with sequences in the GenBank databases. However, the 510-amino-acid sequence of ORF1, the site of the Tn5 insertion, strongly resembled the deduced amino acid sequences of the Bacillus subtilis and Zea mays genes encoding 2,3-diphosphoglycerate (DPG)-independent phosphoglyceromutase (PGM) (52% identity and 72% similarity and 37% identity and 57% similarity, respectively). PGMs not requiring the cofactor DPG are usually found in plants and algae. Enzyme assays confirmed that P. syringae PGM activity required an intact ORF1. Not only is DC3481 the first PGM-deficient pseudomonad mutant to be described, but the P. syringae pgm gene is the first gram-negative bacterial gene identified that appears to code for a DPG-independent PGM. PGM activity appears essential for the growth and pathogenicity of P. syringae pv. tomato on its host plant. PMID:7896694

  19. Whole-Genome Sequence of Coxiella burnetii Nine Mile RSA439 (Phase II, Clone 4), a Laboratory Workhorse Strain.

    PubMed

    Millar, Jess A; Beare, Paul A; Moses, Abraham S; Martens, Craig A; Heinzen, Robert A; Raghavan, Rahul

    2017-06-08

    Here, we report the whole-genome sequence of Coxiella burnetii Nine Mile RSA439 (phase II, clone 4), a laboratory strain used extensively to investigate the biology of this intracellular bacterial pathogen. The genome consists of a 1.97-Mb chromosome and a 37.32-kb plasmid. Copyright © 2017 Millar et al.

  20. Analysis of the DNA sequence of a 15,500 bp fragment near the left telomere of chromosome XV from Saccharomyces cerevisiae reveals a putative sugar transporter, a carboxypeptidase homologue and two new open reading frames.

    PubMed

    Gamo, F J; Lafuente, M J; Casamayor, A; Ariño, J; Aldea, M; Casas, C; Herrero, E; Gancedo, C

    1996-06-15

    We report the sequence of a 15.5 kb DNA segment located near the left telomere of chromosome XV of Saccharomyces cerevisiae. The sequence contains nine open reading frames (ORFs) longer than 300 bp. Three of them are internal to other ones. One corresponds to the gene LGT3 that encodes a putative sugar transporter. Three adjacent ORFs were separated by two stop codons in frame. These ORFs presented homology with the gene CPS1 that encodes carboxypeptidase S. The stop codons were not found in the same sequence derived from another yeast strain. Two other ORFs without significant homology in databases were also found. One of them, O0420, is very rich in serine and threonine and presents a series of repeated or similar amino acid stretches along the sequence.

  1. The simplest possible design for a KB microfocus mirror system?

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Collins, S. P., E-mail: steve.collins@diamond.ac.uk; Scott, S. M.; Hawkins, D. M.

    2016-07-27

    We report a design for a Kirkpatrick-Baez (KB) microfocussing mirror system. The main components are described, with emphasis on a ‘tripod’ manipulator, where we outline the required coordinate transformation calculations. The merit of this device lies in its simplicity of design, minimal degrees of freedom, and speed and ease of setup on a beamline. Test results and an example of the mirrors in use on Diamond Beamline I16, showing a high-resolution polar domain map of KTiOPO{sub 4} with a spot size of 1.25 µm × 1.5 µm, are presented.

  2. A 405-kb cosmid contig and HindIII restriction map of the progressive myoclonus epilepsy type 1 (EPM1) candidate region in 21q22.3

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lafreniere, R.G.; Rouleau, G.A.; De Jong, P.J.

    1995-09-01

    As a step toward identifying the molecular defect in patients afflicted with progressive myoclonus epilepsy type 1 (EPM1), we have assembled a cosmid contig of the candidate EPM1 region in 21q22.3. The contig constitutes a collection of 87 different cosmids spanning 405 kb based on a derived HindIII restriction map. Potential CpG-rich islands have been identified based on the restriction map generated from eight different rare-cutting enzymes. This contig contains the genetic material required for the isolation of expressed sequences and the identification of the gene defective in EPM1 and possibly other disorders mapping to this region. 15 refs., 1more » fig.« less

  3. Finished Genome Sequence of Bacillus cereus Strain 03BB87, a Clinical Isolate with B. anthracis Virulence Genes

    DOE PAGES

    Johnson, Shannon L.; Minogue, Timothy D.; Teshima, Hazuki; ...

    2015-01-15

    Bacillus cereus strain 03BB87, a blood culture isolate, originated in a 56-year-old male muller operator with a fatal case of pneumonia in 2003. Here we present the finished genome sequence of that pathogen, including a 5.46-Mb chromosome and two plasmids (209 and 52 Kb, respectively).

  4. The ergot alkaloid gene cluster in Claviceps purpurea: extension of the cluster sequence and intra species evolution.

    PubMed

    Haarmann, Thomas; Machado, Caroline; Lübbe, Yvonne; Correia, Telmo; Schardl, Christopher L; Panaccione, Daniel G; Tudzynski, Paul

    2005-06-01

    The genomic region of Claviceps purpurea strain P1 containing the ergot alkaloid gene cluster [Tudzynski, P., Hölter, K., Correia, T., Arntz, C., Grammel, N., Keller, U., 1999. Evidence for an ergot alkaloid gene cluster in Claviceps purpurea. Mol. Gen. Genet. 261, 133-141] was explored by chromosome walking, and additional genes probably involved in the ergot alkaloid biosynthesis have been identified. The putative cluster sequence (extending over 68.5kb) contains 4 different nonribosomal peptide synthetase (NRPS) genes and several putative oxidases. Northern analysis showed that most of the genes were co-regulated (repressed by high phosphate), and identified probable flanking genes by lack of co-regulation. Comparison of the cluster sequences of strain P1, an ergotamine producer, with that of strain ECC93, an ergocristine producer, showed high conservation of most of the cluster genes, but significant variation in the NRPS modules, strongly suggesting that evolution of these chemical races of C. purpurea is determined by evolution of NRPS module specificity.

  5. SEPT9 Mutations and a Conserved 17q25 Sequence in Sporadic and Hereditary Brachial Plexus Neuropathy

    PubMed Central

    Klein, Christopher J.; Wu, Yanhong; Cunningham, Julie M.; Windebank, Anthony J.; Dyck, P. James B.; Friedenberg, Scott M.; Klein, Diane M.; Dyck, Peter J.

    2009-01-01

    Background The clinical characteristics of sporadic brachial plexus neuropathy (S-BPN) and hereditary brachial plexus neuropathy (H-BPN) are similar. At times of attack inflammation in brachial plexus nerves has been identified in both conditions. SEPT-9 mutations (Arg88Trp, Ser93Phe, 5UTR-131G to C) occur in some families with H-BPN. These mutations were not found in American H-BPN kindreds with a conserved 500 Kb sequence of DNA at 17q25 (the location of SEPT-9) where a founder mutation has been suggested. Objective To study 17q25 and SEPT-9 in S-BPN (56 patients) and H-BPN (13 kindreds). Methods Allele analysis at 17q25, SEPT-9 DNA sequencing and mRNA analysis from lymphoblast cultures. Results A conserved 17q25 sequence was found in 5 of 13 H-BPN kindreds and one S-BPN patient. This conserved sequence was not found in the family with a SEPT-9 mutation (Arg88Trp) or controls (182). SEPT-9 mRNA expression did not differ between forms of H-BPN and controls. No known mutations of SEPT-9 were found in S-BPN. Conclusions/Relevance Rare S-BPN patients have the same conserved 17q25 sequence found in many American H-BPN kindreds. BPN patients with this conserved sequence do not appear to have SEPT-9 mutations or alterations of its mRNA expression levels in lymphoblast cultures. BPN patients with this conserved sequence may have the most common genetic cause in the Americas by a founder effect mutation. PMID:19204161

  6. EventThread: Visual Summarization and Stage Analysis of Event Sequence Data.

    PubMed

    Guo, Shunan; Xu, Ke; Zhao, Rongwen; Gotz, David; Zha, Hongyuan; Cao, Nan

    2018-01-01

    Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.

  7. [Phylogenetic analysis of genomes of Vibrio cholerae strains isolated on the territory of Rostov region].

    PubMed

    Kuleshov, K V; Markelov, M L; Dedkov, V G; Vodop'ianov, A S; Kermanov, A V; Pisanov, R V; Kruglikov, V D; Mazrukho, A B; Maleev, V V; Shipulin, G A

    2013-01-01

    Determination of origin of 2 Vibrio cholerae strains isolated on the territory of Rostov region by using full genome sequencing data. Toxigenic strain 2011 EL- 301 V. cholerae 01 El Tor Inaba No. 301 (ctxAB+, tcpA+) and nontoxigenic strain V. cholerae O1 Ogawa P- 18785 (ctxAB-, tcpA+) were studied. Sequencing was carried out on the MiSeq platform. Phylogenetic analysis of the genomes obtained was carried out based on comparison of conservative part of the studied and 54 previously sequenced genomes. 2011EL-301 strain genome was presented by 164 contigs with an average coverage of 100, N50 parameter was 132 kb, for strain P- 18785 - 159 contigs with a coverage of69, N50 - 83 kb. The contigs obtained for strain 2011 EL-301 were deposited in DDBJ/EMBL/GenBank databases with access code AJFN02000000, for strain P-18785 - ANHS00000000. 716 protein-coding orthologous genes were detected. Based on phylogenetic analysis strain P- 18785 belongs to PG-1 subgroup (a group of predecessor strains of the 7th pandemic). Strain 2011EL-301 belongs to groups of strains of the 7th pandemic and is included into the cluster with later isolates that are associated with cases of cholera in South Africa and cases of import of cholera to the USA from Pakistan. The data obtained allows to establish phylogenetic connections with V cholerae strains isolated earlier.

  8. Noncoding sequence classification based on wavelet transform analysis: part I

    NASA Astrophysics Data System (ADS)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.

  9. Genome sequencing of adzuki bean (Vigna angularis) provides insight into high starch and low fat accumulation and domestication.

    PubMed

    Yang, Kai; Tian, Zhixi; Chen, Chunhai; Luo, Longhai; Zhao, Bo; Wang, Zhuo; Yu, Lili; Li, Yisong; Sun, Yudong; Li, Weiyu; Chen, Yan; Li, Yongqiang; Zhang, Yueyang; Ai, Danjiao; Zhao, Jinyang; Shang, Cheng; Ma, Yong; Wu, Bin; Wang, Mingli; Gao, Li; Sun, Dongjing; Zhang, Peng; Guo, Fangfang; Wang, Weiwei; Li, Yuan; Wang, Jinlong; Varshney, Rajeev K; Wang, Jun; Ling, Hong-Qing; Wan, Ping

    2015-10-27

    Adzuki bean (Vigna angularis), an important legume crop, is grown in more than 30 countries of the world. The seed of adzuki bean, as an important source of starch, digestible protein, mineral elements, and vitamins, is widely used foods for at least a billion people. Here, we generated a high-quality draft genome sequence of adzuki bean by whole-genome shotgun sequencing. The assembled contig sequences reached to 450 Mb (83% of the genome) with an N50 of 38 kb, and the total scaffold sequences were 466.7 Mb with an N50 of 1.29 Mb. Of them, 372.9 Mb of scaffold sequences were assigned to the 11 chromosomes of adzuki bean by using a single nucleotide polymorphism genetic map. A total of 34,183 protein-coding genes were predicted. Functional analysis revealed that significant differences in starch and fat content between adzuki bean and soybean were likely due to transcriptional abundance, rather than copy number variations, of the genes related to starch and oil synthesis. We detected strong selection signals in domestication by the population analysis of 50 accessions including 11 wild, 11 semiwild, 17 landraces, and 11 improved varieties. In addition, the semiwild accessions were illuminated to have a closer relationship to the cultigen accessions than the wild type, suggesting that the semiwild adzuki bean might be a preliminary landrace and play some roles in the adzuki bean domestication. The genome sequence of adzuki bean will facilitate the identification of agronomically important genes and accelerate the improvement of adzuki bean.

  10. Genome sequencing of adzuki bean (Vigna angularis) provides insight into high starch and low fat accumulation and domestication

    PubMed Central

    Yang, Kai; Tian, Zhixi; Chen, Chunhai; Luo, Longhai; Zhao, Bo; Wang, Zhuo; Yu, Lili; Li, Yisong; Sun, Yudong; Li, Weiyu; Chen, Yan; Li, Yongqiang; Zhang, Yueyang; Ai, Danjiao; Zhao, Jinyang; Shang, Cheng; Ma, Yong; Wu, Bin; Wang, Mingli; Gao, Li; Sun, Dongjing; Zhang, Peng; Guo, Fangfang; Wang, Weiwei; Li, Yuan; Wang, Jinlong; Varshney, Rajeev K.; Wang, Jun; Ling, Hong-Qing; Wan, Ping

    2015-01-01

    Adzuki bean (Vigna angularis), an important legume crop, is grown in more than 30 countries of the world. The seed of adzuki bean, as an important source of starch, digestible protein, mineral elements, and vitamins, is widely used foods for at least a billion people. Here, we generated a high-quality draft genome sequence of adzuki bean by whole-genome shotgun sequencing. The assembled contig sequences reached to 450 Mb (83% of the genome) with an N50 of 38 kb, and the total scaffold sequences were 466.7 Mb with an N50 of 1.29 Mb. Of them, 372.9 Mb of scaffold sequences were assigned to the 11 chromosomes of adzuki bean by using a single nucleotide polymorphism genetic map. A total of 34,183 protein-coding genes were predicted. Functional analysis revealed that significant differences in starch and fat content between adzuki bean and soybean were likely due to transcriptional abundance, rather than copy number variations, of the genes related to starch and oil synthesis. We detected strong selection signals in domestication by the population analysis of 50 accessions including 11 wild, 11 semiwild, 17 landraces, and 11 improved varieties. In addition, the semiwild accessions were illuminated to have a closer relationship to the cultigen accessions than the wild type, suggesting that the semiwild adzuki bean might be a preliminary landrace and play some roles in the adzuki bean domestication. The genome sequence of adzuki bean will facilitate the identification of agronomically important genes and accelerate the improvement of adzuki bean. PMID:26460024

  11. [Development of laboratory sequence analysis software based on WWW and UNIX].

    PubMed

    Huang, Y; Gu, J R

    2001-01-01

    Sequence analysis tools based on WWW and UNIX were developed in our laboratory to meet the needs of molecular genetics research in our laboratory. General principles of computer analysis of DNA and protein sequences were also briefly discussed in this paper.

  12. Paracetamol - toxicity and microbial utilization. Pseudomonas moorei KB4 as a case study for exploring degradation pathway.

    PubMed

    Żur, Joanna; Wojcieszyńska, Danuta; Hupert-Kocurek, Katarzyna; Marchlewicz, Ariel; Guzik, Urszula

    2018-09-01

    Paracetamol, a widely used analgesic and antipyretic drug, is currently one of the most emerging pollutants worldwide. Besides its wide prevalence in the literature only several bacterial strains able to degrade this compound have been described. In this study, we isolated six new bacterial strains able to remove paracetamol. The isolated strains were identified as the members of Pseudomonas, Bacillus, Acinetobacter and Sphingomonas genera and characterized phenotypically and biochemically using standard methods. From the isolated strains, Pseudomonas moorei KB4 was able to utilize 50 mg L -1 of paracetamol. As the main degradation products, p-aminophenol and hydroquinone were identified. Based on the measurements of specific activity of acyl amidohydrolase, deaminase and hydroquinone 1,2-dioxygenase and the results of liquid chromatography analyses, we proposed a mechanism of paracetamol degradation by KB4 strain under co-metabolic conditions with glucose. Additionally, toxicity bioassays and the influence of various environmental factors, including pH, temperature, heavy metals at no-observed-effective-concentrations, and the presence of aromatic compounds on the efficiency and mechanism of paracetamol degradation by KB4 strain were determined. This comprehensive study about paracetamol biodegradation will be helpful in designing a treatment systems of wastewaters contaminated with paracetamol. Copyright © 2018 Elsevier Ltd. All rights reserved.

  13. Identification, characterization and functional analysis of regulatory region of nanos gene from half-smooth tongue sole (Cynoglossus semilaevis).

    PubMed

    Huang, Jinqiang; Li, Yongjuan; Shao, Changwei; Wang, Na; Chen, Songlin

    2017-06-20

    The nanos gene encodes an RNA-binding zinc finger protein, which is required in the development and maintenance of germ cells. However, there is very limited information about nanos in flatfish, which impedes its application in fish breeding. In this study, we report the molecular cloning, characterization and functional analysis of the 3'-untranslated region of the nanos gene (Csnanos) from half-smooth tongue sole (Cynoglossus semilaevis), which is an economically important flatfish in China. The 1233-bp cDNA sequence, 1709-bp genomic sequence and flanking sequences (2.8-kb 5'- and 1.6-kb 3'-flanking regions) of Csnanos were cloned and characterized. Sequence analysis revealed that CsNanos shares low homology with Nanos in other species, but the zinc finger domain of CsNanos is highly similar. Phylogenetic analysis indicated that CsNanos belongs to the Nanos2 subfamily. Csnanos expression was widely detected in various tissues, but the expression level was higher in testis and ovary. During early development and sex differentiation, Csnanos expression exhibited a clear sexually dimorphic pattern, suggesting its different roles in the migration and differentiation of primordial germ cells (PGCs). Higher expression levels of Csnanos mRNA in normal females and males than in neomales indicated that the nanos gene may play key roles in maintaining the differentiation of gonad. Moreover, medaka PGCs were successfully labeled by the microinjection of synthesized mRNA consisting of green fluorescence protein and the 3'-untranslated region of Csnanos. These findings provide new insights into nanos gene expression and function, and lay the foundation for further study of PGC development and applications in tongue sole breeding. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Nucleotide sequence of a cluster of early and late genes in a conserved segment of the vaccinia virus genome.

    PubMed Central

    Plucienniczak, A; Schroeder, E; Zettlmeissl, G; Streeck, R E

    1985-01-01

    The nucleotide sequence of a 7.6 kb vaccinia DNA segment from a genomic region conserved among different orthopox virus has been determined. This segment contains a tight cluster of 12 partly overlapping open reading frames most of which can be correlated with previously identified early and late proteins and mRNAs. Regulatory signals used by vaccinia virus have been studied. Presumptive promoter regions are rich in A, T and carry the consensus sequences TATA and AATAA spaced at 20-24 base pairs. Tandem repeats of a CTATTC consensus sequence are proposed to be involved in the termination of early transcription. PMID:2987815

  15. DNA sequence analysis of ARS elements from chromosome III of Saccharomyces cerevisiae: identification of a new conserved sequence.

    PubMed Central

    Palzkill, T G; Oliver, S G; Newlon, C S

    1986-01-01

    Four fragments of Saccharomyces cerevisiae chromosome III DNA which carry ARS elements have been sequenced. Each fragment contains multiple copies of sequences that have at least 10 out of 11 bases of homology to a previously reported 11 bp core consensus sequence. A survey of these new ARS sequences and previously reported sequences revealed the presence of an additional 11 bp conserved element located on the 3' side of the T-rich strand of the core consensus. Subcloning analysis as well as deletion and transposon insertion mutagenesis of ARS fragments support a role for 3' conserved sequence in promoting ARS activity. PMID:3529036

  16. Exploitation of the diverse insertion sequence element content of dairy Lactobacillus helveticus starters as a rapid method to identify different strains.

    PubMed

    Kaleta, Pawel; Callanan, Michael J; O'Callaghan, John; Fitzgerald, Gerald F; Beresford, Thomas P; Ross, R Paul

    2009-10-01

    The species Lactobacillus helveticus is a commonly used thermophilic starter and/or adjunct culture for Swiss and Cheddar cheese manufacture. Its use is normally associated with flavour improvement which is known to be associated with culture traits such as rapid autolysis and high proteolytic activity. The genome of the commercial strain, DPC4571, was recently sequenced and found to have an abundance of IS sequences in terms of both abundance (213 intact) and diversity (21 types). Given this unique diversity for a lactic acid bacterium, we investigated whether PCR-based IS fingerprinting could be used as a discriminatory tool to distinguish between different strains of Lb. helveticus. A set of ten primers targeting five of the most numerous groups (ISL1201, ISLhe65, ISLhe2, ISLhe15 and ISL2) of IS elements was designed. Multiplex-PCR with all primers resulted in 1-12 discreet amplicons for each strain tested. The resultant fingerprints (in the 0.5 kb-3 kb range) were found to be strain specific and reproducible. This approach thus provides a valuable method to distinguish between Lb. helveticus strains while giving some indication of the relative abundance of IS sequences in each strain.

  17. Quantiprot - a Python package for quantitative analysis of protein sequences.

    PubMed

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  18. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    PubMed

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  19. Infrared thermal facial image sequence registration analysis and verification

    NASA Astrophysics Data System (ADS)

    Chen, Chieh-Li; Jian, Bo-Lin

    2015-03-01

    To study the emotional responses of subjects to the International Affective Picture System (IAPS), infrared thermal facial image sequence is preprocessed for registration before further analysis such that the variance caused by minor and irregular subject movements is reduced. Without affecting the comfort level and inducing minimal harm, this study proposes an infrared thermal facial image sequence registration process that will reduce the deviations caused by the unconscious head shaking of the subjects. A fixed image for registration is produced through the localization of the centroid of the eye region as well as image translation and rotation processes. Thermal image sequencing will then be automatically registered using the two-stage genetic algorithm proposed. The deviation before and after image registration will be demonstrated by image quality indices. The results show that the infrared thermal image sequence registration process proposed in this study is effective in localizing facial images accurately, which will be beneficial to the correlation analysis of psychological information related to the facial area.

  20. Structural and genetic analysis of a mutant of Rhodobacter sphaeroides WS8 deficient in hook length control.

    PubMed Central

    González-Pedrajo, B; Ballado, T; Campos, A; Sockett, R E; Camarena, L; Dreyfus, G

    1997-01-01

    Motility in the photosynthetic bacterium Rhodobacter sphaeroides is achieved by the unidirectional rotation of a single subpolar flagellum. In this study, transposon mutagenesis was used to obtain nonmotile flagellar mutants from this bacterium. We report here the isolation and characterization of a mutant that shows a polyhook phenotype. Morphological characterization of the mutant was done by electron microscopy. Polyhooks were obtained by shearing and were used to purify the hook protein monomer (FlgE). The apparent molecular mass of the hook protein was 50 kDa. N-terminal amino acid sequencing and comparisons with the hook proteins of other flagellated bacteria indicated that the Rhodobacter hook protein has consensus sequences common to axial flagellar components. A 25-kb fragment from an R. sphaeroides WS8 cosmid library restored wild-type flagellation and motility to the mutant. Using DNA adjacent to the inserted transposon as a probe, we identified a 4.6-kb SalI restriction fragment that contained the gene responsible for the polyhook phenotype. Nucleotide sequence analysis of this region revealed an open reading frame with a deduced amino acid sequence that was 23.4% identical to that of FliK of Salmonella typhimurium, the polypeptide responsible for hook length control in that enteric bacterium. The relevance of a gene homologous to fliK in the uniflagellated bacterium R. sphaeroides is discussed. PMID:9352903

  1. High-Resolution Melting Analysis for Rapid Detection of Sequence Type 131 Escherichia coli.

    PubMed

    Harrison, Lucas B; Hanson, Nancy D

    2017-06-01

    Escherichia coli isolates belonging to the sequence type 131 (ST131) clonal complex have been associated with the global distribution of fluoroquinolone and β-lactam resistance. Whole-genome sequencing and multilocus sequence typing identify sequence type but are expensive when evaluating large numbers of samples. This study was designed to develop a cost-effective screening tool using high-resolution melting (HRM) analysis to differentiate ST131 from non-ST131 E. coli in large sample populations in the absence of sequence analysis. The method was optimized using DNA from 12 E. coli isolates. Singleplex PCR was performed using 10 ng of DNA, Type-it HRM buffer, and multilocus sequence typing primers and was followed by multiplex PCR. The amplicon sizes ranged from 630 to 737 bp. Melt temperature peaks were determined by performing HRM analysis at 0.1°C resolution from 50 to 95°C on a Rotor-Gene Q 5-plex HRM system. Derivative melt curves were compared between sequence types and analyzed by principal component analysis. A blinded study of 191 E. coli isolates of ST131 and unknown sequence types validated this methodology. This methodology returned 99.2% specificity (124 true negatives and 1 false positive) and 100% sensitivity (66 true positives and 0 false negatives). This HRM methodology distinguishes ST131 from non-ST131 E. coli without sequence analysis. The analysis can be accomplished in about 3 h in any laboratory with an HRM-capable instrument and principal component analysis software. Therefore, this assay is a fast and cost-effective alternative to sequencing-based ST131 identification. Copyright © 2017 Harrison and Hanson.

  2. Melting relations and elemental distribution of portion of the system Fe-S-Si-O to 32 KB with planetary application

    NASA Technical Reports Server (NTRS)

    Huang, W. L.

    1980-01-01

    The melting relations and distribution of K and Cs in portions of the system was determined at high pressures. Ferrosilite is stable as a primary phase at high pressures because of the incongruent melting of ferrosilite to quartz plus liquid and the boundary between the one and two liquid fields on the joint Fe(1-x) O-FeS-SiO2 shifts away from silica with increasing pressures. Potassium K was found to have limited solubility in metal sulfide liquids at pressures up to 45 kb. The speculation that K may dissolve significantly in metal-metal sulfide liquids after undergoing first order isomorphic transition was tested by determining the distribution of Cs between sulfide and silicate liquids as an analogy to K. At 45 kb, 1400 C and 27 kb, 1300 C only limited amounts of Cs were detected in quench sulfide liquids even at pressures beyond the isomorphic transition of Cs.

  3. Analysis and Visualization Tool for Targeted Amplicon Bisulfite Sequencing on Ion Torrent Sequencers

    PubMed Central

    Pabinger, Stephan; Ernst, Karina; Pulverer, Walter; Kallmeyer, Rainer; Valdes, Ana M.; Metrustry, Sarah; Katic, Denis; Nuzzo, Angelo; Kriegner, Albert; Vierlinger, Klemens; Weinhaeusel, Andreas

    2016-01-01

    Targeted sequencing of PCR amplicons generated from bisulfite deaminated DNA is a flexible, cost-effective way to study methylation of a sample at single CpG resolution and perform subsequent multi-target, multi-sample comparisons. Currently, no platform specific protocol, support, or analysis solution is provided to perform targeted bisulfite sequencing on a Personal Genome Machine (PGM). Here, we present a novel tool, called TABSAT, for analyzing targeted bisulfite sequencing data generated on Ion Torrent sequencers. The workflow starts with raw sequencing data, performs quality assessment, and uses a tailored version of Bismark to map the reads to a reference genome. The pipeline visualizes results as lollipop plots and is able to deduce specific methylation-patterns present in a sample. The obtained profiles are then summarized and compared between samples. In order to assess the performance of the targeted bisulfite sequencing workflow, 48 samples were used to generate 53 different Bisulfite-Sequencing PCR amplicons from each sample, resulting in 2,544 amplicon targets. We obtained a mean coverage of 282X using 1,196,822 aligned reads. Next, we compared the sequencing results of these targets to the methylation level of the corresponding sites on an Illumina 450k methylation chip. The calculated average Pearson correlation coefficient of 0.91 confirms the sequencing results with one of the industry-leading CpG methylation platforms and shows that targeted amplicon bisulfite sequencing provides an accurate and cost-efficient method for DNA methylation studies, e.g., to provide platform-independent confirmation of Illumina Infinium 450k methylation data. TABSAT offers a novel way to analyze data generated by Ion Torrent instruments and can also be used with data from the Illumina MiSeq platform. It can be easily accessed via the Platomics platform, which offers a web-based graphical user interface along with sample and parameter storage. TABSAT is freely

  4. Sequence, molecular properties, and chromosomal mapping of mouse lumican

    NASA Technical Reports Server (NTRS)

    Funderburgh, J. L.; Funderburgh, M. L.; Hevelone, N. D.; Stech, M. E.; Justice, M. J.; Liu, C. Y.; Kao, W. W.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)

    1995-01-01

    PURPOSE. Lumican is a major proteoglycan of vertebrate cornea. This study characterizes mouse lumican, its molecular form, cDNA sequence, and chromosomal localization. METHODS. Lumican sequence was determined from cDNA clones selected from a mouse corneal cDNA expression library using a bovine lumican cDNA probe. Tissue expression and size of lumican mRNA were determined using Northern hybridization. Glycosidase digestion followed by Western blot analysis provided characterization of molecular properties of purified mouse corneal lumican. Chromosomal mapping of the lumican gene (Lcn) used Southern hybridization of a panel of genomic DNAs from an interspecific murine backcross. RESULTS. Mouse lumican is a 338-amino acid protein with high-sequence identity to bovine and chicken lumican proteins. The N-terminus of the lumican protein contains consensus sequences for tyrosine sulfation. A 1.9-kb lumican mRNA is present in cornea and several other tissues. Antibody against bovine lumican reacted with recombinant mouse lumican expressed in Escherichia coli and also detected high molecular weight proteoglycans in extracts of mouse cornea. Keratanase digestion of corneal proteoglycans released lumican protein, demonstrating the presence of sulfated keratan sulfate chains on mouse corneal lumican in vivo. The lumican gene (Lcn) was mapped to the distal region of mouse chromosome 10. The Lcn map site is in the region of a previously identified developmental mutant, eye blebs, affecting corneal morphology. CONCLUSIONS. This study demonstrates sulfated keratan sulfate proteoglycan in mouse cornea and describes the tools (antibodies and cDNA) necessary to investigate the functional role of this important corneal molecule using naturally occurring and induced mutants of the murine lumican gene.

  5. Characterization of a hydroxyurea-resistant human KB cell line with supersensitivity to 6-thioguanine.

    PubMed

    Yen, Y; Grill, S P; Dutschman, G E; Chang, C N; Zhou, B S; Cheng, Y C

    1994-07-15

    Hydroxyurea (HU) is currently used in the clinic for the treatment of chronic myelogenous leukemia, head and neck carcinoma, and sarcoma. One of its drawbacks, however, is the development of HU resistance. To study this problem, we developed a HU-resistant human KB cell line which exhibits a 15-fold resistance to HU. The characterization of this HU-resistant phenotype revealed a gene amplification of the M2 subunit of ribonucleotide reductase (RR), increased levels of M2 mRNA and protein, and a 3-fold increase of RR activity. This HU-resistant cell line also expressed a "collateral sensitivity" to 6-thioguanine (6-TG), with a 10-fold decrease in the dose inhibiting cell growth by 50% as compared to the KB parental line. The mechanism responsible for this supersensitivity to 6-TG is believed to be related to an increasingly efficient conversion of 6-TG to its triphosphate form, which is subsequently incorporated into DNA. After passage of the resistant cells in the absence of HU, the cell line reverts. The revertant cells lose their resistance to HU and concomitantly their sensitivity to 6-TG. This phenomenon is due to the return of RR to levels comparable to that of the KB parental cell line. These observations and their relevance to cancer chemotherapy will be discussed in this paper. Our results suggest that a clinical protocol could be designed which would allow for a lower dose of 6-TG to be used by taking advantage of the increased RR activity in HU-refractory cancer patients. Two drugs which display collateral sensitivity are known as a "Ying-Yang" pair. Alternate treatment with two different Ying-Yang pairs is the rationale for the "Ying-Yang Ping-Pong" theory in cancer treatment. This rationale allows for effective cancer chemotherapy with reduced toxicity.

  6. Multilocus sequence analysis and rpoB sequencing of Mycobacterium abscessus (sensu lato) strains.

    PubMed

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-02-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536(T), M. massiliense CIP 108297(T), and M. bolletii CIP 108541(T)) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the

  7. Multilocus Sequence Analysis and rpoB Sequencing of Mycobacterium abscessus (Sensu Lato) Strains▿

    PubMed Central

    Macheras, Edouard; Roux, Anne-Laure; Bastian, Sylvaine; Leão, Sylvia Cardoso; Palaci, Moises; Sivadon-Tardy, Valérie; Gutierrez, Cristina; Richter, Elvira; Rüsch-Gerdes, Sabine; Pfyffer, Gaby; Bodmer, Thomas; Cambau, Emmanuelle; Gaillard, Jean-Louis; Heym, Beate

    2011-01-01

    Mycobacterium abscessus, Mycobacterium bolletii, and Mycobacterium massiliense (Mycobacterium abscessus sensu lato) are closely related species that currently are identified by the sequencing of the rpoB gene. However, recent studies show that rpoB sequencing alone is insufficient to discriminate between these species, and some authors have questioned their current taxonomic classification. We studied here a large collection of M. abscessus (sensu lato) strains by partial rpoB sequencing (752 bp) and multilocus sequence analysis (MLSA). The final MLSA scheme developed was based on the partial sequences of eight housekeeping genes: argH, cya, glpK, gnd, murC, pgm, pta, and purH. The strains studied included the three type strains (M. abscessus CIP 104536T, M. massiliense CIP 108297T, and M. bolletii CIP 108541T) and 120 isolates recovered between 1997 and 2007 in France, Germany, Switzerland, and Brazil. The rpoB phylogenetic tree confirmed the existence of three main clusters, each comprising the type strain of one species. However, divergence values between the M. massiliense and M. bolletii clusters all were below 3% and between the M. abscessus and M. massiliense clusters were from 2.66 to 3.59%. The tree produced using the concatenated MLSA gene sequences (4,071 bp) also showed three main clusters, each comprising the type strain of one species. The M. abscessus cluster had a bootstrap value of 100% and was mostly compact. Bootstrap values for the M. massiliense and M. bolletii branches were much lower (71 and 61%, respectively), with the M. massiliense cluster having a fuzzy aspect. Mean (range) divergence values were 2.17% (1.13 to 2.58%) between the M. abscessus and M. massiliense clusters, 2.37% (1.5 to 2.85%) between the M. abscessus and M. bolletii clusters, and 2.28% (0.86 to 2.68%) between the M. massiliense and M. bolletii clusters. Adding the rpoB sequence to the MLSA-concatenated sequence (total sequence, 4,823 bp) had little effect on the clustering

  8. High Throughput Sequence Analysis for Disease Resistance in Maize

    USDA-ARS?s Scientific Manuscript database

    Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...

  9. DNAApp: a mobile application for sequencing data analysis

    PubMed Central

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-01-01

    Summary: There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. Availability and implementation: The Android version of DNAApp is available in Google Play Store as ‘DNAApp’, and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. Contact: samuelg@bii.a-star.edu.sg PMID:25095882

  10. DNAApp: a mobile application for sequencing data analysis.

    PubMed

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-11-15

    There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. The Android version of DNAApp is available in Google Play Store as 'DNAApp', and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. samuelg@bii.a-star.edu.sg. © The Author 2014. Published by Oxford University Press.

  11. Genome sequencing of the sweetpotato whitefly Bemisia tabaci MED/Q.

    PubMed

    Xie, Wen; Chen, Chunhai; Yang, Zezhong; Guo, Litao; Yang, Xin; Wang, Dan; Chen, Ming; Huang, Jinqun; Wen, Yanan; Zeng, Yang; Liu, Yating; Xia, Jixing; Tian, Lixia; Cui, Hongying; Wu, Qingjun; Wang, Shaoli; Xu, Baoyun; Li, Xianchun; Tan, Xinqiu; Ghanim, Murad; Qiu, Baoli; Pan, Huipeng; Chu, Dong; Delatte, Helene; Maruthi, M N; Ge, Feng; Zhou, Xueping; Wang, Xiaowei; Wan, Fanghao; Du, Yuzhou; Luo, Chen; Yan, Fengming; Preisser, Evan L; Jiao, Xiaoguo; Coates, Brad S; Zhao, Jinyang; Gao, Qiang; Xia, Jinquan; Yin, Ye; Liu, Yong; Brown, Judith K; Zhou, Xuguo Joe; Zhang, Youjun

    2017-05-01

    The sweetpotato whitefly Bemisia tabaci is a highly destructive agricultural and ornamental crop pest. It damages host plants through both phloem feeding and vectoring plant pathogens. Introductions of B. tabaci are difficult to quarantine and eradicate because of its high reproductive rates, broad host plant range, and insecticide resistance. A total of 791 Gb of raw DNA sequence from whole genome shotgun sequencing, and 13 BAC pooling libraries were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 437 kb, and a total length of 658 Mb. Annotation of repetitive elements and coding regions resulted in 265.0 Mb TEs (40.3%) and 20 786 protein-coding genes with putative gene family expansions, respectively. Phylogenetic analysis based on orthologs across 14 arthropod taxa suggested that MED/Q is clustered into a hemipteran clade containing A. pisum and is a sister lineage to a clade containing both R. prolixus and N. lugens. Genome completeness, as estimated using the CEGMA and Benchmarking Universal Single-Copy Orthologs pipelines, reached 96% and 79%. These MED/Q genomic resources lay a foundation for future 'pan-genomic' comparisons of invasive vs. noninvasive, invasive vs. invasive, and native vs. exotic Bemisia, which, in return, will open up new avenues of investigation into whitefly biology, evolution, and management. © The Author 2017. Published by Oxford University Press.

  12. Identification and analysis of the bacterial endosymbiont specialized for production of the chemotherapeutic natural product ET-743

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schofield, Michael M.; Jain, Sunit; Porat, Daphne

    Ecteinascidin 743 (ET-743, Yondelis) is a clinically approved chemotherapeutic natural product isolated from the Caribbean mangrove tunicate Ecteinascidia turbinata. Researchers have long suspected that a microorganism may be the true producer of the anti-cancer drug, but its genome has remained elusive due to our inability to culture the bacterium in the laboratory using standard techniques. Here, we sequenced and assembled the complete genome of the ET-743 producer, Candidatus Endoecteinascidia frumentensis, directly from metagenomic DNA isolated from the tunicate. Analysis of the ~631 kb microbial genome revealed strong evidence of an endosymbiotic lifestyle and extreme genome reduction. Phylogenetic analysis suggested thatmore » the producer of the anti-cancer drug is taxonomically distinct from other sequenced microorganisms and could represent a new family of Gammaproteobacteria. The complete genome has also greatly expanded our understanding of ET-743 production and revealed new biosynthetic genes dispersed across more than 173 kb of the small genome. The gene cluster’s architecture and its preservation demonstrate that the drug is likely essential to the interactions of the microorganism with its mangrove tunicate host. In conclusion, taken together, these studies elucidate the lifestyle of a unique, and pharmaceutically-important microorganism and highlight the wide diversity of bacteria capable of making potent natural products.« less

  13. Identification and analysis of the bacterial endosymbiont specialized for production of the chemotherapeutic natural product ET-743

    DOE PAGES

    Schofield, Michael M.; Jain, Sunit; Porat, Daphne; ...

    2015-07-21

    Ecteinascidin 743 (ET-743, Yondelis) is a clinically approved chemotherapeutic natural product isolated from the Caribbean mangrove tunicate Ecteinascidia turbinata. Researchers have long suspected that a microorganism may be the true producer of the anti-cancer drug, but its genome has remained elusive due to our inability to culture the bacterium in the laboratory using standard techniques. Here, we sequenced and assembled the complete genome of the ET-743 producer, Candidatus Endoecteinascidia frumentensis, directly from metagenomic DNA isolated from the tunicate. Analysis of the ~631 kb microbial genome revealed strong evidence of an endosymbiotic lifestyle and extreme genome reduction. Phylogenetic analysis suggested thatmore » the producer of the anti-cancer drug is taxonomically distinct from other sequenced microorganisms and could represent a new family of Gammaproteobacteria. The complete genome has also greatly expanded our understanding of ET-743 production and revealed new biosynthetic genes dispersed across more than 173 kb of the small genome. The gene cluster’s architecture and its preservation demonstrate that the drug is likely essential to the interactions of the microorganism with its mangrove tunicate host. In conclusion, taken together, these studies elucidate the lifestyle of a unique, and pharmaceutically-important microorganism and highlight the wide diversity of bacteria capable of making potent natural products.« less

  14. Whole-genome sequence-based analysis of thyroid function.

    PubMed

    Taylor, Peter N; Porcu, Eleonora; Chew, Shelby; Campbell, Purdey J; Traglia, Michela; Brown, Suzanne J; Mullin, Benjamin H; Shihab, Hashem A; Min, Josine; Walter, Klaudia; Memari, Yasin; Huang, Jie; Barnes, Michael R; Beilby, John P; Charoen, Pimphen; Danecek, Petr; Dudbridge, Frank; Forgetta, Vincenzo; Greenwood, Celia; Grundberg, Elin; Johnson, Andrew D; Hui, Jennie; Lim, Ee M; McCarthy, Shane; Muddyman, Dawn; Panicker, Vijay; Perry, John R B; Bell, Jordana T; Yuan, Wei; Relton, Caroline; Gaunt, Tom; Schlessinger, David; Abecasis, Goncalo; Cucca, Francesco; Surdulescu, Gabriela L; Woltersdorf, Wolfram; Zeggini, Eleftheria; Zheng, Hou-Feng; Toniolo, Daniela; Dayan, Colin M; Naitza, Silvia; Walsh, John P; Spector, Tim; Davey Smith, George; Durbin, Richard; Richards, J Brent; Sanna, Serena; Soranzo, Nicole; Timpson, Nicholas J; Wilson, Scott G

    2015-03-06

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N=2,287). Using additional whole-genome sequence and deeply imputed data sets, we report meta-analysis results for common variants (MAF≥1%) associated with TSH and FT4 (N=16,335). For TSH, we identify a novel variant in SYN2 (MAF=23.5%, P=6.15 × 10(-9)) and a new independent variant in PDE8B (MAF=10.4%, P=5.94 × 10(-14)). For FT4, we report a low-frequency variant near B4GALT6/SLC25A52 (MAF=3.2%, P=1.27 × 10(-9)) tagging a rare TTR variant (MAF=0.4%, P=2.14 × 10(-11)). All common variants explain ≥20% of the variance in TSH and FT4. Analysis of rare variants (MAF<1%) using sequence kernel association testing reveals a novel association with FT4 in NRG1. Our results demonstrate that increased coverage in whole-genome sequence association studies identifies novel variants associated with thyroid function.

  15. A disputed evidence on obesity: comparison of the effects of Rcan2(-/-) and Rps6kb1(-/-) mutations on growth and body weight in C57BL/6J mice.

    PubMed

    Zhao, Jing; Li, Shi-Wei; Gong, Qian-Qian; Ding, Ling-Cui; Jin, Ye-Cheng; Zhang, Jian; Gao, Jian-Gang; Sun, Xiao-Yang

    2016-09-01

    It is widely accepted that body weight and adipose mass are tightly regulated by homeostatic mechanisms, in which leptin plays a critical role through hypothalamic pathways, and obesity is a result of homeostatic disorder. However, in C57BL/6J mice, we found that Rcan2 increases food intake and plays an important role in the development of age- and diet-induced obesity through a leptin-independent mechanism. RCAN2 was initially identified as a thyroid hormone (T3)-responsive gene in human fibroblasts. Expression of RCAN2 is regulated by T3 through the PI3K-Akt/PKB-mTOR-Rps6kb1 signaling pathway. Intriguingly, both Rcan2(-/-) and Rps6kb1(-/-) mutations were reported to result in lean phenotypes in mice. In this study we compared the effects of these two mutations on growth and body weight in C57BL/6J mice. We observed reduced body weight and lower fat mass in both Rcan2(-/-) and Rps6kb1(-/-) mice compared to the wild-type mice, and we reported other differences unique to either the Rcan2(-/-) or Rps6kb1(-/-) mice. Firstly, loss of Rcan2 does not directly alter body length; however, Rcan2(-/-) mice exhibit reduced food intake. In contrast, Rps6kb1(-/-) mice exhibit abnormal embryonic development, which leads to smaller body size and reduced food intake in adulthood. Secondly, when fed a normal chow diet, Rcan2(-/-) mice weigh significantly more than Rps6kb1(-/-) mice, but both Rcan2(-/-) and Rps6kb1(-/-) mice develop similar amounts of epididymal fat. On a high-fat diet, Rcan2(-/-) mice gain body weight and fat mass at slower rates than Rps6kb1(-/-) mice. Finally, using the double-knockout mice (Rcan2(-/-) Rps6kb1(-/-)), we demonstrate that concurrent loss of Rcan2 and Rps6kb1 has an additive effect on body weight reduction in C57BL/6J mice. Our data suggest that Rcan2 and Rps6kb1 mutations both affect growth and body weight of mice, though likely through different mechanisms.

  16. Identification and characterization of Serpulina hyodysenteriae by restriction enzyme analysis and Southern blot analysis.

    PubMed Central

    Sotiropoulos, C; Coloe, P J; Smith, S C

    1994-01-01

    Chromosomal DNA restriction enzyme analysis and Southern blot hybridization were used to characterize Serpulina hyodysenteriae strains. When chromosomal DNAs from selected strains (reference serotypes) of S. hyodysenteriae were digested with the restriction endonuclease Sau3A and hybridized with a 1.1-kb S. hyodysenteriae-specific DNA probe, a common 3-kb band was always detected in S. hyodysenteriae strains but was absent from Serpulina innocens strains. When the chromosomal DNA was digested with the restriction endonuclease Asp 700 and hybridized with two S. hyodysenteriae-specific DNA probes (0.75 and 1.1 kb of DNA), distinct hybridization patterns for each S. hyodysenteriae reference strain and the Australian isolate S. hyodysenteriae 5380 were detected. Neither the 1.1-kb nor the 0.75-kb DNA probe hybridized with Asp 700- or Sau3A-digested S. innocens chromosomal DNA. The presence of the 3-kb Sau3A DNA fragment in S. hyodysenteriae reference strains from diverse geographical locations shows that this fragment is conserved among S. hyodysenteriae strains and can be used as a species-specific marker. Restriction endonuclease analysis and Southern blot hybridization with these well-defined DNA probes are reliable and accurate methods for species-specific and strain-specific identification of S. hyodysenteriae. Images PMID:7914209

  17. Inhibition of the reverse mode of the Na+/Ca2+ exchange by KB-R7943 augments arrhythmogenicity in the canine heart during rapid heart rates.

    PubMed

    Shinada, Takuro; Hirayama, Yoshiyuki; Maruyama, Mitsunori; Ohara, Toshihiko; Yashima, Masaaki; Kobayashi, Yoshinori; Atarashi, Hirotsugu; Takano, Teruo

    2005-07-01

    To test the hypothesis that the reverse mode of the Na+/Ca2+ exchange augmented by a rapid heart rate has an antiarrhythmic effect by shortening the action potential duration, we examined the effects of KB-R7943 (2-[2-[4-(4-nitrobenzyloxy)phenyl]ethyl] isothiourea methanesulfonate), a selective inhibitor of the reverse mode of the Na+/Ca2+ exchange, to attenuate this effect. We recorded the electrocardiogram, monophasic action potential (MAP), and left ventricular pressure in canine beating hearts. In comparison to the control, KB-R7943 significantly increased the QTc value and MAP duration. MAP alternans and left ventricular pressure alternans were observed after changing the cycle length to 300 milliseconds in the control studies. KB-R7943 magnified both types of alternans and produced spatially discordant alternans between right and left ventricles. Early after-depolarizations and nonsustained ventricular tachycardia occurred in the presence of KB-R7943. Our data suggest that the reverse mode of the Na+/Ca2+ exchange may contribute to suppression of arrhythmias by abbreviating action potential duration under pathophysiological conditions. This conclusion is based on further confirmation by future studies of the specificity of KB-R7943 for block of the reverse mode of the Na+/Ca2+ exchange.

  18. RSAT 2018: regulatory sequence analysis tools 20th anniversary.

    PubMed

    Nguyen, Nga Thi Thuy; Contreras-Moreira, Bruno; Castro-Mondragon, Jaime A; Santana-Garcia, Walter; Ossio, Raul; Robles-Espinoza, Carla Daniela; Bahin, Mathieu; Collombet, Samuel; Vincens, Pierre; Thieffry, Denis; van Helden, Jacques; Medina-Rivera, Alejandra; Thomas-Chollier, Morgane

    2018-05-02

    RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.

  19. Exponential Megapriming PCR (EMP) Cloning—Seamless DNA Insertion into Any Target Plasmid without Sequence Constraints

    PubMed Central

    Ulrich, Alexander; Andersen, Kasper R.; Schwartz, Thomas U.

    2012-01-01

    We present a fast, reliable and inexpensive restriction-free cloning method for seamless DNA insertion into any plasmid without sequence limitation. Exponential megapriming PCR (EMP) cloning requires two consecutive PCR steps and can be carried out in one day. We show that EMP cloning has a higher efficiency than restriction-free (RF) cloning, especially for long inserts above 2.5 kb. EMP further enables simultaneous cloning of multiple inserts. PMID:23300917

  20. Exponential megapriming PCR (EMP) cloning--seamless DNA insertion into any target plasmid without sequence constraints.

    PubMed

    Ulrich, Alexander; Andersen, Kasper R; Schwartz, Thomas U

    2012-01-01

    We present a fast, reliable and inexpensive restriction-free cloning method for seamless DNA insertion into any plasmid without sequence limitation. Exponential megapriming PCR (EMP) cloning requires two consecutive PCR steps and can be carried out in one day. We show that EMP cloning has a higher efficiency than restriction-free (RF) cloning, especially for long inserts above 2.5 kb. EMP further enables simultaneous cloning of multiple inserts.

  1. A novel donor splice site in intron 11 of the CFTR gene, created by mutation 1811 + 1.6kbA {yields} G, produces a new exon: High frequency in spanish cystic fibrosis chromosomes and association with severe phenotype

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chillon, M.; Casals, T.; Gimenez, J.

    1995-03-01

    mRNA analysis of the cystic fibrosis transmembrane regulator (CFTR) gene in tissues of cystic fibrosis (CF) patients has allowed us to detect a cryptic exon. The new exon involves 49 base pairs between exons 11 and 12 and is due to a point mutation (1811+1.6bA{yields}G) that creates a new donor splice site in intron 11. Semiquantitative mRNA analysis showed that 1811+1.6kbA{r_arrow}G-mRNA was 5-10-fold less abundant than {triangle}F508 mRNA. Mutations 1811+1.6kbA{yields}G was found in 21 Spanish and 1 German CF chromosome(s), making it the fourth-most-frequent mutation (2%) in the Spanish population. Individuals with genotype {triangle}F508/1811+1.6kbA{yields}G have only 1%-3% of normal CFTRmore » mRNA. This loss of 97% of normal CFTR mRNA must be responsible for the pancreatic insufficiency and for the severe CF phenotype in these patients. 30 refs., 3 figs., 2 tabs.« less

  2. SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.

    PubMed

    Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue

    2018-05-02

    Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.

  3. Mitochondrial sequence analysis for forensic identification using pyrosequencing technology.

    PubMed

    Andréasson, H; Asp, A; Alderborn, A; Gyllensten, U; Allen, M

    2002-01-01

    Over recent years, requests for mtDNA analysis in the field of forensic medicine have notably increased, and the results of such analyses have proved to be very useful in forensic cases where nuclear DNA analysis cannot be performed. Traditionally, mtDNA has been analyzed by DNA sequencing of the two hypervariable regions, HVI and HVII, in the D-loop. DNA sequence analysis using the conventional Sanger sequencing is very robust but time consuming and labor intensive. By contrast, mtDNA analysis based on the pyrosequencing technology provides fast and accurate results from the human mtDNA present in many types of evidence materials in forensic casework. The assay has been developed to determine polymorphic sites in the mitochondrial D-loop as well as the coding region to further increase the discrimination power of mtDNA analysis. The pyrosequencing technology for analysis of mtDNA polymorphisms has been tested with regard to sensitivity, reproducibility, and success rate when applied to control samples and actual casework materials. The results show that the method is very accurate and sensitive; the results are easily interpreted and provide a high success rate on casework samples. The panel of pyrosequencing reactions for the mtDNA polymorphisms were chosen to result in an optimal discrimination power in relation to the number of bases determined.

  4. A 21.7 kb DNA segment on the left arm of yeast chromosome XIV carries WHI3, GCR2, SPX18, SPX19, an homologue to the heat shock gene SSB1 and 8 new open reading frames of unknown function.

    PubMed

    Jonniaux, J L; Coster, F; Purnelle, B; Goffeau, A

    1994-12-01

    We report the amino acid sequence of 13 open reading frames (ORF > 299 bp) located on a 21.7 kb DNA segment from the left arm of chromosome XIV of Saccharomyces cerevisiae. Five open reading frames had been entirely or partially sequenced previously: WHI3, GCR2, SPX19, SPX18 and a heat shock gene similar to SSB1. The products of 8 other ORFs are new putative proteins among which N1394 is probably a membrane protein. N1346 contains a leucine zipper pattern and the corresponding ORF presents an HAP (global regulator of respiratory genes) upstream activating sequence in the promoting region. N1386 shares homologies with the DNA structure-specific recognition protein family SSRPs and the corresponding ORF is preceded by an MCB (MluI cell cycle box) upstream activating factor.

  5. Software for rapid time dependent ChIP-sequencing analysis (TDCA).

    PubMed

    Myschyshyn, Mike; Farren-Dai, Marco; Chuang, Tien-Jui; Vocadlo, David

    2017-11-25

    Chromatin immunoprecipitation followed by DNA sequencing (ChIP-seq) and associated methods are widely used to define the genome wide distribution of chromatin associated proteins, post-translational epigenetic marks, and modifications found on DNA bases. An area of emerging interest is to study time dependent changes in the distribution of such proteins and marks by using serial ChIP-seq experiments performed in a time resolved manner. Despite such time resolved studies becoming increasingly common, software to facilitate analysis of such data in a robust automated manner is limited. We have designed software called Time-Dependent ChIP-Sequencing Analyser (TDCA), which is the first program to automate analysis of time-dependent ChIP-seq data by fitting to sigmoidal curves. We provide users with guidance for experimental design of TDCA for modeling of time course (TC) ChIP-seq data using two simulated data sets. Furthermore, we demonstrate that this fitting strategy is widely applicable by showing that automated analysis of three previously published TC data sets accurately recapitulates key findings reported in these studies. Using each of these data sets, we highlight how biologically relevant findings can be readily obtained by exploiting TDCA to yield intuitive parameters that describe behavior at either a single locus or sets of loci. TDCA enables customizable analysis of user input aligned DNA sequencing data, coupled with graphical outputs in the form of publication-ready figures that describe behavior at either individual loci or sets of loci sharing common traits defined by the user. TDCA accepts sequencing data as standard binary alignment map (BAM) files and loci of interest in browser extensible data (BED) file format. TDCA accurately models the number of sequencing reads, or coverage, at loci from TC ChIP-seq studies or conceptually related TC sequencing experiments. TC experiments are reduced to intuitive parametric values that facilitate biologically

  6. Coupling detrended fluctuation analysis for multiple warehouse-out behavioral sequences

    NASA Astrophysics Data System (ADS)

    Yao, Can-Zhong; Lin, Ji-Nan; Zheng, Xu-Zhou

    2017-01-01

    Interaction patterns among different warehouses could make the warehouse-out behavioral sequences less predictable. We firstly take a coupling detrended fluctuation analysis on the warehouse-out quantity, and find that the multivariate sequences exhibit significant coupling multifractal characteristics regardless of the types of steel products. Secondly, we track the sources of multifractal warehouse-out sequences by shuffling and surrogating original ones, and we find that fat-tail distribution contributes more to multifractal features than the long-term memory, regardless of types of steel products. From perspective of warehouse contribution, some warehouses steadily contribute more to multifractal than other warehouses. Finally, based on multiscale multifractal analysis, we propose Hurst surface structure to investigate coupling multifractal, and show that multiple behavioral sequences exhibit significant coupling multifractal features that emerge and usually be restricted within relatively greater time scale interval.

  7. The Sequence of Two Bacteriophages with Hypermodified Bases Reveals Novel Phage-Host Interactions.

    PubMed

    Kropinski, Andrew M; Turner, Dann; Nash, John H E; Ackermann, Hans-Wolfgang; Lingohr, Erika J; Warren, Richard A; Ehrlich, Kenneth C; Ehrlich, Melanie

    2018-04-24

    Bacteriophages SP-15 and ΦW-14 are members of the Myoviridae infecting Bacillus subtilis and Delftia (formerly Pseudomonas ) acidovorans , respectively. What links them is that in both cases, approximately 50% of the thymine residues are replaced by hypermodified bases. The consequence of this is that the physico-chemical properties of the DNA are radically altered (melting temperature (Tm), buoyant density and susceptibility to restriction endonucleases). Using 454 pyrosequencing technology, we sequenced the genomes of both viruses. Phage ΦW-14 possesses a 157-kb genome (56.3% GC) specifying 236 proteins, while SP-15 is larger at 222 kb (38.6 mol % G + C) and encodes 318 proteins. In both cases, the phages can be considered genomic singletons since they do not possess BLASTn homologs. While no obvious genes were identified as being responsible for the modified base in ΦW-14, SP-15 contains a cluster of genes obviously involved in carbohydrate metabolism.

  8. Structural analysis of the HLA-A/HLA-F subregion: Precise localization of two new multigene families closely associated with the HLA class I sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pichon, L.; Carn, G.; Bouric, P.

    1996-03-01

    Positional cloning strategies for the hemochromatosis gene have previously concentrated on a target area restricted to a maximum genomic expanse of 400 kb around the HLA-A and HLA-F loci. Recently, the candidate region has been extended to 2-3 Mb on the distal side of the MHC. In this study, 10 coding sequences [hemochromatosis candidate genes (HCG) I to X] were isolated by cDNA selection using YACs covering the HLA-A/HLA-F subregion. Two of these (HCG II and HCG IV) belong to multigene families, as well as other sequences already described in this region, i.e., P5, pMC 6.7, and HLA class I.more » Fingerprinting of the four YACSs overlapping the region was performed and allowed partial localization of the different multigene family sequences on each YAC without defining their exact positions. Fingerprinting on cosmids isolated from the ICRF chromosome 6-specific cosmid library allowed more precise localization of the redundant sequences in all of the multigene families and revealed their apparent organization in clusters. Further examination of these intertwined sequences demonstrated that this structural organization resulted from a succession of complex phenomena, including duplications and contractions. This study presents a precise description of the structural organization of the HLA-A/HLA-F region and a determination of the sequences involved in the megabase size polymorphism observed among the A3, A24, and A31 haplotypes. 29 refs., 2 figs., 2 tabs.« less

  9. CAFE: aCcelerated Alignment-FrEe sequence analysis

    PubMed Central

    Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A.; Waterman, Michael S.

    2017-01-01

    Abstract Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$d_2^*$\\end{document} and \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$d_2^S$\\end{document} are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. PMID:28472388

  10. Integrated databanks access and sequence/structure analysis services at the PBIL.

    PubMed

    Perrière, Guy; Combet, Christophe; Penel, Simon; Blanchet, Christophe; Thioulouse, Jean; Geourjon, Christophe; Grassot, Julien; Charavay, Céline; Gouy, Manolo; Duret, Laurent; Deléage, Gilbert

    2003-07-01

    The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.

  11. Automated sequence analysis and editing software for HIV drug resistance testing.

    PubMed

    Struck, Daniel; Wallis, Carole L; Denisov, Gennady; Lambert, Christine; Servais, Jean-Yves; Viana, Raquel V; Letsoalo, Esrom; Bronze, Michelle; Aitken, Sue C; Schuurman, Rob; Stevens, Wendy; Schmit, Jean Claude; Rinke de Wit, Tobias; Perez Bercoff, Danielle

    2012-05-01

    Access to antiretroviral treatment in resource-limited-settings is inevitably paralleled by the emergence of HIV drug resistance. Monitoring treatment efficacy and HIV drugs resistance testing are therefore of increasing importance in resource-limited settings. Yet low-cost technologies and procedures suited to the particular context and constraints of such settings are still lacking. The ART-A (Affordable Resistance Testing for Africa) consortium brought together public and private partners to address this issue. To develop an automated sequence analysis and editing software to support high throughput automated sequencing. The ART-A Software was designed to automatically process and edit ABI chromatograms or FASTA files from HIV-1 isolates. The ART-A Software performs the basecalling, assigns quality values, aligns query sequences against a set reference, infers a consensus sequence, identifies the HIV type and subtype, translates the nucleotide sequence to amino acids and reports insertions/deletions, premature stop codons, ambiguities and mixed calls. The results can be automatically exported to Excel to identify mutations. Automated analysis was compared to manual analysis using a panel of 1624 PR-RT sequences generated in 3 different laboratories. Discrepancies between manual and automated sequence analysis were 0.69% at the nucleotide level and 0.57% at the amino acid level (668,047 AA analyzed), and discordances at major resistance mutations were recorded in 62 cases (4.83% of differences, 0.04% of all AA) for PR and 171 (6.18% of differences, 0.03% of all AA) cases for RT. The ART-A Software is a time-sparing tool for pre-analyzing HIV and viral quasispecies sequences in high throughput laboratories and highlighting positions requiring attention. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. Genome survey of pistachio (Pistacia vera L.) by next generation sequencing: Development of novel SSR markers and genetic diversity in Pistacia species.

    PubMed

    Ziya Motalebipour, Elmira; Kafkas, Salih; Khodaeiaminjan, Mortaza; Çoban, Nergiz; Gözel, Hatice

    2016-12-07

    Pistachio (Pistacia vera L.) is one of the most important nut crops in the world. There are about 11 wild species in the genus Pistacia, and they have importance as rootstock seed sources for cultivated P. vera and forest trees. Published information on the pistachio genome is limited. Therefore, a genome survey is necessary to obtain knowledge on the genome structure of pistachio by next generation sequencing. Simple sequence repeat (SSR) markers are useful tools for germplasm characterization, genetic diversity analysis, and genetic linkage mapping, and may help to elucidate genetic relationships among pistachio cultivars and species. To explore the genome structure of pistachio, a genome survey was performed using the Illumina platform at approximately 40× coverage depth in the P. vera cv. Siirt. The K-mer analysis indicated that pistachio has a genome that is about 600 Mb in size and is highly heterozygous. The assembly of 26.77 Gb Illumina data produced 27,069 scaffolds at N50 = 3.4 kb with a total of 513.5 Mb. A total of 59,280 SSR motifs were detected with a frequency of 8.67 kb. A total of 206 SSRs were used to characterize 24 P. vera cultivars and 20 wild Pistacia genotypes (four genotypes from each five wild Pistacia species) belonging to P. atlantica, P. integerrima, P. chinenesis, P. terebinthus, and P. lentiscus genotypes. Overall 135 SSR loci amplified in all 44 cultivars and genotypes, 41 were polymorphic in six Pistacia species. The novel SSR loci developed from cultivated pistachio were highly transferable to wild Pistacia species. The results from a genome survey of pistachio suggest that the genome size of pistachio is about 600 Mb with a high heterozygosity rate. This information will help to design whole genome sequencing strategies for pistachio. The newly developed novel polymorphic SSRs in this study may help germplasm characterization, genetic diversity, and genetic linkage mapping studies in the genus Pistacia.

  13. Community analysis of a full-scale anaerobic bioreactor treating paper mill wastewater.

    PubMed

    Roest, Kees; Heilig, Hans G H J; Smidt, Hauke; de Vos, Willem M; Stams, Alfons J M; Akkermans, Antoon D L

    2005-03-01

    To get insight into the microbial community of an Upflow Anaerobic Sludge Blanket reactor treating paper mill wastewater, conventional microbiological methods were combined with 16S rRNA gene analyses. Particular attention was paid to microorganisms able to degrade propionate or butyrate in the presence or absence of sulphate. Serial enrichment dilutions allowed estimating the number of microorganisms per ml sludge that could use butyrate with or without sulphate (10(5)), propionate without sulphate (10(6)), or propionate and sulphate (10(8)). Quantitative RNA dot-blot hybridisation indicated that Archaea were two-times more abundant in the microbial community of anaerobic sludge than Bacteria. The microbial community composition was further characterised by 16S rRNA-gene-targeted Denaturing Gradient Gel Electrophoresis (DGGE) fingerprinting, and via cloning and sequencing of dominant amplicons from the bacterial and archaeal patterns. Most of the nearly full length (approximately 1.45 kb) bacterial 16S rRNA gene sequences showed less than 97% similarity to sequences present in public databases, in contrast to the archaeal clones (approximately. 1.3 kb) that were highly similar to known sequences. While Methanosaeta was found as the most abundant genus, also Crenarchaeote-relatives were identified. The microbial community was relatively stable over a period of 3 years (samples taken in July 1999, May 2001, March 2002 and June 2002) as indicated by the high similarity index calculated from DGGE profiles (81.9+/-2.7% for Bacteria and 75.1+/-3.1% for Archaea). 16S rRNA gene sequence analysis indicated the presence of unknown and yet uncultured microorganisms, but also showed that known sulphate-reducing bacteria and syntrophic fatty acid-oxidising microorganisms dominated the enrichments.

  14. Localization of an Ataxia-Telangiectasia Gene to an −500-kb Interval on Chromosome 11q23.1: Linkage Analysis of 176 Families by an International Consortium

    PubMed Central

    Lange, Ethan; Borresen, Anna-Lise; Chen, Xiaoguang; Chessa, Luciana; Chiplunkar, Sujata; Concannon, Patrick; Dandekar, Sugandha; Gerken, Steven; Lange, Kenneth; Liang, Teresa; McConville, Carmel; Polakow, Jeff; Porras, Oscar; Rotman, Galit; Sanal, Ozden; Sheikhavandi, Sepideh; Shiloh, Yosef; Sobel, Eric; Taylor, Malcolm; Telatar, Milhan; Teraoka, Sharon; Tolun, Aslihan; Udar, Nitin; Uhrhammer, Nancy; Vanagaite, Lina; Wang, Zhijun; Wapelhorst, Beth; Wright, Jocyndra; Yang, Huan-Ming; Yang, Lan; Ziv, Yael; Gatti, Richard A.

    1995-01-01

    We describe a 20-point linkage analysis map of chromosome 11q22-23 that is based on genotyping 249 families (59 CEPH and 190 A-T). Monte Carlo linkage analyses of 176 ataxia-telangiectasia (A-T) families localizes the major A-T locus to the region between S1819(A4) and S1818(A2). When seven nonlinking families were excluded from subsequent analyses, a 2-lod support interval of ∼500 kb was identified between S1819(A4) and S1294. No recombinants were observed between A-T and markers S384, B7, S535, or S1294. Only 17 of the international consortium families have been assigned to complementation groups. The available evidence favors either a cluster of A-T genes on chromosome 11 or intragenic defects in a single gene. PMID:7611279

  15. CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

    PubMed

    Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

    2017-05-15

    B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.

  16. Prx1 and 3.2 kb Col1a1 promoters target distinct bone cell populations in transgenic mice

    PubMed Central

    Ouyang, Zhufeng; Chen, Zhijun; Ishikawa, Masakazu; Yue, Xiuzhen; Kawanami, Aya; Leahy, Patrick; Greenfield, Edward M.; Murakami, Shunichi

    2014-01-01

    Bones consist of a number of cell types including osteoblasts and their precursor cells at various stages of differentiation. To analyze cellular organization within the bone, we generated Col1a1CreER-DsRed transgenic mice that express, in osteoblasts, CreER and DsRed under the control of a mouse 3.2 kb Col1a1 promoter. We further crossed Col1a1CreER-DsRed mice with Prx1CreER-GFP mice that express CreER and GFP in osteochondro progenitor cells under the control of a 2.4 kb Prx1 promoter. Since the 3.2 kb Col1a1 promoter becomes active in osteoblasts at early stages of differentiation, and Prx1CreER-GFP-expressing periosteal cells show endogenous Col1a1 expression, we expected to find a cell population in which both the 2.4 kb Prx1 promoter and the 3.2 kb Col1a1 promoter are active. However, our histological and flow cytometric analyses demonstrated that these transgenes are expressed in distinct cell populations. In the periosteum of long bones, Col1a1CreER-DsRed is expressed in the innermost layer directly lining the bone surface, while Prx1CreER-GFP-expressing cells are localized immediately outside of the Col1a1CreER-DsRed-expressing osteoblasts. In the calvaria, Prx1CreER-GFP-expressing cells are also localized in the cranial suture mesenchyme. Our experiments further showed that Col1a1CreER-DsRed-expressing cells lack chondrogenic potential, while the Prx1CreER-GFP-expressing cells show both chondrogenic and osteogenic potential. Our results indicate that Col1a1CreER-DsRed-expressing cells are committed osteoblasts, while Prx1CreER-GFP-expressing cells are osteochondro progenitor cells. The Prx1CreER-GFP and Col1a1CreER-DsRed transgenes will offer novel approaches for analyzing lineage commitment and early stages of osteoblast differentiation under physiologic and pathologic conditions. PMID:24513582

  17. Analysis of the entire genomes of fifteen torque teno midi virus variants classifiable into a third group of genus Anellovirus.

    PubMed

    Ninomiya, M; Takahashi, M; Shimosegawa, T; Okamoto, H

    2007-01-01

    Recently, we identified a novel human virus with a circular DNA genome of 3.2 kb, tentatively designated as torque teno midi virus (TTMDV), with a genomic organization resembling those of torque teno virus (TTV) of 3.8-3.9 kb and torque teno mini virus (TTMV) of 2.8-2.9 kb. To investigate the extent of genomic variability of TTMDV genomes, the full-length sequence was determined for 15 TTMDV isolates obtained from viremic individuals in Japan. The 15 TTMDV isolates comprised 3175-3230 bases and shared 67.0-90.3% identities with each other, and were only 68.4-73.0% identical to the 3 reported TTMDV isolates over the entire genome. TTMDV possessed a genomic organization with four open reading frames (ORF1-ORF4) with characteristic sequence motifs and stem and loop structures with high GC content, similar to TTV and TTMV. The total of 18 TTMDV genomes differed by up to 60.7% from each other in the amino acid sequence of ORF1 (658-677 amino acids), but segregated phylogenetically into the same cluster, which was distantly related to the TTVs and TTMVs. These results indicate that TTMDV with a circular DNA genome of 3.2 kb, has an extremely high degree of genomic variability, and is classifiable into a third group in the genus Anellovirus.

  18. Progressive myoclonus epilepsy EPM1 locus maps to a 175-kb interval in distal 21q

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Virtaneva, K.; Miao, J.; Traeskelin, A.L.

    1996-06-01

    The EPM1 locus responsible for progressive myoclonus epilepsy of Unverricht-Lundborg type (MIM 254800) maps to a region in distal chromosome 21q where positional cloning has been hampered by the lack of physical and genetic mapping resolution. We here report the use of a recently constituted contig of cosmid, BAC, and P1 clones that allowed new polymorphic markers to be positioned. These were typed in 53 unrelated disease families from an isolated Finnish population in which a putative single ancestral EPM1 mutation has segregated for an estimated 100 generations. By thus exploiting historical recombinations in haplotype analysis, EPM1 could be assignedmore » to the {approximately}175-kb interval between the markers D21S2040 and D21S1259. 26 refs., 2 figs., 4 tabs.« less

  19. SPAR: small RNA-seq portal for analysis of sequencing experiments.

    PubMed

    Kuksa, Pavel P; Amlie-Wolf, Alexandre; Katanic, Živadin; Valladares, Otto; Wang, Li-San; Leung, Yuk Yee

    2018-05-04

    The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.

  20. Noncoding sequence classification based on wavelet transform analysis: part II

    NASA Astrophysics Data System (ADS)

    Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez-Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

    2017-09-01

    DNA sequences in human genome can be divided into the coding and noncoding ones. We hypothesize that the characteristic periodicities of the noncoding sequences are related to their function. We describe the procedure to identify these characteristic periodicities using the wavelet analysis. Our results show that three groups of noncoding sequences, each one with different biological function, may be differentiated by their wavelet coefficients within specific frequency range.

  1. Analysis of the putative regulatory region of the gastric inhibitory polypeptide receptor gene in food-dependent Cushing's syndrome.

    PubMed

    Antonini, S R; N'Diaye, N; Baldacchino, V; Hamet, P; Tremblay, J; Lacroix, A

    2004-07-01

    Gastric inhibitory polypeptide (GIP)-dependent Cushing's syndrome (CS) results from the ectopic expression of non-mutated GIP receptor (hGIPR) in the adrenal cortex. We evaluated whether mutations or polymorphisms in the regulatory region of the GIPR gene could lead to this aberrant expression. We studied 9.0kb upstream and 1.3kb downstream of the GIPR gene putative promoter (pProm) by sequencing leukocyte DNA from controls and from adrenal tissues of GIP- and non-GIP-dependent CS patients. The putative proximal promoter region (800 bp) and the first exon and intron of the hGIPR gene were sequenced on adrenal DNA from nine GIP-dependent CS, as well as on leukocyte DNA of nine normal controls. Three variations found in this region were found in all patients and controls; at position -4/-5, an insertion of a T was seen in four out of nine patients and in five out of nine controls. Transient transfection studies conducted in rat GC and mouse Y1 cells showed that the TT allele confers loss of 40% in the promoter activity. The analysis of the 8-kb distal pProm region revealed eight distal single nucleotide polymorphisms (SNPs) without probable association with the disease, since frequencies in patients and controls were very similar. In conclusion, mutations or SNPs in the regulatory region of the GIPR gene are unlikely to underlie GIP-dependent CS. Copyright 2004 Elsevier Ltd.

  2. Construction and Evaluation of Normalized cDNA Libraries Enriched with Full-Length Sequences for Rapid Discovery of New Genes from Sisal (Agave sisalana Perr.) Different Developmental Stages

    PubMed Central

    Zhou, Wen-Zhao; Zhang, Yan-Mei; Lu, Jun-Ying; Li, Jun-Feng

    2012-01-01

    To provide a resource of sisal-specific expressed sequence data and facilitate this powerful approach in new gene research, the preparation of normalized cDNA libraries enriched with full-length sequences is necessary. Four libraries were produced with RNA pooled from Agave sisalana multiple tissues to increase efficiency of normalization and maximize the number of independent genes by SMART™ method and the duplex-specific nuclease (DSN). This procedure kept the proportion of full-length cDNAs in the subtracted/normalized libraries and dramatically enhanced the discovery of new genes. Sequencing of 3875 cDNA clones of libraries revealed 3320 unigenes with an average insert length about 1.2 kb, indicating that the non-redundancy of libraries was about 85.7%. These unigene functions were predicted by comparing their sequences to functional domain databases and extensively annotated with Gene Ontology (GO) terms. Comparative analysis of sisal unigenes and other plant genomes revealed that four putative MADS-box genes and knotted-like homeobox (knox) gene were obtained from a total of 1162 full-length transcripts. Furthermore, real-time PCR showed that the characteristics of their transcripts mainly depended on the tight expression regulation of a number of genes during the leaf and flower development. Analysis of individual library sequence data indicated that the pooled-tissue approach was highly effective in discovering new genes and preparing libraries for efficient deep sequencing. PMID:23202944

  3. Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

    DOE PAGES

    Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; ...

    2014-09-01

    Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less

  4. Sequence variations of the partially dominant DELLA gene Rht-B1c in wheat and their functional impacts

    PubMed Central

    Ma, Zhengqiang

    2013-01-01

    Rht-B1c, allelic to the DELLA protein-encoding gene Rht-B1a, is a natural mutation documented in common wheat (Triticum aestivum). It confers variation to a number of traits related to cell and plant morphology, seed dormancy, and photosynthesis. The present study was conducted to examine the sequence variations of Rht-B1c and their functional impacts. The results showed that Rht-B1c was partially dominant or co-dominant for plant height, and exhibited an increased dwarfing effect. At the sequence level, Rht-B1c differed from Rht-B1a by one 2kb Veju retrotransposon insertion, three coding region single nucleotide polymorphisms (SNPs), one 197bp insertion, and four SNPs in the 1kb upstream sequence. Haplotype investigations, association analyses, transient expression assays, and expression profiling showed that the Veju insertion was primarily responsible for the extreme dwarfing effect. It was found that the Veju insertion changed processing of the Rht-B1c transcripts and resulted in DELLA motif primary structure disruption. Expression assays showed that Rht-B1c caused reduction of total Rht-1 transcript levels, and up-regulation of GATA-like transcription factors and genes positively regulated by these factors, suggesting that one way in which Rht-1 proteins affect plant growth and development is through GATA-like transcription factor regulation. PMID:23918966

  5. Alternative polyadenylation of the gene transcripts encoding a rat DNA polymerase beta.

    PubMed

    Konopiński, R; Nowak, R; Siedlecki, J A

    1996-10-17

    Rat cells produce two different transcripts of DNA polymerase beta (beta-Pol). The low-molecular-weight transcript (1.4 kb) was already sequenced. We report here the cloning and sequencing of the full-length cDNA, corresponding to the high-molecular-weight (HMW) transcript (4.0 kb) of beta-Pol. Sequence data strongly suggest that both transcripts are produced from a single gene by alternative polyadenylation. The HMW transcript contains the entire 1.4 kb transcript sequence and additional 2.2 kb on the 3' end. The 3' UTR of the HMW transcript contains some regulatory sequences which are not present in the 1.4-kb transcript. The A + U-rich fragment and (GU)21 sequence are believed to influence the stability of the mRNA. The functional significance of the A-rich region locally destabilizing double-stranded secondary structure remains unknown.

  6. Comparative Sequence and X-Inactivation Analyses of a Domain of Escape in Human Xp11.2 and the Conserved Segment in Mouse

    PubMed Central

    Tsuchiya, Karen D.; Greally, John M.; Yi, Yajun; Noel, Kevin P.; Truong, Jean-Pierre; Disteche, Christine M.

    2004-01-01

    We have performed X-inactivation and sequence analyses on 350 kb of sequence from human Xp11.2, a region shown previously to contain a cluster of genes that escape X inactivation, and we compared this region with the region of conserved synteny in mouse. We identified several new transcripts from this region in human and in mouse, which defined the full extent of the domain escaping X inactivation in both species. In human, escape from X inactivation involves an uninterrupted 235-kb domain of multiple genes. Despite highly conserved gene content and order between the two species, Smcx is the only mouse gene from the conserved segment that escapes inactivation. As repetitive sequences are believed to facilitate spreading of X inactivation along the chromosome, we compared the repetitive sequence composition of this region between the two species. We found that long terminal repeats (LTRs) were decreased in the human domain of escape, but not in the majority of the conserved mouse region adjacent to Smcx in which genes were subject to X inactivation, suggesting that these repeats might be excluded from escape domains to prevent spreading of silencing. Our findings indicate that genomic context, as well as gene-specific regulatory elements, interact to determine expression of a gene from the inactive X-chromosome. PMID:15197169

  7. KB-R7943 reduces 4-aminopyridine-induced epileptiform activity in adult rats after neuronal damage induced by neonatal monosodium glutamate treatment.

    PubMed

    Hernandez-Ojeda, Mariana; Ureña-Guerrero, Monica E; Gutierrez-Barajas, Paola E; Cardenas-Castillo, Jazmin A; Camins, Antoni; Beas-Zarate, Carlos

    2017-05-09

    Neonatal monosodium glutamate (MSG) treatment triggers excitotoxicity and induces a degenerative process that affects several brain regions in a way that could lead to epileptogenesis. Na + /Ca 2+ exchangers (NCX1-3) are implicated in Ca 2+ brain homeostasis; normally, they extrude Ca 2+ to control cell inflammation, but after damage and in epilepsy, they introduce Ca 2+ by acting in the reverse mode, amplifying the damage. Changes in NCX3 expression in the hippocampus have been reported immediately after neonatal MSG treatment. In this study, the expression level of NCX1-3 in the entorhinal cortex (EC) and hippocampus (Hp); and the effects of blockade of NCXs on the seizures induced by 4-Aminopyridine (4-AP) were analysed in adult rats after neonatal MSG treatment. KB-R7943 was applied as NCXs blocker, but is more selective to NCX3 in reverse mode. Neonatal MSG treatment was applied to newborn male rats at postnatal days (PD) 1, 3, 5, and 7 (4 g/kg of body weight, s.c.). Western blot analysis was performed on total protein extracts from the EC and Hp to estimate the expression level of NCX1-3 proteins in relative way to the expression of β-actin, as constitutive protein. Electrographic activity of the EC and Hp were acquired before and after intracerebroventricular (i.c.v.) infusion of 4-AP (3 nmol) and KB-R7943 (62.5 pmol), alone or in combination. All experiments were performed at PD60. Behavioural alterations were also recorder. Neonatal MSG treatment significantly increased the expression of NCX3 protein in both studied regions, and NCX1 protein only in the EC. The 4-AP-induced epileptiform activity was significantly higher in MSG-treated rats than in controls, and KB-R7943 co-administered with 4-AP reduced the epileptiform activity in more prominent way in MSG-treated rats than in controls. The long-term effects of neonatal MSG treatment include increases on functional expression of NCXs (mainly of NCX3) in the EC and Hp, which seems to contribute to

  8. A conjugative 38 kB plasmid is present in multiple subspecies of Xylella fastidiosa.

    PubMed

    Rogers, Elizabeth E; Stenger, Drake C

    2012-01-01

    A ≈ 38kB plasmid (pXF-RIV5) was present in the Riv5 strain of Xylella fastidiosa subsp. multiplex isolated from ornamental plum in southern California. The complete nucleotide sequence of pXF-RIV5 is almost identical to that of pXFAS01 from X. fastidiosa subsp. fastidiosa strain M23; the two plasmids vary at only 6 nucleotide positions. BLAST searches and phylogenetic analyses indicate pXF-RIV5 and pXFAS01 share some similarity to chromosomal and plasmid (pXF51) sequences of X. fastidiosa subsp. pauca strain 9a5c and more distant similarity to plasmids from a wide variety of bacteria. Both pXF-RIV5 and pXFAS01 encode homologues of a complete Type IV secretion system involved in conjugation and DNA transfer among bacteria. Mating pair formation proteins (Trb) from Yersinia pseudotuberculosis IP31758 are the mostly closely related non-X. fastidiosa proteins to most of the Trb proteins encoded by pXF-RIV5 and pXFAS01. Unlike many bacterial conjugative plasmids, pXF-RIV5 and pXFAS01 do not carry homologues of known accessory modules that confer selective advantage on host bacteria. However, both plasmids encode seven hypothetical proteins of unknown function and possess a small transposon-associated region encoding a putative transposase and associated factor. Vegetative replication of pXF-RIV5 and pXFAS01 appears to be under control of RepA protein and both plasmids have an origin of DNA replication (oriV) similar to that of pRP4 and pR751 from Escherichia coli. In contrast, conjugative plasmids commonly encode TrfA and have an oriV similar to those found in IncP-1 incompatibility group plasmids. The presence of nearly identical plasmids in single strains from two distinct subspecies of X. fastidiosa is indicative of recent horizontal transfer, probably subsequent to the introduction of subspecies fastidiosa to the United States in the late 19(th) century.

  9. A Conjugative 38 kB Plasmid Is Present in Multiple Subspecies of Xylella fastidiosa

    PubMed Central

    Rogers, Elizabeth E.; Stenger, Drake C.

    2012-01-01

    A ∼38kB plasmid (pXF-RIV5) was present in the Riv5 strain of Xylella fastidiosa subsp. multiplex isolated from ornamental plum in southern California. The complete nucleotide sequence of pXF-RIV5 is almost identical to that of pXFAS01 from X. fastidiosa subsp. fastidiosa strain M23; the two plasmids vary at only 6 nucleotide positions. BLAST searches and phylogenetic analyses indicate pXF-RIV5 and pXFAS01 share some similarity to chromosomal and plasmid (pXF51) sequences of X. fastidiosa subsp. pauca strain 9a5c and more distant similarity to plasmids from a wide variety of bacteria. Both pXF-RIV5 and pXFAS01 encode homologues of a complete Type IV secretion system involved in conjugation and DNA transfer among bacteria. Mating pair formation proteins (Trb) from Yersinia pseudotuberculosis IP31758 are the mostly closely related non-X. fastidiosa proteins to most of the Trb proteins encoded by pXF-RIV5 and pXFAS01. Unlike many bacterial conjugative plasmids, pXF-RIV5 and pXFAS01 do not carry homologues of known accessory modules that confer selective advantage on host bacteria. However, both plasmids encode seven hypothetical proteins of unknown function and possess a small transposon-associated region encoding a putative transposase and associated factor. Vegetative replication of pXF-RIV5 and pXFAS01 appears to be under control of RepA protein and both plasmids have an origin of DNA replication (oriV) similar to that of pRP4 and pR751 from Escherichia coli. In contrast, conjugative plasmids commonly encode TrfA and have an oriV similar to those found in IncP-1 incompatibility group plasmids. The presence of nearly identical plasmids in single strains from two distinct subspecies of X. fastidiosa is indicative of recent horizontal transfer, probably subsequent to the introduction of subspecies fastidiosa to the United States in the late 19th century. PMID:23251694

  10. BioNano genome mapping of individual chromosomes supports physical mapping and sequence assembly in complex plant genomes.

    PubMed

    Staňková, Helena; Hastie, Alex R; Chan, Saki; Vrána, Jan; Tulpová, Zuzana; Kubaláková, Marie; Visendi, Paul; Hayashi, Satomi; Luo, Mingcheng; Batley, Jacqueline; Edwards, David; Doležel, Jaroslav; Šimková, Hana

    2016-07-01

    The assembly of a reference genome sequence of bread wheat is challenging due to its specific features such as the genome size of 17 Gbp, polyploid nature and prevalence of repetitive sequences. BAC-by-BAC sequencing based on chromosomal physical maps, adopted by the International Wheat Genome Sequencing Consortium as the key strategy, reduces problems caused by the genome complexity and polyploidy, but the repeat content still hampers the sequence assembly. Availability of a high-resolution genomic map to guide sequence scaffolding and validate physical map and sequence assemblies would be highly beneficial to obtaining an accurate and complete genome sequence. Here, we chose the short arm of chromosome 7D (7DS) as a model to demonstrate for the first time that it is possible to couple chromosome flow sorting with genome mapping in nanochannel arrays and create a de novo genome map of a wheat chromosome. We constructed a high-resolution chromosome map composed of 371 contigs with an N50 of 1.3 Mb. Long DNA molecules achieved by our approach facilitated chromosome-scale analysis of repetitive sequences and revealed a ~800-kb array of tandem repeats intractable to current DNA sequencing technologies. Anchoring 7DS sequence assemblies obtained by clone-by-clone sequencing to the 7DS genome map provided a valuable tool to improve the BAC-contig physical map and validate sequence assembly on a chromosome-arm scale. Our results indicate that creating genome maps for the whole wheat genome in a chromosome-by-chromosome manner is feasible and that they will be an affordable tool to support the production of improved pseudomolecules. © 2016 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.

  11. Ordered shotgun sequencing of a 135 kb Xq25 YAC containing ANT2 and four possible genes, including three confirmed by EST matches.

    PubMed Central

    Chen, C N; Su, Y; Baybayan, P; Siruno, A; Nagaraja, R; Mazzarella, R; Schlessinger, D; Chen, E

    1996-01-01

    Ordered shotgun sequencing (OSS) has been successfully carried out with an Xq25 YAC substrate. yWXD703 DNA was subcloned into lambda phage and sequences of insert ends of the lambda subclones were used to generate a map to select a minimum tiling path of clones to be completely sequenced. The sequence of 135 038 nt contains the entire ANT2 cDNA as well as four other candidates suggested by computer-assisted analyses. One of the putative genes is homologous to a gene implicated in Graves' disease and it, ANT2 and two others are confirmed by EST matches. The results suggest that OSS can be applied to YACs in accord with earlier simulations and further indicate that the sequence of the YAC accurately reflects the sequence of uncloned human DNA. PMID:8918809

  12. Mapping-by-sequencing of Ligon-lintless-1 (Li 1 ) reveals a cluster of neighboring genes with correlated expression in developing fibers of Upland cotton (Gossypium hirsutum L.).

    PubMed

    Thyssen, Gregory N; Fang, David D; Turley, Rickie B; Florane, Christopher; Li, Ping; Naoumkina, Marina

    2015-09-01

    Mapping-by-sequencing and SNP marker analysis were used to fine map the Ligon-lintless-1 ( Li 1 ) short fiber mutation in tetraploid cotton to a 255-kb region that contains 16 annotated proteins. The Ligon-lintless-1 (Li 1 ) mutant of cotton (Gossypium hirsutum L.) has been studied as a model for cotton fiber development since its identification in 1929; however, the causative mutation has not been identified yet. Here we report the fine genetic mapping of the mutation to a 255-kb region that contains only 16 annotated genes in the reference Gossypium raimondii genome. We took advantage of the incompletely dominant dwarf vegetative phenotype to identify 100 mutants (Li 1 /Li 1 ) and 100 wild-type (li 1 /li 1 ) homozygotes from a mapping population of 2567 F2 plants, which we bulked and deep sequenced. Since only homozygotes were sequenced, we were able to use a high stringency in SNP calling to rapidly narrow down the region harboring the Li 1 locus, and designed subgenome-specific SNP markers to test the population. We characterized the expression of all sixteen genes in the region by RNA sequencing of elongating fibers and by RT-qPCR at seven time points spanning fiber development. One of the most highly expressed genes found in this interval in wild-type fiber cells is 40-fold under-expressed at the day of anthesis (DOA) in the mutant fiber cells.  This gene is a major facilitator superfamily protein, part of the large family of proteins that includes auxin and sugar transporters. Interestingly, nearly all genes in this region were most highly expressed at DOA and showed a high degree of co-expression. Further characterization is required to determine if transport of hormones or carbohydrates is involved in both the dwarf and lintless phenotypes of Li 1 plants.

  13. Ultraaccurate genome sequencing and haplotyping of single human cells.

    PubMed

    Chu, Wai Keung; Edge, Peter; Lee, Ho Suk; Bansal, Vikas; Bafna, Vineet; Huang, Xiaohua; Zhang, Kun

    2017-11-21

    Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10 -8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.

  14. An Imaging And Graphics Workstation For Image Sequence Analysis

    NASA Astrophysics Data System (ADS)

    Mostafavi, Hassan

    1990-01-01

    This paper describes an application-specific engineering workstation designed and developed to analyze imagery sequences from a variety of sources. The system combines the software and hardware environment of the modern graphic-oriented workstations with the digital image acquisition, processing and display techniques. The objective is to achieve automation and high throughput for many data reduction tasks involving metric studies of image sequences. The applications of such an automated data reduction tool include analysis of the trajectory and attitude of aircraft, missile, stores and other flying objects in various flight regimes including launch and separation as well as regular flight maneuvers. The workstation can also be used in an on-line or off-line mode to study three-dimensional motion of aircraft models in simulated flight conditions such as wind tunnels. The system's key features are: 1) Acquisition and storage of image sequences by digitizing real-time video or frames from a film strip; 2) computer-controlled movie loop playback, slow motion and freeze frame display combined with digital image sharpening, noise reduction, contrast enhancement and interactive image magnification; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored image sequence; 4) automatic and manual field-of-view and spatial calibration; 5) image sequence data base generation and management, including the measurement data products; 6) off-line analysis software for trajectory plotting and statistical analysis; 7) model-based estimation and tracking of object attitude angles; and 8) interface to a variety of video players and film transport sub-systems.

  15. Network Analysis of Sequence-Function Relationships and Exploration of Sequence Space of TEM β-Lactamases.

    PubMed

    Zeil, Catharina; Widmann, Michael; Fademrecht, Silvia; Vogel, Constantin; Pleiss, Jürgen

    2016-05-01

    The Lactamase Engineering Database (www.LacED.uni-stuttgart.de) was developed to facilitate the classification and analysis of TEM β-lactamases. The current version contains 474 TEM variants. Two hundred fifty-nine variants form a large scale-free network of highly connected point mutants. The network was divided into three subnetworks which were enriched by single phenotypes: one network with predominantly 2be and two networks with 2br phenotypes. Fifteen positions were found to be highly variable, contributing to the majority of the observed variants. Since it is expected that a considerable fraction of the theoretical sequence space is functional, the currently sequenced 474 variants represent only the tip of the iceberg of functional TEM β-lactamase variants which form a huge natural reservoir of highly interconnected variants. Almost 50% of the variants are part of a quartet. Thus, two single mutations that result in functional enzymes can be combined into a functional protein. Most of these quartets consist of the same phenotype, or the mutations are additive with respect to the phenotype. By predicting quartets from triplets, 3,916 unknown variants were constructed. Eighty-seven variants complement multiple quartets and therefore have a high probability of being functional. The construction of a TEM β-lactamase network and subsequent analyses by clustering and quartet prediction are valuable tools to gain new insights into the viable sequence space of TEM β-lactamases and to predict their phenotype. The highly connected sequence space of TEM β-lactamases is ideally suited to network analysis and demonstrates the strengths of network analysis over tree reconstruction methods. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  16. A world of opportunities with nanopore sequencing.

    PubMed

    Leggett, Richard M; Clark, Matthew D

    2017-11-28

    Oxford Nanopore Technologies' MinION sequencer was launched in pre-release form in 2014 and represents an exciting new sequencing paradigm. The device offers multi-kilobase reads and a streamed mode of operation that allows processing of reads as they are generated. Crucially, it is an extremely compact device that is powered from the USB port of a laptop computer, enabling it to be taken out of the lab and facilitating previously impossible in-field sequencing experiments to be undertaken. Many of the initial publications concerning the platform focused on provision of tools to access and analyse the new sequence formats and then demonstrating the assembly of microbial genomes. More recently, as throughput and accuracy have increased, it has been possible to begin work involving more complex genomes and metagenomes. With the release of the high-throughput GridION X5 and PromethION platforms, the sequencing of large genomes will become more cost efficient, and enable the leveraging of extremely long (>100 kb) reads for resolution of complex genomic structures. This review provides a brief overview of nanopore sequencing technology, describes the growing range of nanopore bioinformatics tools, and highlights some of the most influential publications that have emerged over the last 2 years. Finally, we look to the future and the potential the platform has to disrupt work in human, microbiome, and plant genomics. © The Author 2017. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  17. PRADA: pipeline for RNA sequencing data analysis.

    PubMed

    Torres-García, Wandaliz; Zheng, Siyuan; Sivachenko, Andrey; Vegesna, Rahulsimham; Wang, Qianghu; Yao, Rong; Berger, Michael F; Weinstein, John N; Getz, Gad; Verhaak, Roel G W

    2014-08-01

    Technological advances in high-throughput sequencing necessitate improved computational tools for processing and analyzing large-scale datasets in a systematic automated manner. For that purpose, we have developed PRADA (Pipeline for RNA-Sequencing Data Analysis), a flexible, modular and highly scalable software platform that provides many different types of information available by multifaceted analysis starting from raw paired-end RNA-seq data: gene expression levels, quality metrics, detection of unsupervised and supervised fusion transcripts, detection of intragenic fusion variants, homology scores and fusion frame classification. PRADA uses a dual-mapping strategy that increases sensitivity and refines the analytical endpoints. PRADA has been used extensively and successfully in the glioblastoma and renal clear cell projects of The Cancer Genome Atlas program.  http://sourceforge.net/projects/prada/  gadgetz@broadinstitute.org or rverhaak@mdanderson.org  Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  18. A high HIV-1 strain variability in London, UK, revealed by full-genome analysis: Results from the ICONIC project.

    PubMed

    Yebra, Gonzalo; Frampton, Dan; Gallo Cassarino, Tiziano; Raffle, Jade; Hubb, Jonathan; Ferns, R Bridget; Waters, Laura; Tong, C Y William; Kozlakidis, Zisis; Hayward, Andrew; Kellam, Paul; Pillay, Deenan; Clark, Duncan; Nastouli, Eleni; Leigh Brown, Andrew J

    2018-01-01

    The ICONIC project has developed an automated high-throughput pipeline to generate HIV nearly full-length genomes (NFLG, i.e. from gag to nef) from next-generation sequencing (NGS) data. The pipeline was applied to 420 HIV samples collected at University College London Hospitals NHS Trust and Barts Health NHS Trust (London) and sequenced using an Illumina MiSeq at the Wellcome Trust Sanger Institute (Cambridge). Consensus genomes were generated and subtyped using COMET, and unique recombinants were studied with jpHMM and SimPlot. Maximum-likelihood phylogenetic trees were constructed using RAxML to identify transmission networks using the Cluster Picker. The pipeline generated sequences of at least 1Kb of length (median = 7.46Kb, IQR = 4.01Kb) for 375 out of the 420 samples (89%), with 174 (46.4%) being NFLG. A total of 365 sequences (169 of them NFLG) corresponded to unique subjects and were included in the down-stream analyses. The most frequent HIV subtypes were B (n = 149, 40.8%) and C (n = 77, 21.1%) and the circulating recombinant form CRF02_AG (n = 32, 8.8%). We found 14 different CRFs (n = 66, 18.1%) and multiple URFs (n = 32, 8.8%) that involved recombination between 12 different subtypes/CRFs. The most frequent URFs were B/CRF01_AE (4 cases) and A1/D, B/C, and B/CRF02_AG (3 cases each). Most URFs (19/26, 73%) lacked breakpoints in the PR+RT pol region, rendering them undetectable if only that was sequenced. Twelve (37.5%) of the URFs could have emerged within the UK, whereas the rest were probably imported from sub-Saharan Africa, South East Asia and South America. For 2 URFs we found highly similar pol sequences circulating in the UK. We detected 31 phylogenetic clusters using the full dataset: 25 pairs (mostly subtypes B and C), 4 triplets and 2 quadruplets. Some of these were not consistent across different genes due to inter- and intra-subtype recombination. Clusters involved 70 sequences, 19.2% of the dataset. The initial analysis of genome sequences

  19. A high HIV-1 strain variability in London, UK, revealed by full-genome analysis: Results from the ICONIC project

    PubMed Central

    Frampton, Dan; Gallo Cassarino, Tiziano; Raffle, Jade; Hubb, Jonathan; Ferns, R. Bridget; Waters, Laura; Tong, C. Y. William; Kozlakidis, Zisis; Hayward, Andrew; Kellam, Paul; Pillay, Deenan; Clark, Duncan; Nastouli, Eleni; Leigh Brown, Andrew J.

    2018-01-01

    Background & methods The ICONIC project has developed an automated high-throughput pipeline to generate HIV nearly full-length genomes (NFLG, i.e. from gag to nef) from next-generation sequencing (NGS) data. The pipeline was applied to 420 HIV samples collected at University College London Hospitals NHS Trust and Barts Health NHS Trust (London) and sequenced using an Illumina MiSeq at the Wellcome Trust Sanger Institute (Cambridge). Consensus genomes were generated and subtyped using COMET, and unique recombinants were studied with jpHMM and SimPlot. Maximum-likelihood phylogenetic trees were constructed using RAxML to identify transmission networks using the Cluster Picker. Results The pipeline generated sequences of at least 1Kb of length (median = 7.46Kb, IQR = 4.01Kb) for 375 out of the 420 samples (89%), with 174 (46.4%) being NFLG. A total of 365 sequences (169 of them NFLG) corresponded to unique subjects and were included in the down-stream analyses. The most frequent HIV subtypes were B (n = 149, 40.8%) and C (n = 77, 21.1%) and the circulating recombinant form CRF02_AG (n = 32, 8.8%). We found 14 different CRFs (n = 66, 18.1%) and multiple URFs (n = 32, 8.8%) that involved recombination between 12 different subtypes/CRFs. The most frequent URFs were B/CRF01_AE (4 cases) and A1/D, B/C, and B/CRF02_AG (3 cases each). Most URFs (19/26, 73%) lacked breakpoints in the PR+RT pol region, rendering them undetectable if only that was sequenced. Twelve (37.5%) of the URFs could have emerged within the UK, whereas the rest were probably imported from sub-Saharan Africa, South East Asia and South America. For 2 URFs we found highly similar pol sequences circulating in the UK. We detected 31 phylogenetic clusters using the full dataset: 25 pairs (mostly subtypes B and C), 4 triplets and 2 quadruplets. Some of these were not consistent across different genes due to inter- and intra-subtype recombination. Clusters involved 70 sequences, 19.2% of the dataset. Conclusions

  20. Inhibition of spontaneous activity of rabbit atrioventricular node cells by KB-R7943 and inhibitors of sarcoplasmic reticulum Ca2+ ATPase

    PubMed Central

    Cheng, Hongwei; Smith, Godfrey L.; Hancox, Jules C.; Orchard, Clive H.

    2011-01-01

    The atrioventricular node (AVN) can act as a subsidiary cardiac pacemaker if the sinoatrial node fails. In this study, we investigated the effects of the Na–Ca exchange (NCX) inhibitor KB-R7943, and inhibition of the sarcoplasmic reticulum calcium ATPase (SERCA), using thapsigargin or cyclopiazonic acid (CPA), on spontaneous action potentials (APs) and [Ca2+]i transients from cells isolated from the rabbit AVN. Spontaneous [Ca2+]i transients were monitored from undialysed AVN cells at 37 °C using Fluo-4. In separate experiments, spontaneous APs and ionic currents were recorded using the whole-cell patch clamp technique. Rapid application of 5 μM KB-R7943 slowed or stopped spontaneous APs and [Ca2+]i transients. However, in voltage clamp experiments in addition to blocking NCX current (INCX) KB-R7943 partially inhibited L-type calcium current (ICa,L). Rapid reduction of external [Na+] also abolished spontaneous activity. Inhibition of SERCA (using 2.5 μM thapsigargin or 30 μM CPA) also slowed or stopped spontaneous APs and [Ca2+]i transients. Our findings are consistent with the hypothesis that sarcoplasmic reticulum (SR) Ca2+ release influences spontaneous activity in AVN cells, and that this occurs via [Ca2+]i-activated INCX; however, the inhibitory action of KB-R7943 on ICa,L means that care is required in the interpretation of data obtained using this compound. PMID:21163524

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kennedy, M.A.; Morris, C.M.; Fitzgerald, P.H.

    The human kappa deleting element (Kde) mediates loss of CK and JK genes in B cells. A probe for Kde detects two genomic sequences on Southern blots. The Kde is located 24kb 3{prime} to CK, but the position of the homologous sequence is unknown. The authors in situ hybridized m141-2 to metaphase cells of JC11, a B-cell line bearing a t(2;14)(p11;q32) in which the chromosome 2 breakpoint is within JK or the VK-JK intron. Three peaks of labelled sites were obtained. Southern analysis of BamH1 digested DNA showed that Kde (14kb) and the homologous sequence (3kb) were both intact. Kdemore » accounts for hybridization to 14q+ and the 2p- signal presumably derives from the related sequence. This locates the sequence homologous to Kde upstream from JK, possibly within the VK cluster, and may reflect transposition or some other duplicative event as proposed for the evolution of other regions of the kappa locus.« less

  2. Regularized rare variant enrichment analysis for case-control exome sequencing data.

    PubMed

    Larson, Nicholas B; Schaid, Daniel J

    2014-02-01

    Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.

  3. [Complete genome sequencing and sequence analysis of BCG Tice].

    PubMed

    Wang, Zhiming; Pan, Yuanlong; Wu, Jun; Zhu, Baoli

    2012-10-04

    The objective of this study is to obtain the complete genome sequence of Bacillus Calmette-Guerin Tice (BCG Tice), in order to provide more information about the molecular biology of BCG Tice and design more reasonable vaccines to prevent tuberculosis. We assembled the data from high-throughput sequencing with SOAPdenovo software, with many contigs and scaffolds obtained. There are many sequence gaps and physical gaps remained as a result of regional low coverage and low quality. We designed primers at the end of contigs and performed PCR amplification in order to link these contigs and scaffolds. With various enzymes to perform PCR amplification, adjustment of PCR reaction conditions, and combined with clone construction to sequence, all the gaps were finished. We obtained the complete genome sequence of BCG Tice and submitted it to GenBank of National Center for Biotechnology Information (NCBI). The genome of BCG Tice is 4334064 base pairs in length, with GC content 65.65%. The problems and strategies during the finishing step of BCG Tice sequencing are illuminated here, with the hope of affording some experience to those who are involved in the finishing step of genome sequencing. The microarray data were verified by our results.

  4. FAST: FAST Analysis of Sequences Toolbox

    PubMed Central

    Lawrence, Travis J.; Kauffman, Kyle T.; Amrine, Katherine C. H.; Carper, Dana L.; Lee, Raymond S.; Becich, Peter J.; Canales, Claudia J.; Ardell, David H.

    2015-01-01

    FAST (FAST Analysis of Sequences Toolbox) provides simple, powerful open source command-line tools to filter, transform, annotate and analyze biological sequence data. Modeled after the GNU (GNU's Not Unix) Textutils such as grep, cut, and tr, FAST tools such as fasgrep, fascut, and fastr make it easy to rapidly prototype expressive bioinformatic workflows in a compact and generic command vocabulary. Compact combinatorial encoding of data workflows with FAST commands can simplify the documentation and reproducibility of bioinformatic protocols, supporting better transparency in biological data science. Interface self-consistency and conformity with conventions of GNU, Matlab, Perl, BioPerl, R, and GenBank help make FAST easy and rewarding to learn. FAST automates numerical, taxonomic, and text-based sorting, selection and transformation of sequence records and alignment sites based on content, index ranges, descriptive tags, annotated features, and in-line calculated analytics, including composition and codon usage. Automated content- and feature-based extraction of sites and support for molecular population genetic statistics make FAST useful for molecular evolutionary analysis. FAST is portable, easy to install and secure thanks to the relative maturity of its Perl and BioPerl foundations, with stable releases posted to CPAN. Development as well as a publicly accessible Cookbook and Wiki are available on the FAST GitHub repository at https://github.com/tlawrence3/FAST. The default data exchange format in FAST is Multi-FastA (specifically, a restriction of BioPerl FastA format). Sanger and Illumina 1.8+ FastQ formatted files are also supported. FAST makes it easier for non-programmer biologists to interactively investigate and control biological data at the speed of thought. PMID:26042145

  5. A ZFY-like sequence in fish, with comments on the evolution of the ZFY family of genes in vertebrates

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zimmerer, E.J.; Threlkeld, L.

    1995-08-01

    ZFY-like genes have been observed in a variety of vertebrate species. Although originally implicated as the primary testis-determining gene in humans and other placental mammals, more recent evidence indicates a role(s) outside that of testis determination. In this study, DNA from five species of fish, Carasius auratus, Rivulus marmoratus, Xiphophorus maculatus, X. milleri, and X. nigrensis was subjected to Southern blot analysis using a PCR-amplified fragment of mouse ZFY-like sequence as a probe. Restriction fragment patterns were not polymorphic between sexes in any one species but showed a different pattern for each species. With one exception, Rivulus, a 3.1-kb bandmore » from the EcoRI digestion was common to all. Sequence and open reading frame analysis of this fragment showed a strong homology to other known vertebrate ZFY-like genes. Of particular interest in this gene is a novel third finger domain similar to one human and one alligator ZFY-like gene. Our studies and others provide evidence for a family of vertebrate ZFY genes, with those having this novel third finger being representative of the ancestral condition. 30 refs., 3 figs., 3 tabs.« less

  6. Cloning and sequence analysis of Hemonchus contortus HC58cDNA.

    PubMed

    Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li

    2007-06-01

    The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.

  7. Partial Shotgun Sequencing of the Boechera stricta Genome Reveals Extensive Microsynteny and Promoter Conservation with Arabidopsis1[W

    PubMed Central

    Windsor, Aaron J.; Schranz, M. Eric; Formanová, Nataša; Gebauer-Jung, Steffi; Bishop, John G.; Schnabelrauch, Domenica; Kroymann, Juergen; Mitchell-Olds, Thomas

    2006-01-01

    Comparative genomics provides insight into the evolutionary dynamics that shape discrete sequences as well as whole genomes. To advance comparative genomics within the Brassicaceae, we have end sequenced 23,136 medium-sized insert clones from Boechera stricta, a wild relative of Arabidopsis (Arabidopsis thaliana). A significant proportion of these sequences, 18,797, are nonredundant and display highly significant similarity (BLASTn e-value ≤ 10−30) to low copy number Arabidopsis genomic regions, including more than 9,000 annotated coding sequences. We have used this dataset to identify orthologous gene pairs in the two species and to perform a global comparison of DNA regions 5′ to annotated coding regions. On average, the 500 nucleotides upstream to coding sequences display 71.4% identity between the two species. In a similar analysis, 61.4% identity was observed between 5′ noncoding sequences of Brassica oleracea and Arabidopsis, indicating that regulatory regions are not as diverged among these lineages as previously anticipated. By mapping the B. stricta end sequences onto the Arabidopsis genome, we have identified nearly 2,000 conserved blocks of microsynteny (bracketing 26% of the Arabidopsis genome). A comparison of fully sequenced B. stricta inserts to their homologous Arabidopsis genomic regions indicates that indel polymorphisms >5 kb contribute substantially to the genome size difference observed between the two species. Further, we demonstrate that microsynteny inferred from end-sequence data can be applied to the rapid identification and cloning of genomic regions of interest from nonmodel species. These results suggest that among diploid relatives of Arabidopsis, small- to medium-scale shotgun sequencing approaches can provide rapid and cost-effective benefits to evolutionary and/or functional comparative genomic frameworks. PMID:16607030

  8. A simple, rapid, high-fidelity and cost-effective PCR-based two-step DNA synthesis method for long gene sequences.

    PubMed

    Xiong, Ai-Sheng; Yao, Quan-Hong; Peng, Ri-He; Li, Xian; Fan, Hui-Qin; Cheng, Zong-Ming; Li, Yi

    2004-07-07

    Chemical synthesis of DNA sequences provides a powerful tool for modifying genes and for studying gene function, structure and expression. Here, we report a simple, high-fidelity and cost-effective PCR-based two-step DNA synthesis (PTDS) method for synthesis of long segments of DNA. The method involves two steps. (i) Synthesis of individual fragments of the DNA of interest: ten to twelve 60mer oligonucleotides with 20 bp overlap are mixed and a PCR reaction is carried out with high-fidelity DNA polymerase Pfu to produce DNA fragments that are approximately 500 bp in length. (ii) Synthesis of the entire sequence of the DNA of interest: five to ten PCR products from the first step are combined and used as the template for a second PCR reaction using high-fidelity DNA polymerase pyrobest, with the two outermost oligonucleotides as primers. Compared with the previously published methods, the PTDS method is rapid (5-7 days) and suitable for synthesizing long segments of DNA (5-6 kb) with high G + C contents, repetitive sequences or complex secondary structures. Thus, the PTDS method provides an alternative tool for synthesizing and assembling long genes with complex structures. Using the newly developed PTDS method, we have successfully obtained several genes of interest with sizes ranging from 1.0 to 5.4 kb.

  9. Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

    PubMed

    Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

    2018-01-01

    We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation.  Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases.  We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes.  Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.

  10. Deletions involving long-range conserved nongenic sequences upstream and downstream of FOXL2 as a novel disease-causing mechanism in blepharophimosis syndrome.

    PubMed

    Beysen, D; Raes, J; Leroy, B P; Lucassen, A; Yates, J R W; Clayton-Smith, J; Ilyina, H; Brooks, S Sklower; Christin-Maitre, S; Fellous, M; Fryns, J P; Kim, J R; Lapunzina, P; Lemyre, E; Meire, F; Messiaen, L M; Oley, C; Splitt, M; Thomson, J; Van de Peer, Y; Veitia, R A; De Paepe, A; De Baere, E

    2005-08-01

    The expression of a gene requires not only a normal coding sequence but also intact regulatory regions, which can be located at large distances from the target genes, as demonstrated for an increasing number of developmental genes. In previous mutation studies of the role of FOXL2 in blepharophimosis syndrome (BPES), we identified intragenic mutations in 70% of our patients. Three translocation breakpoints upstream of FOXL2 in patients with BPES suggested a position effect. Here, we identified novel microdeletions outside of FOXL2 in cases of sporadic and familial BPES. Specifically, four rearrangements, with an overlap of 126 kb, are located 230 kb upstream of FOXL2, telomeric to the reported translocation breakpoints. Moreover, the shortest region of deletion overlap (SRO) contains several conserved nongenic sequences (CNGs) harboring putative transcription-factor binding sites and representing potential long-range cis-regulatory elements. Interestingly, the human region orthologous to the 12-kb sequence deleted in the polled intersex syndrome in goat, which is an animal model for BPES, is contained in this SRO, providing evidence of human-goat conservation of FOXL2 expression and of the mutational mechanism. Surprisingly, in a fifth family with BPES, one rearrangement was found downstream of FOXL2. In addition, we report nine novel rearrangements encompassing FOXL2 that range from partial gene deletions to submicroscopic deletions. Overall, genomic rearrangements encompassing or outside of FOXL2 account for 16% of all molecular defects found in our families with BPES. In summary, this is the first report of extragenic deletions in BPES, providing further evidence of potential long-range cis-regulatory elements regulating FOXL2 expression. It contributes to the enlarging group of developmental diseases caused by defective distant regulation of gene expression. Finally, we demonstrate that CNGs are candidate regions for genomic rearrangements in developmental

  11. Deletions Involving Long-Range Conserved Nongenic Sequences Upstream and Downstream of FOXL2 as a Novel Disease-Causing Mechanism in Blepharophimosis Syndrome

    PubMed Central

    Beysen, D.; Raes, J.; Leroy, B. P.; Lucassen, A.; Yates, J. R. W.; Clayton-Smith, J.; Ilyina, H.; Brooks, S. Sklower; Christin-Maitre, S.; Fellous, M.; Fryns, J. P.; Kim, J. R.; Lapunzina, P.; Lemyre, E.; Meire, F.; Messiaen, L. M.; Oley, C.; Splitt, M.; Thomson, J.; Peer, Y. Van de; Veitia, R. A.; De Paepe, A.; De Baere, E.

    2005-01-01

    The expression of a gene requires not only a normal coding sequence but also intact regulatory regions, which can be located at large distances from the target genes, as demonstrated for an increasing number of developmental genes. In previous mutation studies of the role of FOXL2 in blepharophimosis syndrome (BPES), we identified intragenic mutations in 70% of our patients. Three translocation breakpoints upstream of FOXL2 in patients with BPES suggested a position effect. Here, we identified novel microdeletions outside of FOXL2 in cases of sporadic and familial BPES. Specifically, four rearrangements, with an overlap of 126 kb, are located 230 kb upstream of FOXL2, telomeric to the reported translocation breakpoints. Moreover, the shortest region of deletion overlap (SRO) contains several conserved nongenic sequences (CNGs) harboring putative transcription-factor binding sites and representing potential long-range cis-regulatory elements. Interestingly, the human region orthologous to the 12-kb sequence deleted in the polled intersex syndrome in goat, which is an animal model for BPES, is contained in this SRO, providing evidence of human-goat conservation of FOXL2 expression and of the mutational mechanism. Surprisingly, in a fifth family with BPES, one rearrangement was found downstream of FOXL2. In addition, we report nine novel rearrangements encompassing FOXL2 that range from partial gene deletions to submicroscopic deletions. Overall, genomic rearrangements encompassing or outside of FOXL2 account for 16% of all molecular defects found in our families with BPES. In summary, this is the first report of extragenic deletions in BPES, providing further evidence of potential long-range cis-regulatory elements regulating FOXL2 expression. It contributes to the enlarging group of developmental diseases caused by defective distant regulation of gene expression. Finally, we demonstrate that CNGs are candidate regions for genomic rearrangements in developmental

  12. Insights from the Genome Sequence of Acidovorax citrulli M6, a Group I Strain of the Causal Agent of Bacterial Fruit Blotch of Cucurbits.

    PubMed

    Eckshtain-Levi, Noam; Shkedy, Dafna; Gershovits, Michael; Da Silva, Gustavo M; Tamir-Ariel, Dafna; Walcott, Ron; Pupko, Tal; Burdman, Saul

    2016-01-01

    Acidovorax citrulli is a seedborne bacterium that causes bacterial fruit blotch of cucurbit plants including watermelon and melon. A. citrulli strains can be divided into two major groups based on DNA fingerprint analyses and biochemical properties. Group I strains have been generally isolated from non-watermelon cucurbits, while group II strains are closely associated with watermelon. In the present study, we report the genome sequence of M6, a group I model A. citrulli strain, isolated from melon. We used comparative genome analysis to investigate differences between the genome of strain M6 and the genome of the group II model strain AAC00-1. The draft genome sequence of A. citrulli M6 harbors 139 contigs, with an overall approximate size of 4.85 Mb. The genome of M6 is ∼500 Kb shorter than that of strain AAC00-1. Comparative analysis revealed that this size difference is mainly explained by eight fragments, ranging from ∼35-120 Kb and distributed throughout the AAC00-1 genome, which are absent in the M6 genome. In agreement with this finding, while AAC00-1 was found to possess 532 open reading frames (ORFs) that are absent in strain M6, only 123 ORFs in M6 were absent in AAC00-1. Most of these M6 ORFs are hypothetical proteins and most of them were also detected in two group I strains that were recently sequenced, tw6 and pslb65. Further analyses by PCR assays and coverage analyses with other A. citrulli strains support the notion that some of these fragments or significant portions of them are discriminative between groups I and II strains of A. citrulli. Moreover, GC content, effective number of codon values and cluster of orthologs' analyses indicate that these fragments were introduced into group II strains by horizontal gene transfer events. Our study reports the genome sequence of a model group I strain of A. citrulli, one of the most important pathogens of cucurbits. It also provides the first comprehensive comparison at the genomic level between the

  13. Insights from the Genome Sequence of Acidovorax citrulli M6, a Group I Strain of the Causal Agent of Bacterial Fruit Blotch of Cucurbits

    PubMed Central

    Eckshtain-Levi, Noam; Shkedy, Dafna; Gershovits, Michael; Da Silva, Gustavo M.; Tamir-Ariel, Dafna; Walcott, Ron; Pupko, Tal; Burdman, Saul

    2016-01-01

    Acidovorax citrulli is a seedborne bacterium that causes bacterial fruit blotch of cucurbit plants including watermelon and melon. A. citrulli strains can be divided into two major groups based on DNA fingerprint analyses and biochemical properties. Group I strains have been generally isolated from non-watermelon cucurbits, while group II strains are closely associated with watermelon. In the present study, we report the genome sequence of M6, a group I model A. citrulli strain, isolated from melon. We used comparative genome analysis to investigate differences between the genome of strain M6 and the genome of the group II model strain AAC00-1. The draft genome sequence of A. citrulli M6 harbors 139 contigs, with an overall approximate size of 4.85 Mb. The genome of M6 is ∼500 Kb shorter than that of strain AAC00-1. Comparative analysis revealed that this size difference is mainly explained by eight fragments, ranging from ∼35–120 Kb and distributed throughout the AAC00-1 genome, which are absent in the M6 genome. In agreement with this finding, while AAC00-1 was found to possess 532 open reading frames (ORFs) that are absent in strain M6, only 123 ORFs in M6 were absent in AAC00-1. Most of these M6 ORFs are hypothetical proteins and most of them were also detected in two group I strains that were recently sequenced, tw6 and pslb65. Further analyses by PCR assays and coverage analyses with other A. citrulli strains support the notion that some of these fragments or significant portions of them are discriminative between groups I and II strains of A. citrulli. Moreover, GC content, effective number of codon values and cluster of orthologs’ analyses indicate that these fragments were introduced into group II strains by horizontal gene transfer events. Our study reports the genome sequence of a model group I strain of A. citrulli, one of the most important pathogens of cucurbits. It also provides the first comprehensive comparison at the genomic level between

  14. Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

    PubMed

    Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

    2012-08-01

    Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or <0.5% or >15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.

  15. Cloning and sequence analysis of the human brain beta-adrenergic receptor. Evolutionary relationship to rodent and avian beta-receptors and porcine muscarinic receptors.

    PubMed

    Chung, F Z; Lentes, K U; Gocayne, J; Fitzgerald, M; Robinson, D; Kerlavage, A R; Fraser, C M; Venter, J C

    1987-01-26

    Two cDNA clones, lambda-CLFV-108 and lambda-CLFV-119, encoding for the beta-adrenergic receptor, have been isolated from a human brain stem cDNA library. One human genomic clone, LCV-517 (20 kb), was characterized by restriction mapping and partial sequencing. The human brain beta-receptor consists of 413 amino acids with a calculated Mr of 46480. The gene contains three potential glucocorticoid receptor-binding sites. The beta-receptor expressed in human brain was homology with rodent (88%) and avian (52%) beta-receptors and with porcine muscarinic cholinergic receptors (31%), supporting our proposal [(1984) Proc. Natl. Acad. Sci. USA 81, 272 276] that adrenergic and muscarinic cholinergic receptors are structurally related. This represents the first cloning of a neurotransmitter receptor gene from human brain.

  16. The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis

    PubMed Central

    Rampp, Markus; Soddemann, Thomas; Lederer, Hermann

    2006-01-01

    We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980

  17. Analysis of Metagenomic Sequences: From Megabases to Terabases

    ScienceCinema

    Krypides, Nikos

    2018-05-04

    Nikos Krypides of the DOE Joint Genome Institute discusses metagenomics and the challenge of dealing with terabases of data on June 4, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.

  18. Disruption of a -35kb enhancer impairs CTCF binding and MLH1 expression in colorectal cells.

    PubMed

    Liu, Qing; Thoms, Julie A; Nunez, Andrea C; Huang, Yizhou; Knezevic, Kathy; Packham, Deborah; Poulos, Rebecca C; Williams, Rachel; Beck, Dominik; Hawkins, Nicholas J; Ward, Robyn L; Wong, Jason W H; Hesson, Luke B; Sloane, Mathew A; Pimanda, John

    2018-06-13

    MLH1 is a major tumour suppressor gene involved in the pathogenesis of Lynch syndrome and various sporadic cancers. Despite their potential pathogenic importance, genomic regions capable of regulating MLH1 expression over long distances have yet to be identified. Here we use chromosome conformation capture (3C) to screen a 650-kb region flanking the MLH1 locus to identify interactions between the MLH1 promoter and distal regions in MLH1 expressing and non-expressing cells. Putative enhancers were functionally validated using luciferase reporter assays, chromatin immunoprecipitation and CRISPR-Cas9 mediated deletion of endogenous regions. To evaluate whether germline variants in the enhancer might contribute to impaired MLH1 expression in patients with suspected Lynch syndrome, we also screened germline DNA from a cohort of 74 patients with no known coding mutations or epimutations at the MLH1 promoter. A 1.8kb DNA fragment, 35kb upstream of the MLH1 transcription start site enhances MLH1 gene expression in colorectal cells. The enhancer was bound by CTCF and CRISPR-Cas9 mediated deletion of a core binding region impairs endogenous MLH1 expression. 5.4% of suspected Lynch syndrome patients have a rare single nucleotide variant (G>A; rs143969848; 2.5% in gnomAD European, non-Finnish) within a highly conserved CTCF binding motif, which disrupts enhancer activity in SW620 colorectal carcinoma cells. A CTCF bound region within the MLH1 -35 enhancer regulates MLH1 expression in colorectal cells and is worthy of scrutiny in future genetic screening strategies for suspected Lynch syndrome associated with loss of MLH1 expression. Copyright ©2018, American Association for Cancer Research.

  19. DNA sequence analysis of the composite plasmid pTC conferring virulence and antimicrobial resistance for porcine enterotoxigenic Escherichia coli.

    PubMed

    Fekete, Péter Z; Brzuszkiewicz, Elzbieta; Blum-Oehler, Gabriele; Olasz, Ferenc; Szabó, Mónika; Gottschalk, Gerhard; Hacker, Jörg; Nagy, Béla

    2012-01-01

    In this study the plasmid pTC, a 90 kb self-conjugative virulence plasmid of the porcine enterotoxigenic Escherichia coli (ETEC) strain EC2173 encoding the STa and STb heat-stable enterotoxins and tetracycline resistance, has been sequenced in two steps. As a result we identified five main distinct regions of pTC: (i) the maintenance region responsible for the extreme stability of the plasmid, (ii) the TSL (toxin-specific locus comprising the estA and estB genes) which is unique and characteristic for pTC, (iii) a Tn10 transposon, encoding tetracycline resistance, (iv) the tra (plasmid transfer) region, and (v) the colE1-like origin of replication. It is concluded that pTC is a self-transmissible composite plasmid harbouring antibiotic resistance and virulence genes. pTC belongs to a group of large conjugative E. coli plasmids represented by NR1 with a widespread tra backbone which might have evolved from a common ancestor. This is the first report of a completely sequenced animal ETEC virulence plasmid containing an antimicrobial resistance locus, thereby representing a selection advantage for spread of pathogenicity in the presence of antimicrobials leading to increased disease potential. Copyright © 2011. Published by Elsevier GmbH.

  20. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    PubMed Central

    Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

    2009-01-01

    The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722

  1. Antiproliferative activity of flower hexane extract obtained from Mentha spicata associated with Mentha rotundifolia against the MCF7, KB, and NIH/3T3 cell lines.

    PubMed

    Nedel, Fernanda; Begnini, Karine; Carvalho, Pedro Henrique de Azambuja; Lund, Rafael Guerra; Beira, Fátima T A; Del Pino, Francisco Augusto B

    2012-11-01

    This study assessed the antiproliferative effect in vitro of the flower hexane extract obtained from Mentha spicata associated with Mentha rotundifolia against the human breast adenocarcinoma (MCF-7), human mouth epidermal carcinoma (KB), and mouse embryonic fibroblast (NIH 3T3) cell lines, using sulforhodamine B (SRB) assay. A cell density of 2×10(4)/well was seeded in 96-well plates, and samples at different concentrations ranging from 10 to 500 mg/mL were tested. The optical density was determined in an ELISA multiplate reader (Thermo Plate TP-Reader). Results demonstrated that the hexane extract presented antiproliferative activity against both the tumor cell lines KB and MCF-7, presenting a GI(50) (MCF-7=13.09 mg/mL), TGI (KB=37.76 mg/mL), and IL(50) (KB=291.07 mg/mL). Also, the hexane extract presented antiproliferative activity toward NIH 3T3 cells GI(50) (183.65 mg/mL), TGI (280.54 mg/mL), and IL(50) (384.59 mg/mL). The results indicate that the flower hexane extract obtained from M. spicata associated with M. rotundifolia presents an antineoplastic activity against KB and MCF-7, although an antiproliferative effect at a high concentration of the extract was observed toward NIH 3T3.

  2. Meta sequence analysis of human blood peptides and their parent proteins.

    PubMed

    Bowden, Peter; Pendrak, Voitek; Zhu, Peihong; Marshall, John G

    2010-04-18

    Sequence analysis of the blood peptides and their qualities will be key to understanding the mechanisms that contribute to error in LC-ESI-MS/MS. Analysis of peptides and their proteins at the level of sequences is much more direct and informative than the comparison of disparate accession numbers. A portable database of all blood peptide and protein sequences with descriptor fields and gene ontology terms might be useful for designing immunological or MRM assays from human blood. The results of twelve studies of human blood peptides and/or proteins identified by LC-MS/MS and correlated against a disparate array of genetic libraries were parsed and matched to proteins from the human ENSEMBL, SwissProt and RefSeq databases by SQL. The reported peptide and protein sequences were organized into an SQL database with full protein sequences and up to five unique peptides in order of prevalence along with the peptide count for each protein. Structured query language or BLAST was used to acquire descriptive information in current databases. Sampling error at the level of peptides is the largest source of disparity between groups. Chi Square analysis of peptide to protein distributions confirmed the significant agreement between groups on identified proteins. Copyright 2010. Published by Elsevier B.V.

  3. High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

    ScienceCinema

    Athavale, Ajay

    2018-01-04

    Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.

  4. The European Classical Swine Fever Virus Database: Blueprint for a Pathogen-Specific Sequence Database with Integrated Sequence Analysis Tools

    PubMed Central

    Postel, Alexander; Schmeiser, Stefanie; Zimmermann, Bernd; Becher, Paul

    2016-01-01

    Molecular epidemiology has become an indispensable tool in the diagnosis of diseases and in tracing the infection routes of pathogens. Due to advances in conventional sequencing and the development of high throughput technologies, the field of sequence determination is in the process of being revolutionized. Platforms for sharing sequence information and providing standardized tools for phylogenetic analyses are becoming increasingly important. The database (DB) of the European Union (EU) and World Organisation for Animal Health (OIE) Reference Laboratory for classical swine fever offers one of the world’s largest semi-public virus-specific sequence collections combined with a module for phylogenetic analysis. The classical swine fever (CSF) DB (CSF-DB) became a valuable tool for supporting diagnosis and epidemiological investigations of this highly contagious disease in pigs with high socio-economic impacts worldwide. The DB has been re-designed and now allows for the storage and analysis of traditionally used, well established genomic regions and of larger genomic regions including complete viral genomes. We present an application example for the analysis of highly similar viral sequences obtained in an endemic disease situation and introduce the new geographic “CSF Maps” tool. The concept of this standardized and easy-to-use DB with an integrated genetic typing module is suited to serve as a blueprint for similar platforms for other human or animal viruses. PMID:27827988

  5. Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

    DTIC Science & Technology

    2016-09-01

    AWARD NUMBER: W81XWH-14-1-0080 TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer. PRINCIPAL INVESTIGATOR...PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland 21702-5012 DISTRIBUTION STATEMENT: Approved for Public Release...SUBTITLE Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer. 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-14-1-0080 GRANT11489

  6. Cloning and sequencing of the cDNA species for mammalian dimeric dihydrodiol dehydrogenases.

    PubMed Central

    Arimitsu, E; Aoki, S; Ishikura, S; Nakanishi, K; Matsuura, K; Hara, A

    1999-01-01

    Cynomolgus and Japanese monkey kidneys, dog and pig livers and rabbit lens contain dimeric dihydrodiol dehydrogenase (EC 1.3.1.20) associated with high carbonyl reductase activity. Here we have isolated cDNA species for the dimeric enzymes by reverse transcriptase-PCR from human intestine in addition to the above five animal tissues. The amino acid sequences deduced from the monkey, pig and dog cDNA species perfectly matched the partial sequences of peptides digested from the respective enzymes of these animal tissues, and active recombinant proteins were expressed in a bacterial system from the monkey and human cDNA species. Northern blot analysis revealed the existence of a single 1.3 kb mRNA species for the enzyme in these animal tissues. The human enzyme shared 94%, 85%, 84% and 82% amino acid identity with the enzymes of the two monkey strains (their sequences were identical), the dog, the pig and the rabbit respectively. The sequences of the primate enzymes consisted of 335 amino acid residues and lacked one amino acid compared with the other animal enzymes. In contrast with previous reports that other types of dihydrodiol dehydrogenase, carbonyl reductases and enzymes with either activity belong to the aldo-keto reductase family or the short-chain dehydrogenase/reductase family, dimeric dihydrodiol dehydrogenase showed no sequence similarity with the members of the two protein families. The dimeric enzyme aligned with low degrees of identity (14-25%) with several prokaryotic proteins, in which 47 residues are strictly or highly conserved. Thus dimeric dihydrodiol dehydrogenase has a primary structure distinct from the previously known mammalian enzymes and is suggested to constitute a novel protein family with the prokaryotic proteins. PMID:10477285

  7. DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences: sequence analysis.

    PubMed

    Mohammed, Monzoorul Haque; Dutta, Anirban; Bose, Tungadri; Chadaram, Sudha; Mande, Sharmila S

    2012-10-01

    An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma. Linux, Windows and Mac implementations (both 32 and 64-bit) of DELIMINATE are freely available for download at: http://metagenomics.atc.tcs.com/compression/DELIMINATE. sharmila@atc.tcs.com Supplementary data are available at Bioinformatics online.

  8. Comparative and Evolutionary Analyses of Meloidogyne spp. Based on Mitochondrial Genome Sequences

    PubMed Central

    García, Laura Evangelina; Sánchez-Puerta, M. Virginia

    2015-01-01

    Molecular taxonomy and evolution of nematodes have been recently the focus of several studies. Mitochondrial sequences were proposed as an alternative for precise identification of Meloidogyne species, to study intraspecific variability and to follow maternal lineages. We characterized the mitochondrial genomes (mtDNAs) of the root knot nematodes M. floridensis, M. hapla and M. incognita. These were AT rich (81–83%) and highly compact, encoding 12 proteins, 2 rRNAs, and 22 tRNAs. Comparisons with published mtDNAs of M. chitwoodi, M. incognita (another strain) and M. graminicola revealed that they share protein and rRNA gene order but differ in the order of tRNAs. The mtDNAs of M. floridensis and M. incognita were strikingly similar (97–100% identity for all coding regions). In contrast, M. floridensis, M. chitwoodi, M. hapla and M. graminicola showed 65–84% nucleotide identity for coding regions. Variable mitochondrial sequences are potentially useful for evolutionary and taxonomic studies. We developed a molecular taxonomic marker by sequencing a highly-variable ~2 kb mitochondrial region, nad5-cox1, from 36 populations of root-knot nematodes to elucidate relationships within the genus Meloidogyne. Isolates of five species formed monophyletic groups and showed little intraspecific variability. We also present a thorough analysis of the mitochondrial region cox2-rrnS. Phylogenies based on either mitochondrial region had good discrimination power but could not discriminate between M. arenaria, M. incognita and M. floridensis. PMID:25799071

  9. Optical mapping and sequencing of the Escherichia coli KO11 genome reveal extensive chromosomal rearrangements, and multiple tandem copies of the Zymomonas mobilis pdc and adhB genes.

    PubMed

    Turner, Peter C; Yomano, Lorraine P; Jarboe, Laura R; York, Sean W; Baggett, Christy L; Moritz, Brélan E; Zentz, Emily B; Shanmugam, K T; Ingram, Lonnie O

    2012-04-01

    Escherichia coli KO11 (ATCC 55124) was engineered in 1990 to produce ethanol by chromosomal insertion of the Zymomonas mobilis pdc and adhB genes into E. coli W (ATCC 9637). KO11FL, our current laboratory version of KO11, and its parent E. coli W were sequenced, and contigs assembled into genomic sequences using optical NcoI restriction maps as templates. E. coli W contained plasmids pRK1 (102.5 kb) and pRK2 (5.4 kb), but KO11FL only contained pRK2. KO11FL optical maps made with AflII and with BamHI showed a tandem repeat region, consisting of at least 20 copies of a 10-kb unit. The repeat region was located at the insertion site for the pdc, adhB, and chloramphenicol-resistance genes. Sequence coverage of these genes was about 25-fold higher than average, consistent with amplification of the foreign genes that were inserted as circularized DNA. Selection for higher levels of chloramphenicol resistance originally produced strains with higher pdc and adhB expression, and hence improved fermentation performance, by increasing the gene copy number. Sequence data for an earlier version of KO11, ATCC 55124, indicated that multiple copies of pdc adhB were present. Comparison of the W and KO11FL genomes showed large inversions and deletions in KO11FL, mostly enabled by IS10, which is absent from W but present at 30 sites in KO11FL. The early KO11 strain ATCC 55124 had no rearrangements, contained only one IS10, and lacked most accumulated single nucleotide polymorphisms (SNPs) present in KO11FL. Despite rearrangements and SNPs in KO11FL, fermentation performance was equal to that of ATCC 55124.

  10. Differentiation of mycoplasmalike organisms (MLOs) in European fruit trees by PCR using specific primers derived from the sequence of a chromosomal fragment of the apple proliferation MLO.

    PubMed Central

    Jarausch, W; Saillard, C; Dosba, F; Bové, J M

    1994-01-01

    A 1.8-kb chromosomal DNA fragment of the mycoplasmalike organism (MLO) associated with apple proliferation was sequenced. Three putative open reading frames were observed on this fragment. The protein encoded by open reading frame 2 shows significant homologies with bacterial nitroreductases. From the nucleotide sequence four primer pairs for PCR were chosen to specifically amplify DNA from MLOs associated with European diseases of fruit trees. Primer pairs specific for (i) Malus-affecting MLOs, (ii) Malus- and Prunus-affecting MLOs, and (iii) Malus-, Prunus-, and Pyrus-affecting MLOs were obtained. Restriction enzyme analysis of the amplification products revealed restriction fragment length polymorphisms between Malus-, Prunus, and Pyrus-affecting MLOs as well as between different isolates of the apple proliferation MLO. No amplification with either primer pair could be obtained with DNA from 12 different MLOs experimentally maintained in periwinkle. Images PMID:7916180

  11. A novel bioinformatics method for efficient knowledge discovery by BLSOM from big genomic sequence data.

    PubMed

    Bai, Yu; Iwasaki, Yuki; Kanaya, Shigehiko; Zhao, Yue; Ikemura, Toshimichi

    2014-01-01

    With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), which allows classification of sequence fragments according to species, solely depending on the oligonucleotide composition. In the present study, we introduce the oligonucleotide BLSOM used for characterization of vertebrate genome sequences. We first analyzed pentanucleotide compositions in 100 kb sequences derived from a wide range of vertebrate genomes and then the compositions in the human and mouse genomes in order to investigate an efficient method for detecting differences between the closely related genomes. BLSOM can recognize the species-specific key combination of oligonucleotide frequencies in each genome, which is called a "genome signature," and the specific regions specifically enriched in transcription-factor-binding sequences. Because the classification and visualization power is very high, BLSOM is an efficient powerful tool for extracting a wide range of information from massive amounts of genomic sequences (i.e., big sequence data).

  12. Evaluation of Linkage Disequilibrium Pattern and Association Study on Seed Oil Content in Brassica napus Using ddRAD Sequencing.

    PubMed

    Wu, Zhikun; Wang, Bo; Chen, Xun; Wu, Jiangsheng; King, Graham J; Xiao, Yingjie; Liu, Kede

    2016-01-01

    High-density genetic markers are the prerequisite for understanding linkage disequilibrium (LD) and genome-wide association studies (GWASs) of complex traits in crops. To evaluate the LD pattern in oilseed rape, we sequenced a previous association panel containing 189 B. napus inbred lines using double-digested restriction-site associated DNA (ddRAD) and genotyped 19,327 RAD tags. A total of 15,921 RAD tags were assigned to a published genetic linkage map and the majority (71.1%) of these tags was uniquely mapped to the draft reference genome "Darmor-bzh." The distance of LD decay was 1,214 kb across the genome at the background level (r2 = 0.26), with the distances of LD decay being 405 kb and 2,111 kb in the A and C subgenomes, respectively. A total of 361 haplotype blocks with length > 100 kb were identified in the entire genome. The association panel could be classified into two groups, P1 and P2, which are essentially consistent with the geographical origins of varieties. A large number of group-specific haplotypes were identified, reflecting that varieties in the P1 and P2 groups experienced distinct selection in breeding programs to adapt their different growth habitats. GWAS repeatedly detected two loci significantly associated with oil content of seeds based on the developed SNPs, suggesting that the high-density SNPs were useful for understanding the genetic determinants of complex traits in GWAS.

  13. Genome sequence and analysis of Lactobacillus helveticus

    PubMed Central

    Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca

    2013-01-01

    The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916

  14. Library preparation and data analysis packages for rapid genome sequencing.

    PubMed

    Pomraning, Kyle R; Smith, Kristina M; Bredeweg, Erin L; Connolly, Lanelle R; Phatale, Pallavi A; Freitag, Michael

    2012-01-01

    High-throughput sequencing (HTS) has quickly become a valuable tool for comparative genetics and genomics and is now regularly carried out in laboratories that are not connected to large sequencing centers. Here we describe an updated version of our protocol for constructing single- and paired-end Illumina sequencing libraries, beginning with purified genomic DNA. The present protocol can also be used for "multiplexing," i.e. the analysis of several samples in a single flowcell lane by generating "barcoded" or "indexed" Illumina sequencing libraries in a way that is independent from Illumina-supported methods. To analyze sequencing results, we suggest several independent approaches but end users should be aware that this is a quickly evolving field and that currently many alignment (or "mapping") and counting algorithms are being developed and tested.

  15. Sequence Diversity Diagram for comparative analysis of multiple sequence alignments.

    PubMed

    Sakai, Ryo; Aerts, Jan

    2014-01-01

    The sequence logo is a graphical representation of a set of aligned sequences, commonly used to depict conservation of amino acid or nucleotide sequences. Although it effectively communicates the amount of information present at every position, this visual representation falls short when the domain task is to compare between two or more sets of aligned sequences. We present a new visual presentation called a Sequence Diversity Diagram and validate our design choices with a case study. Our software was developed using the open-source program called Processing. It loads multiple sequence alignment FASTA files and a configuration file, which can be modified as needed to change the visualization. The redesigned figure improves on the visual comparison of two or more sets, and it additionally encodes information on sequential position conservation. In our case study of the adenylate kinase lid domain, the Sequence Diversity Diagram reveals unexpected patterns and new insights, for example the identification of subgroups within the protein subfamily. Our future work will integrate this visual encoding into interactive visualization tools to support higher level data exploration tasks.

  16. A basic analysis toolkit for biological sequences

    PubMed Central

    Giancarlo, Raffaele; Siragusa, Alessandro; Siragusa, Enrico; Utro, Filippo

    2007-01-01

    This paper presents a software library, nicknamed BATS, for some basic sequence analysis tasks. Namely, local alignments, via approximate string matching, and global alignments, via longest common subsequence and alignments with affine and concave gap cost functions. Moreover, it also supports filtering operations to select strings from a set and establish their statistical significance, via z-score computation. None of the algorithms is new, but although they are generally regarded as fundamental for sequence analysis, they have not been implemented in a single and consistent software package, as we do here. Therefore, our main contribution is to fill this gap between algorithmic theory and practice by providing an extensible and easy to use software library that includes algorithms for the mentioned string matching and alignment problems. The library consists of C/C++ library functions as well as Perl library functions. It can be interfaced with Bioperl and can also be used as a stand-alone system with a GUI. The software is available at under the GNU GPL. PMID:17877802

  17. Discovery of Pod Shatter-Resistant Associated SNPs by Deep Sequencing of a Representative Library Followed by Bulk Segregant Analysis in Rapeseed

    PubMed Central

    Huang, Shunmou; Yang, Hongli; Zhan, Gaomiao; Wang, Xinfa; Liu, Guihua; Wang, Hanzhong

    2012-01-01

    Background Single nucleotide polymorphisms (SNPs) are an important class of genetic marker for target gene mapping. As of yet, there is no rapid and effective method to identify SNPs linked with agronomic traits in rapeseed and other crop species. Methodology/Principal Findings We demonstrate a novel method for identifying SNP markers in rapeseed by deep sequencing a representative library and performing bulk segregant analysis. With this method, SNPs associated with rapeseed pod shatter-resistance were discovered. Firstly, a reduced representation of the rapeseed genome was used. Genomic fragments ranging from 450–550 bp were prepared from the susceptible bulk (ten F2 plants with the silique shattering resistance index, SSRI <0.10) and the resistance bulk (ten F2 plants with SSRI >0.90), and also Solexa sequencing-produced 90 bp reads. Approximately 50 million of these sequence reads were assembled into contigs to a depth of 20-fold coverage. Secondly, 60,396 ‘simple SNPs’ were identified, and the statistical significance was evaluated using Fisher's exact test. There were 70 associated SNPs whose –log10 p value over 16 were selected to be further analyzed. The distribution of these SNPs appeared a tight cluster, which consisted of 14 associated SNPs within a 396 kb region on chromosome A09. Our evidence indicates that this region contains a major quantitative trait locus (QTL). Finally, two associated SNPs from this region were mapped on a major QTL region. Conclusions/Significance 70 associated SNPs were discovered and a major QTL for rapeseed pod shatter-resistance was found on chromosome A09 using our novel method. The associated SNP markers were used for mapping of the QTL, and may be useful for improving pod shatter-resistance in rapeseed through marker-assisted selection and map-based cloning. This approach will accelerate the discovery of major QTLs and the cloning of functional genes for important agronomic traits in rapeseed and other crop

  18. Use of deep whole-genome sequencing data to identify structure risk variants in breast cancer susceptibility genes.

    PubMed

    Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin; Shu, Xiao-Ou; He, Jing; Wen, Wanqing; Allen, Jamie; Pharoah, Paul; Dunning, Alison; Hunter, David J; Kraft, Peter; Easton, Douglas F; Zheng, Wei; Long, Jirong

    2018-03-01

    Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.

  19. SAFOD Brittle Microstructure and Mechanics Knowledge Base (BM2KB)

    NASA Astrophysics Data System (ADS)

    Babaie, Hassan A.; Broda Cindi, M.; Hadizadeh, Jafar; Kumar, Anuj

    2013-07-01

    Scientific drilling near Parkfield, California has established the San Andreas Fault Observatory at Depth (SAFOD), which provides the solid earth community with short range geophysical and fault zone material data. The BM2KB ontology was developed in order to formalize the knowledge about brittle microstructures in the fault rocks sampled from the SAFOD cores. A knowledge base, instantiated from this domain ontology, stores and presents the observed microstructural and analytical data with respect to implications for brittle deformation and mechanics of faulting. These data can be searched on the knowledge base‧s Web interface by selecting a set of terms (classes, properties) from different drop-down lists that are dynamically populated from the ontology. In addition to this general search, a query can also be conducted to view data contributed by a specific investigator. A search by sample is done using the EarthScope SAFOD Core Viewer that allows a user to locate samples on high resolution images of core sections belonging to different runs and holes. The class hierarchy of the BM2KB ontology was initially designed using the Unified Modeling Language (UML), which was used as a visual guide to develop the ontology in OWL applying the Protégé ontology editor. Various Semantic Web technologies such as the RDF, RDFS, and OWL ontology languages, SPARQL query language, and Pellet reasoning engine, were used to develop the ontology. An interactive Web application interface was developed through Jena, a java based framework, with AJAX technology, jsp pages, and java servlets, and deployed via an Apache tomcat server. The interface allows the registered user to submit data related to their research on a sample of the SAFOD core. The submitted data, after initial review by the knowledge base administrator, are added to the extensible knowledge base and become available in subsequent queries to all types of users. The interface facilitates inference capabilities in the

  20. Sequence analysis of PROTEOLYSIS 6 from Solanum lycopersicum

    NASA Astrophysics Data System (ADS)

    Roslan, Nur Farhana; Chew, Bee Lyn; Goh, Hoe-Han; Isa, Nurulhikma Md

    2018-04-01

    The N-end rule pathway is a protein degradation pathway that relates the protein half-life with the identity of its N-terminal residues. A destabilizing N-terminal residues is created by enzymatic reaction or chemical modifications. This destabilized substrate will be recognized by PROTEOLYSIS 6 (PRT6) protein, which encodes an E3 ligase enzyme and resulted in substrate degradation by proteasome. PRT6 has been studied in Arabidopsis thaliana and barley but not yet been studied in fleshy fruit plants. Hence, this study was carried out in tomato that is known as the model for fleshy fruit plants. BLASTX analysis identified that Solyc09g010830 which encodes for a PRT6 gene in tomato based on its sequence similarity with PRT6 in A. thaliana. In silico gene expression analysis shows that PRT6 gene was highly expressed in tomato fruits breaker +5. Co-expression analysis shows that PRT6 may not only involved in abiotic stresses but also in biotic stresses. The objective is to analyze the sequence and characterize PRT6 gene in tomato.

  1. Genome sequence of the olive tree, Olea europaea.

    PubMed

    Cruz, Fernando; Julca, Irene; Gómez-Garrido, Jèssica; Loska, Damian; Marcet-Houben, Marina; Cano, Emilio; Galán, Beatriz; Frias, Leonor; Ribeca, Paolo; Derdak, Sophia; Gut, Marta; Sánchez-Fernández, Manuel; García, Jose Luis; Gut, Ivo G; Vargas, Pablo; Alioto, Tyler S; Gabaldón, Toni

    2016-06-27

    The Mediterranean olive tree (Olea europaea subsp. europaea) was one of the first trees to be domesticated and is currently of major agricultural importance in the Mediterranean region as the source of olive oil. The molecular bases underlying the phenotypic differences among domesticated cultivars, or between domesticated olive trees and their wild relatives, remain poorly understood. Both wild and cultivated olive trees have 46 chromosomes (2n). A total of 543 Gb of raw DNA sequence from whole genome shotgun sequencing, and a fosmid library containing 155,000 clones from a 1,000+ year-old olive tree (cv. Farga) were generated by Illumina sequencing using different combinations of mate-pair and pair-end libraries. Assembly gave a final genome with a scaffold N50 of 443 kb, and a total length of 1.31 Gb, which represents 95 % of the estimated genome length (1.38 Gb). In addition, the associated fungus Aureobasidium pullulans was partially sequenced. Genome annotation, assisted by RNA sequencing from leaf, root, and fruit tissues at various stages, resulted in 56,349 unique protein coding genes, suggesting recent genomic expansion. Genome completeness, as estimated using the CEGMA pipeline, reached 98.79 %. The assembled draft genome of O. europaea will provide a valuable resource for the study of the evolution and domestication processes of this important tree, and allow determination of the genetic bases of key phenotypic traits. Moreover, it will enhance breeding programs and the formation of new varieties.

  2. Evolution Analysis of Simple Sequence Repeats in Plant Genome.

    PubMed

    Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

    2015-01-01

    Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1-3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution.

  3. A functional U-statistic method for association analysis of sequencing data.

    PubMed

    Jadhav, Sneha; Tong, Xiaoran; Lu, Qing

    2017-11-01

    Although sequencing studies hold great promise for uncovering novel variants predisposing to human diseases, the high dimensionality of the sequencing data brings tremendous challenges to data analysis. Moreover, for many complex diseases (e.g., psychiatric disorders) multiple related phenotypes are collected. These phenotypes can be different measurements of an underlying disease, or measurements characterizing multiple related diseases for studying common genetic mechanism. Although jointly analyzing these phenotypes could potentially increase the power of identifying disease-associated genes, the different types of phenotypes pose challenges for association analysis. To address these challenges, we propose a nonparametric method, functional U-statistic method (FU), for multivariate analysis of sequencing data. It first constructs smooth functions from individuals' sequencing data, and then tests the association of these functions with multiple phenotypes by using a U-statistic. The method provides a general framework for analyzing various types of phenotypes (e.g., binary and continuous phenotypes) with unknown distributions. Fitting the genetic variants within a gene using a smoothing function also allows us to capture complexities of gene structure (e.g., linkage disequilibrium, LD), which could potentially increase the power of association analysis. Through simulations, we compared our method to the multivariate outcome score test (MOST), and found that our test attained better performance than MOST. In a real data application, we apply our method to the sequencing data from Minnesota Twin Study (MTS) and found potential associations of several nicotine receptor subunit (CHRN) genes, including CHRNB3, associated with nicotine dependence and/or alcohol dependence. © 2017 WILEY PERIODICALS, INC.

  4. CLCNKB mutations causing mild Bartter syndrome profoundly alter the pH and Ca2+ dependence of ClC-Kb channels.

    PubMed

    Andrini, Olga; Keck, Mathilde; L'Hoste, Sébastien; Briones, Rodolfo; Mansour-Hendili, Lamisse; Grand, Teddy; Sepúlveda, Francisco V; Blanchard, Anne; Lourdel, Stéphane; Vargas-Poussou, Rosa; Teulon, Jacques

    2014-09-01

    ClC-Kb, a member of the ClC family of Cl(-) channels/transporters, plays a major role in the absorption of NaCl in the distal nephron. CLCNKB mutations cause Bartter syndrome type 3, a hereditary renal salt-wasting tubulopathy. Here, we investigate the functional consequences of a Val to Met substitution at position 170 (V170M, α helix F), which was detected in eight patients displaying a mild phenotype. Conductance and surface expression were reduced by ~40-50 %. The regulation of channel activity by external H(+) and Ca(2+) is a characteristic property of ClC-Kb. Inhibition by external H(+) was dramatically altered, with pKH shifting from 7.6 to 6.0. Stimulation by external Ca(2+) on the other hand was no longer detectable at pH 7.4, but was still present at acidic pH values. Functionally, these regulatory modifications partly counterbalance the reduced surface expression by rendering V170M hyperactive. Pathogenic Met170 seems to interact with another methionine on α helix H (Met227) since diverse mutations at this site partly removed pH sensitivity alterations of V170M ClC-Kb. Exploring other disease-associated mutations, we found that a Pro to Leu substitution at position 124 (α helix D, Simon et al., Nat Genet 1997, 17:171-178) had functional consequences similar to those of V170M. In conclusion, we report here for the first time that ClC-Kb disease-causing mutations located around the selectivity filter can result in both reduced surface expression and hyperactivity in heterologous expression systems. This interplay must be considered when analyzing the mild phenotype of patients with type 3 Bartter syndrome.

  5. Genome-wide gene–gene interaction analysis for next-generation sequencing

    PubMed Central

    Zhao, Jinying; Zhu, Yun; Xiong, Momiao

    2016-01-01

    The critical barrier in interaction analysis for next-generation sequencing (NGS) data is that the traditional pairwise interaction analysis that is suitable for common variants is difficult to apply to rare variants because of their prohibitive computational time, large number of tests and low power. The great challenges for successful detection of interactions with NGS data are (1) the demands in the paradigm of changes in interaction analysis; (2) severe multiple testing; and (3) heavy computations. To meet these challenges, we shift the paradigm of interaction analysis between two SNPs to interaction analysis between two genomic regions. In other words, we take a gene as a unit of analysis and use functional data analysis techniques as dimensional reduction tools to develop a novel statistic to collectively test interaction between all possible pairs of SNPs within two genome regions. By intensive simulations, we demonstrate that the functional logistic regression for interaction analysis has the correct type 1 error rates and higher power to detect interaction than the currently used methods. The proposed method was applied to a coronary artery disease dataset from the Wellcome Trust Case Control Consortium (WTCCC) study and the Framingham Heart Study (FHS) dataset, and the early-onset myocardial infarction (EOMI) exome sequence datasets with European origin from the NHLBI's Exome Sequencing Project. We discovered that 6 of 27 pairs of significantly interacted genes in the FHS were replicated in the independent WTCCC study and 24 pairs of significantly interacted genes after applying Bonferroni correction in the EOMI study. PMID:26173972

  6. Identification of a precursor genomic segment that provided a sequence unique to glycophorin B and E genes

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Onda, M.; Kudo, S.; Fukuda, M.

    Human glycophorin A, B, and E (GPA, GPB, and GPE) genes belong to a gene family located at the long arm of chromosome 4. These three genes are homologous from the 5'-flanking sequence to the Alu sequence, which is 1 kb downstream from the exon encoding the transmembrane domain. Analysis of the Alu sequence and flanking direct repeat sequences suggested that the GPA gene most closely resembles the ancestral gene, whereas the GPB and GPE gene arose by homologous recombination within the Alu sequence, acquiring 3' sequences from an unrelated precursor genomic segment. Here the authors describe the identification ofmore » this putative precursor genomic segment. A human genomic library was screened by using the sequence of the 3' region of the GPB gene as a probe. The genomic clones isolated were found to contain an Alu sequence that appeared to be involved in the recombination. Downstream from the Alu sequence, the nucleotide sequence of the precursor genomic segment is almost identical to that of the GPB or GPE gene. In contrast, the upstream sequence of the genomic segment differs entirely from that of the GPA, GPB, and GPE genes. Conservation of the direct repeats flanking the Alu sequence of the genomic segment strongly suggests that the sequence of this genomic segment has been maintained during evolution. This identified genomic segment was found to reside downstream from the GPA gene by both gene mapping and in situ chromosomal localization. The precursor genomic segment was also identified in the orangutan genome, which is known to lack GPB and GPE genes. These results indicate that one of the duplicated ancestral glycophorin genes acquired a unique 3' sequence by unequal crossing-over through its Alu sequence and the further downstream Alu sequence present in the duplicated gene. Further duplication and divergence of this gene yielded the GPB and GPE genes. 37 refs., 5 figs.« less

  7. Genome Sequence of Cronobacter sakazakii BAA-894 and Comparative Genomic Hybridization Analysis with Other Cronobacter Species

    PubMed Central

    Kucerova, Eva; Clifton, Sandra W.; Xia, Xiao-Qin; Long, Fred; Porwollik, Steffen; Fulton, Lucinda; Fronick, Catrina; Minx, Patrick; Kyung, Kim; Warren, Wesley; Fulton, Robert; Feng, Dongyan; Wollam, Aye; Shah, Neha; Bhonagiri, Veena; Nash, William E.; Hallsworth-Pepin, Kymberlie; Wilson, Richard K.

    2010-01-01

    Background The genus Cronobacter (formerly called Enterobacter sakazakii) is composed of five species; C. sakazakii, C. malonaticus, C. turicensis, C. muytjensii, and C. dublinensis. The genus includes opportunistic human pathogens, and the first three species have been associated with neonatal infections. The most severe diseases are caused in neonates and include fatal necrotizing enterocolitis and meningitis. The genetic basis of the diversity within the genus is unknown, and few virulence traits have been identified. Methodology/Principal Findings We report here the first sequence of a member of this genus, C. sakazakii strain BAA-894. The genome of Cronobacter sakazakii strain BAA-894 comprises a 4.4 Mb chromosome (57% GC content) and two plasmids; 31 kb (51% GC) and 131 kb (56% GC). The genome was used to construct a 387,000 probe oligonucleotide tiling DNA microarray covering the whole genome. Comparative genomic hybridization (CGH) was undertaken on five other C. sakazakii strains, and representatives of the four other Cronobacter species. Among 4,382 annotated genes inspected in this study, about 55% of genes were common to all C. sakazakii strains and 43% were common to all Cronobacter strains, with 10–17% absence of genes. Conclusions/Significance CGH highlighted 15 clusters of genes in C. sakazakii BAA-894 that were divergent or absent in more than half of the tested strains; six of these are of probable prophage origin. Putative virulence factors were identified in these prophage and in other variable regions. A number of genes unique to Cronobacter species associated with neonatal infections (C. sakazakii, C. malonaticus and C. turicensis) were identified. These included a copper and silver resistance system known to be linked to invasion of the blood-brain barrier by neonatal meningitic strains of Escherichia coli. In addition, genes encoding for multidrug efflux pumps and adhesins were identified that were unique to C. sakazakii strains from

  8. BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads.

    PubMed

    Hong, Lewis Z; Hong, Shuzhen; Wong, Han Teng; Aw, Pauline P K; Cheng, Yan; Wilm, Andreas; de Sessions, Paola F; Lim, Seng Gee; Nagarajan, Niranjan; Hibberd, Martin L; Quake, Stephen R; Burkholder, William F

    2014-01-01

    We present a method for obtaining long haplotypes, of over 3 kb in length, using a short-read sequencer, Barcode-directed Assembly for Extra-long Sequences (BAsE-Seq). BAsE-Seq relies on transposing a template-specific barcode onto random segments of the template molecule and assembling the barcoded short reads into complete haplotypes. We applied BAsE-Seq on mixed clones of hepatitis B virus and accurately identified haplotypes occurring at frequencies greater than or equal to 0.4%, with >99.9% specificity. Applying BAsE-Seq to a clinical sample, we obtained over 9,000 viral haplotypes, which provided an unprecedented view of hepatitis B virus population structure during chronic infection. BAsE-Seq is readily applicable for monitoring quasispecies evolution in viral diseases.

  9. Transcriptome Analysis and Development of SSR Molecular Markers in Glycyrrhiza uralensis Fisch.

    PubMed Central

    Liu, Yaling; Zhang, Pengfei; Song, Meiling; Hou, Junling; Qing, Mei; Wang, Wenquan; Liu, Chunsheng

    2015-01-01

    Licorice is an important traditional Chinese medicine with clinical and industrial applications. Genetic resources of licorice are insufficient for analysis of molecular biology and genetic functions; as such, transcriptome sequencing must be conducted for functional characterization and development of molecular markers. In this study, transcriptome sequencing on the Illumina HiSeq 2500 sequencing platform generated a total of 5.41 Gb clean data. De novo assembly yielded a total of 46,641 unigenes. Comparison analysis using BLAST showed that the annotations of 29,614 unigenes were conserved. Further study revealed 773 genes related to biosynthesis of secondary metabolites of licorice, 40 genes involved in biosynthesis of the terpenoid backbone, and 16 genes associated with biosynthesis of glycyrrhizic acid. Analysis of unigenes larger than 1 Kb with a length of 11,702 nt presented 7,032 simple sequence repeats (SSR). Sixty-four of 69 randomly designed and synthesized SSR pairs were successfully amplified, 33 pairs of primers were polymorphism in in Glycyrrhiza uralensis Fisch., Glycyrrhiza inflata Bat., Glycyrrhiza glabra L. and Glycyrrhiza pallidiflora Maxim. This study not only presents the molecular biology data of licorice but also provides a basis for genetic diversity research and molecular marker-assisted breeding of licorice. PMID:26571372

  10. Low-Resolution Electromagnetic Tomography (LORETA) of changed Brain Function Provoked by Pro-Dopamine Regulator (KB220z) in one Adult ADHD case.

    PubMed

    Steinberg, Bruce; Blum, Kenneth; McLaughlin, Thomas; Lubar, Joel; Febo, Marcelo; Braverman, Eric R; Badgaiyan, Rajendra D

    Attention Deficit-Hyperactivity Disorder (ADHD) often continues into adulthood. Recent neuroimaging studies found lowered baseline dopamine tone in the brains of affected individuals that may place them at risk for Substance Use Disorder (SUD). This is an observational case study of the potential for novel management of Adult ADHD with a non-addictive glutaminergic-dopaminergic optimization complex KB200z. Low-resolution electromagnetic tomography (LORETA) was used to evaluate the effects of KB220z on a 72-year-old male with ADHD, at baseline and one hour following administration. The resultant z-scores, averaged across Eyes Closed, Eyes Open and Working Memory conditions, increased for each frequency band, in the anterior, dorsal and posterior cingulate regions, as well as the right dorsolateral prefrontal cortex during Working Memory, with KB220z. These scores are consistent with other human and animal neuroimaging studies that demonstrated increased connectivity volumes in reward circuitry and may offer a new approach to ADHD treatment. However, larger randomized trials to confirm these results are required.

  11. Low-Resolution Electromagnetic Tomography (LORETA) of changed Brain Function Provoked by Pro-Dopamine Regulator (KB220z) in one Adult ADHD case

    PubMed Central

    Steinberg, Bruce; Blum, Kenneth; McLaughlin, Thomas; Lubar, Joel; Febo, Marcelo; Braverman, Eric R.; Badgaiyan, Rajendra D

    2016-01-01

    Attention Deficit-Hyperactivity Disorder (ADHD) often continues into adulthood. Recent neuroimaging studies found lowered baseline dopamine tone in the brains of affected individuals that may place them at risk for Substance Use Disorder (SUD). This is an observational case study of the potential for novel management of Adult ADHD with a non-addictive glutaminergic-dopaminergic optimization complex KB200z. Low-resolution electromagnetic tomography (LORETA) was used to evaluate the effects of KB220z on a 72-year-old male with ADHD, at baseline and one hour following administration. The resultant z-scores, averaged across Eyes Closed, Eyes Open and Working Memory conditions, increased for each frequency band, in the anterior, dorsal and posterior cingulate regions, as well as the right dorsolateral prefrontal cortex during Working Memory, with KB220z. These scores are consistent with other human and animal neuroimaging studies that demonstrated increased connectivity volumes in reward circuitry and may offer a new approach to ADHD treatment. However, larger randomized trials to confirm these results are required. PMID:27610420

  12. Cloning and sequencing the genes encoding goldfish and carp ependymin.

    PubMed

    Adams, D S; Shashoua, V E

    1994-04-20

    Ependymins (EPNs) are brain glycoproteins thought to function in optic nerve regeneration and long-term memory consolidation. To date, epn genes have been characterized in two orders of teleost fish. In this study, polymerase chain reactions (PCR) were used to amplify the complete 1.6-kb epn genes, gf-I and cc-I, from genomic DNA of Cypriniformes, goldfish and carp, respectively. Amplified bands were cloned and sequenced. Each gene consists of six exons and five introns. The exon portion of gf-I encodes a predicted 215-amino-acid (aa) protein previously characterized as GF-I, while cc-I encodes a predicted 215-aa protein 95% homologous to GF-I.

  13. Relationships among genera of the Saccharomycotina from multigene sequence analysis

    USDA-ARS?s Scientific Manuscript database

    Most known species of the subphylum Saccharomycotina (budding ascomycetous yeasts) have now been placed in phylogenetically defined clades following multigene sequence analysis. Terminal clades, which are usually well supported from bootstrap analysis, are viewed as phylogenetically circumscribed ge...

  14. Control of photosynthetic membrane assembly in Rhodobacter sphaeroides mediated by puhA and flanking sequences.

    PubMed Central

    Sockett, R E; Donohue, T J; Varga, A R; Kaplan, S

    1989-01-01

    A reaction center H- strain (RCH-) of Rhodobacter sphaeroides, PUHA1, was made by in vitro deletion of an XhoI restriction endonuclease fragment from the puhA gene coupled with insertion of a kanamycin resistance gene cartridge. The resulting construct was delivered to R. sphaeroides wild-type 2.4.1, with the defective puhA gene replacing the wild-type copy by recombination, followed by selection for kanamycin resistance. When grown under conditions known to induce intracytoplasmic membrane development, PUHA1 synthesized a pigmented intracytoplasmic membrane. Spectral analysis of this membrane showed that it was deficient in B875 spectral complexes as well as functional reaction centers and that the level of B800-850 spectral complexes was greater than in the wild type. The RCH- strain was photosythetically incompetent, but photosynthetic growth was restored by complementation with a 1.45-kilobase (kb) BamHI restriction endonuclease fragment containing the puhA gene carried in trans on plasmid pRK404. B875 spectral complexes were not restored by complementation with the 1.45-kb BamHI restriction endonuclease fragment containing the puhA gene but were restored along with photosynthetic competence by complementation with DNA from a cosmid carrying the puhA gene, as well as a flanking DNA sequence. Interestingly, B875 spectral complexes, but not photosynthetic competence, were restored to PUHA1 by introduction in trans of a 13-kb BamHI restriction endonuclease fragment carrying genes encoding the puf operon region of the DNA. The effect of the puhA deletion was further investigated by an examination of the levels of specific mRNA species derived from the puf and puc operons, as well as by determinations of the relative abundances of polypeptides associated with various spectral complexes by immunological methods. The roles of puhA and other genetic components in photosynthetic gene expression and membrane assembly are discussed. Images PMID:2644200

  15. Sirius PSB: a generic system for analysis of biological sequences.

    PubMed

    Koh, Chuan Hock; Lin, Sharene; Jedd, Gregory; Wong, Limsoon

    2009-12-01

    Computational tools are essential components of modern biological research. For example, BLAST searches can be used to identify related proteins based on sequence homology, or when a new genome is sequenced, prediction models can be used to annotate functional sites such as transcription start sites, translation initiation sites and polyadenylation sites and to predict protein localization. Here we present Sirius Prediction Systems Builder (PSB), a new computational tool for sequence analysis, classification and searching. Sirius PSB has four main operations: (1) Building a classifier, (2) Deploying a classifier, (3) Search for proteins similar to query proteins, (4) Preliminary and post-prediction analysis. Sirius PSB supports all these operations via a simple and interactive graphical user interface. Besides being a convenient tool, Sirius PSB has also introduced two novelties in sequence analysis. Firstly, genetic algorithm is used to identify interesting features in the feature space. Secondly, instead of the conventional method of searching for similar proteins via sequence similarity, we introduced searching via features' similarity. To demonstrate the capabilities of Sirius PSB, we have built two prediction models - one for the recognition of Arabidopsis polyadenylation sites and another for the subcellular localization of proteins. Both systems are competitive against current state-of-the-art models based on evaluation of public datasets. More notably, the time and effort required to build each model is greatly reduced with the assistance of Sirius PSB. Furthermore, we show that under certain conditions when BLAST is unable to find related proteins, Sirius PSB can identify functionally related proteins based on their biophysical similarities. Sirius PSB and its related supplements are available at: http://compbio.ddns.comp.nus.edu.sg/~sirius.

  16. Characterization of a linear DNA plasmid from the filamentous fungal plant pathogen Glomerella musae [Anamorph: Colletotrichum musae (Berk. and Curt.) arx.

    USGS Publications Warehouse

    Freeman, S.; Redman, R.S.; Grantham, G.; Rodriguez, R.J.

    1997-01-01

    A 7.4-kilobase (kb) DNA plasmid was isolated from Glomerella musae isolate 927 and designated pGML1. Exonuclease treatments indicated that pGML1 was a linear plasmid with blocked 5' termini. Cell-fractionation experiments combined with sequence-specific PCR amplification revealed that pGML1 resided in mitochondria. The pGML1 plasmid hybridized to cesium chloride-fractionated nuclear DNA but not to A + T-rich mitochondrial DNA. An internal 7.0-kb section of pGML1 was cloned and did not hybridize with either nuclear or mitochondrial DNA from G. musae. Sequence analysis revealed identical terminal inverted repeats (TIR) of 520 bp at the ends of the cloned 7.0-kb section of pGML1. The occurrence of pGML1 did not correspond with the pathogenicity of G. musae on banana fruit. Four additional isolates of G. musae possessed extrachromosomal DNA fragments similar in size and sequence to pGML1.

  17. The Complete Nucleotide Sequence of the Human Immunoglobulin Heavy Chain Variable Region Locus

    PubMed Central

    Matsuda, Fumihiko; Ishii, Kazuo; Bourvagnet, Patrice; Kuma, Kei-ichi; Hayashida, Hidenori; Miyata, Takashi; Honjo, Tasuku

    1998-01-01

    The complete nucleotide sequence of the 957-kb DNA of the human immunoglobulin heavy chain variable (VH) region locus was determined and 43 novel VH segments were identified. The region contains 123 VH segments classifiable into seven different families, of which 79 are pseudogenes. Of the 44 VH segments with an open reading frame, 39 are expressed as heavy chain proteins and 1 as mRNA, while the remaining 4 are not found in immunoglobulin cDNAs. Combinatorial diversity of VH region was calculated to be ∼6,000. Conservation of the promoter and recombination signal sequences was observed to be higher in functional VH segments than in pseudogenes. Phylogenetic analysis of 114 VH segments clearly showed clustering of the VH segments of each family. However, an independent branch in the tree contained a single VH, V4-44.1P, sharing similar levels of homology to human VH families and to those of other vertebrates. Comparison between different copies of homologous units that appear repeatedly across the locus clearly demonstrates that dynamic DNA reorganization of the locus took place at least eight times between 133 and 10 million years ago. One nonimmunoglobulin gene of unknown function was identified in the intergenic region. PMID:9841928

  18. Sakharov at KB-11. The path of a genius

    NASA Astrophysics Data System (ADS)

    Ilkaev, Radii I.

    2012-02-01

    21 May 2011 would have marked the 90th birthday of Andrei Dmitrievich Sakharov, a towering 20th-century figure in science and human thought, whose ideas, research contributions, and life example exerted enormous influence on the history of the second half of the 20th century and, in particular, on the history of Russia. Whether as a scientist or a private person (including his public activities and exceptional attitude to human personality), he always displayed creativity and a freedom of spirit, thought, and action. Sakharov's life and creative work make him a model scientist and citizen for many and undoubtedly provide a legacy for the development of science and society in the 21st century. In this paper, some of Sakharov's key ideas and achievements relating to his KB-11 period are exemplified, and how they influence present day research and technology, notably as employed for affording national security, is examined.

  19. Bayesian Correlation Analysis for Sequence Count Data

    PubMed Central

    Lau, Nelson; Perkins, Theodore J.

    2016-01-01

    Evaluating the similarity of different measured variables is a fundamental task of statistics, and a key part of many bioinformatics algorithms. Here we propose a Bayesian scheme for estimating the correlation between different entities’ measurements based on high-throughput sequencing data. These entities could be different genes or miRNAs whose expression is measured by RNA-seq, different transcription factors or histone marks whose expression is measured by ChIP-seq, or even combinations of different types of entities. Our Bayesian formulation accounts for both measured signal levels and uncertainty in those levels, due to varying sequencing depth in different experiments and to varying absolute levels of individual entities, both of which affect the precision of the measurements. In comparison with a traditional Pearson correlation analysis, we show that our Bayesian correlation analysis retains high correlations when measurement confidence is high, but suppresses correlations when measurement confidence is low—especially for entities with low signal levels. In addition, we consider the influence of priors on the Bayesian correlation estimate. Perhaps surprisingly, we show that naive, uniform priors on entities’ signal levels can lead to highly biased correlation estimates, particularly when different experiments have widely varying sequencing depths. However, we propose two alternative priors that provably mitigate this problem. We also prove that, like traditional Pearson correlation, our Bayesian correlation calculation constitutes a kernel in the machine learning sense, and thus can be used as a similarity measure in any kernel-based machine learning algorithm. We demonstrate our approach on two RNA-seq datasets and one miRNA-seq dataset. PMID:27701449

  20. Congruence analysis of point clouds from unstable stereo image sequences

    NASA Astrophysics Data System (ADS)

    Jepping, C.; Bethmann, F.; Luhmann, T.

    2014-06-01

    This paper deals with the correction of exterior orientation parameters of stereo image sequences over deformed free-form surfaces without control points. Such imaging situation can occur, for example, during photogrammetric car crash test recordings where onboard high-speed stereo cameras are used to measure 3D surfaces. As a result of such measurements 3D point clouds of deformed surfaces are generated for a complete stereo sequence. The first objective of this research focusses on the development and investigation of methods for the detection of corresponding spatial and temporal tie points within the stereo image sequences (by stereo image matching and 3D point tracking) that are robust enough for a reliable handling of occlusions and other disturbances that may occur. The second objective of this research is the analysis of object deformations in order to detect stable areas (congruence analysis). For this purpose a RANSAC-based method for congruence analysis has been developed. This process is based on the sequential transformation of randomly selected point groups from one epoch to another by using a 3D similarity transformation. The paper gives a detailed description of the congruence analysis. The approach has been tested successfully on synthetic and real image data.