Qu, Wen; Cingolani, Pablo; Zeeberg, Barry R; Ruden, Douglas M
2017-01-01
Deep sequencing of cDNAs made from spliced mRNAs indicates that most coding genes in many animals and plants have pre-mRNA transcripts that are alternatively spliced. In pre-mRNAs, in addition to invariant exons that are present in almost all mature mRNA products, there are at least 6 additional types of exons, such as exons from alternative promoters or with alternative polyA sites, mutually exclusive exons, skipped exons, or exons with alternative 5' or 3' splice sites. Our bioinformatics-based hypothesis is that, in analogy to the genetic code, there is an "alternative-splicing code" in introns and flanking exon sequences, analogous to the genetic code, that directs alternative splicing of many of the 36 types of introns. In humans, we identified 42 different consensus sequences that are each present in at least 100 human introns. 37 of the 42 top consensus sequences are significantly enriched or depleted in at least one of the 36 types of introns. We further supported our hypothesis by showing that 96 out of 96 analyzed human disease mutations that affect RNA splicing, and change alternative splicing from one class to another, can be partially explained by a mutation altering a consensus sequence from one type of intron to that of another type of intron. Some of the alternative splicing consensus sequences, and presumably their small-RNA or protein targets, are evolutionarily conserved from 50 plant to animal species. We also noticed the set of introns within a gene usually share the same splicing codes, thus arguing that one sub-type of splicesosome might process all (or most) of the introns in a given gene. Our work sheds new light on a possible mechanism for generating the tremendous diversity in protein structure by alternative splicing of pre-mRNAs.
Consensus generation and variant detection by Celera Assembler.
Denisov, Gennady; Walenz, Brian; Halpern, Aaron L; Miller, Jason; Axelrod, Nelson; Levy, Samuel; Sutton, Granger
2008-04-15
We present an algorithm to identify allelic variation given a Whole Genome Shotgun (WGS) assembly of haploid sequences, and to produce a set of haploid consensus sequences rather than a single consensus sequence. Existing WGS assemblers take a column-by-column approach to consensus generation, and produce a single consensus sequence which can be inconsistent with the underlying haploid alleles, and inconsistent with any of the aligned sequence reads. Our new algorithm uses a dynamic windowing approach. It detects alleles by simultaneously processing the portions of aligned reads spanning a region of sequence variation, assigns reads to their respective alleles, phases adjacent variant alleles and generates a consensus sequence corresponding to each confirmed allele. This algorithm was used to produce the first diploid genome sequence of an individual human. It can also be applied to assemblies of multiple diploid individuals and hybrid assemblies of multiple haploid organisms. Being applied to the individual human genome assembly, the new algorithm detects exactly two confirmed alleles and reports two consensus sequences in 98.98% of the total number 2,033311 detected regions of sequence variation. In 33,269 out of 460,373 detected regions of size >1 bp, it fixes the constructed errors of a mosaic haploid representation of a diploid locus as produced by the original Celera Assembler consensus algorithm. Using an optimized procedure calibrated against 1 506 344 known SNPs, it detects 438 814 new heterozygous SNPs with false positive rate 12%. The open source code is available at: http://wgs-assembler.cvs.sourceforge.net/wgs-assembler/
2014-01-01
Background Ambiscript is a graphically-designed nucleic acid notation that uses symbol symmetries to support sequence complementation, highlight biologically-relevant palindromes, and facilitate the analysis of consensus sequences. Although the original Ambiscript notation was designed to easily represent consensus sequences for multiple sequence alignments, the notation’s black-on-white ambiguity characters are unable to reflect the statistical distribution of nucleotides found at each position. We now propose a color-augmented ambigraphic notation to encode the frequency of positional polymorphisms in these consensus sequences. Results We have implemented this color-coding approach by creating an Adobe Flash® application ( http://www.ambiscript.org) that shades and colors modified Ambiscript characters according to the prevalence of the encoded nucleotide at each position in the alignment. The resulting graphic helps viewers perceive biologically-relevant patterns in multiple sequence alignments by uniquely combining color, shading, and character symmetries to highlight palindromes and inverted repeats in conserved DNA motifs. Conclusion Juxtaposing an intuitive color scheme over the deliberate character symmetries of an ambigraphic nucleic acid notation yields a highly-functional nucleic acid notation that maximizes information content and successfully embodies key principles of graphic excellence put forth by the statistician and graphic design theorist, Edward Tufte. PMID:24447494
Rogan, P K; Schneider, T D
1995-01-01
Predicting the effects of nucleotide substitutions in human splice sites has been based on analysis of consensus sequences. We used a graphic representation of sequence conservation and base frequency, the sequence logo, to demonstrate that a change in a splice acceptor of hMSH2 (a gene associated with familial nonpolyposis colon cancer) probably does not reduce splicing efficiency. This confirms a population genetic study that suggested that this substitution is a genetic polymorphism. The information theory-based sequence logo is quantitative and more sensitive than the corresponding splice acceptor consensus sequence for detection of true mutations. Information analysis may potentially be used to distinguish polymorphisms from mutations in other types of transcriptional, translational, or protein-coding motifs.
Brain cDNA clone for human cholinesterase
DOE Office of Scientific and Technical Information (OSTI.GOV)
McTiernan, C.; Adkins, S.; Chatonnet, A.
1987-10-01
A cDNA library from human basal ganglia was screened with oligonucleotide probes corresponding to portions of the amino acid sequence of human serum cholinesterase. Five overlapping clones, representing 2.4 kilobases, were isolated. The sequenced cDNA contained 207 base pairs of coding sequence 5' to the amino terminus of the mature protein in which there were four ATG translation start sites in the same reading frame as the protein. Only the ATG coding for Met-(-28) lay within a favorable consensus sequence for functional initiators. There were 1722 base pairs of coding sequence corresponding to the protein found circulating in human serum.more » The amino acid sequence deduced from the cDNA exactly matched the 574 amino acid sequence of human serum cholinesterase, as previously determined by Edman degradation. Therefore, our clones represented cholinesterase rather than acetylcholinesterase. It was concluded that the amino acid sequences of cholinesterase from two different tissues, human brain and human serum, were identical. Hybridization of genomic DNA blots suggested that a single gene, or very few genes coded for cholinesterase.« less
Pujar, Shashikant; O’Leary, Nuala A; Farrell, Catherine M; Mudge, Jonathan M; Wallin, Craig; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bult, Carol J; Frankish, Adam; Pruitt, Kim D
2018-01-01
Abstract The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. PMID:29126148
Trung, Le Quang; VAN Puyvelde, Karolien; Triest, Ludwig
2008-03-01
Consensus primers, based on exon sequences of the cyp73 gene family coding for cinnamate 4-hydroxylase (C4H) of the lignin biosynthesis pathway, were designed for the tetraploid willow species Salix alba and Salix fragilis. Diagnostic alleles at species level were observed among introns of three cyp73 genes and allowed unambiguous detection of the first generation and introgressed hybrids in populations. Progeny analysis of a female S. alba with a male introgressed hybrid confirmed the codominant inheritance of each intron. Sequences of the diagnostic alleles of both species were similar to those found in the hybrids. © 2007 The Authors.
AMS 4.0: consensus prediction of post-translational modifications in protein sequences.
Plewczynski, Dariusz; Basu, Subhadip; Saha, Indrajit
2012-08-01
We present here the 2011 update of the AutoMotif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining by automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The source code and precompiled binaries of brainstorming tool are available at http://code.google.com/p/automotifserver/ under Apache 2.0 licensing.
Ghosh, Jayadri Sekhar; Bhattacharya, Samik; Pal, Amita
2017-06-01
The unavailability of the reproductive structure and unpredictability of vegetative characters for the identification and phylogenetic study of bamboo prompted the application of molecular techniques for greater resolution and consensus. We first employed internal transcribed spacer (ITS1, 5.8S rRNA and ITS2) sequences to construct the phylogenetic tree of 21 tropical bamboo species. While the sequence alone could grossly reconstruct the traditional phylogeny amongst the 21-tropical species studied, some anomalies were encountered that prompted a further refinement of the phylogenetic analyses. Therefore, we integrated the secondary structure of the ITS sequences to derive individual sequence-structure matrix to gain more resolution on the phylogenetic reconstruction. The results showed that ITS sequence-structure is the reliable alternative to the conventional phenotypic method for the identification of bamboo species. The best-fit topology obtained by the sequence-structure based phylogeny over the sole sequence based one underscores closer clustering of all the studied Bambusa species (Sub-tribe Bambusinae), while Melocanna baccifera, which belongs to Sub-Tribe Melocanneae, disjointedly clustered as an out-group within the consensus phylogenetic tree. In this study, we demonstrated the dependability of the combined (ITS sequence+structure-based) approach over the only sequence-based analysis for phylogenetic relationship assessment of bamboo.
Zeng, Lu; Kortschak, R Daniel; Raison, Joy M; Bertozzi, Terry; Adelson, David L
2018-01-01
Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.
Zeng, Lu; Kortschak, R. Daniel; Raison, Joy M.
2018-01-01
Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. PMID:29538441
Pujar, Shashikant; O'Leary, Nuala A; Farrell, Catherine M; Loveland, Jane E; Mudge, Jonathan M; Wallin, Craig; Girón, Carlos G; Diekhans, Mark; Barnes, If; Bennett, Ruth; Berry, Andrew E; Cox, Eric; Davidson, Claire; Goldfarb, Tamara; Gonzalez, Jose M; Hunt, Toby; Jackson, John; Joardar, Vinita; Kay, Mike P; Kodali, Vamsi K; Martin, Fergal J; McAndrews, Monica; McGarvey, Kelly M; Murphy, Michael; Rajput, Bhanu; Rangwala, Sanjida H; Riddick, Lillian D; Seal, Ruth L; Suner, Marie-Marthe; Webb, David; Zhu, Sophia; Aken, Bronwen L; Bruford, Elspeth A; Bult, Carol J; Frankish, Adam; Murphy, Terence; Pruitt, Kim D
2018-01-04
The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
CHROMA: consensus-based colouring of multiple alignments for publication.
Goodstadt, L; Ponting, C P
2001-09-01
CHROMA annotates multiple protein sequence alignments by consensus to produce formatted and coloured text suitable for incorporation into other documents for publication. The package is designed to be flexible and reliable, and has a simple-to-use graphical user interface running under Microsoft Windows. Both the executables and source code for CHROMA running under Windows and Linux (portable command-line only) are freely available at http://www.lg.ndirect.co.uk/chroma. Software enquiries should be directed to CHROMA@lg.ndirect.co.uk.
Giardina, P; Cannio, R; Martirani, L; Marzullo, L; Palmieri, G; Sannia, G
1995-01-01
The gene (pox1) encoding a phenol oxidase from Pleurotus ostreatus, a lignin-degrading basidiomycete, was cloned and sequenced, and the corresponding pox1 cDNA was also synthesized and sequenced. The isolated gene consists of 2,592 bp, with the coding sequence being interrupted by 19 introns and flanked by an upstream region in which putative CAAT and TATA consensus sequences could be identified at positions -174 and -84, respectively. The isolation of a second cDNA (pox2 cDNA), showing 84% similarity, and of the corresponding truncated genomic clones demonstrated the existence of a multigene family coding for isoforms of laccase in P. ostreatus. PCR amplifications of specific regions on the DNA of isolated monokaryons proved that the two genes are not allelic forms. The POX1 amino acid sequence deduced was compared with those of other known laccases from different fungi. PMID:7793961
Interactive web-based identification and visualization of transcript shared sequences.
Azhir, Alaleh; Merino, Louis-Henri; Nauen, David W
2018-05-12
We have developed TraC (Transcript Consensus), a web-based tool for detecting and visualizing shared sequences among two or more mRNA transcripts such as splice variants. Results including exon-exon boundaries are returned in a highly intuitive, data-rich, interactive plot that permits users to explore the similarities and differences of multiple transcript sequences. The online tool (http://labs.pathology.jhu.edu/nauen/trac/) is free to use. The source code is freely available for download (https://github.com/nauenlab/TraC). Copyright © 2018 Elsevier Inc. All rights reserved.
Quick, Josh; Grubaugh, Nathan D; Pullan, Steven T; Claro, Ingra M; Smith, Andrew D; Gangavarapu, Karthik; Oliveira, Glenn; Robles-Sikisaka, Refugio; Rogers, Thomas F; Beutler, Nathan A; Burton, Dennis R; Lewis-Ximenez, Lia Laura; de Jesus, Jaqueline Goes; Giovanetti, Marta; Hill, Sarah; Black, Allison; Bedford, Trevor; Carroll, Miles W; Nunes, Marcio; Alcantara, Luiz Carlos; Sabino, Ester C; Baylis, Sally A; Faria, Nuno; Loose, Matthew; Simpson, Jared T; Pybus, Oliver G; Andersen, Kristian G; Loman, Nicholas J
2018-01-01
Genome sequencing has become a powerful tool for studying emerging infectious diseases; however, genome sequencing directly from clinical samples without isolation remains challenging for viruses such as Zika, where metagenomic sequencing methods may generate insufficient numbers of viral reads. Here we present a protocol for generating coding-sequence complete genomes comprising an online primer design tool, a novel multiplex PCR enrichment protocol, optimised library preparation methods for the portable MinION sequencer (Oxford Nanopore Technologies) and the Illumina range of instruments, and a bioinformatics pipeline for generating consensus sequences. The MinION protocol does not require an internet connection for analysis, making it suitable for field applications with limited connectivity. Our method relies on multiplex PCR for targeted enrichment of viral genomes from samples containing as few as 50 genome copies per reaction. Viral consensus sequences can be achieved starting with clinical samples in 1-2 days following a simple laboratory workflow. This method has been successfully used by several groups studying Zika virus evolution and is facilitating an understanding of the spread of the virus in the Americas. PMID:28538739
Mechanisms of radiation-induced gene responses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Woloschak, G.E.; Paunesku, T.
1996-10-01
In the process of identifying genes differentially expressed in cells exposed ultraviolet radiation, we have identified a transcript having a 26-bp region that is highly conserved in a variety of species including Bacillus circulans, yeast, pumpkin, Drosophila, mouse, and man. When the 5` region (flanking region or UTR) of a gene, the sequence is predominantly in +/+ orientation with respect to the coding DNA strand; while in the coding region and the 3` region (UTR), the sequence is most frequently in the +/-orientation with respect to the coding DNA strand. In two genes, the element is split into two parts;more » however, in most cases, it is found only once but with a minimum of 11 consecutive nucleotides precisely depicting the original sequence. The element is found in a large number of different genes with diverse functions (from human ras p21 to B. circulans chitonase). Gel shift assays demonstrated the presence of a protein in HeLa cell extracts that binds to the sense and antisense single-stranded consensus oligomers, as well as to the double- stranded oligonucleotide. When double-stranded oligomer was used, the size shift demonstrated as additional protein-oligomer complex larger than the one bound to either sense or antisense single-stranded consensus oligomers alone. It is speculated either that this element binds to protein(s) important in maintaining DNA is a single-stranded orientation for transcription or, alternatively that this element is important in the transcription-coupled DNA repair process.« less
Takagi, M; Kobayashi, N; Sugimoto, M; Fujii, T; Watari, J; Yano, K
1987-01-01
The expression of a LEU gene from Candida maltosa (designated as C-LEU2) isolated previously (Kawamura et al. 1983) was shown to be regulated, when transferred into Saccharomyces cerevisiae, by leucine and threonine in the medium, as in the case of LEU2 gene of S. cerevisiae. The coding region together with the regulatory region was subcloned and the nucleotide sequence was determined. When the sequence of the coding region was compared with that of LEU2, the homology was 72% for base pairs and 76% for deduced amino acids. Comparison of the regulatory region of C-LEU2 with those of LEU1 and LEU2 suggested a few short consensus sequences which are involved in regulation of gene expression by leucine and threonine in the medium.
Grötzinger, Stefan W.; Alam, Intikhab; Ba Alawi, Wail; Bajic, Vladimir B.; Stingl, Ulrich; Eppinger, Jörg
2014-01-01
Reliable functional annotation of genomic data is the key-step in the discovery of novel enzymes. Intrinsic sequencing data quality problems of single amplified genomes (SAGs) and poor homology of novel extremophile's genomes pose significant challenges for the attribution of functions to the coding sequences identified. The anoxic deep-sea brine pools of the Red Sea are a promising source of novel enzymes with unique evolutionary adaptation. Sequencing data from Red Sea brine pool cultures and SAGs are annotated and stored in the Integrated Data Warehouse of Microbial Genomes (INDIGO) data warehouse. Low sequence homology of annotated genes (no similarity for 35% of these genes) may translate into false positives when searching for specific functions. The Profile and Pattern Matching (PPM) strategy described here was developed to eliminate false positive annotations of enzyme function before progressing to labor-intensive hyper-saline gene expression and characterization. It utilizes InterPro-derived Gene Ontology (GO)-terms (which represent enzyme function profiles) and annotated relevant PROSITE IDs (which are linked to an amino acid consensus pattern). The PPM algorithm was tested on 15 protein families, which were selected based on scientific and commercial potential. An initial list of 2577 enzyme commission (E.C.) numbers was translated into 171 GO-terms and 49 consensus patterns. A subset of INDIGO-sequences consisting of 58 SAGs from six different taxons of bacteria and archaea were selected from six different brine pool environments. Those SAGs code for 74,516 genes, which were independently scanned for the GO-terms (profile filter) and PROSITE IDs (pattern filter). Following stringent reliability filtering, the non-redundant hits (106 profile hits and 147 pattern hits) are classified as reliable, if at least two relevant descriptors (GO-terms and/or consensus patterns) are present. Scripts for annotation, as well as for the PPM algorithm, are available through the INDIGO website. PMID:24778629
The GENCODE exome: sequencing the complete human exome
Coffey, Alison J; Kokocinski, Felix; Calafato, Maria S; Scott, Carol E; Palta, Priit; Drury, Eleanor; Joyce, Christopher J; LeProust, Emily M; Harrow, Jen; Hunt, Sarah; Lehesjoki, Anna-Elina; Turner, Daniel J; Hubbard, Tim J; Palotie, Aarno
2011-01-01
Sequencing the coding regions, the exome, of the human genome is one of the major current strategies to identify low frequency and rare variants associated with human disease traits. So far, the most widely used commercial exome capture reagents have mainly targeted the consensus coding sequence (CCDS) database. We report the design of an extended set of targets for capturing the complete human exome, based on annotation from the GENCODE consortium. The extended set covers an additional 5594 genes and 10.3 Mb compared with the current CCDS-based sets. The additional regions include potential disease genes previously inaccessible to exome resequencing studies, such as 43 genes linked to ion channel activity and 70 genes linked to protein kinase activity. In total, the new GENCODE exome set developed here covers 47.9 Mb and performed well in sequence capture experiments. In the sample set used in this study, we identified over 5000 SNP variants more in the GENCODE exome target (24%) than in the CCDS-based exome sequencing. PMID:21364695
WebLogo: A Sequence Logo Generator
Crooks, Gavin E.; Hon, Gary; Chandonia, John-Marc; Brenner, Steven E.
2004-01-01
WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization. PMID:15173120
Regulation of the alpha-glucuronidase-encoding gene ( aguA) from Aspergillus niger.
de Vries, R P; van de Vondervoort, P J I; Hendriks, L; van de Belt, M; Visser, J
2002-09-01
The alpha-glucuronidase gene aguA from Aspergillus niger was cloned and characterised. Analysis of the promoter region of aguA revealed the presence of four putative binding sites for the major carbon catabolite repressor protein CREA and one putative binding site for the transcriptional activator XLNR. In addition, a sequence motif was detected which differed only in the last nucleotide from the XLNR consensus site. A construct in which part of the aguA coding region was deleted still resulted in production of a stable mRNA upon transformation of A. niger. The putative XLNR binding sites and two of the putative CREA binding sites were mutated individually in this construct and the effects on expression were examined in A. niger transformants. Northern analysis of the transformants revealed that the consensus XLNR site is not actually functional in the aguA promoter, whereas the sequence that diverges from the consensus at a single position is functional. This indicates that XLNR is also able to bind to the sequence GGCTAG, and the XLNR binding site consensus should therefore be changed to GGCTAR. Both CREA sites are functional, indicating that CREA has a strong influence on aguA expression. A detailed expression analysis of aguA in four genetic backgrounds revealed a second regulatory system involved in activation of aguA gene expression. This system responds to the presence of glucuronic and galacturonic acids, and is not dependent on XLNR.
Cloning and sequence analysis of the invertase gene INV 1 from the yeast Pichia anomala.
Pérez, J A; Rodríguez, J; Rodríguez, L; Ruiz, T
1996-02-01
A genomic library from the yeast Pichia anomala has been constructed and employed to clone the gene encoding the sucrose-hydrolysing enzyme invertase by complementation of a sucrose non-fermenting mutant of Saccharomyces cerevisiae. The cloned gene, INV1, was sequenced and found to encode a polypeptide of 550 amino acids which contained a 22 amino-acid signal sequence and ten potential glycosylation sites. The amino-acid sequence shows significant identity with other yeast invertases and also with Kluyveromyces marxianus inulinase, a yeast beta-fructofuranosidase which has a different substrate specificity. The nucleotide sequences of the 5' and 3' non-coding regions were found to contain several consensus motifs probably involved in the initiation and termination of gene transcription.
Nowacka-Woszuk, Joanna; Switonski, Marek
2009-01-01
The sex determination process is under the control of several genes of which two (SRY and SOX9), encoding transcription factors, play a crucial role. It is well-known that mutations at these genes may cause the development of an intersexual phenotype. The aim of this study was to conduct a comparative analysis of the coding sequence and 5'-flanking regions of both genes in four species of the family Canidae (the dog, red fox, arctic fox and Chinese raccoon dog). Similarity of the coding sequence of the SOX9 gene among the studied species was higher (99.7-99.9%) than in the case of the SRY gene (96.7-97.3%). Only single nucleotide changes were found in the compared coding sequences, whereas in the 5'-flanking region of both genes nucleotide substitutions, as well as insertions and deletions were observed. None of the changes detected in the 5'-flanking region occurred within the potential consensus sequences for transcription factors. No polymorphism was found for either of these genes in any of the analyzed species.
Montandon, P E; Vasserot, A; Stutz, E
1986-01-01
We retrieved a 1.6 kbp intron separating two exons of the psb C gene which codes for the 44 kDa reaction center protein of photosystem II. This intron is 3 to 4 times the size of all previously sequenced Euglena gracilis chloroplast introns. It contains an open reading frame of 458 codons potentially coding for a basic protein of 54 kDa of yet unknown function. The intron boundaries follow consensus sequences established for chloroplast introns related to class II and nuclear pre-mRNA introns. Its 3'-terminal segment has structural features similar to class II mitochondrial introns with an invariant base A as possible branch point for lariat formation.
Polyomavirus BK non-coding control region rearrangements in health and disease.
Sharma, Preety M; Gupta, Gaurav; Vats, Abhay; Shapiro, Ron; Randhawa, Parmjeet S
2007-08-01
BK virus is an increasingly recognized pathogen in transplanted patients. DNA sequencing of this virus shows considerable genomic variability. To understand the clinical significance of rearrangements in the non-coding control region (NCCR) of BK virus (BKV), we report a meta-analysis of 507 sequences, including 40 sequences generated in our own laboratory, for associations between rearrangements and disease, tissue tropism, geographic origin, and viral genotype. NCCR rearrangements were less frequent in (a) asymptomatic BKV viruria compared to patients viral nephropathy (1.7% vs. 22.5%), and (b) viral genotype 1 compared to other genotypes (2.4% vs. 11.2%). Rearrangements were commoner in malignancy (78.6%), and Norwegians (45.7%), and less common in East Indians (0%), and Japanese (4.3%). A surprising number of rearranged sequences were reported from mononuclear cells of healthy subjects, whereas most plasma sequences were archetypal. This difference could not be related to potential recombinase activity in lymphocytes, as consensus recombination signal sequences could not be found in the NCCR region. NCCR rearrangements are neither required nor a sufficient condition to produce clinical disease. BKV nephropathy and hemorrhagic cystitis are not associated with any unique NCCR configuration or nucleotide sequence.
Splicing factor SFRS1 recognizes a functionally diverse landscape of RNA transcripts.
Sanford, Jeremy R; Wang, Xin; Mort, Matthew; Vanduyn, Natalia; Cooper, David N; Mooney, Sean D; Edenberg, Howard J; Liu, Yunlong
2009-03-01
Metazoan genes are encrypted with at least two superimposed codes: the genetic code to specify the primary structure of proteins and the splicing code to expand their proteomic output via alternative splicing. Here, we define the specificity of a central regulator of pre-mRNA splicing, the conserved, essential splicing factor SFRS1. Cross-linking immunoprecipitation and high-throughput sequencing (CLIP-seq) identified 23,632 binding sites for SFRS1 in the transcriptome of cultured human embryonic kidney cells. SFRS1 was found to engage many different classes of functionally distinct transcripts including mRNA, miRNA, snoRNAs, ncRNAs, and conserved intergenic transcripts of unknown function. The majority of these diverse transcripts share a purine-rich consensus motif corresponding to the canonical SFRS1 binding site. The consensus site was not only enriched in exons cross-linked to SFRS1 in vivo, but was also enriched in close proximity to splice sites. mRNAs encoding RNA processing factors were significantly overrepresented, suggesting that SFRS1 may broadly influence the post-transcriptional control of gene expression in vivo. Finally, a search for the SFRS1 consensus motif within the Human Gene Mutation Database identified 181 mutations in 82 different genes that disrupt predicted SFRS1 binding sites. This comprehensive analysis substantially expands the known roles of human SR proteins in the regulation of a diverse array of RNA transcripts.
Barley whole exome capture: a tool for genomic research in the genus Hordeum and beyond
Mascher, Martin; Richmond, Todd A; Gerhardt, Daniel J; Himmelbach, Axel; Clissold, Leah; Sampath, Dharanya; Ayling, Sarah; Steuernagel, Burkhard; Pfeifer, Matthias; D'Ascenzo, Mark; Akhunov, Eduard D; Hedley, Pete E; Gonzales, Ana M; Morrell, Peter L; Kilian, Benjamin; Blattner, Frank R; Scholz, Uwe; Mayer, Klaus FX; Flavell, Andrew J; Muehlbauer, Gary J; Waugh, Robbie; Jeddeloh, Jeffrey A; Stein, Nils
2013-01-01
Advanced resources for genome-assisted research in barley (Hordeum vulgare) including a whole-genome shotgun assembly and an integrated physical map have recently become available. These have made possible studies that aim to assess genetic diversity or to isolate single genes by whole-genome resequencing and in silico variant detection. However such an approach remains expensive given the 5 Gb size of the barley genome. Targeted sequencing of the mRNA-coding exome reduces barley genomic complexity more than 50-fold, thus dramatically reducing this heavy sequencing and analysis load. We have developed and employed an in-solution hybridization-based sequence capture platform to selectively enrich for a 61.6 megabase coding sequence target that includes predicted genes from the genome assembly of the cultivar Morex as well as publicly available full-length cDNAs and de novo assembled RNA-Seq consensus sequence contigs. The platform provides a highly specific capture with substantial and reproducible enrichment of targeted exons, both for cultivated barley and related species. We show that this exome capture platform provides a clear path towards a broader and deeper understanding of the natural variation residing in the mRNA-coding part of the barley genome and will thus constitute a valuable resource for applications such as mapping-by-sequencing and genetic diversity analyzes. PMID:23889683
Quick, Joshua; Grubaugh, Nathan D; Pullan, Steven T; Claro, Ingra M; Smith, Andrew D; Gangavarapu, Karthik; Oliveira, Glenn; Robles-Sikisaka, Refugio; Rogers, Thomas F; Beutler, Nathan A; Burton, Dennis R; Lewis-Ximenez, Lia Laura; de Jesus, Jaqueline Goes; Giovanetti, Marta; Hill, Sarah C; Black, Allison; Bedford, Trevor; Carroll, Miles W; Nunes, Marcio; Alcantara, Luiz Carlos; Sabino, Ester C; Baylis, Sally A; Faria, Nuno R; Loose, Matthew; Simpson, Jared T; Pybus, Oliver G; Andersen, Kristian G; Loman, Nicholas J
2017-06-01
Genome sequencing has become a powerful tool for studying emerging infectious diseases; however, genome sequencing directly from clinical samples (i.e., without isolation and culture) remains challenging for viruses such as Zika, for which metagenomic sequencing methods may generate insufficient numbers of viral reads. Here we present a protocol for generating coding-sequence-complete genomes, comprising an online primer design tool, a novel multiplex PCR enrichment protocol, optimized library preparation methods for the portable MinION sequencer (Oxford Nanopore Technologies) and the Illumina range of instruments, and a bioinformatics pipeline for generating consensus sequences. The MinION protocol does not require an Internet connection for analysis, making it suitable for field applications with limited connectivity. Our method relies on multiplex PCR for targeted enrichment of viral genomes from samples containing as few as 50 genome copies per reaction. Viral consensus sequences can be achieved in 1-2 d by starting with clinical samples and following a simple laboratory workflow. This method has been successfully used by several groups studying Zika virus evolution and is facilitating an understanding of the spread of the virus in the Americas. The protocol can be used to sequence other viral genomes using the online Primal Scheme primer designer software. It is suitable for sequencing either RNA or DNA viruses in the field during outbreaks or as an inexpensive, convenient method for use in the lab.
Aggregating and Predicting Sequence Labels from Crowd Annotations
Nguyen, An T.; Wallace, Byron C.; Li, Junyi Jessy; Nenkova, Ani; Lease, Matthew
2017-01-01
Despite sequences being core to NLP, scant work has considered how to handle noisy sequence labels from multiple annotators for the same text. Given such annotations, we consider two complementary tasks: (1) aggregating sequential crowd labels to infer a best single set of consensus annotations; and (2) using crowd annotations as training data for a model that can predict sequences in unannotated text. For aggregation, we propose a novel Hidden Markov Model variant. To predict sequences in unannotated text, we propose a neural approach using Long Short Term Memory. We evaluate a suite of methods across two different applications and text genres: Named-Entity Recognition in news articles and Information Extraction from biomedical abstracts. Results show improvement over strong baselines. Our source code and data are available online1. PMID:29093611
Human Splice-Site Prediction with Deep Neural Networks.
Naito, Tatsuhiko
2018-04-18
Accurate splice-site prediction is essential to delineate gene structures from sequence data. Several computational techniques have been applied to create a system to predict canonical splice sites. For classification tasks, deep neural networks (DNNs) have achieved record-breaking results and often outperformed other supervised learning techniques. In this study, a new method of splice-site prediction using DNNs was proposed. The proposed system receives an input sequence data and returns an answer as to whether it is splice site. The length of input is 140 nucleotides, with the consensus sequence (i.e., "GT" and "AG" for the donor and acceptor sites, respectively) in the middle. Each input sequence model is applied to the pretrained DNN model that determines the probability that an input is a splice site. The model consists of convolutional layers and bidirectional long short-term memory network layers. The pretraining and validation were conducted using the data set tested in previously reported methods. The performance evaluation results showed that the proposed method can outperform the previous methods. In addition, the pattern learned by the DNNs was visualized as position frequency matrices (PFMs). Some of PFMs were very similar to the consensus sequence. The trained DNN model and the brief source code for the prediction system are uploaded. Further improvement will be achieved following the further development of DNNs.
Investigating intra-host and intra-herd sequence diversity of foot-and-mouth disease virus.
King, David J; Freimanis, Graham L; Orton, Richard J; Waters, Ryan A; Haydon, Daniel T; King, Donald P
2016-10-01
Due to the poor-fidelity of the enzymes involved in RNA genome replication, foot-and-mouth disease (FMD) virus samples comprise of unique polymorphic populations. In this study, deep sequencing was utilised to characterise the diversity of FMD virus (FMDV) populations in 6 infected cattle present on a single farm during the series of outbreaks in the UK in 2007. A novel RT-PCR method was developed to amplify a 7.6kb nucleotide fragment encompassing the polyprotein coding region of the FMDV genome. Illumina sequencing of each sample identified the fine polymorphic structures at each nucleotide position, from consensus level changes to variants present at a 0.24% frequency. These data were used to investigate population dynamics of FMDV at both herd and host levels, evaluate the impact of host on the viral swarm structure and to identify transmission links with viruses recovered from other farms in the same series of outbreaks. In 7 samples, from 6 different animals, a total of 5 consensus level variants were identified, in addition to 104 sub-consensus variants of which 22 were shared between 2 or more animals. Further analysis revealed differences in swarm structures from samples derived from the same animal suggesting the presence of distinct viral populations evolving independently at different lesion sites within the same infected animal. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Non-Coding RNA Analysis Using the Rfam Database.
Kalvari, Ioanna; Nawrocki, Eric P; Argasinska, Joanna; Quinones-Olvera, Natalia; Finn, Robert D; Bateman, Alex; Petrov, Anton I
2018-06-01
Rfam is a database of non-coding RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. Using a combination of manual and literature-based curation and a custom software pipeline, Rfam converts descriptions of RNA families found in the scientific literature into computational models that can be used to annotate RNAs belonging to those families in any DNA or RNA sequence. Valuable research outputs that are often locked up in figures and supplementary information files are encapsulated in Rfam entries and made accessible through the Rfam Web site. The data produced by Rfam have a broad application, from genome annotation to providing training sets for algorithm development. This article gives an overview of how to search and navigate the Rfam Web site, and how to annotate sequences with RNA families. The Rfam database is freely available at http://rfam.org. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.
Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome.
Bush, Stephen J; Muriuki, Charity; McCulloch, Mary E B; Farquhar, Iseabail L; Clark, Emily L; Hume, David A
2018-04-24
mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci. Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes. Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.
Mirarchi, Ferdinando L; Cooney, Timothy E; Venkat, Arvind; Wang, David; Pope, Thaddeus M; Fant, Abra L; Terman, Stanley A; Klauer, Kevin M; Williams-Murphy, Monica; Gisondi, Michael A; Clemency, Brian; Doshi, Ankur A; Siegel, Mari; Kraemer, Mary S; Aberger, Kate; Harman, Stephanie; Ahuja, Neera; Carlson, Jestin N; Milliron, Melody L; Hart, Kristopher K; Gilbertson, Chelsey D; Wilson, Jason W; Mueller, Larissa; Brown, Lori; Gordon, Bradley D
2017-06-01
End-of-life interventions should be predicated on consensus understanding of patient wishes. Written documents are not always understood; adding a video testimonial/message (VM) might improve clarity. Goals of this study were to (1) determine baseline rates of consensus in assigning code status and resuscitation decisions in critically ill scenarios and (2) determine whether adding a VM increases consensus. We randomly assigned 2 web-based survey links to 1366 faculty and resident physicians at institutions with graduate medical education programs in emergency medicine, family practice, and internal medicine. Each survey asked for code status interpretation of stand-alone Physician Orders for Life-Sustaining Treatment (POLST) and living will (LW) documents in 9 scenarios. Respondents assigned code status and resuscitation decisions to each scenario. For 1 of 2 surveys, a VM was included to help clarify patient wishes. Response rate was 54%, and most were male emergency physicians who lacked formal advanced planning document interpretation training. Consensus was not achievable for stand-alone POLST or LW documents (68%-78% noted "DNR"). Two of 9 scenarios attained consensus for code status (97%-98% responses) and treatment decisions (96%-99%). Adding a VM significantly changed code status responses by 9% to 62% (P ≤ 0.026) in 7 of 9 scenarios with 4 achieving consensus. Resuscitation responses changed by 7% to 57% (P ≤ 0.005) with 4 of 9 achieving consensus with VMs. For most scenarios, consensus was not attained for code status and resuscitation decisions with stand-alone LW and POLST documents. Adding VMs produced significant impacts toward achieving interpretive consensus.
Roca, Alberto I
2014-01-01
The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org.
Duret, Laurent; Cohen, Jean; Jubin, Claire; Dessen, Philippe; Goût, Jean-François; Mousset, Sylvain; Aury, Jean-Marc; Jaillon, Olivier; Noël, Benjamin; Arnaiz, Olivier; Bétermier, Mireille; Wincker, Patrick; Meyer, Eric; Sperling, Linda
2008-01-01
Ciliates are the only unicellular eukaryotes known to separate germinal and somatic functions. Diploid but silent micronuclei transmit the genetic information to the next sexual generation. Polyploid macronuclei express the genetic information from a streamlined version of the genome but are replaced at each sexual generation. The macronuclear genome of Paramecium tetraurelia was recently sequenced by a shotgun approach, providing access to the gene repertoire. The 72-Mb assembly represents a consensus sequence for the somatic DNA, which is produced after sexual events by reproducible rearrangements of the zygotic genome involving elimination of repeated sequences, precise excision of unique-copy internal eliminated sequences (IES), and amplification of the cellular genes to high copy number. We report use of the shotgun sequencing data (>106 reads representing 13× coverage of a completely homozygous clone) to evaluate variability in the somatic DNA produced by these developmental genome rearrangements. Although DNA amplification appears uniform, both of the DNA elimination processes produce sequence heterogeneity. The variability that arises from IES excision allowed identification of hundreds of putative new IESs, compared to 42 that were previously known, and revealed cases of erroneous excision of segments of coding sequences. We demonstrate that IESs in coding regions are under selective pressure to introduce premature termination of translation in case of excision failure. PMID:18256234
DOE Office of Scientific and Technical Information (OSTI.GOV)
Prody, C.A.; Zevin-Sonkin, D.; Gnatt, A.
1987-06-01
To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase and Torpedo electric organ true acetylcholinesterase. Using these probes, the authors isolated several cDNA clones from lambdagt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. Inmore » RNA blots of poly(A)/sup +/ RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These finding demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species.« less
Gomes, S L; Gober, J W; Shapiro, L
1990-01-01
Caulobacter crescentus has a single dnaK gene that is highly homologous to the hsp70 family of heat shock genes. Analysis of the cloned and sequenced dnaK gene has shown that the deduced amino acid sequence could encode a protein of 67.6 kilodaltons that is 68% identical to the DnaK protein of Escherichia coli and 49% identical to the Drosophila and human hsp70 protein family. A partial open reading frame 165 base pairs 3' to the end of dnaK encodes a peptide of 190 amino acids that is 59% identical to DnaJ of E. coli. Northern blot analysis revealed a single 4.0-kilobase mRNA homologous to the cloned fragment. Since the dnaK coding region is 1.89 kilobases, dnaK and dnaJ may be transcribed as a polycistronic message. S1 mapping and primer extension experiments showed that transcription initiated at two sites 5' to the dnaK coding sequence. A single start site of transcription was identified during heat shock at 42 degrees C, and the predicted promoter sequence conformed to the consensus heat shock promoters of E. coli. At normal growth temperature (30 degrees C), a different start site was identified 3' to the heat shock start site that conformed to the E. coli sigma 70 promoter consensus sequence. S1 protection assays and analysis of expression of the dnaK gene fused to the lux transcription reporter gene showed that expression of dnaK is temporally controlled under normal physiological conditions and that transcription occurs just before the initiation of DNA replication. Thus, in both human cells (I. K. L. Milarski and R. I. Morimoto, Proc. Natl. Acad. Sci. USA 83:9517-9521, 1986) and in a simple bacterium, the transcription of a hsp70 gene is temporally controlled as a function of the cell cycle under normal growth conditions. Images PMID:2345134
Nucleotide sequences of two genomic DNAs encoding peroxidase of Arabidopsis thaliana.
Intapruk, C; Higashimura, N; Yamamoto, K; Okada, N; Shinmyo, A; Takano, M
1991-02-15
The peroxidase (EC 1.11.1.7)-encoding gene of Arabidopsis thaliana was screened from a genomic library using a cDNA encoding a neutral isozyme of horseradish, Armoracia rusticana, peroxidase (HRP) as a probe, and two positive clones were isolated. From the comparison with the sequences of the HRP-encoding genes, we concluded that two clones contained peroxidase-encoding genes, and they were named prxCa and prxEa. Both genes consisted of four exons and three introns; the introns had consensus nucleotides, GT and AG, at the 5' and 3' ends, respectively. The lengths of each putative exon of the prxEa gene were the same as those of the HRP-basic-isozyme-encoding gene, prxC3, and coded for 349 amino acids (aa) with a sequence homology of 89% to that encoded by prxC3. The prxCa gene was very close to the HRP-neutral-isozyme-encoding gene, prxC1b, and coded for 354 aa with 91% homology to that encoded by prxC1b. The aa sequence homology was 64% between the two peroxidases encoded by prxCa and prxEa.
Genomic Structure of the Luciferase Gene from the Bioluminescent Beetle, Nyctophila cf. Caucasica
Day, John C.; Chaichi, Mohammad J.; Najafil, Iraj; Whiteley, Andrew S.
2006-01-01
The gene coding for beetle luciferase, the enzyme responsible for bioluminescence in over two thousand coleopteran species has, to date, only been characterized from one Palearctic species of Lampyridae. Here we report the characterization of the luciferase gene from a female beetle of an Iranian lampyrid species, Nyctophila cf. caucasica (Coleoptera:Lampyridae). The luciferase gene was composed of seven exons, coding for 547 amino acids, separated by six introns spanning 1976 bp of genomic DNA. The deduced amino acid sequences of the luciferase gene of N. caucasica showed 98.9% homology to that of the Palearctic species Lampyris noctiluca. Analysis of the 810 bp upstream region of the luciferase gene revealed three TATA boxes and several other consensus transcriptional factor recognition sequences presenting evidence for a putative core promoter region conserved in Lampyrinae from -190 through to -155 upstream of the luciferase start codon. Along with the core promoter region the luciferase gene was compared with orthologous sequences from other lampyrid species and found to have greatest identity to Lampyris turkistanicus and Lampyris noctiluca. The significant sequence identity to the former is discussed in relation to taxonomic issues of Iranian lampyrids. PMID:20298115
Specific DNA binding of the two chicken Deformed family homeodomain proteins, Chox-1.4 and Chox-a.
Sasaki, H; Yokoyama, E; Kuroiwa, A
1990-01-01
The cDNA clones encoding two chicken Deformed (Dfd) family homeobox containing genes Chox-1.4 and Chox-a were isolated. Comparison of their amino acid sequences with another chicken Dfd family homeodomain protein and with those of mouse homologues revealed that strong homologies are located in the amino terminal regions and around the homeodomains. Although homologies in other regions were relatively low, some short conserved sequences were also identified. E. coli-made full length proteins were purified and used for the production of specific antibodies and for DNA binding studies. The binding profiles of these proteins to the 5'-leader and 5'-upstream sequences of Chox-1.4 and Chox-a coding regions were analyzed by immunoprecipitation and DNase I footprint assays. These two Chox proteins bound to the same sites in the 5'-flanking sequences of their coding regions with various affinities and their binding affinities to each site were nearly the same. The consensus sequences of the high and low affinity binding sites were TAATGA(C/G) and CTAATTTT, respectively. A clustered binding site was identified in the 5'-upstream of the Chox-a gene, suggesting that this clustered binding site works as a cis-regulatory element for auto- and/or cross-regulation of Chox-a gene expression. Images PMID:1970866
Haas, Brian J; Salzberg, Steven L; Zhu, Wei; Pertea, Mihaela; Allen, Jonathan E; Orvis, Joshua; White, Owen; Buell, C Robin; Wortman, Jennifer R
2008-01-01
EVidenceModeler (EVM) is presented as an automated eukaryotic gene structure annotation tool that reports eukaryotic gene structures as a weighted consensus of all available evidence. EVM, when combined with the Program to Assemble Spliced Alignments (PASA), yields a comprehensive, configurable annotation system that predicts protein-coding genes and alternatively spliced isoforms. Our experiments on both rice and human genome sequences demonstrate that EVM produces automated gene structure annotation approaching the quality of manual curation. PMID:18190707
Khrustalev, Vladislav Victorovich
2009-01-01
Guanine is the most mutable nucleotide in HIV genes because of frequently occurring G to A transitions, which are caused by cytosine deamination in viral DNA minus strands catalyzed by APOBEC enzymes. Distribution of guanine between three codon positions should influence the probability for G to A mutation to be nonsynonymous (to occur in first or second codon position). We discovered that nucleotide sequences of env genes coding for third variable regions (V3 loops) of gp120 from HIV1 and HIV2 have different kinds of guanine usage biases. In the HIV1 reference strain and 100 additionally analyzed HIV1 strains the guanine usage bias in V3 loop coding regions (2G>1G>3G) should lead to elevated nonsynonymous G to A transitions occurrence rates. In the HIV2 reference strain and 100 other HIV2 strains guanine usage bias in V3 loop coding regions (3G>2G>1G) should protect V3 loops from hypermutability. According to the HIV1 and HIV2 V3 alignment, insertion of the sequence enriched with 2G (21 codons in length) occurred during the evolution of HIV1 predecessor, while insertion of the different sequence enriched with 3G (19 codons in length) occurred during the evolution of HIV2 predecessor. The higher is the level of 3G in the V3 coding region, the lower should be the immune escaping mutation occurrence rates. This hypothesis was tested in this study by comparing the guanine usage in V3 loop coding regions from HIV1 fast and slow progressors. All calculations have been performed by our algorithms "VVK In length", "VVK Dinucleotides" and "VVK Consensus" (www.barkovsky.hotmail.ru).
Molecular cloning of crustins from the hemocytes of Brazilian penaeid shrimps.
Rosa, Rafael Diego; Bandeira, Paula Terra; Barracco, Margherita Anna
2007-09-01
Crustins are antimicrobial peptides initially identified in the hemocytes of the crab Carcinus maenas (11.5-kDa peptide or carcinin) and recently also recognized in penaeid shrimps and other crustacean species. The aim of this study was to identify sequences encoding for crustins from the hemocytes of four Brazilian penaeid species: Farfantepenaeus paulensis, Farfantepenaeus subtilis, Farfantepenaeus brasiliensis and Litopenaeus schmitti. Using primers based on consensus nucleotide alignment of crustins from different crustaceans, cDNA sequences coding for crustins in all indigenous penaeid species were amplified. The obtained four crustin sequences encoded for peptides containing a hydrophobic N-terminal region rich in glycine repeats and a C-terminal part with 12 cysteine residues and a conserved whey acidic protein domain. All obtained crustin sequences showed high amino acidic similarity among each other and with crustins from litopenaeid shrimps (76-98%). This is the first report of crustins in native Brazilian penaeid shrimps.
Aggarwal, Gautam; Worthey, E A; McDonagh, Paul D; Myler, Peter J
2003-06-07
Seattle Biomedical Research Institute (SBRI) as part of the Leishmania Genome Network (LGN) is sequencing chromosomes of the trypanosomatid protozoan species Leishmania major. At SBRI, chromosomal sequence is annotated using a combination of trained and untrained non-consensus gene-prediction algorithms with ARTEMIS, an annotation platform with rich and user-friendly interfaces. Here we describe a methodology used to import results from three different protein-coding gene-prediction algorithms (GLIMMER, TESTCODE and GENESCAN) into the ARTEMIS sequence viewer and annotation tool. Comparison of these methods, along with the CODONUSAGE algorithm built into ARTEMIS, shows the importance of combining methods to more accurately annotate the L. major genomic sequence. An improvised and powerful tool for gene prediction has been developed by importing data from widely-used algorithms into an existing annotation platform. This approach is especially fruitful in the Leishmania genome project where there is large proportion of novel genes requiring manual annotation.
Archaebacterial rhodopsin sequences: Implications for evolution
NASA Technical Reports Server (NTRS)
Lanyi, J. K.
1991-01-01
It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.
2014-01-01
Background The 2013 BioVis Contest provided an opportunity to evaluate different paradigms for visualizing protein multiple sequence alignments. Such data sets are becoming extremely large and thus taxing current visualization paradigms. Sequence Logos represent consensus sequences but have limitations for protein alignments. As an alternative, ProfileGrids are a new protein sequence alignment visualization paradigm that represents an alignment as a color-coded matrix of the residue frequency occurring at every homologous position in the aligned protein family. Results The JProfileGrid software program was used to analyze the BioVis contest data sets to generate figures for comparison with the Sequence Logo reference images. Conclusions The ProfileGrid representation allows for the clear and effective analysis of protein multiple sequence alignments. This includes both a general overview of the conservation and diversity sequence patterns as well as the interactive ability to query the details of the protein residue distributions in the alignment. The JProfileGrid software is free and available from http://www.ProfileGrid.org. PMID:25237393
Takai, T; Nishita, Y; Iguchi-Ariga, S M; Ariga, H
1994-01-01
We have previously reported the human cDNA encoding MSSP-1, a sequence-specific double- and single-stranded DNA binding protein [Negishi, Nishita, Saëgusa, Kakizaki, Galli, Kihara, Tamai, Miyajima, Iguchi-Ariga and Ariga (1994) Oncogene, 9, 1133-1143]. MSSP-1 binds to a DNA replication origin/transcriptional enhancer of the human c-myc gene and has turned out to be identical with Scr2, a human protein which complements the defect of cdc2 kinase in S.pombe [Kataoka and Nojima (1994) Nucleic Acid Res., 22, 2687-2693]. We have cloned the cDNA for MSSP-2, another member of the MSSP family of proteins. The MSSP-2 cDNA shares highly homologous sequences with MSSP-1 cDNA, except for the insertion of 48 bp coding 16 amino acids near the C-terminus. Like MSSP-1, MSSP-2 has RNP-1 consensus sequences. The results of the experiments using bacterially expressed MSSP-2, and its deletion mutants, as histidine fusion proteins suggested that the binding specificity of MSSP-2 to double- and single-stranded DNA is the same as that of MSSP-1, and that the RNP consensus sequences are required for the DNA binding of the protein. MSSP-2 stimulated the DNA replication of an SV40-derived plasmid containing the binding sequence for MSSP-1 or -2. MSSP-2 is hence suggested to play an important role in regulation of DNA replication. Images PMID:7838710
NASA Astrophysics Data System (ADS)
Prasetyo, Afiono Agung; Dharmawan, Ruben; Sari, Yulia; Sariyatun, Ratna
2017-02-01
Human immunodeficiency virus type 1 (HIV-1) remains a cause of global health problem. Continuous studies of HIV-1 genetic and immunological profiles are important to find strategies against the virus. This study aimed to conduct analysis of sequence conservation, HLA-E-restricted peptide, and best-defined CTL/CD8+ epitopes in p24 (capsid) of HIV-1 subtype B worldwide. The p24-coding sequences from 3,557 HIV subtype B isolates were aligned using MUSCLE and analysed. Some highly conserved regions (sequence conservation ≥95%) were observed. Two considerably long series of sequences with conservation of 100% was observed at base 349-356 and 550-557 of p24 (HXB2 numbering). The consensus from all aligned isolates was precisely the same as consensus B in the Los Alamos HIV Database. The HLA-E-restricted peptide in amino acid (aa) 14-22 of HIV-1 p24 (AISPRTLNA) was found in 55.9% (1,987/3,557) of HIV-1 subtype B worldwide. Forty-four best-defined CTL/CD8+ epitopes were observed, in which VKNWMTETL epitope (aa 181-189 of p24) restricted by B*4801 was the most frequent, as found in 94.9% of isolates. The results of this study would contribute information about HIV-1 subtype B and benefits for further works willing to develop diagnostic and therapeutic strategies against the virus.
Prody, C A; Zevin-Sonkin, D; Gnatt, A; Goldberg, O; Soreq, H
1987-01-01
To study the primary structure and regulation of human cholinesterases, oligodeoxynucleotide probes were prepared according to a consensus peptide sequence present in the active site of both human serum pseudocholinesterase (BtChoEase; EC 3.1.1.8) and Torpedo electric organ "true" acetylcholinesterase (AcChoEase; EC 3.1.1.7). Using these probes, we isolated several cDNA clones from lambda gt10 libraries of fetal brain and liver origins. These include 2.4-kilobase cDNA clones that code for a polypeptide containing a putative signal peptide and the N-terminal, active site, and C-terminal peptides of human BtChoEase, suggesting that they code either for BtChoEase itself or for a very similar but distinct fetal form of cholinesterase. In RNA blots of poly(A)+ RNA from the cholinesterase-producing fetal brain and liver, these cDNAs hybridized with a single 2.5-kilobase band. Blot hybridization to human genomic DNA revealed that these fetal BtChoEase cDNA clones hybridize with DNA fragments of the total length of 17.5 kilobases, and signal intensities indicated that these sequences are not present in many copies. Both the cDNA-encoded protein and its nucleotide sequence display striking homology to parallel sequences published for Torpedo AcChoEase. These findings demonstrate extensive homologies between the fetal BtChoEase encoded by these clones and other cholinesterases of various forms and species. Images PMID:3035536
Singh, Vinod Kumar; Krishnamachari, Annangarachari
2016-09-01
Genome-wide experimental studies in Saccharomyces cerevisiae reveal that autonomous replicating sequence (ARS) requires an essential consensus sequence (ACS) for replication activity. Computational studies identified thousands of ACS like patterns in the genome. However, only a few hundreds of these sites act as replicating sites and the rest are considered as dormant or evolving sites. In a bid to understand the sequence makeup of replication sites, a content and context-based analysis was performed on a set of replicating ACS sequences that binds to origin-recognition complex (ORC) denoted as ORC-ACS and non-replicating ACS sequences (nrACS), that are not bound by ORC. In this study, DNA properties such as base composition, correlation, sequence dependent thermodynamic and DNA structural profiles, and their positions have been considered for characterizing ORC-ACS and nrACS. Analysis reveals that ORC-ACS depict marked differences in nucleotide composition and context features in its vicinity compared to nrACS. Interestingly, an A-rich motif was also discovered in ORC-ACS sequences within its nucleosome-free region. Profound changes in the conformational features, such as DNA helical twist, inclination angle and stacking energy between ORC-ACS and nrACS were observed. Distribution of ACS motifs in the non-coding segments points to the locations of ORC-ACS which are found far away from the adjacent gene start position compared to nrACS thereby enabling an accessible environment for ORC-proteins. Our attempt is novel in considering the contextual view of ACS and its flanking region along with nucleosome positioning in the S. cerevisiae genome and may be useful for any computational prediction scheme.
Link, Gerhard
1984-01-01
A nuclease-treated plastid extract from mustard (Sinapis alba L.) allows efficient transcription of cloned plastid DNA templates. In this in vitro system, the major runoff transcript of the truncated gene for the 32 000 mol. wt. photosystem II protein was accurately initiated from a site close to or identical with the in vivo start site. By using plasmids with deletions in the 5'-flanking region of this gene as templates, a DNA region required for efficient and selective initiation was detected ˜28-35 nucleotides upstream of the transcription start site. This region contains the sequence element TTGACA, which matches the consensus sequence for prokaryotic `−35' promoter elements. In the absence of this region, a region ˜13-27 nucleotides upstream of the start site still enables a basic level of specific transcription. This second region contains the sequence element TATATAA, which matches the consensus sequence for the `TATA' box of genes transcribed by RNA polymerase II (or B). The region between the `TATA'-like element and the transcription start site is not sufficient but may be required for specific transcription of the plastid gene. This latter region contains the sequence element TATACT, which resembles the prokaryotic `−10' (Pribnow) box. Based on the structural and transcriptional features of the 5' upstream region, a `promoter switch' mechanism is proposed, which may account for the developmentally regulated expression of this plastid gene. ImagesFig. 1.Fig. 2.Fig. 3.Fig. 4.Figure 5. PMID:16453540
Analysis of the regulatory region of the protease III (ptr) gene of Escherichia coli K-12.
Claverie-Martin, F; Diaz-Torres, M R; Kushner, S R
1987-01-01
The ptr gene of Escherichia coli encodes protease III (Mr 110,000) and a 50-kDa polypeptide, both of which are found in the periplasmic space. The gene is physically located between the recC and recB loci on the E. coli chromosome. The nucleotide sequence of a 1167-bp EcoRV-ClaI fragment of chromosomal DNA containing the promoter region and 885 bp of the ptr coding sequence has been determined. S1 nuclease mapping analysis showed that the major 5' end of the ptr mRNA was localized 127 bp upstream from the ATG start codon. The open reading frame (ORF), preceded by a Shine-Dalgarno sequence, extends to the end of the sequenced DNA. Downstream from the -35 and -10 regions is a sequence that strongly fits the consensus sequence of known nitrogen-regulated promoters. A signal peptide of 23 amino acids residues is present at the N terminus of the derived amino acid sequence. The cleavage site as well as the ORF were confirmed by sequencing the N terminus of mature protease III.
Jurka, Jerzy W.
1997-01-01
Enhanced homologous recombination is obtained by employing a consensus sequence which has been found to be associated with integration of repeat sequences, such as Alu and ID. The consensus sequence or sequence having a single transition mutation determines one site of a double break which allows for high efficiency of integration at the site. By introducing single or double stranded DNA having the consensus sequence flanking region joined to a sequence of interest, one can reproducibly direct integration of the sequence of interest at one or a limited number of sites. In this way, specific sites can be identified and homologous recombination achieved at the site by employing a second flanking sequence associated with a sequence proximal to the 3'-nick.
A Molecular Phylogeny of Hemiptera Inferred from Mitochondrial Genome Sequences
Song, Nan; Liang, Ai-Ping; Bu, Cui-Ping
2012-01-01
Classically, Hemiptera is comprised of two suborders: Homoptera and Heteroptera. Homoptera includes Cicadomorpha, Fulgoromorpha and Sternorrhyncha. However, according to previous molecular phylogenetic studies based on 18S rDNA, Fulgoromorpha has a closer relationship to Heteroptera than to other hemipterans, leaving Homoptera as paraphyletic. Therefore, the position of Fulgoromorpha is important for studying phylogenetic structure of Hemiptera. We inferred the evolutionary affiliations of twenty-five superfamilies of Hemiptera using mitochondrial protein-coding genes and rRNAs. We sequenced three mitogenomes, from Pyrops candelaria, Lycorma delicatula and Ricania marginalis, representing two additional families in Fulgoromorpha. Pyrops and Lycorma are representatives of an additional major family Fulgoridae in Fulgoromorpha, whereas Ricania is a second representative of the highly derived clade Ricaniidae. The organization and size of these mitogenomes are similar to those of the sequenced fulgoroid species. Our consensus phylogeny of Hemiptera largely supported the relationships (((Fulgoromorpha,Sternorrhyncha),Cicadomorpha),Heteroptera), and thus supported the classic phylogeny of Hemiptera. Selection of optimal evolutionary models (exclusion and inclusion of two rRNA genes or of third codon positions of protein-coding genes) demonstrated that rapidly evolving and saturated sites should be removed from the analyses. PMID:23144967
To Clone or Not To Clone: Method Analysis for Retrieving Consensus Sequences In Ancient DNA Samples
Winters, Misa; Barta, Jodi Lynn; Monroe, Cara; Kemp, Brian M.
2011-01-01
The challenges associated with the retrieval and authentication of ancient DNA (aDNA) evidence are principally due to post-mortem damage which makes ancient samples particularly prone to contamination from “modern” DNA sources. The necessity for authentication of results has led many aDNA researchers to adopt methods considered to be “gold standards” in the field, including cloning aDNA amplicons as opposed to directly sequencing them. However, no standardized protocol has emerged regarding the necessary number of clones to sequence, how a consensus sequence is most appropriately derived, or how results should be reported in the literature. In addition, there has been no systematic demonstration of the degree to which direct sequences are affected by damage or whether direct sequencing would provide disparate results from a consensus of clones. To address this issue, a comparative study was designed to examine both cloned and direct sequences amplified from ∼3,500 year-old ancient northern fur seal DNA extracts. Majority rules and the Consensus Confidence Program were used to generate consensus sequences for each individual from the cloned sequences, which exhibited damage at 31 of 139 base pairs across all clones. In no instance did the consensus of clones differ from the direct sequence. This study demonstrates that, when appropriate, cloning need not be the default method, but instead, should be used as a measure of authentication on a case-by-case basis, especially when this practice adds time and cost to studies where it may be superfluous. PMID:21738625
Amexis, Georgios; Rubin, Steven; Chatterjee, Nando; Carbone, Kathryn; Chumakov, Kostantin
2003-06-01
A single clinical isolate of mumps virus designated 88-1961 was obtained from a patient hospitalized with a clinical history of upper respiratory tract infection, parotitis, severe headache, fever and lymphadenopathy. We have sequenced the full-length genome of 88-1961 and compared it against all available full-length sequences of mumps virus. Based upon its nucleotide sequence of the SH gene 88-1961 was identified as a genotype H mumps strain. The overall extent of nucleotide and amino acid differences between each individual gene and protein of 88-1961 and the full-length mumps samples showed that the missense to silent ratios were unevenly distributed. Upon evaluation of the consensus sequence of 88-1961, four positions were found to be clearly heterogeneous at the nucleotide level (NP 315C/T, NP 318C/T, F 271A/C, and HN 855C/T). Sequence analysis revealed that the amino acid sequences for the NP, M, and the L protein were the most conserved, whereas the SH protein exhibited the highest variability among the compared mumps genotypes A, B, and G. No identifying molecular patterns in the non-coding (intergenic) or coding regions of 88-1961 were found when we compared it against relatively virulent (Urabe AM9 B, Glouc1/UK96, 87-1004 and 87-1005) and non-virulent mumps strains (Jeryl Lynn and all Urabe Am9 A substrains). Copyright 2003 Wiley-Liss, Inc.
RNA-Seq analysis and transcriptome assembly for blackberry (Rubus sp. Var. Lochness) fruit.
Garcia-Seco, Daniel; Zhang, Yang; Gutierrez-Mañero, Francisco J; Martin, Cathie; Ramos-Solano, Beatriz
2015-01-22
There is an increasing interest in berries, especially blackberries in the diet, because of recent reports of their health benefits due to their high content of flavonoids. A broad range of genomic tools are available for other Rosaceae species but these tools are still lacking in the Rubus genus, thus limiting gene discovery and the breeding of improved varieties. De novo RNA-seq of ripe blackberries grown under field conditions was performed using Illumina Hiseq 2000. Almost 9 billion nucleotide bases were sequenced in total. Following assembly, 42,062 consensus sequences were detected. For functional annotation, 33,040 (NR), 32,762 (NT), 21,932 (Swiss-Prot), 20,134 (KEGG), 13,676 (COG), 24,168 (GO) consensus sequences were annotated using different databases; in total 34,552 annotated sequences were identified. For protein prediction analysis, the number of coding DNA sequences (CDS) that mapped to the protein database was 32,540. Non redundant (NR), annotation showed that 25,418 genes (73.5%) has the highest similarity with Fragaria vesca subspecies vesca. Reanalysis was undertaken by aligning the reads with this reference genome for a deeper analysis of the transcriptome. We demonstrated that de novo assembly, using Trinity and later annotation with Blast using different databases, were complementary to alignment to the reference sequence using SOAPaligner/SOAP2. The Fragaria reference genome belongs to a species in the same family as blackberry (Rosaceae) but to a different genus. Since blackberries are tetraploids, the possibility of artefactual gene chimeras resulting from mis-assembly was tested with one of the genes sequenced by RNAseq, Chalcone Synthase (CHS). cDNAs encoding this protein were cloned and sequenced. Primers designed to the assembled sequences accurately distinguished different contigs, at least for chalcone synthase genes. We prepared and analysed transcriptome data from ripe blackberries, for which prior genomic information was limited. This new sequence information will improve the knowledge of this important and healthy fruit, providing an invaluable new tool for biological research.
Fernández-Caballero Rico, Jose Ángel; Chueca Porcuna, Natalia; Álvarez Estévez, Marta; Mosquera Gutiérrez, María Del Mar; Marcos Maeso, María Ángeles; García, Federico
2018-02-01
To show how to generate a consensus sequence from the information of massive parallel sequences data obtained from routine HIV anti-retroviral resistance studies, and that may be suitable for molecular epidemiology studies. Paired Sanger (Trugene-Siemens) and next-generation sequencing (NGS) (454 GSJunior-Roche) HIV RT and protease sequences from 62 patients were studied. NGS consensus sequences were generated using Mesquite, using 10%, 15%, and 20% thresholds. Molecular evolutionary genetics analysis (MEGA) was used for phylogenetic studies. At a 10% threshold, NGS-Sanger sequences from 17/62 patients were phylogenetically related, with a median bootstrap-value of 88% (IQR83.5-95.5). Association increased to 36/62 sequences, median bootstrap 94% (IQR85.5-98)], using a 15% threshold. Maximum association was at the 20% threshold, with 61/62 sequences associated, and a median bootstrap value of 99% (IQR98-100). A safe method is presented to generate consensus sequences from HIV-NGS data at 20% threshold, which will prove useful for molecular epidemiological studies. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
[Identification of Tibetan medicine "Dida" of Gentianaceae using DNA barcoding].
Liu, Chuan; Zhang, Yu-Xin; Liu, Yue; Chen, Yi-Long; Fan, Gang; Xiang, Li; Xu, Jiang; Zhang, Yi
2016-02-01
The ITS2 barcode was used toidentify Tibetan medicine "Dida", and tosecure its quality and safety in medication. A total of 13 species, 151 experimental samples for the study from the Tibetan Plateau, including Gentianaceae Swertia, Halenia, Gentianopsis, Comastoma, Lomatogonium ITS2 sequences were amplified, and purified PCR products were sequenced. Sequence assembly and consensus sequence generation were performed using the CodonCode Aligner V3.7.1. The Kimura 2-Parameter (K2P) distances were calculated using MEGA 6.0. The neighbor-joining (NJ) phylogenetic trees were constructed. There are 31 haplotypes among 231 bp after alignment of all ITS2 sequence haplotypes, and the average G±C content of 61.40%. The NJ tree strongly supported that every species clustered into their own clade and high identification success rate, except that Swertia bifolia and Swertia wolfangiana could not be distinguished from each other based on the sequence divergences. DNA barcoding could be used as a fast and accurate identification method to distinguish Tibetan medicine "Dida" to ensure its safe use. Copyright© by the Chinese Pharmaceutical Association.
A comparative analysis of exome capture.
Parla, Jennifer S; Iossifov, Ivan; Grabill, Ian; Spector, Mona S; Kramer, Melissa; McCombie, W Richard
2011-09-29
Human exome resequencing using commercial target capture kits has been and is being used for sequencing large numbers of individuals to search for variants associated with various human diseases. We rigorously evaluated the capabilities of two solution exome capture kits. These analyses help clarify the strengths and limitations of those data as well as systematically identify variables that should be considered in the use of those data. Each exome kit performed well at capturing the targets they were designed to capture, which mainly corresponds to the consensus coding sequences (CCDS) annotations of the human genome. In addition, based on their respective targets, each capture kit coupled with high coverage Illumina sequencing produced highly accurate nucleotide calls. However, other databases, such as the Reference Sequence collection (RefSeq), define the exome more broadly, and so not surprisingly, the exome kits did not capture these additional regions. Commercial exome capture kits provide a very efficient way to sequence select areas of the genome at very high accuracy. Here we provide the data to help guide critical analyses of sequencing data derived from these products.
Aboussekhra, A; Chanet, R; Zgaga, Z; Cassier-Chauvat, C; Heude, M; Fabre, F
1989-09-25
A new type of radiation-sensitive mutant of S. cerevisiae is described. The recessive radH mutation sensitizes to the lethal effect of UV radiations haploids in the G1 but not in the G2 mitotic phase. Homozygous diploids are as sensitive as G1 haploids. The UV-induced mutagenesis is depressed, while the induction of gene conversion is increased. The mutation is believed to channel the repair of lesions engaged in the mutagenic pathway into a recombination process, successful if the events involve sister-chromatids but lethal if they involve homologous chromosomes. The sequence of the RADH gene reveals that it may code for a DNA helicase, with a Mr of 134 kDa. All the consensus domains of known DNA helicases are present. Besides these consensus regions, strong homologies with the Rep and UvrD helicases of E. coli were found. The RadH putative helicase appears to belong to the set of proteins involved in the error-prone repair mechanism, at least for UV-induced lesions, and could act in coordination with the Rev3 error-prone DNA polymerase.
Transcriptome Analysis and Comparison of Marmota monax and Marmota himalayana.
Liu, Yanan; Wang, Baoju; Wang, Lu; Vikash, Vikash; Wang, Qin; Roggendorf, Michael; Lu, Mengji; Yang, Dongliang; Liu, Jia
2016-01-01
The Eastern woodchuck (Marmota monax) is a classical animal model for studying hepatitis B virus (HBV) infection and hepatocellular carcinoma (HCC) in humans. Recently, we found that Marmota himalayana, an Asian animal species closely related to Marmota monax, is susceptible to woodchuck hepatitis virus (WHV) infection and can be used as a new mammalian model for HBV infection. However, the lack of genomic sequence information of both Marmota models strongly limited their application breadth and depth. To address this major obstacle of the Marmota models, we utilized Illumina RNA-Seq technology to sequence the cDNA libraries of liver and spleen samples of two Marmota monax and four Marmota himalayana. In total, over 13 billion nucleotide bases were sequenced and approximately 1.5 billion clean reads were obtained. Following assembly, 106,496 consensus sequences of Marmota monax and 78,483 consensus sequences of Marmota himalayana were detected. For functional annotation, in total 73,603 Unigenes of Marmota monax and 78,483 Unigenes of Marmota himalayana were identified using different databases (NR, NT, Swiss-Prot, KEGG, COG, GO). The Unigenes were aligned by blastx to protein databases to decide the coding DNA sequences (CDS) and in total 41,247 CDS of Marmota monax and 34,033 CDS of Marmota himalayana were predicted. The single nucleotide polymorphisms (SNPs) and the simple sequence repeats (SSRs) were also analyzed for all Unigenes obtained. Moreover, a large-scale transcriptome comparison was performed and revealed a high similarity in transcriptome sequences between the two marmota species. Our study provides an extensive amount of novel sequence information for Marmota monax and Marmota himalayana. This information may serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the identification and characterization of functional genes that are involved in WHV infection and HCC development in the woodchuck model.
Transcriptome Analysis and Comparison of Marmota monax and Marmota himalayana
Wang, Lu; Vikash, Vikash; Wang, Qin; Roggendorf, Michael; Lu, Mengji; Yang, Dongliang; Liu, Jia
2016-01-01
The Eastern woodchuck (Marmota monax) is a classical animal model for studying hepatitis B virus (HBV) infection and hepatocellular carcinoma (HCC) in humans. Recently, we found that Marmota himalayana, an Asian animal species closely related to Marmota monax, is susceptible to woodchuck hepatitis virus (WHV) infection and can be used as a new mammalian model for HBV infection. However, the lack of genomic sequence information of both Marmota models strongly limited their application breadth and depth. To address this major obstacle of the Marmota models, we utilized Illumina RNA-Seq technology to sequence the cDNA libraries of liver and spleen samples of two Marmota monax and four Marmota himalayana. In total, over 13 billion nucleotide bases were sequenced and approximately 1.5 billion clean reads were obtained. Following assembly, 106,496 consensus sequences of Marmota monax and 78,483 consensus sequences of Marmota himalayana were detected. For functional annotation, in total 73,603 Unigenes of Marmota monax and 78,483 Unigenes of Marmota himalayana were identified using different databases (NR, NT, Swiss-Prot, KEGG, COG, GO). The Unigenes were aligned by blastx to protein databases to decide the coding DNA sequences (CDS) and in total 41,247 CDS of Marmota monax and 34,033 CDS of Marmota himalayana were predicted. The single nucleotide polymorphisms (SNPs) and the simple sequence repeats (SSRs) were also analyzed for all Unigenes obtained. Moreover, a large-scale transcriptome comparison was performed and revealed a high similarity in transcriptome sequences between the two marmota species. Our study provides an extensive amount of novel sequence information for Marmota monax and Marmota himalayana. This information may serve as a valuable genomics resource for further molecular, developmental and comparative evolutionary studies, as well as for the identification and characterization of functional genes that are involved in WHV infection and HCC development in the woodchuck model. PMID:27806133
SSMART: Sequence-structure motif identification for RNA-binding proteins.
Munteanu, Alina; Mukherjee, Neelanjan; Ohler, Uwe
2018-06-11
RNA-binding proteins (RBPs) regulate every aspect of RNA metabolism and function. There are hundreds of RBPs encoded in the eukaryotic genomes, and each recognize its RNA targets through a specific mixture of RNA sequence and structure properties. For most RBPs, however, only a primary sequence motif has been determined, while the structure of the binding sites is uncharacterized. We developed SSMART, an RNA motif finder that simultaneously models the primary sequence and the structural properties of the RNA targets sites. The sequence-structure motifs are represented as consensus strings over a degenerate alphabet, extending the IUPAC codes for nucleotides to account for secondary structure preferences. Evaluation on synthetic data showed that SSMART is able to recover both sequence and structure motifs implanted into 3'UTR-like sequences, for various degrees of structured/unstructured binding sites. In addition, we successfully used SSMART on high-throughput in vivo and in vitro data, showing that we not only recover the known sequence motif, but also gain insight into the structural preferences of the RBP. Availability: SSMART is freely available at https://ohlerlab.mdc-berlin.de/software/SSMART_137/. Supplementary data are available at Bioinformatics online.
Galeano, Carlos H.; Fernandez, Andrea C.; Franco-Herrera, Natalia; Cichy, Karen A.; McClean, Phillip E.; Vanderleyden, Jos; Blair, Matthew W.
2011-01-01
Map-based cloning and fine mapping to find genes of interest and marker assisted selection (MAS) requires good genetic maps with reproducible markers. In this study, we saturated the linkage map of the intra-gene pool population of common bean DOR364×BAT477 (DB) by evaluating 2,706 molecular markers including SSR, SNP, and gene-based markers. On average the polymorphism rate was 7.7% due to the narrow genetic base between the parents. The DB linkage map consisted of 291 markers with a total map length of 1,788 cM. A consensus map was built using the core mapping populations derived from inter-gene pool crosses: DOR364×G19833 (DG) and BAT93×JALO EEP558 (BJ). The consensus map consisted of a total of 1,010 markers mapped, with a total map length of 2,041 cM across 11 linkage groups. On average, each linkage group on the consensus map contained 91 markers of which 83% were single copy markers. Finally, a synteny analysis was carried out using our highly saturated consensus maps compared with the soybean pseudo-chromosome assembly. A total of 772 marker sequences were compared with the soybean genome. A total of 44 syntenic blocks were identified. The linkage group Pv6 presented the most diverse pattern of synteny with seven syntenic blocks, and Pv9 showed the most consistent relations with soybean with just two syntenic blocks. Additionally, a co-linear analysis using common bean transcript map information against soybean coding sequences (CDS) revealed the relationship with 787 soybean genes. The common bean consensus map has allowed us to map a larger number of markers, to obtain a more complete coverage of the common bean genome. Our results, combined with synteny relationships provide tools to increase marker density in selected genomic regions to identify closely linked polymorphic markers for indirect selection, fine mapping or for positional cloning. PMID:22174773
Embedding strategies for effective use of information from multiple sequence alignments.
Henikoff, S.; Henikoff, J. G.
1997-01-01
We describe a new strategy for utilizing multiple sequence alignment information to detect distant relationships in searches of sequence databases. A single sequence representing a protein family is enriched by replacing conserved regions with position-specific scoring matrices (PSSMs) or consensus residues derived from multiple alignments of family members. In comprehensive tests of these and other family representations, PSSM-embedded queries produced the best results overall when used with a special version of the Smith-Waterman searching algorithm. Moreover, embedding consensus residues instead of PSSMs improved performance with readily available single sequence query searching programs, such as BLAST and FASTA. Embedding PSSMs or consensus residues into a representative sequence improves searching performance by extracting multiple alignment information from motif regions while retaining single sequence information where alignment is uncertain. PMID:9070452
Reformation of Regulatory Technical Standards for Nuclear Power Generation Equipments in Japan
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mikio Kurihara; Masahiro Aoki; Yu Maruyama
2006-07-01
Comprehensive reformation of the regulatory system has been introduced in Japan in order to apply recent technical progress in a timely manner. 'The Technical Standards for Nuclear Power Generation Equipments', known as the Ordinance No.622) of the Ministry of International Trade and Industry, which is used for detailed design, construction and operating stage of Nuclear Power Plants, was being modified to performance specifications with the consensus codes and standards being used as prescriptive specifications, in order to facilitate prompt review of the Ordinance with response to technological innovation. The activities on modification were performed by the Nuclear and Industrial Safetymore » Agency (NISA), the regulatory body in Japan, with support of the Japan Nuclear Energy Safety Organization (JNES), a technical support organization. The revised Ordinance No.62 was issued on July 1, 2005 and is enforced from January 1 2006. During the period from the issuance to the enforcement, JNES carried out to prepare enforceable regulatory guide which complies with each provisions of the Ordinance No.62, and also made technical assessment to endorse the applicability of consensus codes and standards, in response to NISA's request. Some consensus codes and standards were re-assessed since they were already used in regulatory review of the construction plan submitted by licensee. Other consensus codes and standards were newly assessed for endorsement. In case that proper consensus code or standards were not prepared, details of regulatory requirements were described in the regulatory guide as immediate measures. At the same time, appropriate standards developing bodies were requested to prepare those consensus code or standards. Supplementary note which provides background information on the modification, applicable examples etc. was prepared for convenience to the users of the Ordinance No. 62. This paper shows the activities on modification and the results, following the NISA's presentation at ICONE-13 that introduced the framework of the performance specifications and the modification process of the Ordinance NO. 62. (authors)« less
Multiple splicing defects in an intronic false exon.
Sun, H; Chasin, L A
2000-09-01
Splice site consensus sequences alone are insufficient to dictate the recognition of real constitutive splice sites within the typically large transcripts of higher eukaryotes, and large numbers of pseudoexons flanked by pseudosplice sites with good matches to the consensus sequences can be easily designated. In an attempt to identify elements that prevent pseudoexon splicing, we have systematically altered known splicing signals, as well as immediately adjacent flanking sequences, of an arbitrarily chosen pseudoexon from intron 1 of the human hprt gene. The substitution of a 5' splice site that perfectly matches the 5' consensus combined with mutation to match the CAG/G sequence of the 3' consensus failed to get this model pseudoexon included as the central exon in a dhfr minigene context. Provision of a real 3' splice site and a consensus 5' splice site and removal of an upstream inhibitory sequence were necessary and sufficient to confer splicing on the pseudoexon. This activated context also supported the splicing of a second pseudoexon sequence containing no apparent enhancer. Thus, both the 5' splice site sequence and the polypyrimidine tract of the pseudoexon are defective despite their good agreement with the consensus. On the other hand, the pseudoexon body did not exert a negative influence on splicing. The introduction into the pseudoexon of a sequence selected for binding to ASF/SF2 or its replacement with beta-globin exon 2 only partially reversed the effect of the upstream negative element and the defective polypyrimidine tract. These results support the idea that exon-bridging enhancers are not a prerequisite for constitutive exon definition and suggest that intrinsically defective splice sites and negative elements play important roles in distinguishing the real splicing signal from the vast number of false splicing signals.
Isolation and characterization of the gene coding for Escherichia coli arginyl-tRNA synthetase.
Eriani, G; Dirheimer, G; Gangloff, J
1989-01-01
The gene coding for Escherichia coli arginyl-tRNA synthetase (argS) was isolated as a fragment of 2.4 kb after analysis and subcloning of recombinant plasmids from the Clarke and Carbon library. The clone bearing the gene overproduces arginyl-tRNA synthetase by a factor 100. This means that the enzyme represents more than 20% of the cellular total protein content. Sequencing revealed that the fragment contains a unique open reading frame of 1734 bp flanked at its 5' and 3' ends respectively by 247 bp and 397 bp. The length of the corresponding protein (577 aa) is well consistent with earlier Mr determination (about 70 kd). Primer extension analysis of the ArgRS mRNA by reverse transcriptase, located its 5' end respectively at 8 and 30 nucleotides downstream of a TATA and a TTGAC like element (CTGAC) and 60 nucleotides upstream of the unusual translation initiation codon GUG; nuclease S1 analysis located the 3'-end at 48 bp downstream of the translation termination codon. argS has a codon usage pattern typical for highly expressed E. coli genes. With the exception of the presence of a HVGH sequence similar to the HIGH consensus element, ArgRS has no relevant sequence homologies with other aminoacyl-tRNA synthetases. Images PMID:2668891
Jerjos, Michael; Hohman, Baily; Lauterbur, M. Elise; Kistler, Logan
2017-01-01
Abstract Several taxonomically distinct mammalian groups—certain microbats and cetaceans (e.g., dolphins)—share both morphological adaptations related to echolocation behavior and strong signatures of convergent evolution at the amino acid level across seven genes related to auditory processing. Aye-ayes (Daubentonia madagascariensis) are nocturnal lemurs with a specialized auditory processing system. Aye-ayes tap rapidly along the surfaces of trees, listening to reverberations to identify the mines of wood-boring insect larvae; this behavior has been hypothesized to functionally mimic echolocation. Here we investigated whether there are signals of convergence in auditory processing genes between aye-ayes and known mammalian echolocators. We developed a computational pipeline (Basic Exon Assembly Tool) that produces consensus sequences for regions of interest from shotgun genomic sequencing data for nonmodel organisms without requiring de novo genome assembly. We reconstructed complete coding region sequences for the seven convergent echolocating bat–dolphin genes for aye-ayes and another lemur. We compared sequences from these two lemurs in a phylogenetic framework with those of bat and dolphin echolocators and appropriate nonecholocating outgroups. Our analysis reaffirms the existence of amino acid convergence at these loci among echolocating bats and dolphins; some methods also detected signals of convergence between echolocating bats and both mice and elephants. However, we observed no significant signal of amino acid convergence between aye-ayes and echolocating bats and dolphins, suggesting that aye-aye tap-foraging auditory adaptations represent distinct evolutionary innovations. These results are also consistent with a developing consensus that convergent behavioral ecology does not reliably predict convergent molecular evolution. PMID:28810710
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, S.I.; Wirth, D.F.
1988-06-01
The 5' ends of Leishmania mRNAs contain an identical 35-nucleotide sequence termed the spliced leader (SL) or 5' mini-exon. The SL sequence is at the 5' end of an 85-nucleotide primary transcript that contains a consensus eucaryotic 5' intron-exon splice junction immediately 3' to the SL. The SL is added to protein-coding genes immediately 3' to a consensus eucaryotic 3' intron-exon splice junction. The authors' previous work demonstrated possible intermediates in discontinuous mRNA processing that contain the 50 nucleotides of the SL primary transcript 3' to the SL, the SL intron sequence (SLIS). These RNAs have a 5' terminus atmore » the splice junction of the SL and the SLIS. The authors examined a Leishmania nuclear extract for these RNAs in ribonucleoprotein (RNP) particles. Density centrifugation analysis showed that the SL RNA is predominately in RNP complexes at 60S, while the SLIS-containing RNAs are in complexes at 40S. They also demonstrated that the SLIS can be released from polyadenylated RNA by incubation with a HeLa cell extract containing debranching enzymatic activity. These data suggested that Leishmania enriettii mRNAs are assembled by bimolecular or trans splicing as has been recently demonstrated for Trypanosoma brucei. Furthermore, they determined the partial sequence of the Leishmania U2 equivalent RNA and demonstrated that it cosediments with the SL RNA at 60S in a nuclear extract. These RNP particles may be analogous to so-called spliceosomes that have been demonstrated in other systems.« less
Kiryu, Hisanori; Kin, Taishin; Asai, Kiyoshi
2007-02-15
Recent transcriptomic studies have revealed the existence of a considerable number of non-protein-coding RNA transcripts in higher eukaryotic cells. To investigate the functional roles of these transcripts, it is of great interest to find conserved secondary structures from multiple alignments on a genomic scale. Since multiple alignments are often created using alignment programs that neglect the special conservation patterns of RNA secondary structures for computational efficiency, alignment failures can cause potential risks of overlooking conserved stem structures. We investigated the dependence of the accuracy of secondary structure prediction on the quality of alignments. We compared three algorithms that maximize the expected accuracy of secondary structures as well as other frequently used algorithms. We found that one of our algorithms, called McCaskill-MEA, was more robust against alignment failures than others. The McCaskill-MEA method first computes the base pairing probability matrices for all the sequences in the alignment and then obtains the base pairing probability matrix of the alignment by averaging over these matrices. The consensus secondary structure is predicted from this matrix such that the expected accuracy of the prediction is maximized. We show that the McCaskill-MEA method performs better than other methods, particularly when the alignment quality is low and when the alignment consists of many sequences. Our model has a parameter that controls the sensitivity and specificity of predictions. We discussed the uses of that parameter for multi-step screening procedures to search for conserved secondary structures and for assigning confidence values to the predicted base pairs. The C++ source code that implements the McCaskill-MEA algorithm and the test dataset used in this paper are available at http://www.ncrna.org/papers/McCaskillMEA/. Supplementary data are available at Bioinformatics online.
Li, Fan; Ma, Liying; Feng, Yi; Hu, Jing; Ni, Na; Ruan, Yuhua; Shao, Yiming
2017-06-01
HIV-1 transmission in intravenous drug users (IDUs) has been characterized by high genetic multiplicity and suggests a greater challenge for HIV-1 infection blocking. We investigated a total of 749 sequences of full-length gp160 gene obtained by single genome sequencing (SGS) from 22 HIV-1 early infected IDUs in Xinjiang province, northwest China, and generated a transmitted and founder virus (T/F virus) consensus sequence (IDU.CON). The T/F virus was classified as subtype CRF07_BC and predicted to be CCR5-tropic virus. The variable region (V1, V2, and V4 loop) of IDU.CON showed length variation compared with the heterosexual T/F virus consensus sequence (HSX.CON) and homosexual T/F virus consensus sequence (MSM.CON). A total of 26 N-linked glycosylation sites were discovered in the IDU.CON sequence, which is less than that of MSM.CON and HSX.CON. Characterization of T/F virus from IDUs highlights the genetic make-up and complexity of virus near the moment of transmission or in early infection preceding systemic dissemination and is important toward the development of an effective HIV-1 preventive methods, including vaccines.
Nucleotide sequence of the gag gene and gag-pol junction of feline leukemia virus.
Laprevotte, I; Hampe, A; Sherr, C J; Galibert, F
1984-01-01
The nucleotide sequence of the gag gene of feline leukemia virus and its flanking sequences were determined and compared with the corresponding sequences of two strains of feline sarcoma virus and with that of the Moloney strain of murine leukemia virus. A high degree of nucleotide sequence homology between the feline leukemia virus and murine leukemia virus gag genes was observed, suggesting that retroviruses of domestic cats and laboratory mice have a common, proximal evolutionary progenitor. The predicted structure of the complete feline leukemia virus gag gene precursor suggests that the translation of nonglycosylated and glycosylated gag gene polypeptides is initiated at two different AUG codons. These initiator codons fall in the same reading frame and are separated by a 222-base-pair segment which encodes an amino terminal signal peptide. The nucleotide sequence predicts the order of amino acids in each of the individual gag-coded proteins (p15, p12, p30, p10), all of which derive from the gag gene precursor. Stable stem-and-loop secondary structures are proposed for two regions of viral RNA. The first falls within sequences at the 5' end of the viral genome, together with adjacent palindromic sequences which may play a role in dimer linkage of RNA subunits. The second includes coding sequences at the gag-pol junction and is proposed to be involved in translation of the pol gene product. Sequence analysis of the latter region shows that the gag and pol genes are translated in different reading frames. Classical consensus splice donor and acceptor sequences could not be localized to regions which would permit synthesis of the expected gag-pol precursor protein. Alternatively, we suggest that the pol gene product (RNA-dependent DNA polymerase) could be translated by a frameshift suppressing mechanism which could involve cleavage modification of stems and loops in a manner similar to that observed in tRNA processing. PMID:6328019
Nelson, Christopher S; Fuller, Chris K; Fordyce, Polly M; Greninger, Alexander L; Li, Hao; DeRisi, Joseph L
2013-07-01
The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein's DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2's-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved.
Nelson, Christopher S.; Fuller, Chris K.; Fordyce, Polly M.; Greninger, Alexander L.; Li, Hao; DeRisi, Joseph L.
2013-01-01
The transcription factor forkhead box P2 (FOXP2) is believed to be important in the evolution of human speech. A mutation in its DNA-binding domain causes severe speech impairment. Humans have acquired two coding changes relative to the conserved mammalian sequence. Despite intense interest in FOXP2, it has remained an open question whether the human protein’s DNA-binding specificity and chromatin localization are conserved. Previous in vitro and ChIP-chip studies have provided conflicting consensus sequences for the FOXP2-binding site. Using MITOMI 2.0 microfluidic affinity assays, we describe the binding site of FOXP2 and its affinity profile in base-specific detail for all substitutions of the strongest binding site. We find that human and chimp FOXP2 have similar binding sites that are distinct from previously suggested consensus binding sites. Additionally, through analysis of FOXP2 ChIP-seq data from cultured neurons, we find strong overrepresentation of a motif that matches our in vitro results and identifies a set of genes with FOXP2 binding sites. The FOXP2-binding sites tend to be conserved, yet we identified 38 instances of evolutionarily novel sites in humans. Combined, these data present a comprehensive portrait of FOXP2’s-binding properties and imply that although its sequence specificity has been conserved, some of its genomic binding sites are newly evolved. PMID:23625967
DOE Office of Scientific and Technical Information (OSTI.GOV)
Burkhardt E.; Adham, I.M.; Brosig, B.
1994-03-01
Leydig insulin-like protein (LEY I-L) is a member of the insulin-like hormone superfamily. The LEY I-L gene (designated INSL3) is expressed exclusively in prenatal and postnatal Leydig cells. The authors report here the cloning and nucleotide sequence of porcine and human LEY I-L genes including the 5[prime] regions. Both genes consist of two exons and one intron. The organization of the LEY I-L gene is similar to that of insulin and relaxin. The transcription start site in the porcine and human LEY I-L gene is localized 13 and 14 bp upstream of the translation start site, respectively. Alignment of themore » 5[prime] flanking regions of both genes reveals that the first 107 nucleotides upstream of the transcription start site exhibit an overall sequence similarity of 80%. This conserved region contains a consensus TATAA box, a CAAT-like element (GAAT), and a consensus SP1 sequence (GGGCGG) at equivalent positions in both genes and therefore may play a role in regulation of expression of the LEY I-L gene. The porcine and human genome contains a single copy of the LEY I-L gene. By in situ hybridization, the human gene was assigned to bands p13.2-p12 of the short arm of chromosome 19. 25 refs., 6 figs.« less
Structure and genomic organization of the human B1 receptor gene for kinins (BDKRB1).
Bachvarov, D R; Hess, J F; Menke, J G; Larrivée, J F; Marceau, F
1996-05-01
Two subtypes of mammalian bradykinin receptors, B1 and B2 (BDKRB1 and BDKRB2), have been defined based on their pharmacological properties. The B1 type kinin receptors have weak affinity for intact BK or Lys-BK but strong affinity for kinin metabolites without the C-terminal arginine (e.g., des-Arg9-BK and Lys-des-Arg9-BK, also called des-Arg10-kallidin), which are generated by kininase I. The B1 receptor expression is up-regulated following tissue injury and inflammation (hyperemia, exudation, hyperalgesia, etc.). In the present study, we have cloned and sequenced the gene encoding human B1 receptor from a human genomic library. The human B1 receptor gene contains three exons separated by two introns. The first and the second exon are noncoding, while the coding region and the 3'-flanking region are located entirely on the third exon. The exon-intron arrangement of the human B1 receptor gene shows significant similarity with the genes encoding the B2 receptor subtype in human, mouse, and rat. Sequence analysis of the 5'-flanking region revealed the presence of a consensus TATA box and of numerous candidate transcription factor binding sequences. Primer extension experiments have shown the existence of multiple transcription initiation sites situated downstream and upstream from the consensus TATA box. Genomic Southern blot analysis indicated that the human B1 receptor is encoded by a single-copy gene.
Zhou, Gaofeng; Jian, Jianbo; Wang, Penghao; Li, Chengdao; Tao, Ye; Li, Xuan; Renshaw, Daniel; Clements, Jonathan; Sweetingham, Mark; Yang, Huaan
2018-01-01
An ultra-high density genetic map containing 34,574 sequence-defined markers was developed in Lupinus angustifolius. Markers closely linked to nine genes of agronomic traits were identified. A physical map was improved to cover 560.5 Mb genome sequence. Lupin (Lupinus angustifolius L.) is a recently domesticated legume grain crop. In this study, we applied the restriction-site associated DNA sequencing (RADseq) method to genotype an F 9 recombinant inbred line population derived from a wild type × domesticated cultivar (W × D) cross. A high density linkage map was developed based on the W × D population. By integrating sequence-defined DNA markers reported in previous mapping studies, we established an ultra-high density consensus genetic map, which contains 34,574 markers consisting of 3508 loci covering 2399 cM on 20 linkage groups. The largest gap in the entire consensus map was 4.73 cM. The high density W × D map and the consensus map were used to develop an improved physical map, which covered 560.5 Mb of genome sequence data. The ultra-high density consensus linkage map, the improved physical map and the markers linked to genes of breeding interest reported in this study provide a common tool for genome sequence assembly, structural genomics, comparative genomics, functional genomics, QTL mapping, and molecular plant breeding in lupin.
Listorti, Valeria; Laconi, Andrea; Catelli, Elena; Cecchinato, Mattia; Lupini, Caterina; Naylor, Clive J
2017-10-09
IBV genotype QX causes sufficient disease in Europe for several commercial companies to have started developing live attenuated vaccines. Here, one of those vaccines (L1148) was fully consensus sequenced alongside its progenitor field strain (1148-A) to determine vaccine markers, thereby enabling detection on farms. Twenty-eight single nucleotide substitutions were associated with the 1148-A attenuation, of which any combination can identify vaccine L1148 in the field. Sixteen substitutions resulted in amino acid coding changes of which half were in spike. One change in the 1b gene altered the normally highly conserved final 5 nucleotides of the transcription regulatory sequence of the S gene, common to all IBV QX genes. No mutations can currently be associated with the attenuation process. Field vaccination strategies would greatly benefit by such comparative sequence data being mandatorily submitted to regulators prior to vaccine release following a successful registration process. Copyright © 2017. Published by Elsevier Ltd.
Structure of the coding region and mRNA variants of the apyrase gene from pea (Pisum sativum)
NASA Technical Reports Server (NTRS)
Shibata, K.; Abe, S.; Davies, E.
2001-01-01
Partial amino acid sequences of a 49 kDa apyrase (ATP diphosphohydrolase, EC 3.6.1.5) from the cytoskeletal fraction of etiolated pea stems were used to derive oligonucleotide DNA primers to generate a cDNA fragment of pea apyrase mRNA by RT-PCR and these primers were used to screen a pea stem cDNA library. Two almost identical cDNAs differing in just 6 nucleotides within the coding regions were found, and these cDNA sequences were used to clone genomic fragments by PCR. Two nearly identical gene fragments containing 8 exons and 7 introns were obtained. One of them (H-type) encoded the mRNA sequence described by Hsieh et al. (1996) (DDBJ/EMBL/GenBank Z32743), while the other (S-type) differed by the same 6 nucleotides as the mRNAs, suggesting that these genes may be alleles. The six nucleotide differences between these two alleles were found solely in the first exon, and these mutation sites had two types of consensus sequences. These mRNAs were found with varying lengths of 3' untranslated regions (3'-UTR). There are some similarities between the 3'-UTR of these mRNAs and those of actin and actin binding proteins in plants. The putative roles of the 3'-UTR and alternative polyadenylation sites are discussed in relation to their possible role in targeting the mRNAs to different subcellular compartments.
Allison, Kimberly H; Reisch, Lisa M; Carney, Patricia A; Weaver, Donald L; Schnitt, Stuart J; O’Malley, Frances P; Geller, Berta M; Elmore, Joann G
2015-01-01
Aims To gain a better understanding of the reasons for diagnostic variability, with the aim of reducing the phenomenon. Methods and results In preparation for a study on the interpretation of breast specimens (B-PATH), a panel of three experienced breast pathologists reviewed 336 cases to develop consensus reference diagnoses. After independent assessment, cases coded as diagnostically discordant were discussed at consensus meetings. By the use of qualitative data analysis techniques, transcripts of 16 h of consensus meetings for a subset of 201 cases were analysed. Diagnostic variability could be attributed to three overall root causes: (i) pathologist-related; (ii) diagnostic coding/study methodology-related; and (iii) specimen-related. Most pathologist-related root causes were attributable to professional differences in pathologists’ opinions about whether the diagnostic criteria for a specific diagnosis were met, most frequently in cases of atypia. Diagnostic coding/study methodology-related root causes were primarily miscategorizations of descriptive text diagnoses, which led to the development of a standardized electronic diagnostic form (BPATH-Dx). Specimen-related root causes included artefacts, limited diagnostic material, and poor slide quality. After re-review and discussion, a consensus diagnosis could be assigned in all cases. Conclusions Diagnostic variability is related to multiple factors, but consensus conferences, standardized electronic reporting formats and comments on suboptimal specimen quality can be used to reduce diagnostic variability. PMID:24511905
The influence of ignoring secondary structure on divergence time estimates from ribosomal RNA genes.
Dohrmann, Martin
2014-02-01
Genes coding for ribosomal RNA molecules (rDNA) are among the most popular markers in molecular phylogenetics and evolution. However, coevolution of sites that code for pairing regions (stems) in the RNA secondary structure can make it challenging to obtain accurate results from such loci. While the influence of ignoring secondary structure on multiple sequence alignment and tree topology has been investigated in numerous studies, its effect on molecular divergence time estimates is still poorly known. Here, I investigate this issue in Bayesian Markov Chain Monte Carlo (BMCMC) and penalized likelihood (PL) frameworks, using empirical datasets from dragonflies (Odonata: Anisoptera) and glass sponges (Porifera: Hexactinellida). My results indicate that highly biased inferences under substitution models that ignore secondary structure only occur if maximum-likelihood estimates of branch lengths are used as input to PL dating, whereas in a BMCMC framework and in PL dating based on Bayesian consensus branch lengths, the effect is far less severe. I conclude that accounting for coevolution of paired sites in molecular dating studies is not as important as previously suggested, as long as the estimates are based on Bayesian consensus branch lengths instead of ML point estimates. This finding is especially relevant for studies where computational limitations do not allow the use of secondary-structure specific substitution models, or where accurate consensus structures cannot be predicted. I also found that the magnitude and direction (over- vs. underestimating node ages) of bias in age estimates when secondary structure is ignored was not distributed randomly across the nodes of the phylogenies, a phenomenon that requires further investigation. Copyright © 2013 Elsevier Inc. All rights reserved.
Kuhn, Jens H.; Andersen, Kristian G.; Bào, Yīmíng; Bavari, Sina; Becker, Stephan; Bennett, Richard S.; Bergman, Nicholas H.; Blinkova, Olga; Bradfute, Steven; Brister, J. Rodney; Bukreyev, Alexander; Chandran, Kartik; Chepurnov, Alexander A.; Davey, Robert A.; Dietzgen, Ralf G.; Doggett, Norman A.; Dolnik, Olga; Dye, John M.; Enterlein, Sven; Fenimore, Paul W.; Formenty, Pierre; Freiberg, Alexander N.; Garry, Robert F.; Garza, Nicole L.; Gire, Stephen K.; Gonzalez, Jean-Paul; Griffiths, Anthony; Happi, Christian T.; Hensley, Lisa E.; Herbert, Andrew S.; Hevey, Michael C.; Hoenen, Thomas; Honko, Anna N.; Ignatyev, Georgy M.; Jahrling, Peter B.; Johnson, Joshua C.; Johnson, Karl M.; Kindrachuk, Jason; Klenk, Hans-Dieter; Kobinger, Gary; Kochel, Tadeusz J.; Lackemeyer, Matthew G.; Lackner, Daniel F.; Leroy, Eric M.; Lever, Mark S.; Mühlberger, Elke; Netesov, Sergey V.; Olinger, Gene G.; Omilabu, Sunday A.; Palacios, Gustavo; Panchal, Rekha G.; Park, Daniel J.; Patterson, Jean L.; Paweska, Janusz T.; Peters, Clarence J.; Pettitt, James; Pitt, Louise; Radoshitzky, Sheli R.; Ryabchikova, Elena I.; Saphire, Erica Ollmann; Sabeti, Pardis C.; Sealfon, Rachel; Shestopalov, Aleksandr M.; Smither, Sophie J.; Sullivan, Nancy J.; Swanepoel, Robert; Takada, Ayato; Towner, Jonathan S.; van der Groen, Guido; Volchkov, Viktor E.; Volchkova, Valentina A.; Wahl-Jensen, Victoria; Warren, Travis K.; Warfield, Kelly L.; Weidmann, Manfred; Nichol, Stuart T.
2014-01-01
Sequence determination of complete or coding-complete genomes of viruses is becoming common practice for supporting the work of epidemiologists, ecologists, virologists, and taxonomists. Sequencing duration and costs are rapidly decreasing, sequencing hardware is under modification for use by non-experts, and software is constantly being improved to simplify sequence data management and analysis. Thus, analysis of virus disease outbreaks on the molecular level is now feasible, including characterization of the evolution of individual virus populations in single patients over time. The increasing accumulation of sequencing data creates a management problem for the curators of commonly used sequence databases and an entry retrieval problem for end users. Therefore, utilizing the data to their fullest potential will require setting nomenclature and annotation standards for virus isolates and associated genomic sequences. The National Center for Biotechnology Information’s (NCBI’s) RefSeq is a non-redundant, curated database for reference (or type) nucleotide sequence records that supplies source data to numerous other databases. Building on recently proposed templates for filovirus variant naming [
Collins, Richard A; Stajich, Jason E; Field, Deborah J; Olive, Joan E; DeAbreu, Diane M
2015-05-01
When we expressed a small (0.9 kb) nonprotein-coding transcript derived from the mitochondrial VS plasmid in the nucleus of Neurospora we found that it was efficiently spliced at one or more of eight 5' splice sites and ten 3' splice sites, which are present apparently by chance in the sequence. Further experimental and bioinformatic analyses of other mitochondrial plasmids, random sequences, and natural nuclear genes in Neurospora and other fungi indicate that fungal spliceosomes recognize a wide range of 5' splice site and branchpoint sequences and predict introns to be present at high frequency in random sequence. In contrast, analysis of intronless fungal nuclear genes indicates that branchpoint, 5' splice site and 3' splice site consensus sequences are underrepresented compared with random sequences. This underrepresentation of splicing signals is sufficient to deplete the nuclear genome of splice sites at locations that do not comprise biologically relevant introns. Thus, the splicing machinery can recognize a wide range of splicing signal sequences, but splicing still occurs with great accuracy, not because the splicing machinery distinguishes correct from incorrect introns, but because incorrect introns are substantially depleted from the genome. © 2015 Collins et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Bankoff, Richard J; Jerjos, Michael; Hohman, Baily; Lauterbur, M Elise; Kistler, Logan; Perry, George H
2017-07-01
Several taxonomically distinct mammalian groups-certain microbats and cetaceans (e.g., dolphins)-share both morphological adaptations related to echolocation behavior and strong signatures of convergent evolution at the amino acid level across seven genes related to auditory processing. Aye-ayes (Daubentonia madagascariensis) are nocturnal lemurs with a specialized auditory processing system. Aye-ayes tap rapidly along the surfaces of trees, listening to reverberations to identify the mines of wood-boring insect larvae; this behavior has been hypothesized to functionally mimic echolocation. Here we investigated whether there are signals of convergence in auditory processing genes between aye-ayes and known mammalian echolocators. We developed a computational pipeline (Basic Exon Assembly Tool) that produces consensus sequences for regions of interest from shotgun genomic sequencing data for nonmodel organisms without requiring de novo genome assembly. We reconstructed complete coding region sequences for the seven convergent echolocating bat-dolphin genes for aye-ayes and another lemur. We compared sequences from these two lemurs in a phylogenetic framework with those of bat and dolphin echolocators and appropriate nonecholocating outgroups. Our analysis reaffirms the existence of amino acid convergence at these loci among echolocating bats and dolphins; some methods also detected signals of convergence between echolocating bats and both mice and elephants. However, we observed no significant signal of amino acid convergence between aye-ayes and echolocating bats and dolphins, suggesting that aye-aye tap-foraging auditory adaptations represent distinct evolutionary innovations. These results are also consistent with a developing consensus that convergent behavioral ecology does not reliably predict convergent molecular evolution. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Fujibuchi, Wataru; Anderson, John S. J.; Landsman, David
2001-01-01
Consensus pattern and matrix-based searches designed to predict cis-acting transcriptional regulatory sequences have historically been subject to large numbers of false positives. We sought to decrease false positives by incorporating expression profile data into a consensus pattern-based search method. We have systematically analyzed the expression phenotypes of over 6000 yeast genes, across 121 expression profile experiments, and correlated them with the distribution of 14 known regulatory elements over sequences upstream of the genes. Our method is based on a metric we term probabilistic element assessment (PEA), which is a ranking of potential sites based on sequence similarity in the upstream regions of genes with similar expression phenotypes. For eight of the 14 known elements that we examined, our method had a much higher selectivity than a naïve consensus pattern search. Based on our analysis, we have developed a web-based tool called PROSPECT, which allows consensus pattern-based searching of gene clusters obtained from microarray data. PMID:11574681
Chutiwitoonchai, Nopporn; Kakisaka, Michinori; Yamada, Kazunori; Aida, Yoko
2014-01-01
The assembly of influenza virus progeny virions requires machinery that exports viral genomic ribonucleoproteins from the cell nucleus. Currently, seven nuclear export signal (NES) consensus sequences have been identified in different viral proteins, including NS1, NS2, M1, and NP. The present study examined the roles of viral NES consensus sequences and their significance in terms of viral replication and nuclear export. Mutation of the NP-NES3 consensus sequence resulted in a failure to rescue viruses using a reverse genetics approach, whereas mutation of the NS2-NES1 and NS2-NES2 sequences led to a strong reduction in viral replication kinetics compared with the wild-type sequence. While the viral replication kinetics for other NES mutant viruses were also lower than those of the wild-type, the difference was not so marked. Immunofluorescence analysis after transient expression of NP-NES3, NS2-NES1, or NS2-NES2 proteins in host cells showed that they accumulated in the cell nucleus. These results suggest that the NP-NES3 consensus sequence is mostly required for viral replication. Therefore, each of the hydrophobic (Φ) residues within this NES consensus sequence (Φ1, Φ2, Φ3, or Φ4) was mutated, and its viral replication and nuclear export function were analyzed. No viruses harboring NP-NES3 Φ2 or Φ3 mutants could be rescued. Consistent with this, the NP-NES3 Φ2 and Φ3 mutants showed reduced binding affinity with CRM1 in a pull-down assay, and both accumulated in the cell nucleus. Indeed, a nuclear export assay revealed that these mutant proteins showed lower nuclear export activity than the wild-type protein. Moreover, the Φ2 and Φ3 residues (along with other Φ residues) within the NP-NES3 consensus were highly conserved among different influenza A viruses, including human, avian, and swine. Taken together, these results suggest that the Φ2 and Φ3 residues within the NP-NES3 protein are important for its nuclear export function during viral replication.
Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.
Rehm, Charlotte; Wurmthaler, Lena A; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S
2015-01-01
In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1-5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6-9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria.
Investigation of a Quadruplex-Forming Repeat Sequence Highly Enriched in Xanthomonas and Nostoc sp.
Rehm, Charlotte; Wurmthaler, Lena A.; Li, Yuanhao; Frickey, Tancred; Hartig, Jörg S.
2015-01-01
In prokaryotes simple sequence repeats (SSRs) with unit sizes of 1–5 nucleotides (nt) are causative for phase and antigenic variation. Although an increased abundance of heptameric repeats was noticed in bacteria, reports about SSRs of 6–9 nt are rare. In particular G-rich repeat sequences with the propensity to fold into G-quadruplex (G4) structures have received little attention. In silico analysis of prokaryotic genomes show putative G4 forming sequences to be abundant. This report focuses on a surprisingly enriched G-rich repeat of the type GGGNATC in Xanthomonas and cyanobacteria such as Nostoc. We studied in detail the genomes of Xanthomonas campestris pv. campestris ATCC 33913 (Xcc), Xanthomonas axonopodis pv. citri str. 306 (Xac), and Nostoc sp. strain PCC7120 (Ana). In all three organisms repeats are spread all over the genome with an over-representation in non-coding regions. Extensive variation of the number of repetitive units was observed with repeat numbers ranging from two up to 26 units. However a clear preference for four units was detected. The strong bias for four units coincides with the requirement of four consecutive G-tracts for G4 formation. Evidence for G4 formation of the consensus repeat sequences was found in biophysical studies utilizing CD spectroscopy. The G-rich repeats are preferably located between aligned open reading frames (ORFs) and are under-represented in coding regions or between divergent ORFs. The G-rich repeats are preferentially located within a distance of 50 bp upstream of an ORF on the anti-sense strand or within 50 bp from the stop codon on the sense strand. Analysis of whole transcriptome sequence data showed that the majority of repeat sequences are transcribed. The genetic loci in the vicinity of repeat regions show increased genomic stability. In conclusion, we introduce and characterize a special class of highly abundant and wide-spread quadruplex-forming repeat sequences in bacteria. PMID:26695179
Palzkill, T G; Oliver, S G; Newlon, C S
1986-01-01
Four fragments of Saccharomyces cerevisiae chromosome III DNA which carry ARS elements have been sequenced. Each fragment contains multiple copies of sequences that have at least 10 out of 11 bases of homology to a previously reported 11 bp core consensus sequence. A survey of these new ARS sequences and previously reported sequences revealed the presence of an additional 11 bp conserved element located on the 3' side of the T-rich strand of the core consensus. Subcloning analysis as well as deletion and transposon insertion mutagenesis of ARS fragments support a role for 3' conserved sequence in promoting ARS activity. PMID:3529036
The Functional Human C-Terminome
Hedden, Michael; Lyon, Kenneth F.; Brooks, Steven B.; David, Roxanne P.; Limtong, Justin; Newsome, Jacklyn M.; Novakovic, Nemanja; Rajasekaran, Sanguthevar; Thapar, Vishal; Williams, Sean R.; Schiller, Martin R.
2016-01-01
All translated proteins end with a carboxylic acid commonly called the C-terminus. Many short functional sequences (minimotifs) are located on or immediately proximal to the C-terminus. However, information about the function of protein C-termini has not been consolidated into a single source. Here, we built a new “C-terminome” database and web system focused on human proteins. Approximately 3,600 C-termini in the human proteome have a minimotif with an established molecular function. To help evaluate the function of the remaining C-termini in the human proteome, we inferred minimotifs identified by experimentation in rodent cells, predicted minimotifs based upon consensus sequence matches, and predicted novel highly repetitive sequences in C-termini. Predictions can be ranked by enrichment scores or Gene Evolutionary Rate Profiling (GERP) scores, a measurement of evolutionary constraint. By searching for new anchored sequences on the last 10 amino acids of proteins in the human proteome with lengths between 3–10 residues and up to 5 degenerate positions in the consensus sequences, we have identified new consensus sequences that predict instances in the majority of human genes. All of this information is consolidated into a database that can be accessed through a C-terminome web system with search and browse functions for minimotifs and human proteins. A known consensus sequence-based predicted function is assigned to nearly half the proteins in the human proteome. Weblink: http://cterminome.bio-toolkit.com. PMID:27050421
Mull, Hillary J; Graham, Laura A; Morris, Melanie S; Rosen, Amy K; Richman, Joshua S; Whittle, Jeffery; Burns, Edith; Wagner, Todd H; Copeland, Laurel A; Wahl, Tyler; Jones, Caroline; Hollis, Robert H; Itani, Kamal M F; Hawn, Mary T
2018-04-18
Postoperative readmission data are used to measure hospital performance, yet the extent to which these readmissions reflect surgical quality is unknown. To establish expert consensus on whether reasons for postoperative readmission are associated with the quality of surgery in the index admission. In a modified Delphi process, a panel of 14 experts in medical and surgical readmissions comprising physicians and nonphysicians from Veterans Affairs (VA) and private-sector institutions reviewed 30-day postoperative readmissions from fiscal years 2008 through 2014 associated with inpatient surgical procedures performed at a VA medical center between October 1, 2007, and September 30, 2014. The consensus process was conducted from January through May 2017. Reasons for readmission were grouped into categories based on International Classification of Diseases, Ninth Revision (ICD-9) diagnosis codes. Panelists were given the proportion of readmissions coded by each reason and median (interquartile range) days to readmission. They answered the question, "Does the readmission reason reflect possible surgical quality of care problems in the index admission?" on a scale of 1 (never related) to 5 (directly related) in 3 rounds of consensus building. The consensus process was completed in May 2017 and data were analyzed in June 2017. Consensus on proportion of ICD-9-coded readmission reasons that reflected quality of surgical procedure. In 3 Delphi rounds, the 14 panelists achieved consensus on 50 reasons for readmission; 12 panelists also completed group telephone calls between rounds 1 and 2. Readmissions with diagnoses of infection, sepsis, pneumonia, hemorrhage/hematoma, anemia, ostomy complications, acute renal failure, fluid/electrolyte disorders, or venous thromboembolism were considered associated with surgical quality and accounted for 25 521 of 39 664 readmissions (64% of readmissions; 7.5% of 340 858 index surgical procedures). The proportion of readmissions considered to be not associated with surgical quality varied by procedure, ranging from to 21% (613 of 2331) of readmissions after lower-extremity amputations to 47% (745 of 1598) of readmissions after cholecystectomy. One-third of postoperative readmissions are unlikely to reflect problems with surgical quality. Future studies should test whether restricting readmissions to those with specific ICD-9 codes might yield a more useful quality measure.
Soldà, Giulia; Merlino, Giuseppe; Fina, Emanuela; Brini, Elena; Moles, Anna; Cappelletti, Vera; Daidone, Maria Grazia
2016-01-01
Numerous studies have reported the existence of tumor-promoting cells (TPC) with self-renewal potential and a relevant role in drug resistance. However, pathways and modifications involved in the maintenance of such tumor subpopulations are still only partially understood. Sequencing-based approaches offer the opportunity for a detailed study of TPC including their transcriptome modulation. Using microarrays and RNA sequencing approaches, we compared the transcriptional profiles of parental MCF7 breast cancer cells with MCF7-derived TPC (i.e. MCFS). Data were explored using different bioinformatic approaches, and major findings were experimentally validated. The different analytical pipelines (Lifescope and Cufflinks based) yielded similar although not identical results. RNA sequencing data partially overlapped microarray results and displayed a higher dynamic range, although overall the two approaches concordantly predicted pathway modifications. Several biological functions were altered in TPC, ranging from production of inflammatory cytokines (i.e., IL-8 and MCP-1) to proliferation and response to steroid hormones. More than 300 non-coding RNAs were defined as differentially expressed, and 2,471 potential splicing events were identified. A consensus signature of genes up-regulated in TPC was derived and was found to be significantly associated with insensitivity to fulvestrant in a public breast cancer patient dataset. Overall, we obtained a detailed portrait of the transcriptome of a breast cancer TPC line, highlighted the role of non-coding RNAs and differential splicing, and identified a gene signature with a potential as a context-specific biomarker in patients receiving endocrine treatment. PMID:26556871
A new open reading frame in the genome of the cyanobacterium Synechocystis sp. PCC 6803
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lysenko, E.S.; Ogarkova, O.A.; Tarasov, V.A.
1995-02-01
A new open reading frame ORF242, coding for a 26.47-kDa polypeptide, was found in a DNA fragment of the cyanobacterium Synechocystis 6803, transforming a photosynthetic mutant to photoautotrophy and having homology with plant chloroplast DNA. In the 5{prime} flanking region of ORF242, consensus sequences characteristic of a functioning gene were found. One copy of ORF242 is present in the Synechocystis 6803 genome. Insertion inactivation of ORF242 does not lead to a decrease in photosynthetic activity in cells of cyanobacteria but may influence the ratio between active complexes of photosystems I and II. 22 refs., 6 figs., 2 tabs.
Isolation and characterization of target sequences of the chicken CdxA homeobox gene.
Margalit, Y; Yarus, S; Shapira, E; Gruenbaum, Y; Fainsod, A
1993-01-01
The DNA binding specificity of the chicken homeodomain protein CDXA was studied. Using a CDXA-glutathione-S-transferase fusion protein, DNA fragments containing the binding site for this protein were isolated. The sources of DNA were oligonucleotides with random sequence and chicken genomic DNA. The DNA fragments isolated were sequenced and tested in DNA binding assays. Sequencing revealed that most DNA fragments are AT rich which is a common feature of homeodomain binding sites. By electrophoretic mobility shift assays it was shown that the different target sequences isolated bind to the CDXA protein with different affinities. The specific sequences bound by the CDXA protein in the genomic fragments isolated, were determined by DNase I footprinting. From the footprinted sequences, the CDXA consensus binding site was determined. The CDXA protein binds the consensus sequence A, A/T, T, A/T, A, T, A/G. The CAUDAL binding site in the ftz promoter is also included in this consensus sequence. When tested, some of the genomic target sequences were capable of enhancing the transcriptional activity of reporter plasmids when introduced into CDXA expressing cells. This study determined the DNA sequence specificity of the CDXA protein and it also shows that this protein can further activate transcription in cells in culture. Images PMID:7909943
Hunt, C; Morimoto, R I
1985-01-01
We have determined the nucleotide sequence of the human hsp70 gene and 5' flanking region. The hsp70 gene is transcribed as an uninterrupted primary transcript of 2440 nucleotides composed of a 5' noncoding leader sequence of 212 nucleotides, a 3' noncoding region of 242 nucleotides, and a continuous open reading frame of 1986 nucleotides that encodes a protein with predicted molecular mass of 69,800 daltons. Upstream of the 5' terminus are the canonical TATAAA box, the sequence ATTGG that corresponds in the inverted orientation to the CCAAT motif, and the dyad sequence CTGGAAT/ATTCCCG that shares homology in 12 of 14 positions with the consensus transcription regulatory sequence common to Drosophila heat shock genes. Comparison of the predicted amino acid sequences of human hsp70 with the published sequences of Drosophila hsp70 and Escherichia coli dnaK reveals that human hsp70 is 73% identical to Drosophila hsp70 and 47% identical to E. coli dnaK. Surprisingly, the nucleotide sequences of the human and Drosophila genes are 72% identical and human and E. coli genes are 50% identical, which is more highly conserved than necessary given the degeneracy of the genetic code. The lack of accumulated silent nucleotide substitutions leads us to propose that there may be additional information in the nucleotide sequence of the hsp70 gene or the corresponding mRNA that precludes the maximum divergence allowed in the silent codon positions. PMID:3931075
Singh, Aditya; Bhatia, Prateek
2016-12-01
Sanger sequencing platforms, such as applied biosystems instruments, generate chromatogram files. Generally, for 1 region of a sequence, we use both forward and reverse primers to sequence that area, in that way, we have 2 sequences that need to be aligned and a consensus generated before mutation detection studies. This work is cumbersome and takes time, especially if the gene is large with many exons. Hence, we devised a rapid automated command system to filter, build, and align consensus sequences and also optionally extract exonic regions, translate them in all frames, and perform an amino acid alignment starting from raw sequence data within a very short time. In full capabilities of Automated Mutation Analysis Pipeline (ASAP), it is able to read "*.ab1" chromatogram files through command line interface, convert it to the FASTQ format, trim the low-quality regions, reverse-complement the reverse sequence, create a consensus sequence, extract the exonic regions using a reference exonic sequence, translate the sequence in all frames, and align the nucleic acid and amino acid sequences to reference nucleic acid and amino acid sequences, respectively. All files are created and can be used for further analysis. ASAP is available as Python 3.x executable at https://github.com/aditya-88/ASAP. The version described in this paper is 0.28.
Focus Group Research on the Implications of Adopting the Unified English Braille Code
ERIC Educational Resources Information Center
Wetzel, Robin; Knowlton, Marie
2006-01-01
Five focus groups explored concerns about adopting the Unified English Braille Code. The consensus was that while the proposed changes to the literary braille code would be minor, those to the mathematics braille code would be much more extensive. The participants emphasized that "any code that reduces the number of individuals who can access…
Dynamical and topological aspects of consensus formation in complex networks
NASA Astrophysics Data System (ADS)
Chacoma, A.; Mato, G.; Kuperman, M. N.
2018-04-01
The present work analyzes a particular scenario of consensus formation, where the individuals navigate across an underlying network defining the topology of the walks. The consensus, associated to a given opinion coded as a simple message, is generated by interactions during the agent's walk and manifest itself in the collapse of the various opinions into a single one. We analyze how the topology of the underlying networks and the rules of interaction between the agents promote or inhibit the emergence of this consensus. We find that non-linear interaction rules are required to form consensus and that consensus is more easily achieved in networks whose degree distribution is narrower.
CircularLogo: A lightweight web application to visualize intra-motif dependencies.
Ye, Zhenqing; Ma, Tao; Kalmbach, Michael T; Dasari, Surendra; Kocher, Jean-Pierre A; Wang, Liguo
2017-05-22
The sequence logo has been widely used to represent DNA or RNA motifs for more than three decades. Despite its intelligibility and intuitiveness, the traditional sequence logo is unable to display the intra-motif dependencies and therefore is insufficient to fully characterize nucleotide motifs. Many methods have been developed to quantify the intra-motif dependencies, but fewer tools are available for visualization. We developed CircularLogo, a web-based interactive application, which is able to not only visualize the position-specific nucleotide consensus and diversity but also display the intra-motif dependencies. Applying CircularLogo to HNF6 binding sites and tRNA sequences demonstrated its ability to show intra-motif dependencies and intuitively reveal biomolecular structure. CircularLogo is implemented in JavaScript and Python based on the Django web framework. The program's source code and user's manual are freely available at http://circularlogo.sourceforge.net . CircularLogo web server can be accessed from http://bioinformaticstools.mayo.edu/circularlogo/index.html . CircularLogo is an innovative web application that is specifically designed to visualize and interactively explore intra-motif dependencies.
Knierim, Dennis; Maiss, Edgar; Kenyon, Lawrence; Winter, Stephan; Menzel, Wulf
2015-10-01
Luffa aphid-borne yellows virus (LABYV) was proposed as the name for a previously undescribed polerovirus based on partial genome sequences obtained from samples of cucurbit plants collected in Thailand between 2008 and 2013. In this study, we determined the first full-length genome sequence of LABYV. Based on phylogenetic analysis and genome properties, it is clear that this virus represents a distinct species in the genus Polerovirus. Analysis of sequences from sample TH24, which was collected in 2010 from a luffa plant in Thailand, reveals the presence of two different full-length genome consensus sequences.
Del Piccolo, Lidia; de Haes, Hanneke; Heaven, Cathy; Jansen, Jesse; Verheul, William; Bensing, Jozien; Bergvik, Svein; Deveugele, Myriam; Eide, Hilde; Fletcher, Ian; Goss, Claudia; Humphris, Gerry; Kim, Young-Mi; Langewitz, Wolf; Mazzi, Maria Angela; Mjaaland, Trond; Moretti, Francesca; Nübling, Matthias; Rimondini, Michela; Salmon, Peter; Sibbern, Tonje; Skre, Ingunn; van Dulmen, Sandra; Wissow, Larry; Young, Bridget; Zandbelt, Linda; Zimmermann, Christa; Finset, Arnstein
2011-02-01
To present a method to classify health provider responses to patient cues and concerns according to the VR-CoDES-CC (Del Piccolo et al. (2009) [2] and Zimmermann et al. (submitted for publication) [3]). The system permits sequence analysis and a detailed description of how providers handle patient's expressions of emotion. The Verona-CoDES-P system has been developed based on consensus views within the "Verona Network of Sequence Analysis". The different phases of the creation process are described in detail. A reliability study has been conducted on 20 interviews from a convenience sample of 104 psychiatric consultations. The VR-CoDES-P has two main classes of provider responses, corresponding to the degree of explicitness (yes/no) and space (yes/no) that is given by the health provider to each cue/concern expressed by the patient. The system can be further subdivided into 17 individual categories. Statistical analyses showed that the VR-CoDES-P is reliable (agreement 92.86%, Cohen's kappa 0.90 (±0.04) p<0.0001). Once validity and reliability are tested in different settings, the system should be applied to investigate the relationship between provider responses to patients' expression of emotions and outcome variables. Research employing the VR-CoDES-P should be applied to develop research-based approaches to maximize appropriate responses to patients' indirect and overt expressions of emotional needs. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
René, P; Lenne, F; Ventura, M A; Bertagna, X; de Keyzer, Y
2000-01-04
In the pituitary, vasopressin triggers ACTH release through a specific receptor subtype, termed V3 or V1b. We cloned the V3 cDNA and showed that its expression was almost exclusive to pituitary corticotrophs and some corticotroph tumors. To study the determinants of this tissue specificity, we have now cloned the gene for the human (h) V3 receptor and characterized its structure. It is composed of two exons, spanning 10kb, with the coding region interrupted between transmembrane domains 6 and 7. We established that the transcription initiation site is located 498 nucleotides upstream of the initiator codon and showed that two polyadenylation sites may be used, while the most frequent is the most downstream. Sequence analysis of the promoter region showed no TATA box but identified consensus binding motifs for Sp1, CREB, and half sites of the estrogen receptor binding site. However comparison with another corticotroph-specific gene, proopiomelanocortin, did not identify common regulatory elements in the two promoters except for a short GC-rich region. Unexpectedly, hV3 gene analysis revealed that a formerly cloned 'artifactual' hV3 cDNA indeed corresponded to a spliced antisense transcript, overlapping the 5' part of the coding sequence in exon 1 and the promoter region. This transcript, hV3rev, was detected in normal pituitary and in many corticotroph tumors expressing hV3 sense mRNA and may therefore play a role in hV3 gene expression.
Auvray, F; Coddeville, M; Ritzenthaler, P; Dupont, L
1997-01-01
Bacteriophage mv4 is a temperate phage infecting Lactobacillus delbrueckii subsp. bulgaricus. During lysogenization, the phage integrates its genome into the host chromosome at the 3' end of a tRNA(Ser) gene through a site-specific recombination process (L. Dupont et al., J. Bacteriol., 177:586-595, 1995). A nonreplicative vector (pMC1) based on the mv4 integrative elements (attP site and integrase-coding int gene) is able to integrate into the chromosome of a wide range of bacterial hosts, including Lactobacillus plantarum, Lactobacillus casei (two strains), Lactococcus lactis subsp. cremoris, Enterococcus faecalis, and Streptococcus pneumoniae. Integrative recombination of pMC1 into the chromosomes of all of these species is dependent on the int gene product and occurs specifically at the pMC1 attP site. The isolation and sequencing of pMC1 integration sites from these bacteria showed that in lactobacilli, pMC1 integrated into the conserved tRNA(Ser) gene. In the other bacterial species where this tRNA gene is less or not conserved; secondary integration sites either in potential protein-coding regions or in intergenic DNA were used. A consensus sequence was deduced from the analysis of the different integration sites. The comparison of these sequences demonstrated the flexibility of the integrase for the bacterial integration site and suggested the importance of the trinucleotide CCT at the 5' end of the core in the strand exchange reaction. PMID:9068626
Wheat CBF gene family: identification of polymorphisms in the CBF coding sequence.
Mohseni, Sara; Che, Hua; Djillali, Zakia; Dumont, Estelle; Nankeu, Joseph; Danyluk, Jean
2012-12-01
Expression of cold-regulated genes needed for protection against freezing stress is mediated, in part, by the CBF transcription factor family. Previous studies with temperate cereals suggested that the CBF gene family in wheat was large, and that CBF genes were at the base of an important low temperature tolerance trait. Therefore, the goal of our study was to identify the CBF repertoire in the freezing-tolerant hexaploid wheat cultivar Norstar, and then to examine if the coding region of CBF genes in two spring cultivars contain polymorphisms that could affect the protein sequence and structure. Our analyses reveal that hexaploid wheat contains a complex CBF family consisting of at least 65 CBF genes of which 60 are known to be expressed in the cultivar Norstar. They represent 27 paralogous genes with 1-3 homeologous copies for the A, B, and D genomes. The cultivar Norstar contains two pseudogenes and at least 24 additional proteins having sequences and (or) structures that deviate from the consensus in the conserved AP2 DNA-binding and (or) C-terminal activation-domains. This suggests that in cultivars such as Norstar, low temperature tolerance may be increased through breeding of additional optimal alleles. The examination of the CBF repertoire present in the two spring cultivars, Chinese Spring and Manitou, reveals that they have additional polymorphisms affecting conserved positions in these domains. Understanding the effects of these polymorphisms will provide additional information for the selection of optimum CBF alleles in Triticeae breeding programs.
Srivastava, Rishi; Singh, Mohar; Bajaj, Deepak; Parida, Swarup K.
2016-01-01
Development and large-scale genotyping of user-friendly informative genome/gene-derived InDel markers in natural and mapping populations is vital for accelerating genomics-assisted breeding applications of chickpea with minimal resource expenses. The present investigation employed a high-throughput whole genome next-generation resequencing strategy in low and high pod number parental accessions and homozygous individuals constituting the bulks from each of two inter-specific mapping populations [(Pusa 1103 × ILWC 46) and (Pusa 256 × ILWC 46)] to develop non-erroneous InDel markers at a genome-wide scale. Comparing these high-quality genomic sequences, 82,360 InDel markers with reference to kabuli genome and 13,891 InDel markers exhibiting differentiation between low and high pod number parental accessions and bulks of aforementioned mapping populations were developed. These informative markers were structurally and functionally annotated in diverse coding and non-coding sequence components of genome/genes of kabuli chickpea. The functional significance of regulatory and coding (frameshift and large-effect mutations) InDel markers for establishing marker-trait linkages through association/genetic mapping was apparent. The markers detected a greater amplification (97%) and intra-specific polymorphic potential (58–87%) among a diverse panel of cultivated desi, kabuli, and wild accessions even by using a simpler cost-efficient agarose gel-based assay implicating their utility in large-scale genetic analysis especially in domesticated chickpea with narrow genetic base. Two high-density inter-specific genetic linkage maps generated using aforesaid mapping populations were integrated to construct a consensus 1479 InDel markers-anchored high-resolution (inter-marker distance: 0.66 cM) genetic map for efficient molecular mapping of major QTLs governing pod number and seed yield per plant in chickpea. Utilizing these high-density genetic maps as anchors, three major genomic regions harboring each of pod number and seed yield robust QTLs (15–28% phenotypic variation explained) were identified on chromosomes 2, 4, and 6. The integration of genetic and physical maps at these QTLs mapped on chromosomes scaled-down the long major QTL intervals into high-resolution short pod number and seed yield robust QTL physical intervals (0.89–2.94 Mb) which were essentially got validated in multiple genetic backgrounds of two chickpea mapping populations. The genome-wide InDel markers including natural allelic variants and genomic loci/genes delineated at major six especially in one colocalized novel congruent robust pod number and seed yield robust QTLs mapped on a high-density consensus genetic map were found most promising in chickpea. These functionally relevant molecular tags can drive marker-assisted genetic enhancement to develop high-yielding cultivars with increased seed/pod number and yield in chickpea. PMID:27695461
Bowen, David M; Lewis, Jessica A; Lu, Wenzhe; Schein, Catherine H
2012-09-14
Designing proteins that reflect the natural variability of a pathogen is essential for developing novel vaccines and drugs. Flaviviruses, including Dengue (DENV) and West Nile (WNV), evolve rapidly and can "escape" neutralizing monoclonal antibodies by mutation. Designing antigens that represent many distinct strains is important for DENV, where infection with a strain from one of the four serotypes may lead to severe hemorrhagic disease on subsequent infection with a strain from another serotype. Here, a DENV physicochemical property (PCP)-consensus sequence was derived from 671 unique sequences from the Flavitrack database. PCP-consensus proteins for domain 3 of the envelope protein (EdomIII) were expressed from synthetic genes in Escherichia coli. The ability of the purified consensus proteins to bind polyclonal antibodies generated in response to infection with strains from each of the four DENV serotypes was determined. The initial consensus protein bound antibodies from DENV-1-3 in ELISA and Western blot assays. This sequence was altered in 3 steps to incorporate regions of maximum variability, identified as significant changes in the PCPs, characteristic of DENV-4 strains. The final protein was recognized by antibodies against all four serotypes. Two amino acids essential for efficient binding to all DENV antibodies are part of a discontinuous epitope previously defined for a neutralizing monoclonal antibody. The PCP-consensus method can significantly reduce the number of experiments required to define a multivalent antigen, which is particularly important when dealing with pathogens that must be tested at higher biosafety levels. Copyright © 2012 Elsevier Ltd. All rights reserved.
Castillo, Jonathan; Stueve, Theresa R.; Marconett, Crystal N.
2017-01-01
Previously thought of as junk transcripts and pseudogene remnants, long non-coding RNAs (lncRNAs) have come into their own over the last decade as an essential component of cellular activity, regulating a plethora of functions within multicellular organisms. lncRNAs are now known to participate in development, cellular homeostasis, immunological processes, and the development of disease. With the advent of next generation sequencing technology, hundreds of thousands of lncRNAs have been identified. However, movement beyond mere discovery to the understanding of molecular processes has been stymied by the complicated genomic structure, tissue-restricted expression, and diverse regulatory roles lncRNAs play. In this review, we will focus on lncRNAs involved in lung cancer, the most common cause of cancer-related death in the United States and worldwide. We will summarize their various methods of discovery, provide consensus rankings of deregulated lncRNAs in lung cancer, and describe in detail the limited functional analysis that has been undertaken so far. PMID:29113413
Fine-tuning structural RNA alignments in the twilight zone.
Bremges, Andreas; Schirmer, Stefanie; Giegerich, Robert
2010-04-30
A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index.
Time for change: a roadmap to guide the implementation of the World Anti-Doping Code 2015
Dvorak, Jiri; Baume, Norbert; Botré, Francesco; Broséus, Julian; Budgett, Richard; Frey, Walter O; Geyer, Hans; Harcourt, Peter Rex; Ho, Dave; Howman, David; Isola, Victor; Lundby, Carsten; Marclay, François; Peytavin, Annie; Pipe, Andrew; Pitsiladis, Yannis P; Reichel, Christian; Robinson, Neil; Rodchenkov, Grigory; Saugy, Martial; Sayegh, Souheil; Segura, Jordi; Thevis, Mario; Vernec, Alan; Viret, Marjolaine; Vouillamoz, Marc; Zorzoli, Mario
2014-01-01
A medical and scientific multidisciplinary consensus meeting was held from 29 to 30 November 2013 on Anti-Doping in Sport at the Home of FIFA in Zurich, Switzerland, to create a roadmap for the implementation of the 2015 World Anti-Doping Code. The consensus statement and accompanying papers set out the priorities for the antidoping community in research, science and medicine. The participants achieved consensus on a strategy for the implementation of the 2015 World Anti-Doping Code. Key components of this strategy include: (1) sport-specific risk assessment, (2) prevalence measurement, (3) sport-specific test distribution plans, (4) storage and reanalysis, (5) analytical challenges, (6) forensic intelligence, (7) psychological approach to optimise the most deterrent effect, (8) the Athlete Biological Passport (ABP) and confounding factors, (9) data management system (Anti-Doping Administration & Management System (ADAMS), (10) education, (11) research needs and necessary advances, (12) inadvertent doping and (13) management and ethics: biological data. True implementation of the 2015 World Anti-Doping Code will depend largely on the ability to align thinking around these core concepts and strategies. FIFA, jointly with all other engaged International Federations of sports (Ifs), the International Olympic Committee (IOC) and World Anti-Doping Agency (WADA), are ideally placed to lead transformational change with the unwavering support of the wider antidoping community. The outcome of the consensus meeting was the creation of the ad hoc Working Group charged with the responsibility of moving this agenda forward. PMID:24764550
Time for change: a roadmap to guide the implementation of the World Anti-Doping Code 2015.
Dvorak, Jiri; Baume, Norbert; Botré, Francesco; Broséus, Julian; Budgett, Richard; Frey, Walter O; Geyer, Hans; Harcourt, Peter Rex; Ho, Dave; Howman, David; Isola, Victor; Lundby, Carsten; Marclay, François; Peytavin, Annie; Pipe, Andrew; Pitsiladis, Yannis P; Reichel, Christian; Robinson, Neil; Rodchenkov, Grigory; Saugy, Martial; Sayegh, Souheil; Segura, Jordi; Thevis, Mario; Vernec, Alan; Viret, Marjolaine; Vouillamoz, Marc; Zorzoli, Mario
2014-05-01
A medical and scientific multidisciplinary consensus meeting was held from 29 to 30 November 2013 on Anti-Doping in Sport at the Home of FIFA in Zurich, Switzerland, to create a roadmap for the implementation of the 2015 World Anti-Doping Code. The consensus statement and accompanying papers set out the priorities for the antidoping community in research, science and medicine. The participants achieved consensus on a strategy for the implementation of the 2015 World Anti-Doping Code. Key components of this strategy include: (1) sport-specific risk assessment, (2) prevalence measurement, (3) sport-specific test distribution plans, (4) storage and reanalysis, (5) analytical challenges, (6) forensic intelligence, (7) psychological approach to optimise the most deterrent effect, (8) the Athlete Biological Passport (ABP) and confounding factors, (9) data management system (Anti-Doping Administration & Management System (ADAMS), (10) education, (11) research needs and necessary advances, (12) inadvertent doping and (13) management and ethics: biological data. True implementation of the 2015 World Anti-Doping Code will depend largely on the ability to align thinking around these core concepts and strategies. FIFA, jointly with all other engaged International Federations of sports (Ifs), the International Olympic Committee (IOC) and World Anti-Doping Agency (WADA), are ideally placed to lead transformational change with the unwavering support of the wider antidoping community. The outcome of the consensus meeting was the creation of the ad hoc Working Group charged with the responsibility of moving this agenda forward.
Sidell, Neil; Mathad, Raveendra I.; Shu, Feng-jue; Zhang, Zhenjiang; Kallen, Caleb B.; Yang, Danzhou
2011-01-01
DNA-intercalating molecules can impair DNA replication, DNA repair, and gene transcription. We previously demonstrated that XR5944, a DNA bis-intercalator, specifically blocks binding of estrogen receptor-α (ERα) to the consensus estrogen response element (ERE). The consensus ERE sequence is AGGTCAnnnTGACCT, where nnn is known as the tri-nucleotide spacer. Recent work has shown that the tri-nucleotide spacer can modulate ERα-ERE binding affinity and ligand-mediated transcriptional responses. To further understand the mechanism by which XR5944 inhibits ERα-ERE binding, we tested its ability to interact with consensus EREs with variable tri-nucleotide spacer sequences and with natural but non-consensus ERE sequences using one dimensional nuclear magnetic resonance (1D 1H NMR) titration studies. We found that the tri-nucleotide spacer sequence significantly modulates the binding of XR5944 to EREs. Of the sequences that were tested, EREs with CGG and AGG spacers showed the best binding specificity with XR5944, while those spaced with TTT demonstrated the least specific binding. The binding stoichiometry of XR5944 with EREs was 2:1, which can explain why the spacer influences the drug-DNA interaction; each XR5944 spans four nucleotides (including portions of the spacer) when intercalating with DNA. To validate our NMR results, we conducted functional studies using reporter constructs containing consensus EREs with tri-nucleotide spacers CGG, CTG, and TTT. Results of reporter assays in MCF-7 cells indicated that XR5944 was significantly more potent in inhibiting the activity of CGG- than TTT-spaced EREs, consistent with our NMR results. Taken together, these findings predict that the anti-estrogenic effects of XR5944 will depend not only on ERE half-site composition but also on the tri-nucleotide spacer sequence of EREs located in the promoters of estrogen-responsive genes. PMID:21333738
Bryant, D A; de Lorimier, R; Lambert, D H; Dubbs, J M; Stirewalt, V L; Stevens, S E; Porter, R D; Tam, J; Jay, E
1985-01-01
The genes for the alpha- and beta-subunit apoproteins of allophycocyanin (AP) were isolated from the cyanelle genome of Cyanophora paradoxa and subjected to nucleotide sequence analysis. The AP beta-subunit apoprotein gene was localized to a 7.8-kilobase-pair Pst I restriction fragment from cyanelle DNA by hybridization with a tetradecameric oligonucleotide probe. Sequence analysis using that oligonucleotide and its complement as primers for the dideoxy chain-termination sequencing method confirmed the presence of both AP alpha- and beta-subunit genes on this restriction fragment. Additional oligonucleotide primers were synthesized as sequencing progressed and were used to determine rapidly the nucleotide sequence of a 1336-base-pair region of this cloned fragment. This strategy allowed the sequencing to be completed without a detailed restriction map and without extensive and time-consuming subcloning. The sequenced region contains two open reading frames whose deduced amino acid sequences are 81-85% homologous to cyanobacterial and red algal AP subunits whose amino acid sequences have been determined. The two open reading frames are in the same orientation and are separated by 39 base pairs. AP alpha is 5' to AP beta and both coding sequences are preceded by a polypurine, Shine-Dalgarno-type sequence. Sequences upstream from AP alpha closely resemble the Escherichia coli consensus promoter sequences and also show considerable homology to promoter sequences for several chloroplast-encoded psbA genes. A 56-base-pair palindromic sequence downstream from the AP beta gene could play a role in the termination of transcription or translation. The allophycocyanin apoprotein subunit genes are located on the large single-copy region of the cyanelle genome. PMID:2987916
Ali, S; Azfer, M A; Bashamboo, A; Mathur, P K; Malik, P K; Mathur, V B; Raha, A K; Ansari, S
1999-03-04
We have cloned and sequenced a 906bp EcoRI repeat DNA fraction from Rhinoceros unicornis genome. The contig pSS(R)2 is AT rich with 340 A (37.53%), 187 C (20.64%), 173 G (19.09%) and 206 T (22.74%). The sequence contains MALT box, NF-E1, Poly-A signal, lariat consensus sequences, TATA box, translational initiation sequences and several stop codons. Translation of the contig showed seven different types of protein motifs, among which, EGF-like domain cysteine pattern signatures and Bowman-Birk serine protease inhibitor family signatures were prominent. The presence of eukaryotic transcriptional elements, protein signatures and analysis of subset sequences in the 5' region from 1 to 165nt indicating coding potential (test code value=0.97) suggest possible regulatory and/or functional role(s) of these sequences in the rhino genome. Translation of the complementary strand from 906 to 706nt and 190 to 2nt showed proteins of more than 7kDa rich in non-polar residues. This suggests that pSS(R)2 is either a part of, or adjacent to, a functional gene. The contig contains mostly non-consecutive simple repeat units from 2 to 17nt with varying frequencies, of which four base motifs were found to be predominant. Zoo-blot hybridization revealed that pSS(R)2 sequences are unique to R. unicornis genome because they do not cross-hybridize, even with the genomic DNA of South African black rhino Diceros bicornis. Southern blot analysis of R. unicornis genomic DNA with pSS(R)2 and other synthetic oligo probes revealed a high level of genetic homogeneity, which was also substantiated by microsatellite associated sequence amplification (MASA). Owing to its uniqueness, the pSS(R)2 probe has a potential application in the area of conservation biology for unequivocal identification of horn or other body tissues of R. unicornis. The evolutionary aspect of this repeat fraction in the context of comparative genome analysis is discussed.
[Discussion on the botanical origin of Isatidis radix and Isatidis folium based on DNA barcoding].
Sun, Zhi-Ying; Pang, Xiao-Hui
2013-12-01
This paper aimed to investigate the botanical origins of Isatidis Radix and Isatidis Folium, and clarify the confusion of its classification. The second internal transcribed spacer (ITS2) of ribosomal DNA, the chloroplast matK gene of 22 samples from some major production areas were amplified and sequenced. Sequence assembly and consensus sequence generation were performed using the CodonCode Aligner. Phylogenetic study was performed using MEGA 4.0 software in accordance with the Kimura 2-Parameter (K2P) model, and the phylogenetic tree was constructed using the neighbor-joining methods. The results showed that the length of ITS2 sequence of the botanical origins of Isatidis Radix and Isatidis Folium was 191 bp. The sequence showed that some samples had several SNP sites, and some samples had heterozygosis sites. In the NJ tree, based on ITS2 sequence, the studied samples were separated into two groups, and one of them was gathered with Isatis tinctoria L. The studied samples also were divided into two groups obviously based on the chloroplast matK gene. In conclusion, our results support that the botanical origins of Isatidis Radix and Isatidis Folium are Isatis indigotica Fortune, and Isatis indigotica and Isatis tinctoria are two distinct species. This study doesn't support the opinion about the combination of these two species in Flora of China.
NASA Technical Reports Server (NTRS)
Funderburgh, J. L.; Funderburgh, M. L.; Brown, S. J.; Vergnes, J. P.; Hassell, J. R.; Mann, M. M.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)
1993-01-01
Amino acid sequence from tryptic peptides of three different bovine corneal keratan sulfate proteoglycan (KSPG) core proteins (designated 37A, 37B, and 25) showed similarities to the sequence of a chicken KSPG core protein lumican. Bovine lumican cDNA was isolated from a bovine corneal expression library by screening with chicken lumican cDNA. The bovine cDNA codes for a 342-amino acid protein, M(r) 38,712, containing amino acid sequences identified in the 37B KSPG core protein. The bovine lumican is 68% identical to chicken lumican, with an 83% identity excluding the N-terminal 40 amino acids. Location of 6 cysteine and 4 consensus N-glycosylation sites in the bovine sequence were identical to those in chicken lumican. Bovine lumican had about 50% identity to bovine fibromodulin and 20% identity to bovine decorin and biglycan. About two-thirds of the lumican protein consists of a series of 10 amino acid leucine-rich repeats that occur in regions of calculated high beta-hydrophobic moment, suggesting that the leucine-rich repeats contribute to beta-sheet formation in these proteins. Sequences obtained from 37A and 25 core proteins were absent in bovine lumican, thus predicting a unique primary structure and separate mRNA for each of the three bovine KSPG core proteins.
Forlano, M D; Teixeira, K R S; Scofield, A; Elisei, C; Yotoko, K S C; Fernandes, K R; Linhares, G F C; Ewing, S A; Massard, C L
2007-04-10
To characterize phylogenetically the species which causes canine hepatozoonosis at two rural areas of Rio de Janeiro State, Brazil, we used universal or Hepatozoon spp. primer sets for the 18S SSU rRNA coding region. DNA extracts were obtained from blood samples of thirteen dogs naturally infected, from four experimentally infected, and from five puppies infected by vertical transmission from a dam, that was experimentally infected. DNA of sporozoites of Hepatozoon americanum was used as positive control. The amplification of DNA extracts from blood of dogs infected with sporozoites of Hepatozoon spp. was observed in the presence of primers to 18S SSU rRNA gene of Hepatozoon spp., whereas DNA of H. americanum sporozoites was amplified in the presence of either universal or Hepatozoon spp.-specific primer sets; the amplified products were approximately 600bp in size. Cloned PCR products obtained from DNA extracts of blood from two dogs experimentally infected with Hepatozoon sp. were sequenced. The consensus sequence, derived from six sequence data sets, were blasted against sequences of 18S SSU rRNA of Hepatozoon spp. available at GenBank and aligned to homologous sequences to perform the phylogenetic analysis. This analysis clearly showed that our sequence clustered, independently of H. americanum sequences, within a group comprising other Hepatozoon canis sequences. Our results confirmed the hypothesis that the agent causing hepatozoonosis in the areas studied in Brazil is H. canis, supporting previous reports that were based on morphological and morphometric analyses.
An Interdisciplinary Code of Ethics for Adult Education.
ERIC Educational Resources Information Center
Connelly, Robert J.; Light, Kathleen M.
1991-01-01
Proposes five basic principles of a code of ethics for adult educators: social responsibility, an inclusive philosophy of education, pluralism as a strength but consensus as a goal, respect for learners, and respect for fellow educators. The wisdom of developing such a code is addressed. (SK)
A mirror code for protein-cholesterol interactions in the two leaflets of biological membranes
NASA Astrophysics Data System (ADS)
Fantini, Jacques; di Scala, Coralie; Evans, Luke S.; Williamson, Philip T. F.; Barrantes, Francisco J.
2016-02-01
Cholesterol controls the activity of a wide range of membrane receptors through specific interactions and identifying cholesterol recognition motifs is therefore critical for understanding signaling receptor function. The membrane-spanning domains of the paradigm neurotransmitter receptor for acetylcholine (AChR) display a series of cholesterol consensus domains (referred to as “CARC”). Here we use a combination of molecular modeling, lipid monolayer/mutational approaches and NMR spectroscopy to study the binding of cholesterol to a synthetic CARC peptide. The CARC-cholesterol interaction is of high affinity, lipid-specific, concentration-dependent, and sensitive to single-point mutations. The CARC motif is generally located in the outer membrane leaflet and its reverse sequence CRAC in the inner one. Their simultaneous presence within the same transmembrane domain obeys a “mirror code” controlling protein-cholesterol interactions in the outer and inner membrane leaflets. Deciphering this code enabled us to elaborate guidelines for the detection of cholesterol-binding motifs in any membrane protein. Several representative examples of neurotransmitter receptors and ABC transporters with the dual CARC/CRAC motifs are presented. The biological significance and potential clinical applications of the mirror code are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stenger, Drake C., E-mail: drake.stenger@ars.usda.
Population structure of Homalodisca coagulata Virus-1 (HoCV-1) among and within field-collected insects sampled from a single point in space and time was examined. Polymorphism in complete consensus sequences among single-insect isolates was dominated by synonymous substitutions. The mutant spectrum of the C2 helicase region within each single-insect isolate was unique and dominated by nonsynonymous singletons. Bootstrapping was used to correct the within-isolate nonsynonymous:synonymous arithmetic ratio (N:S) for RT-PCR error, yielding an N:S value ~one log-unit greater than that of consensus sequences. Probability of all possible single-base substitutions for the C2 region predicted N:S values within 95% confidence limits of themore » corrected within-isolate N:S when the only constraint imposed was viral polymerase error bias for transitions over transversions. These results indicate that bottlenecks coupled with strong negative/purifying selection drive consensus sequences toward neutral sequence space, and that most polymorphism within single-insect isolates is composed of newly-minted mutations sampled prior to selection. -- Highlights: •Sampling protocol minimized differential selection/history among isolates. •Polymorphism among consensus sequences dominated by negative/purifying selection. •Within-isolate N:S ratio corrected for RT-PCR error by bootstrapping. •Within-isolate mutant spectrum dominated by new mutations yet to undergo selection.« less
Watanabe, K; Yoshioka, K; Ito, H; Ishigami, M; Takagi, K; Utsunomiya, S; Kobayashi, M; Kishimoto, H; Yano, M; Kakumu, S
1999-11-10
Hypervariable region 1 (HVR1) proteins of hepatitis C virus (HCV) have been reported to react broadly with sera of patients with HCV infection. However, the variability of the broad reactivity of individual HVR1 proteins has not been elucidated. We assessed the reactivity of 25 different HVR1 proteins (genotype 1b) with sera of 81 patients with HCV infection (genotype 1b) by Western blot. HVR1 proteins reacted with 2-60 sera. The number of sera reactive with each HVR1 protein significantly correlated with the number of amino acid residues identical to the consensus sequence defined by Puntoriero et al. (G. Puntoriero, A. Lahm, S. Zucchelli, B. B. Ercole, R. Tafi, M. Penzzanera, M. U. Mondelli, R. Cortese, A. Tramontano, G. Galfre', and A. Nicosia. 1998. EMBO J. 17, 3521-3533. ) (r = 0.561, P < 0.005). The most widely reactive HVR1 protein, 12-22, had a sequence similar to the consensus sequence. The peptide with C-terminal 13-amino-acids sequence of HVR1 protein 12-22 (NH2-CSFTSLFTPGPSQK) was injected into rabbits as an immunogen. The rabbit immune sera reacted with 9 of 25 HVR1 proteins of genotype 1b including HVR1 protein 12-22 and with 3 of 12 proteins of genotype 2a. These results indicate that the HVR1 protein broadly reactive with patients' sera has a sequence similar to the consensus sequence, can induce broadly reactive sera, and could be one of the candidate immunogens in a prophylactic vaccine against HCV. Copyright 1999 Academic Press.
Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals.
Popova, Olga V; Mikhailov, Kirill V; Nikitin, Mikhail A; Logacheva, Maria D; Penin, Aleksey A; Muntyan, Maria S; Kedrova, Olga S; Petrov, Nikolai B; Panchin, Yuri V; Aleoshin, Vladimir V
2016-01-01
Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha-an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia.
Mitochondrial Genomes of Kinorhyncha: trnM Duplication and New Gene Orders within Animals
Popova, Olga V.; Mikhailov, Kirill V.; Nikitin, Mikhail A.; Logacheva, Maria D.; Penin, Aleksey A.; Muntyan, Maria S.; Kedrova, Olga S.; Petrov, Nikolai B.; Panchin, Yuri V.
2016-01-01
Many features of mitochondrial genomes of animals, such as patterns of gene arrangement, nucleotide content and substitution rate variation are extensively used in evolutionary and phylogenetic studies. Nearly 6,000 mitochondrial genomes of animals have already been sequenced, covering the majority of animal phyla. One of the groups that escaped mitogenome sequencing is phylum Kinorhyncha—an isolated taxon of microscopic worm-like ecdysozoans. The kinorhynchs are thought to be one of the early-branching lineages of Ecdysozoa, and their mitochondrial genomes may be important for resolving evolutionary relations between major animal taxa. Here we present the results of sequencing and analysis of mitochondrial genomes from two members of Kinorhyncha, Echinoderes svetlanae (Cyclorhagida) and Pycnophyes kielensis (Allomalorhagida). Their mitochondrial genomes are circular molecules approximately 15 Kbp in size. The kinorhynch mitochondrial gene sequences are highly divergent, which precludes accurate phylogenetic inference. The mitogenomes of both species encode a typical metazoan complement of 37 genes, which are all positioned on the major strand, but the gene order is distinct and unique among Ecdysozoa or animals as a whole. We predict four types of start codons for protein-coding genes in E. svetlanae and five in P. kielensis with a consensus DTD in single letter code. The mitochondrial genomes of E. svetlanae and P. kielensis encode duplicated methionine tRNA genes that display compensatory nucleotide substitutions. Two distant species of Kinorhyncha demonstrate similar patterns of gene arrangements in their mitogenomes. Both genomes have duplicated methionine tRNA genes; the duplication predates the divergence of two species. The kinorhynchs share a few features pertaining to gene order that align them with Priapulida. Gene order analysis reveals that gene arrangement specific of Priapulida may be ancestral for Scalidophora, Ecdysozoa, and even Protostomia. PMID:27755612
Sweeney, Angela; Greenwood, Kathryn E; Williams, Sally; Wykes, Til; Rose, Diana S
2013-12-01
Health research is frequently conducted in multi-disciplinary teams, with these teams increasingly including service user researchers. Whilst it is common for service user researchers to be involved in data collection--most typically interviewing other service users--it is less common for service user researchers to be involved in data analysis and interpretation. This means that a unique and significant perspective on the data is absent. This study aims to use an empirical report of a study on Cognitive Behavioural Therapy for psychosis (CBTp) to demonstrate the value of multiple coding in enabling service users voices to be heard in team-based qualitative data analysis. The CBTp study employed multiple coding to analyse service users' discussions of CBT for psychosis (CBTp) from the perspectives of a service user researcher, clinical researcher and psychology assistant. Multiple coding was selected to enable multiple perspectives to analyse and interpret data, to understand and explore differences and to build multi-disciplinary consensus. Multiple coding enabled the team to understand where our views were commensurate and incommensurate and to discuss and debate differences. Through the process of multiple coding, we were able to build strong consensus about the data from multiple perspectives, including that of the service user researcher. Multiple coding is an important method for understanding and exploring multiple perspectives on data and building team consensus. This can be contrasted with inter-rater reliability which is only appropriate in limited circumstances. We conclude that multiple coding is an appropriate and important means of hearing service users' voices in qualitative data analysis. © 2012 John Wiley & Sons Ltd.
RhoA Regulation of Cardiomyocyte Differentiation
Kaarbø, Mari; Crane, Denis I.; Murrell, Wayne G.
2013-01-01
Earlier findings from our laboratory implicated RhoA in heart developmental processes. To investigate factors that potentially regulate RhoA expression, RhoA gene organisation and promoter activity were analysed. Comparative analysis indicated strict conservation of both gene organisation and coding sequence of the chick, mouse, and human RhoA genes. Bioinformatics analysis of the derived promoter region of mouse RhoA identified putative consensus sequence binding sites for several transcription factors involved in heart formation and organogenesis generally. Using luciferase reporter assays, RhoA promoter activity was shown to increase in mouse-derived P19CL6 cells that were induced to differentiate into cardiomyocytes. Overexpression of a dominant negative mutant of mouse RhoA (mRhoAN19) blocked this cardiomyocyte differentiation of P19CL6 cells and led to the accumulation of the cardiac transcription factors SRF and GATA4 and the early cardiac marker cardiac α-actin. Taken together, these findings indicate a fundamental role for RhoA in the differentiation of cardiomyocytes. PMID:23935420
Ancient DNA sequence revealed by error-correcting codes.
Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo
2015-07-10
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes
Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo
2015-01-01
A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
Fine-tuning structural RNA alignments in the twilight zone
2010-01-01
Background A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Results Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Conclusions Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index. PMID:20433706
Yang, Yang; Stanković, Vladimir; Xiong, Zixiang; Zhao, Wei
2009-03-01
Following recent works on the rate region of the quadratic Gaussian two-terminal source coding problem and limit-approaching code designs, this paper examines multiterminal source coding of two correlated, i.e., stereo, video sequences to save the sum rate over independent coding of both sequences. Two multiterminal video coding schemes are proposed. In the first scheme, the left sequence of the stereo pair is coded by H.264/AVC and used at the joint decoder to facilitate Wyner-Ziv coding of the right video sequence. The first I-frame of the right sequence is successively coded by H.264/AVC Intracoding and Wyner-Ziv coding. An efficient stereo matching algorithm based on loopy belief propagation is then adopted at the decoder to produce pixel-level disparity maps between the corresponding frames of the two decoded video sequences on the fly. Based on the disparity maps, side information for both motion vectors and motion-compensated residual frames of the right sequence are generated at the decoder before Wyner-Ziv encoding. In the second scheme, source splitting is employed on top of classic and Wyner-Ziv coding for compression of both I-frames to allow flexible rate allocation between the two sequences. Experiments with both schemes on stereo video sequences using H.264/AVC, LDPC codes for Slepian-Wolf coding of the motion vectors, and scalar quantization in conjunction with LDPC codes for Wyner-Ziv coding of the residual coefficients give a slightly lower sum rate than separate H.264/AVC coding of both sequences at the same video quality.
Is DNA code periodicity only due to CUF-codons usage frequency?
Zoltowski, Mariusz
2007-01-01
The triplet code for proteins and functional RNA has been either from the universal pattern of ancient RNA (-H1) [1], with a key role of an uneven codon usage frequency (CUF) in the periodic patterns origination, or a reading frame monitoring device (RFMD -H2) [2- 4]. H1 has lately been upheld [1] but in a single sequence sensitive way [1]. Since H1 and H2 are not mutually exclusive [2, 3, 4], a single sequence-wise sensitive approach by a resonant recognition model (RRM) has become the attempt described in this paper to challenge H1 and H2 in eukaryotes case as a novelty. In the RRM model [5, 6, 7] two bio-molecules interact favorably provided they both obey a common frequency and opposite phases consensus in their delocalized electron energy (DEE-) distributions [5]. Hence it has been possible to learn how well the DEE-s of the mRNA and of the ribosome match each other at 1/3 Hz - that applied to both the original and the CUF preserving randomly shuffled genomic data across the well known Bursét and Guigo collection of 570 coding vertebrates' genes. The matching of RRM patterns reduces to harmonics phase comparison of the relevant DEE-s, a task by a digital phase locked loop (DPLL) [8, 9, and 10]. The DPLL phase control to meet the RRM phase matching case is quantified into a small number of classes to describe the mRNA-ribosome interaction in a categorical way.
Cartwright, Joseph F; Anderson, Karin; Longworth, Joseph; Lobb, Philip; James, David C
2018-06-01
High-fidelity replication of biologic-encoding recombinant DNA sequences by engineered mammalian cell cultures is an essential pre-requisite for the development of stable cell lines for the production of biotherapeutics. However, immortalized mammalian cells characteristically exhibit an increased point mutation frequency compared to mammalian cells in vivo, both across their genomes and at specific loci (hotspots). Thus unforeseen mutations in recombinant DNA sequences can arise and be maintained within producer cell populations. These may affect both the stability of recombinant gene expression and give rise to protein sequence variants with variable bioactivity and immunogenicity. Rigorous quantitative assessment of recombinant DNA integrity should therefore form part of the cell line development process and be an essential quality assurance metric for instances where synthetic/multi-component assemblies are utilized to engineer mammalian cells, such as the assessment of recombinant DNA fidelity or the mutability of single-site integration target loci. Based on Pacific Biosciences (Menlo Park, CA) single molecule real-time (SMRT™) circular consensus sequencing (CCS) technology we developed a rDNA sequence analysis tool to process the multi-parallel sequencing of ∼40,000 single recombinant DNA molecules. After statistical filtering of raw sequencing data, we show that this analytical method is capable of detecting single point mutations in rDNA to a minimum single mutation frequency of 0.0042% (<1/24,000 bases). Using a stable CHO transfectant pool harboring a randomly integrated 5 kB plasmid construct encoding GFP we found that 28% of recombinant plasmid copies contained at least one low frequency (<0.3%) point mutation. These mutations were predominantly found in GC base pairs (85%) and that there was no positional bias in mutation across the plasmid sequence. There was no discernable difference between the mutation frequencies of coding and non-coding DNA. The putative ratio of non-synonymous and synonymous changes within the open reading frames (ORFs) in the plasmid sequence indicates that natural selection does not impact upon the prevalence of these mutations. Here we have demonstrated the abundance of mutations that fall outside of the reported range of detection of next generation sequencing (NGS) and second generation sequencing (SGS) platforms, providing a methodology capable of being utilized in cell line development platforms to identify the fidelity of recombinant genes throughout the production process. © 2018 Wiley Periodicals, Inc.
Hamilton, P T; Reeve, J N
1985-01-01
DNA fragments cloned from the methanogenic archaebacterium Methanobrevibacter smithii which complement mutations in the purE and proC genes of E. coli have been sequenced. Sequence analyses, transposon mutagenesis and expression in E. coli minicells indicate that purE and proC complementations result from the synthesis of M. smithii polypeptides with molecular weights of 36,697 and 27,836 respectively. The encoding genes appear to be located in operons. The M. smithii genome contains 69% A/T basepairs (bp) which is reflected in unusual codon usages and intergenic regions containing approximately 85% A/T bp. An insertion element, designated ISM1, was found within the cloned M. smithii DNA located adjacent to the proC complementing region. ISM1 is 1381 bp in length, has 29 bp terminal inverted repeat sequences and contains one major ORF encoded in 87% of the ISM1 sequence. ISM1 is mobile, present in approximately 10 copies per genome and integration duplicates 8 bp at the site of insertion. The duplicated sequences show homology with sequences within the 29 bp terminal repeat sequence of ISM1. Comparison of our data with sequences from halophilic archaebacteria suggests that 5'GAANTTTCA and 5'TTTTAATATAAA may be consensus promoter sequences for archaebacteria. These sequences closely resemble the consensus sequences which precede Drosophila heat-shock genes (Pelham 1982; Davidson et al. 1983). Methanogens appear to employ the eubacterial system of mRNA: 16SrRNA hybridization to ensure initiation of translation; the consensus ribosome binding sequence is 5'AGGTGA.
Woodman, Jenny; Allister, Janice; Rafi, Imran; de Lusignan, Simon; Belsey, Jonathan; Petersen, Irene; Gilbert, Ruth
2012-01-01
Background Information is lacking on how concerns about child maltreatment are recorded in primary care records. Aim To determine how the recording of child maltreatment concerns can be improved. Design and setting Development of a quality improvement intervention involving: clinical audit, a descriptive survey, telephone interviews, a workshop, database analyses, and consensus development in UK general practice. Method Descriptive analyses and incidence estimates were carried out based on 11 study practices and 442 practices in The Health Improvement Network (THIN). Telephone interviews, a workshop, and a consensus development meeting were conducted with lead GPs from 11 study practices. Results The rate of children with at least one maltreatment-related code was 8.4/1000 child years (11 study practices, 2009–2010), and 8.0/1000 child years (THIN, 2009–2010). Of 25 patients with known maltreatment, six had no maltreatment-related codes recorded, but all had relevant free text, scanned documents, or codes. When stating their reasons for undercoding maltreatment concerns, GPs cited damage to the patient relationship, uncertainty about which codes to use, and having concerns about recording information on other family members in the child’s records. Consensus recommendations are to record the code ‘child is cause for concern’ as a red flag whenever maltreatment is considered, and to use a list of codes arranged around four clinical concepts, with an option for a templated short data entry form. Conclusion GPs under-record maltreatment-related concerns in children’s electronic medical records. As failure to use codes makes it impossible to search or audit these cases, an approach designed to be simple and feasible to implement in UK general practice was recommended. PMID:22781996
Babor, Thomas F; Xuan, Ziming; Damon, Donna
2013-10-01
This study evaluated the use of a modified Delphi technique in combination with a previously developed alcohol advertising rating procedure to detect content violations in the U.S. Beer Institute Code. A related aim was to estimate the minimum number of raters needed to obtain reliable evaluations of code violations in television commercials. Six alcohol ads selected for their likelihood of having code violations were rated by community and expert participants (N = 286). Quantitative rating scales were used to measure the content of alcohol advertisements based on alcohol industry self-regulatory guidelines. The community group participants represented vulnerability characteristics that industry codes were designed to protect (e.g., age <21); experts represented various health-related professions, including public health, human development, alcohol research, and mental health. Alcohol ads were rated on 2 occasions separated by 1 month. After completing Time 1 ratings, participants were randomized to receive feedback from 1 group or the other. Findings indicate that (i) ratings at Time 2 had generally reduced variance, suggesting greater consensus after feedback, (ii) feedback from the expert group was more influential than that of the community group in developing group consensus, (iii) the expert group found significantly fewer violations than the community group, (iv) experts representing different professional backgrounds did not differ among themselves in the number of violations identified, and (v) a rating panel composed of at least 15 raters is sufficient to obtain reliable estimates of code violations. The Delphi technique facilitates consensus development around code violations in alcohol ad content and may enhance the ability of regulatory agencies to monitor the content of alcoholic beverage advertising when combined with psychometric-based rating procedures. Copyright © 2013 by the Research Society on Alcoholism.
Babor, Thomas F.; Xuan, Ziming; Damon, Donna
2013-01-01
Background This study evaluated the use of a modified Delphi technique in combination with a previously developed alcohol advertising rating procedure to detect content violations in the US Beer Institute code. A related aim was to estimate the minimum number of raters needed to obtain reliable evaluations of code violations in television commercials. Methods Six alcohol ads selected for their likelihood of having code violations were rated by community and expert participants (N=286). Quantitative rating scales were used to measure the content of alcohol advertisements based on alcohol industry self-regulatory guidelines. The community group participants represented vulnerability characteristics that industry codes were designed to protect (e.g., age < 21); experts represented various health-related professions, including public health, human development, alcohol research and mental health. Alcohol ads were rated on two occasions separated by one month. After completing Time 1 ratings, participants were randomized to receive feedback from one group or the other. Results Findings indicate that (1) ratings at Time 2 had generally reduced variance, suggesting greater consensus after feedback, (2) feedback from the expert group was more influential than that of the community group in developing group consensus, (3) the expert group found significantly fewer violations than the community group, (4) experts representing different professional backgrounds did not differ among themselves in the number of violations identified; (5) a rating panel composed of at least 15 raters is sufficient to obtain reliable estimates of code violations. Conclusions The Delphi Technique facilitates consensus development around code violations in alcohol ad content and may enhance the ability of regulatory agencies to monitor the content of alcoholic beverage advertising when combined with psychometric-based rating procedures. PMID:23682927
El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges; Hajj, Hazem; Kobeissy, Firas H
2017-01-01
Degradomics is a novel discipline that involves determination of the proteases/substrate fragmentation profile, called the substrate degradome, and has been recently applied in different disciplines. A major application of degradomics is its utility in the field of biomarkers where the breakdown products (BDPs) of different protease have been investigated. Among the major proteases assessed, calpain and caspase proteases have been associated with the execution phases of the pro-apoptotic and pro-necrotic cell death, generating caspase/calpain-specific cleaved fragments. The distinction between calpain and caspase protein fragments has been applied to distinguish injury mechanisms. Advanced proteomics technology has been used to identify these BDPs experimentally. However, it has been a challenge to identify these BDPs with high precision and efficiency, especially if we are targeting a number of proteins at one time. In this chapter, we present a novel bioinfromatic detection method that identifies BDPs accurately and efficiently with validation against experimental data. This method aims at predicting the consensus sequence occurrences and their variants in a large set of experimentally detected protein sequences based on state-of-the-art sequence matching and alignment algorithms. After detection, the method generates all the potential cleaved fragments by a specific protease. This space and time-efficient algorithm is flexible to handle the different orientations that the consensus sequence and the protein sequence can take before cleaving. It is O(mn) in space complexity and O(Nmn) in time complexity, with N number of protein sequences, m length of the consensus sequence, and n length of each protein sequence. Ultimately, this knowledge will subsequently feed into the development of a novel tool for researchers to detect diverse types of selected BDPs as putative disease markers, contributing to the diagnosis and treatment of related disorders.
Chen, J; Chen, J; Adams, M J
2001-01-01
A universal primer (Sprimer: 5'-GGX AAY AAY AGY GGX CAZ CC-3', X = A, G, C or T; Y = T or C; Z = A or G), designed from the consensus sequences that code for the conserved sequence GNNSGQP in the NIb region of members of the family Potyviridae, was used to amplify by RT-PCR the 3'-terminal genome regions from infected plant samples representing 21 different viruses in the family. Sequencing of some of the fragments (c. 1.7 kb) showed that the type strain (ATTC PV-107) of Oat necrotic mottle virus is not a distinct species in the genus Rymovirus, but is synonymous with Brome streak mosaic virus (genus Tritimovirus) and that Celery mosaic virus is a distinct member of the genus Potyvirus not closely related to any other sequenced species. Potyviruses infecting crops in China were also investigated, showing that viruses on cowpea and maize in Hangzhou, Zhejiang province were respectively Bean common mosaic virus and Sugarcane mosaic virus and that one on garlic in Nanjing, Jiangsu province was Onion yellow dwarf virus. Fragments were also sequenced from Chinese isolates of Lettuce mosaic virus and Soybean mosaic virus (from Hangzhou), Turnip mosaic virus (2 different isolates from Zhejiang province) and RNA1 of Wheat yellow mosaic virus (from Rongcheng, Shandong province).
Osipiuk, J; Joachimiak, A
1997-09-12
We propose that the dnaK operon of Thermus thermophilus HB8 is composed of three functionally linked genes: dnaK, grpE, and dnaJ. The dnaK and dnaJ gene products are most closely related to their cyanobacterial homologs. The DnaK protein sequence places T. thermophilus in the plastid Hsp70 subfamily. In contrast, the grpE translated sequence is most similar to GrpE from Clostridium acetobutylicum, a Gram-positive anaerobic bacterium. A single promoter region, with homology to the Escherichia coli consensus promoter sequences recognized by the sigma70 and sigma32 transcription factors, precedes the postulated operon. This promoter is heat-shock inducible. The dnaK mRNA level increased more than 30 times upon 10 min of heat shock (from 70 degrees C to 85 degrees C). A strong transcription terminating sequence was found between the dnaK and grpE genes. The individual genes were cloned into pET expression vectors and the thermophilic proteins were overproduced at high levels in E. coli and purified to homogeneity. The recombinant T. thermophilus DnaK protein was shown to have a weak ATP-hydrolytic activity, with an optimum at 90 degrees C. The ATPase was stimulated by the presence of GrpE and DnaJ. Another open reading frame, coding for ClpB heat-shock protein, was found downstream of the dnaK operon.
Unusual Intron Conservation near Tissue-Regulated Exons Found by Splicing Microarrays
Sugnet, Charles W; Srinivasan, Karpagam; Clark, Tyson A; O'Brien, Georgeann; Cline, Melissa S; Wang, Hui; Williams, Alan; Kulp, David; Blume, John E; Haussler, David; Ares, Manuel
2006-01-01
Alternative splicing contributes to both gene regulation and protein diversity. To discover broad relationships between regulation of alternative splicing and sequence conservation, we applied a systems approach, using oligonucleotide microarrays designed to capture splicing information across the mouse genome. In a set of 22 adult tissues, we observe differential expression of RNA containing at least two alternative splice junctions for about 40% of the 6,216 alternative events we could detect. Statistical comparisons identify 171 cassette exons whose inclusion or skipping is different in brain relative to other tissues and another 28 exons whose splicing is different in muscle. A subset of these exons is associated with unusual blocks of intron sequence whose conservation in vertebrates rivals that of protein-coding exons. By focusing on sets of exons with similar regulatory patterns, we have identified new sequence motifs implicated in brain and muscle splicing regulation. Of note is a motif that is strikingly similar to the branchpoint consensus but is located downstream of the 5′ splice site of exons included in muscle. Analysis of three paralogous membrane-associated guanylate kinase genes reveals that each contains a paralogous tissue-regulated exon with a similar tissue inclusion pattern. While the intron sequences flanking these exons remain highly conserved among mammalian orthologs, the paralogous flanking intron sequences have diverged considerably, suggesting unusually complex evolution of the regulation of alternative splicing in multigene families. PMID:16424921
Catts, Stanley V; Frost, Aaron D J; O'Toole, Brian I; Carr, Vaughan J; Lewin, Terry; Neil, Amanda L; Harris, Meredith G; Evans, Russell W; Crissman, Belinda R; Eadie, Kathy
2011-01-01
Clinical practice improvement carried out in a quality assurance framework relies on routinely collected data using clinical indicators. Herein we describe the development, minimum training requirements, and inter-rater agreement of indicators that were used in an Australian multi-site evaluation of the effectiveness of early psychosis (EP) teams. Surveys of clinician opinion and face-to-face consensus-building meetings were used to select and conceptually define indicators. Operationalization of definitions was achieved by iterative refinement until clinicians could be quickly trained to code indicators reliably. Calculation of percentage agreement with expert consensus coding was based on ratings of paper-based clinical vignettes embedded in a 2-h clinician training package. Consensually agreed upon conceptual definitions for seven clinical indicators judged most relevant to evaluating EP teams were operationalized for ease-of-training. Brief training enabled typical clinicians to code indicators with acceptable percentage agreement (60% to 86%). For indicators of suicide risk, psychosocial function, and family functioning this level of agreement was only possible with less precise 'broad range' expert consensus scores. Estimated kappa values indicated fair to good inter-rater reliability (kappa > 0.65). Inspection of contingency tables (coding category by health service) and modal scores across services suggested consistent, unbiased coding across services. Clinicians are able to agree upon what information is essential to routinely evaluate clinical practice. Simple indicators of this information can be designed and coding rules can be reliably applied to written vignettes after brief training. The real world feasibility of the indicators remains to be tested in field trials.
Kozutsumi, Daisuke; Tsunematsu, Masako; Yamaji, Taketo; Kino, Kohsuke
2007-01-01
Cry-consensus peptide is a linearly linked peptide of T-cell epitopes for the management of Japanese cedar (JC) pollinosis and is expected to become a new drug for immunotherapy. However, the mechanism of T-cell epitopes in allergic diseases is not well understood, and thus, a simple in vitro procedure for evaluation of its biological activity is desired. Peripheral blood mononuclear cells (PBMC) were isolated from 27 JC pollinosis patients and 10 healthy subjects, and cultured in vitro for 4 days in the presence of Cry-consensus peptide and (3)H-thymidine. The relationship between growth stimulation (stimulation index; SI) and antigen-specific IgE levels in serum was also investigated in JC pollinosis patients. Moreover, to confirm the importance of the primary sequence in Cry-consensus peptide, heat-treated Cry-consensus peptide and a mixture of the amino acids of which Cry-consensus peptide is composed, and their (3)H-thymidine uptake was compared with Cry-consensus peptide. Finally, whether Cry-consensus peptide stimulates PBMCs from healthy subjects was investigated. The mean SI of JC patients showed a good correlation with Cry-consensus peptide concentration in the culture medium; however, the SI was independent of the anti-Cry j 1 IgE level. Heat-denatured Cry-consensus peptide retained a PBMC proliferation stimulatory effect comparable to the original Cry-consensus peptide, while the mixture of amino acids constituting Cry-consensus peptide did not stimulate PBMC proliferation. PBMCs from healthy subjects did not respond to Cry-consensus peptide at all. These data indicate that the PBMC response of patients suffering from JC pollinosis to Cry-consensus peptide is specific for the sequence of T cell epitopes thereof and may be useful for the evaluation of the efficacy of Cry-consensus peptide in vivo.
BCL2 oncogene translocation is mediated by a chi-like consensus
1992-01-01
Examination of 64 translocations involving the major breakpoint region (mbr) of the BCL2 oncogene and the immunoglobulin heavy chain locus identified three short (14, 16, and 18 bp) segments within the mbr at which translocations occurred with very high frequency. Each of these clusters was associated with a 15-bp region of sequence homology, the principal one containing an octamer related to chi, the procaryotic activator of recombination. The presence of short deletions and N nucleotide additions at the breakpoints, as well as involvement of JH and DH coding regions, suggested that these sequences served as signals capable of interacting with the VDJ recombinase complex, even though no homology with the traditional heptamer/spacer/nonamer (IgRSS) existed. Furthermore, the BCL2 signal sequences were employed in a bidirectional fashion and could mediate recombination of one mbr region with another. Segments homologous to the BCL2 signal sequences flanked individual members of the XP family of diversity gene segments, which were themselves highly overrepresented in the reciprocal products (18q-) of BCL2 translocation. We propose that the chi-like signal sequences of BCL2 represent a distinct class of recognition sites for the recombinase complex, responsible for initiating interactions between regions of DNA separated by great distances, and that BCL2 translocation begins by a recombination event between mbr and DXP chi signals. Since recombinant joints containing chi, not IgRSS, occur in brain cells expressing RAG-1 (Matsuoka, M., F. Nagawa, K. Okazaki, L. Kingsbury, K. Yoshida, U. Muller, D. T. Larue, J. A. Winer, and H. Sakano. 1991. Science [Wash. DC]. 254:81; reference 1), we further suggest that the product of this gene could mediate both BCL2 translocation and the first step of normal DJ assembly through the creation of chi joints, rather than signal or coding joints. PMID:1588282
Plucienniczak, A; Schroeder, E; Zettlmeissl, G; Streeck, R E
1985-01-01
The nucleotide sequence of a 7.6 kb vaccinia DNA segment from a genomic region conserved among different orthopox virus has been determined. This segment contains a tight cluster of 12 partly overlapping open reading frames most of which can be correlated with previously identified early and late proteins and mRNAs. Regulatory signals used by vaccinia virus have been studied. Presumptive promoter regions are rich in A, T and carry the consensus sequences TATA and AATAA spaced at 20-24 base pairs. Tandem repeats of a CTATTC consensus sequence are proposed to be involved in the termination of early transcription. PMID:2987815
Bioinformatic flowchart and database to investigate the origins and diversity of Clan AA peptidases
Llorens, Carlos; Futami, Ricardo; Renaud, Gabriel; Moya, Andrés
2009-01-01
Background Clan AA of aspartic peptidases relates the family of pepsin monomers evolutionarily with all dimeric peptidases encoded by eukaryotic LTR retroelements. Recent findings describing various pools of single-domain nonviral host peptidases, in prokaryotes and eukaryotes, indicate that the diversity of clan AA is larger than previously thought. The ensuing approach to investigate this enzyme group is by studying its phylogeny. However, clan AA is a difficult case to study due to the low similarity and different rates of evolution. This work is an ongoing attempt to investigate the different clan AA families to understand the cause of their diversity. Results In this paper, we describe in-progress database and bioinformatic flowchart designed to characterize the clan AA protein domain based on all possible protein families through ancestral reconstructions, sequence logos, and hidden markov models (HMMs). The flowchart includes the characterization of a major consensus sequence based on 6 amino acid patterns with correspondence with Andreeva's model, the structural template describing the clan AA peptidase fold. The set of tools is work in progress we have organized in a database within the GyDB project, referred to as Clan AA Reference Database . Conclusion The pre-existing classification combined with the evolutionary history of LTR retroelements permits a consistent taxonomical collection of sequence logos and HMMs. This set is useful for gene annotation but also a reference to evaluate the diversity of, and the relationships among, the different families. Comparisons among HMMs suggest a common ancestor for all dimeric clan AA peptidases that is halfway between single-domain nonviral peptidases and those coded by Ty3/Gypsy LTR retroelements. Sequence logos reveal how all clan AA families follow similar protein domain architecture related to the peptidase fold. In particular, each family nucleates a particular consensus motif in the sequence position related to the flap. The different motifs constitute a network where an alanine-asparagine-like variable motif predominates, instead of the canonical flap of the HIV-1 peptidase and closer relatives. Reviewers This article was reviewed by Daniel H. Haft, Vladimir Kapitonov (nominated by Jerry Jurka), and Ben M. Dunn (nominated by Claus Wilke). PMID:19173708
R2R--software to speed the depiction of aesthetic consensus RNA secondary structures.
Weinberg, Zasha; Breaker, Ronald R
2011-01-04
With continuing identification of novel structured noncoding RNAs, there is an increasing need to create schematic diagrams showing the consensus features of these molecules. RNA structural diagrams are typically made either with general-purpose drawing programs like Adobe Illustrator, or with automated or interactive programs specific to RNA. Unfortunately, the use of applications like Illustrator is extremely time consuming, while existing RNA-specific programs produce figures that are useful, but usually not of the same aesthetic quality as those produced at great cost in Illustrator. Additionally, most existing RNA-specific applications are designed for drawing single RNA molecules, not consensus diagrams. We created R2R, a computer program that facilitates the generation of aesthetic and readable drawings of RNA consensus diagrams in a fraction of the time required with general-purpose drawing programs. Since the inference of a consensus RNA structure typically requires a multiple-sequence alignment, the R2R user annotates the alignment with commands directing the layout and annotation of the RNA. R2R creates SVG or PDF output that can be imported into Adobe Illustrator, Inkscape or CorelDRAW. R2R can be used to create consensus sequence and secondary structure models for novel RNA structures or to revise models when new representatives for known RNA classes become available. Although R2R does not currently have a graphical user interface, it has proven useful in our efforts to create 100 schematic models of distinct noncoding RNA classes. R2R makes it possible to obtain high-quality drawings of the consensus sequence and structural models of many diverse RNA structures with a more practical amount of effort. R2R software is available at http://breaker.research.yale.edu/R2R and as an Additional file.
Klein, Wolfgang; Westendorf, Carolin; Schmidt, Antje; Conill-Cortés, Mercè; Rutz, Claudia; Blohs, Marcus; Beyermann, Michael; Protze, Jonas; Krause, Gerd; Krause, Eberhard; Schülein, Ralf
2015-01-01
The cyclodepsipeptide cotransin was described to inhibit the biosynthesis of a small subset of proteins by a signal sequence-discriminatory mechanism at the Sec61 protein-conducting channel. However, it was not clear how selective cotransin is, i.e. how many proteins are sensitive. Moreover, a consensus motif in signal sequences mediating cotransin sensitivity has yet not been described. To address these questions, we performed a proteomic study using cotransin-treated human hepatocellular carcinoma cells and the stable isotope labelling by amino acids in cell culture technique in combination with quantitative mass spectrometry. We used a saturating concentration of cotransin (30 micromolar) to identify also less-sensitive proteins and to discriminate the latter from completely resistant proteins. We found that the biosynthesis of almost all secreted proteins was cotransin-sensitive under these conditions. In contrast, biosynthesis of the majority of the integral membrane proteins was cotransin-resistant. Cotransin sensitivity of signal sequences was neither related to their length nor to their hydrophobicity. Instead, in the case of signal anchor sequences, we identified for the first time a conformational consensus motif mediating cotransin sensitivity. PMID:25806945
Conservation of CD44 exon v3 functional elements in mammals
Vela, Elena; Hilari, Josep M; Delclaux, María; Fernández-Bellon, Hugo; Isamat, Marcos
2008-01-01
Background The human CD44 gene contains 10 variable exons (v1 to v10) that can be alternatively spliced to generate hundreds of different CD44 protein isoforms. Human CD44 variable exon v3 inclusion in the final mRNA depends on a multisite bipartite splicing enhancer located within the exon itself, which we have recently described, and provides the protein domain responsible for growth factor binding to CD44. Findings We have analyzed the sequence of CD44v3 in 95 mammalian species to report high conservation levels for both its splicing regulatory elements (the 3' splice site and the exonic splicing enhancer), and the functional glycosaminglycan binding site coded by v3. We also report the functional expression of CD44v3 isoforms in peripheral blood cells of different mammalian taxa with both consensus and variant v3 sequences. Conclusion CD44v3 mammalian sequences maintain all functional splicing regulatory elements as well as the GAG binding site with the same relative positions and sequence identity previously described during alternative splicing of human CD44. The sequence within the GAG attachment site, which in turn contains the Y motif of the exonic splicing enhancer, is more conserved relative to the rest of exon. Amplification of CD44v3 sequence from mammalian species but not from birds, fish or reptiles, may lead to classify CD44v3 as an exclusive mammalian gene trait. PMID:18710510
Distribution and sequence homogeneity of an abundant satellite DNA in the beetle, Tenebrio molitor.
Davis, C A; Wyatt, G R
1989-01-01
The mealworm beetle, Tenebrio molitor, contains an unusually abundant and homogeneous satellite DNA which constitutes up to 60% of its genome. The satellite DNA is shown to be present in all of the chromosomes by in situ hybridization. 18 dimers of the repeat unit were cloned and sequenced. The consensus sequence is 142 nt long and lacks any internal repeat structure. Monomers of the sequence are very similar, showing on average a 2% divergence from the calculated consensus. Variant nucleotides are scattered randomly throughout the sequence although some variants are more common than others. Neighboring repeat units are no more alike than randomly chosen ones. The results suggest that some mechanism, perhaps gene conversion, is acting to maintain the homogeneity of the satellite DNA despite its abundance and distribution on all of the chromosomes. Images PMID:2762148
Mutations in PIGY: expanding the phenotype of inherited glycosylphosphatidylinositol deficiencies
Ilkovski, Biljana; Pagnamenta, Alistair T.; O'Grady, Gina L.; Kinoshita, Taroh; Howard, Malcolm F.; Lek, Monkol; Thomas, Brett; Turner, Anne; Christodoulou, John; Sillence, David; Knight, Samantha J.L.; Popitsch, Niko; Keays, David A.; Anzilotti, Consuelo; Goriely, Anne; Waddell, Leigh B.; Brilot, Fabienne; North, Kathryn N.; Kanzawa, Noriyuki; Macarthur, Daniel G.; Taylor, Jenny C.; Kini, Usha; Murakami, Yoshiko; Clarke, Nigel F.
2015-01-01
Glycosylphosphatidylinositol (GPI)-anchored proteins are ubiquitously expressed in the human body and are important for various functions at the cell surface. Mutations in many GPI biosynthesis genes have been described to date in patients with multi-system disease and together these constitute a subtype of congenital disorders of glycosylation. We used whole exome sequencing in two families to investigate the genetic basis of disease and used RNA and cellular studies to investigate the functional consequences of sequence variants in the PIGY gene. Two families with different phenotypes had homozygous recessive sequence variants in the GPI biosynthesis gene PIGY. Two sisters with c.137T>C (p.Leu46Pro) PIGY variants had multi-system disease including dysmorphism, seizures, severe developmental delay, cataracts and early death. There were significantly reduced levels of GPI-anchored proteins (CD55 and CD59) on the surface of patient-derived skin fibroblasts (∼20–50% compared with controls). In a second, consanguineous family, two siblings had moderate development delay and microcephaly. A homozygous PIGY promoter variant (c.-540G>A) was detected within a 7.7 Mb region of autozygosity. This variant was predicted to disrupt a SP1 consensus binding site and was shown to be associated with reduced gene expression. Mutations in PIGY can occur in coding and non-coding regions of the gene and cause variable phenotypes. This article contributes to understanding of the range of disease phenotypes and disease genes associated with deficiencies of the GPI-anchor biosynthesis pathway and also serves to highlight the potential importance of analysing variants detected in 5′-UTR regions despite their typically low coverage in exome data. PMID:26293662
Mutations in PIGY: expanding the phenotype of inherited glycosylphosphatidylinositol deficiencies.
Ilkovski, Biljana; Pagnamenta, Alistair T; O'Grady, Gina L; Kinoshita, Taroh; Howard, Malcolm F; Lek, Monkol; Thomas, Brett; Turner, Anne; Christodoulou, John; Sillence, David; Knight, Samantha J L; Popitsch, Niko; Keays, David A; Anzilotti, Consuelo; Goriely, Anne; Waddell, Leigh B; Brilot, Fabienne; North, Kathryn N; Kanzawa, Noriyuki; Macarthur, Daniel G; Taylor, Jenny C; Kini, Usha; Murakami, Yoshiko; Clarke, Nigel F
2015-11-01
Glycosylphosphatidylinositol (GPI)-anchored proteins are ubiquitously expressed in the human body and are important for various functions at the cell surface. Mutations in many GPI biosynthesis genes have been described to date in patients with multi-system disease and together these constitute a subtype of congenital disorders of glycosylation. We used whole exome sequencing in two families to investigate the genetic basis of disease and used RNA and cellular studies to investigate the functional consequences of sequence variants in the PIGY gene. Two families with different phenotypes had homozygous recessive sequence variants in the GPI biosynthesis gene PIGY. Two sisters with c.137T>C (p.Leu46Pro) PIGY variants had multi-system disease including dysmorphism, seizures, severe developmental delay, cataracts and early death. There were significantly reduced levels of GPI-anchored proteins (CD55 and CD59) on the surface of patient-derived skin fibroblasts (∼20-50% compared with controls). In a second, consanguineous family, two siblings had moderate development delay and microcephaly. A homozygous PIGY promoter variant (c.-540G>A) was detected within a 7.7 Mb region of autozygosity. This variant was predicted to disrupt a SP1 consensus binding site and was shown to be associated with reduced gene expression. Mutations in PIGY can occur in coding and non-coding regions of the gene and cause variable phenotypes. This article contributes to understanding of the range of disease phenotypes and disease genes associated with deficiencies of the GPI-anchor biosynthesis pathway and also serves to highlight the potential importance of analysing variants detected in 5'-UTR regions despite their typically low coverage in exome data. © The Author 2015. Published by Oxford University Press.
Improving performance of DS-CDMA systems using chaotic complex Bernoulli spreading codes
NASA Astrophysics Data System (ADS)
Farzan Sabahi, Mohammad; Dehghanfard, Ali
2014-12-01
The most important goal of spreading spectrum communication system is to protect communication signals against interference and exploitation of information by unintended listeners. In fact, low probability of detection and low probability of intercept are two important parameters to increase the performance of the system. In Direct Sequence Code Division Multiple Access (DS-CDMA) systems, these properties are achieved by multiplying the data information in spreading sequences. Chaotic sequences, with their particular properties, have numerous applications in constructing spreading codes. Using one-dimensional Bernoulli chaotic sequence as spreading code is proposed in literature previously. The main feature of this sequence is its negative auto-correlation at lag of 1, which with proper design, leads to increase in efficiency of the communication system based on these codes. On the other hand, employing the complex chaotic sequences as spreading sequence also has been discussed in several papers. In this paper, use of two-dimensional Bernoulli chaotic sequences is proposed as spreading codes. The performance of a multi-user synchronous and asynchronous DS-CDMA system will be evaluated by applying these sequences under Additive White Gaussian Noise (AWGN) and fading channel. Simulation results indicate improvement of the performance in comparison with conventional spreading codes like Gold codes as well as similar complex chaotic spreading sequences. Similar to one-dimensional Bernoulli chaotic sequences, the proposed sequences also have negative auto-correlation. Besides, construction of complex sequences with lower average cross-correlation is possible with the proposed method.
Shahin, Arwa; Smulders, Marinus J. M.; van Tuyl, Jaap M.; Arens, Paul; Bakker, Freek T.
2014-01-01
Next Generation Sequencing (NGS) may enable estimating relationships among genotypes using allelic variation of multiple nuclear genes simultaneously. We explored the potential and caveats of this strategy in four genetically distant Lilium cultivars to estimate their genetic divergence from transcriptome sequences using three approaches: POFAD (Phylogeny of Organisms from Allelic Data, uses allelic information of sequence data), RAxML (Randomized Accelerated Maximum Likelihood, tree building based on concatenated consensus sequences) and Consensus Network (constructing a network summarizing among gene tree conflicts). Twenty six gene contigs were chosen based on the presence of orthologous sequences in all cultivars, seven of which also had an orthologous sequence in Tulipa, used as out-group. The three approaches generated the same topology. Although the resolution offered by these approaches is high, in this case there was no extra benefit in using allelic information. We conclude that these 26 genes can be widely applied to construct a species tree for the genus Lilium. PMID:25368628
Federal Register 2010, 2011, 2012, 2013, 2014
2012-06-28
... Process To Develop Consumer Data Privacy Code of Conduct Concerning Mobile Application Transparency AGENCY... convene the first meeting of a privacy multistakeholder process concerning mobile application transparency... concerning mobile application transparency. Stakeholders will engage in an open, transparent, consensus...
Hu, H M; Chuang, C K; Lee, M J; Tseng, T C; Tang, T K
2000-11-01
We previously reported two novel testis-specific serine/threonine kinases, Aie1 (mouse) and AIE2 (human), that share high amino acid identities with the kinase domains of fly aurora and yeast Ipl1. Here, we report the entire intron-exon organization of the Aie1 gene and analyze the expression patterns of Aie1 mRNA during testis development. The mouse Aie1 gene spans approximately 14 kb and contains seven exons. The sequences of the exon-intron boundaries of the Aie1 gene conform to the consensus sequences (GT/AG) of the splicing donor and acceptor sites of most eukaryotic genes. Comparative genomic sequencing revealed that the gene structure is highly conserved between mouse Aie1 and human AIE2. However, much less homology was found in the sequence outside the kinase-coding domains. The Aie1 locus was mapped to mouse chromosome 7A2-A3 by fluorescent in situ hybridization. Northern blot analysis indicates that Aie1 mRNA likely is expressed at a low level on day 14 and reaches its plateau on day 21 in the developing postnatal testis. RNA in situ hybridization indicated that the expression of the Aie1 transcript was restricted to meiotically active germ cells, with the highest levels detected in spermatocytes at the late pachytene stage. These findings suggest that Aie1 plays a role in spermatogenesis.
Jin, Ke; Xue, Chenyi; Wu, Xiaoli; Qian, Jinyi; Zhu, Yong; Yang, Zhen; Yonezawa, Takahiro; Crabbe, M James C; Cao, Ying; Hasegawa, Masami; Zhong, Yang; Zheng, Yufang
2011-01-01
The giant panda has an interesting bamboo diet unlike the other species in the order of Carnivora. The umami taste receptor gene T1R1 has been identified as a pseudogene during its genome sequencing project and confirmed using a different giant panda sample. The estimated mutation time for this gene is about 4.2 Myr. Such mutation coincided with the giant panda's dietary change and also reinforced its herbivorous life style. However, as this gene is preserved in herbivores such as cow and horse, we need to look for other reasons behind the giant panda's diet switch. Since taste is part of the reward properties of food related to its energy and nutrition contents, we did a systematic analysis on those genes involved in the appetite-reward system for the giant panda. We extracted the giant panda sequence information for those genes and compared with the human sequence first and then with seven other species including chimpanzee, mouse, rat, dog, cat, horse, and cow. Orthologs in panda were further analyzed based on the coding region, Kozak consensus sequence, and potential microRNA binding of those genes. Our results revealed an interesting dopamine metabolic involvement in the panda's food choice. This finding suggests a new direction for molecular evolution studies behind the panda's dietary switch.
Jin, Ke; Xue, Chenyi; Wu, Xiaoli; Qian, Jinyi; Zhu, Yong; Yang, Zhen; Yonezawa, Takahiro; Crabbe, M. James C.; Cao, Ying; Hasegawa, Masami; Zhong, Yang; Zheng, Yufang
2011-01-01
Background The giant panda has an interesting bamboo diet unlike the other species in the order of Carnivora. The umami taste receptor gene T1R1 has been identified as a pseudogene during its genome sequencing project and confirmed using a different giant panda sample. The estimated mutation time for this gene is about 4.2 Myr. Such mutation coincided with the giant panda's dietary change and also reinforced its herbivorous life style. However, as this gene is preserved in herbivores such as cow and horse, we need to look for other reasons behind the giant panda's diet switch. Methodology/Principal Findings Since taste is part of the reward properties of food related to its energy and nutrition contents, we did a systematic analysis on those genes involved in the appetite-reward system for the giant panda. We extracted the giant panda sequence information for those genes and compared with the human sequence first and then with seven other species including chimpanzee, mouse, rat, dog, cat, horse, and cow. Orthologs in panda were further analyzed based on the coding region, Kozak consensus sequence, and potential microRNA binding of those genes. Conclusions/Significance Our results revealed an interesting dopamine metabolic involvement in the panda's food choice. This finding suggests a new direction for molecular evolution studies behind the panda's dietary switch. PMID:21818345
Miyazaki, Saori; Sato, Yutaka; Asano, Tomoya; Nagamura, Yoshiaki; Nonomura, Ken-Ichi
2015-10-01
Post-transcriptional gene regulation by RNA recognition motif (RRM) proteins through binding to cis-elements in the 3'-untranslated region (3'-UTR) is widely used in eukaryotes to complete various biological processes. Rice MEIOSIS ARRESTED AT LEPTOTENE2 (MEL2) is the RRM protein that functions in the transition to meiosis in proper timing. The MEL2 RRM preferentially associated with the U-rich RNA consensus, UUAGUU[U/A][U/G][A/U/G]U, dependently on sequences and proportionally to MEL2 protein amounts in vitro. The consensus sequences were located in the putative looped structures of the RNA ligand. A genome-wide survey revealed a tendency of MEL2-binding consensus appearing in 3'-UTR of rice genes. Of 249 genes that conserved the consensus in their 3'-UTR, 13 genes spatiotemporally co-expressed with MEL2 in meiotic flowers, and included several genes whose function was supposed in meiosis; such as Replication protein A and OsMADS3. The proteome analysis revealed that the amounts of small ubiquitin-related modifier-like protein and eukaryotic translation initiation factor3-like protein were dramatically altered in mel2 mutant anthers. Taken together with transcriptome and gene ontology results, we propose that the rice MEL2 is involved in the translational regulation of key meiotic genes on 3'-UTRs to achieve the faithful transition of germ cells to meiosis.
Liakhovetskiĭ, V A; Bobrova, E V; Skopin, G N
2012-01-01
Transposition errors during the reproduction of a hand movement sequence make it possible to receive important information on the internal representation of this sequence in the motor working memory. Analysis of such errors showed that learning to reproduce sequences of the left-hand movements improves the system of positional coding (coding ofpositions), while learning of the right-hand movements improves the system of vector coding (coding of movements). Learning of the right-hand movements after the left-hand performance involved the system of positional coding "imposed" by the left hand. Learning of the left-hand movements after the right-hand performance activated the system of vector coding. Transposition errors during learning to reproduce movement sequences can be explained by neural network using either vector coding or both vector and positional coding.
Informational structure of genetic sequences and nature of gene splicing
NASA Astrophysics Data System (ADS)
Trifonov, E. N.
1991-10-01
Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
R2R - software to speed the depiction of aesthetic consensus RNA secondary structures
2011-01-01
Background With continuing identification of novel structured noncoding RNAs, there is an increasing need to create schematic diagrams showing the consensus features of these molecules. RNA structural diagrams are typically made either with general-purpose drawing programs like Adobe Illustrator, or with automated or interactive programs specific to RNA. Unfortunately, the use of applications like Illustrator is extremely time consuming, while existing RNA-specific programs produce figures that are useful, but usually not of the same aesthetic quality as those produced at great cost in Illustrator. Additionally, most existing RNA-specific applications are designed for drawing single RNA molecules, not consensus diagrams. Results We created R2R, a computer program that facilitates the generation of aesthetic and readable drawings of RNA consensus diagrams in a fraction of the time required with general-purpose drawing programs. Since the inference of a consensus RNA structure typically requires a multiple-sequence alignment, the R2R user annotates the alignment with commands directing the layout and annotation of the RNA. R2R creates SVG or PDF output that can be imported into Adobe Illustrator, Inkscape or CorelDRAW. R2R can be used to create consensus sequence and secondary structure models for novel RNA structures or to revise models when new representatives for known RNA classes become available. Although R2R does not currently have a graphical user interface, it has proven useful in our efforts to create 100 schematic models of distinct noncoding RNA classes. Conclusions R2R makes it possible to obtain high-quality drawings of the consensus sequence and structural models of many diverse RNA structures with a more practical amount of effort. R2R software is available at http://breaker.research.yale.edu/R2R and as an Additional file. PMID:21205310
Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.
Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas
2013-06-01
We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.
NASA Astrophysics Data System (ADS)
Kraljić, K.; Strüngmann, L.; Fimmel, E.; Gumbel, M.
2018-01-01
The genetic code is degenerated and it is assumed that redundancy provides error detection and correction mechanisms in the translation process. However, the biological meaning of the code's structure is still under current research. This paper presents a Genetic Code Analysis Toolkit (GCAT) which provides workflows and algorithms for the analysis of the structure of nucleotide sequences. In particular, sets or sequences of codons can be transformed and tested for circularity, comma-freeness, dichotomic partitions and others. GCAT comes with a fertile editor custom-built to work with the genetic code and a batch mode for multi-sequence processing. With the ability to read FASTA files or load sequences from GenBank, the tool can be used for the mathematical and statistical analysis of existing sequence data. GCAT is Java-based and provides a plug-in concept for extensibility. Availability: Open source Homepage:http://www.gcat.bio/
Provencher, Cathy; LaPointe, Gisèle; Sirois, Stéphane; Van Calsteren, Marie-Rose; Roy, Denis
2003-01-01
A primer design strategy named CODEHOP (consensus-degenerate hybrid oligonucleotide primer) for amplification of distantly related sequences was used to detect the priming glycosyltransferase (GT) gene in strains of the Lactobacillus casei group. Each hybrid primer consisted of a short 3′ degenerate core based on four highly conserved amino acids and a longer 5′ consensus clamp region based on six sequences of the priming GT gene products from exopolysaccharide (EPS)-producing bacteria. The hybrid primers were used to detect the priming GT gene of 44 commercial isolates and reference strains of Lactobacillus rhamnosus, L. casei, Lactobacillus zeae, and Streptococcus thermophilus. The priming GT gene was detected in the genome of both non-EPS-producing (EPS−) and EPS-producing (EPS+) strains of L. rhamnosus. The sequences of the cloned PCR products were similar to those of the priming GT gene of various gram-negative and gram-positive EPS+ bacteria. Specific primers designed from the L. rhamnosus RW-9595M GT gene were used to sequence the end of the priming GT gene in selected EPS+ strains of L. rhamnosus. Phylogenetic analysis revealed that Lactobacillus spp. form a distinctive group apart from other lactic acid bacteria for which GT genes have been characterized to date. Moreover, the sequences show a divergence existing among strains of L. rhamnosus with respect to the terminal region of the priming GT gene. Thus, the PCR approach with consensus-degenerate hybrid primers designed with CODEHOP is a practical approach for the detection of similar genes containing conserved motifs in different bacterial genomes. PMID:12788729
Wellehan, James F. X.; Johnson, April J.; Harrach, Balázs; Benkö, Mária; Pessier, Allan P.; Johnson, Calvin M.; Garner, Michael M.; Childress, April; Jacobson, Elliott R.
2004-01-01
A consensus nested-PCR method was designed for investigation of the DNA polymerase gene of adenoviruses. Gene fragments were amplified and sequenced from six novel adenoviruses from seven lizard species, including four species from which adenoviruses had not previously been reported. Host species included Gila monster, leopard gecko, fat-tail gecko, blue-tongued skink, Tokay gecko, bearded dragon, and mountain chameleon. This is the first sequence information from lizard adenoviruses. Phylogenetic analysis indicated that these viruses belong to the genus Atadenovirus, supporting the reptilian origin of atadenoviruses. This PCR method may be useful for obtaining templates for initial sequencing of novel adenoviruses. PMID:15542689
Wellehan, James F X; Johnson, April J; Harrach, Balázs; Benkö, Mária; Pessier, Allan P; Johnson, Calvin M; Garner, Michael M; Childress, April; Jacobson, Elliott R
2004-12-01
A consensus nested-PCR method was designed for investigation of the DNA polymerase gene of adenoviruses. Gene fragments were amplified and sequenced from six novel adenoviruses from seven lizard species, including four species from which adenoviruses had not previously been reported. Host species included Gila monster, leopard gecko, fat-tail gecko, blue-tongued skink, Tokay gecko, bearded dragon, and mountain chameleon. This is the first sequence information from lizard adenoviruses. Phylogenetic analysis indicated that these viruses belong to the genus Atadenovirus, supporting the reptilian origin of atadenoviruses. This PCR method may be useful for obtaining templates for initial sequencing of novel adenoviruses.
SEQassembly: A Practical Tools Program for Coding Sequences Splicing
NASA Astrophysics Data System (ADS)
Lee, Hongbin; Yang, Hang; Fu, Lei; Qin, Long; Li, Huili; He, Feng; Wang, Bo; Wu, Xiaoming
CDS (Coding Sequences) is a portion of mRNA sequences, which are composed by a number of exon sequence segments. The construction of CDS sequence is important for profound genetic analysis such as genotyping. A program in MATLAB environment is presented, which can process batch of samples sequences into code segments under the guide of reference exon models, and splice these code segments of same sample source into CDS according to the exon order in queue file. This program is useful in transcriptional polymorphism detection and gene function study.
Genetic Diversity in Oxytocin Ligands and Receptors in New World Monkeys
Ren, Dongren; Lu, Guoqing; Moriyama, Hideaki; Mustoe, Aaryn C.; Harrison, Emily B.; French, Jeffrey A.
2015-01-01
Oxytocin (OXT) is an important neurohypophyseal hormone that influences wide spectrum of reproductive and social processes. Eutherian mammals possess a highly conserved sequence of OXT (Cys-Tyr-Ile-Gln-Asn-Cys-Pro-Leu-Gly). However, in this study, we sequenced the coding region for OXT in 22 species covering all New World monkeys (NWM) genera and clades, and characterize five OXT variants, including consensus mammalian Leu8-OXT, major variant Pro8-OXT, and three previously unreported variants: Ala8-OXT, Thr8-OXT, and Phe2-OXT. Pro8-OXT shows clear structural and physicochemical differences from Leu8-OXT. We report multiple predicted amino acid substitutions in the G protein-coupled OXT receptor (OXTR), especially in the critical N-terminus, which is crucial for OXT recognition and binding. Genera with same Pro8-OXT tend to cluster together on a phylogenetic tree based on OXTR sequence, and we demonstrate significant coevolution between OXT and OXTR. NWM species are characterized by high incidence of social monogamy, and we document an association between OXTR phylogeny and social monogamy. Our results demonstrate remarkable genetic diversity in the NWM OXT/OXTR system, which can provide a foundation for molecular, pharmacological, and behavioral studies of the role of OXT signaling in regulating complex social phenotypes. PMID:25938568
Identification and application of self-binding zipper-like sequences in SARS-CoV spike protein.
Zhang, Si Min; Liao, Ying; Neo, Tuan Ling; Lu, Yanning; Liu, Ding Xiang; Vahlne, Anders; Tam, James P
2018-05-22
Self-binding peptides containing zipper-like sequences, such as the Leu/Ile zipper sequence within the coiled coil regions of proteins and the cross-β spine steric zippers within the amyloid-like fibrils, could bind to the protein-of-origin through homophilic sequence-specific zipper motifs. These self-binding sequences represent opportunities for the development of biochemical tools and/or therapeutics. Here, we report on the identification of a putative self-binding β-zipper-forming peptide within the severe acute respiratory syndrome-associated coronavirus spike (S) protein and its application in viral detection. Peptide array scanning of overlapping peptides covering the entire length of S protein identified 34 putative self-binding peptides of six clusters, five of which contained octapeptide core consensus sequences. The Cluster I consensus octapeptide sequence GINITNFR was predicted by the Eisenberg's 3D profile method to have high amyloid-like fibrillation potential through steric β-zipper formation. Peptide C6 containing the Cluster I consensus sequence was shown to oligomerize and form amyloid-like fibrils. Taking advantage of this, C6 was further applied to detect the S protein expression in vitro by fluorescence staining. Meanwhile, the coiled-coil-forming Leu/Ile heptad repeat sequences within the S protein were under-represented during peptide array scanning, in agreement with that long peptide lengths were required to attain high helix-mediated interaction avidity. The data suggest that short β-zipper-like self-binding peptides within the S protein could be identified through combining the peptide scanning and predictive methods, and could be exploited as biochemical detection reagents for viral infection. Copyright © 2018. Published by Elsevier Ltd.
Correlation approach to identify coding regions in DNA sequences
NASA Technical Reports Server (NTRS)
Ossadnik, S. M.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.
1994-01-01
Recently, it was observed that noncoding regions of DNA sequences possess long-range power-law correlations, whereas coding regions typically display only short-range correlations. We develop an algorithm based on this finding that enables investigators to perform a statistical analysis on long DNA sequences to locate possible coding regions. The algorithm is particularly successful in predicting the location of lengthy coding regions. For example, for the complete genome of yeast chromosome III (315,344 nucleotides), at least 82% of the predictions correspond to putative coding regions; the algorithm correctly identified all coding regions larger than 3000 nucleotides, 92% of coding regions between 2000 and 3000 nucleotides long, and 79% of coding regions between 1000 and 2000 nucleotides. The predictive ability of this new algorithm supports the claim that there is a fundamental difference in the correlation property between coding and noncoding sequences. This algorithm, which is not species-dependent, can be implemented with other techniques for rapidly and accurately locating relatively long coding regions in genomic sequences.
Nucleotide sequence of the gene encoding the nitrogenase iron protein of Thiobacillus ferrooxidans
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pretorius, I.M.; Rawlings, D.E.; O'Neill, E.G.
1987-01-01
The DNA sequence was determined for the cloned Thiobacillus ferrooxidans nifH and part of the nifD genes. The DNA chains were radiolabeled with (..cap alpha..-/sup 32/P)dCTP (3000 Ci/mmol) or (..cap alpha..-/sup 35/S)dCTP (400 Ci/mmol). A putative T. ferrooxidans nifH promoter was identified whose sequences showed perfect consensus with those of the Klebsiella pneumoniae nif promoter. Two putative consensus upstream activator sequences were also identified. The amino acid sequence was deduced from the DNA sequence. In a comparison of nifH DNA sequences from T. ferrooxidans and eight other nitrogen-fixing microbes, a Rhizobium sp. isolated from Parasponia andersonii showed the greatest homologymore » (74%) and Clostridium pasteurianum (nifH1) showed the least homology (54%). In the comparison of the amino acid sequences of the Fe proteins, the Rhizobium sp. and Rhizobium japonicum showed the greatest homology (both 86%) and C. pasteurianum (nifH1 gene product) demonstrated the least homology (56%) to the T. ferrooxidans Fe protein.« less
Statistical properties of DNA sequences
NASA Technical Reports Server (NTRS)
Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.
1995-01-01
We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.
Cellulases and coding sequences
Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong
2001-02-20
The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.
Cellulases and coding sequences
Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong
2001-01-01
The present invention provides three fungal cellulases, their coding sequences, recombinant DNA molecules comprising the cellulase coding sequences, recombinant host cells and methods for producing same. The present cellulases are from Orpinomyces PC-2.
Biclustering as a method for RNA local multiple sequence alignment.
Wang, Shu; Gutell, Robin R; Miranker, Daniel P
2007-12-15
Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address. We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count. Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions. BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/
Reid-Bayliss, Kate S; Loeb, Lawrence A
2017-08-29
Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.
USDA-ARS?s Scientific Manuscript database
Lipase (lip) and lipase-specific foldase (lif) genes of a biodegradable polyhydroxyalkanoate- (PHA-) synthesizing Pseudomonas resinovorans NRRL B-2649 were cloned using primers based on consensus sequences, followed by PCR-based genome walking. Sequence analyses showed a putative Lip gene-product (...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chain, P; Garcia, E
2003-02-06
The goal of this proposed effort was to assess the difficulty in identifying and characterizing virulence candidate genes in an organism for which very limited data exists. This was accomplished by first addressing the finishing phase of draft-sequenced F. tularensis genomes and conducting comparative analyses to determine the coding potential of each genome; to discover the differences in genome structure and content, and to identify potential genes whose products may be involved in the F. tularensis virulence process. The project was divided into three parts: (1) Genome finishing: This part involves determining the order and orientation of the consensus sequencesmore » of contigs obtained from Phrap assemblies of random draft genomic sequences. This tedious process consists of linking contig ends using information embedded in each sequence file that relates the sequence to the original cloned insert. Since inserts are sequenced from both ends, we can establish a link between these paired-ends in different contigs and thus order and orient contigs. Since these genomes carry numerous copies of insertion sequences, these repeated elements ''confuse'' the Phrap assembly program. It is thus necessary to break these contigs apart at the repeated sequences and individually join the proper flanking regions using paired-end information, or using results of comparisons against a similar genome. Larger repeated elements such as the small subunit ribosomal RNA operon require verification with PCR. Tandem repeats require manual intervention and typically rely on single nucleotide polymorphisms to be resolved. Remaining gaps require PCR reactions and sequencing. Once the genomes have been ''closed'', low quality regions are addressed by resequencing reactions. (2) Genome analysis: The final consensus sequences are processed by combining the results of three gene modelers: Glimmer, Critica and Generation. The final gene models are submitted to a battery of homology searches and domain prediction programs in order to annotate them (e.g. BLAST, Pfam, TIGRfam, COG, KEGG, InterPro, TMhmm, SignalP). The genome structure is also assessed in terms of G+C content, GC bias (GC skew), and locations of repeated regions (e.g. IS elements) and phage-like genes. (3) Comparative genomics: The results of the various genome analyses are compared between the finished (or almost finished) genomes. Here, we have compared the F. tularensis genomes from the extremely lethal strain Schu4 (subsp. tularensis), the vaccine strain LVS (subsp. holartica), and strain UT01-4992 of the less virulent, opportunistic subsp. novicida. Regions present in the highly virulent strain that are absent from the other less virulent strains may provide insight into what factors are required for the high level of virulence.« less
Cofactor specificity switch in Shikimate dehydrogenase by rational design and consensus engineering.
García-Guevara, Fernando; Bravo, Iris; Martínez-Anaya, Claudia; Segovia, Lorenzo
2017-08-01
Consensus engineering has been used to design more stable variants using the most frequent amino acid at each site of a multiple sequence alignment; sometimes consensus engineering modifies function, but efforts have mainly been focused on studying stability. Here we constructed a consensus Rossmann domain for the Shikimate dehydrogenase enzyme; separately we decided to switch the cofactor specificity through rational design in the Escherichia coli Shikimate dehydrogenase enzyme and then analyzed the effect of consensus mutations on top of our design. We found that consensus mutations closest to the 2' adenine moiety increased the activity in our design. Consensus engineering has been shown to result in more stable proteins and our findings suggest it could also be used as a complementary tool for increasing or modifying enzyme activity during design. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Presence of a consensus DNA motif at nearby DNA sequence of the mutation susceptible CG nucleotides.
Chowdhury, Kaushik; Kumar, Suresh; Sharma, Tanu; Sharma, Ankit; Bhagat, Meenakshi; Kamai, Asangla; Ford, Bridget M; Asthana, Shailendra; Mandal, Chandi C
2018-01-10
Complexity in tissues affected by cancer arises from somatic mutations and epigenetic modifications in the genome. The mutation susceptible hotspots present within the genome indicate a non-random nature and/or a position specific selection of mutation. An association exists between the occurrence of mutations and epigenetic DNA methylation. This study is primarily aimed at determining mutation status, and identifying a signature for predicting mutation prone zones of tumor suppressor (TS) genes. Nearby sequences from the top five positions having a higher mutation frequency in each gene of 42 TS genes were selected from a cosmic database and were considered as mutation prone zones. The conserved motifs present in the mutation prone DNA fragments were identified. Molecular docking studies were done to determine putative interactions between the identified conserved motifs and enzyme methyltransferase DNMT1. Collective analysis of 42 TS genes found GC as the most commonly replaced and AT as the most commonly formed residues after mutation. Analysis of the top 5 mutated positions of each gene (210 DNA segments for 42 TS genes) identified that CG nucleotides of the amino acid codons (e.g., Arginine) are most susceptible to mutation, and found a consensus DNA "T/AGC/GAGGA/TG" sequence present in these mutation prone DNA segments. Similar to TS genes, analysis of 54 oncogenes not only found CG nucleotides of the amino acid Arg as the most susceptible to mutation, but also identified the presence of similar consensus DNA motifs in the mutation prone DNA fragments (270 DNA segments for 54 oncogenes) of oncogenes. Docking studies depicted that, upon binding of DNMT1 methylates to this consensus DNA motif (C residues of CpG islands), mutation was likely to occur. Thus, this study proposes that DNMT1 mediated methylation in chromosomal DNA may decrease if a foreign DNA segment containing this consensus sequence along with CG nucleotides is exogenously introduced to dividing cancer cells. Copyright © 2017 Elsevier B.V. All rights reserved.
A first report and complete genome sequence of alfalfa enamovirus from Sudan
USDA-ARS?s Scientific Manuscript database
A full genome sequence of a viral pathogen, provisionally named alfalfa enamovirus 2 (AEV-2), was reconstructed from short reads obtained by Illumina RNA sequencing of alfalfa sample originating from Sudan. Ambiguous nucleotides in the resultant consensus assembly and identity of the predicted virus...
A consensus linkage map of lentil based on DArT markers from three RIL mapping populations.
Ates, Duygu; Aldemir, Secil; Alsaleh, Ahmad; Erdogmus, Semih; Nemli, Seda; Kahriman, Abdullah; Ozkan, Hakan; Vandenberg, Albert; Tanyolac, Bahattin
2018-01-01
Lentil (Lens culinaris ssp. culinaris Medikus) is a diploid (2n = 2x = 14), self-pollinating grain legume with a haploid genome size of about 4 Gbp and is grown throughout the world with current annual production of 4.9 million tonnes. A consensus map of lentil (Lens culinaris ssp. culinaris Medikus) was constructed using three different lentils recombinant inbred line (RIL) populations, including "CDC Redberry" x "ILL7502" (LR8), "ILL8006" x "CDC Milestone" (LR11) and "PI320937" x "Eston" (LR39). The lentil consensus map was composed of 9,793 DArT markers, covered a total of 977.47 cM with an average distance of 0.10 cM between adjacent markers and constructed 7 linkage groups representing 7 chromosomes of the lentil genome. The consensus map had no gap larger than 12.67 cM and only 5 gaps were found to be between 12.67 cM and 6.0 cM (on LG3 and LG4). The localization of the SNP markers on the lentil consensus map were in general consistent with their localization on the three individual genetic linkage maps and the lentil consensus map has longer map length, higher marker density and shorter average distance between the adjacent markers compared to the component linkage maps. This high-density consensus map could provide insight into the lentil genome. The consensus map could also help to construct a physical map using a Bacterial Artificial Chromosome library and map based cloning studies. Sequence information of DArT may help localization of orientation scaffolds from Next Generation Sequencing data.
A consensus linkage map of lentil based on DArT markers from three RIL mapping populations
Ates, Duygu; Aldemir, Secil; Alsaleh, Ahmad; Erdogmus, Semih; Nemli, Seda; Kahriman, Abdullah; Ozkan, Hakan; Vandenberg, Albert
2018-01-01
Background Lentil (Lens culinaris ssp. culinaris Medikus) is a diploid (2n = 2x = 14), self-pollinating grain legume with a haploid genome size of about 4 Gbp and is grown throughout the world with current annual production of 4.9 million tonnes. Materials and methods A consensus map of lentil (Lens culinaris ssp. culinaris Medikus) was constructed using three different lentils recombinant inbred line (RIL) populations, including “CDC Redberry” x “ILL7502” (LR8), “ILL8006” x “CDC Milestone” (LR11) and “PI320937” x “Eston” (LR39). Results The lentil consensus map was composed of 9,793 DArT markers, covered a total of 977.47 cM with an average distance of 0.10 cM between adjacent markers and constructed 7 linkage groups representing 7 chromosomes of the lentil genome. The consensus map had no gap larger than 12.67 cM and only 5 gaps were found to be between 12.67 cM and 6.0 cM (on LG3 and LG4). The localization of the SNP markers on the lentil consensus map were in general consistent with their localization on the three individual genetic linkage maps and the lentil consensus map has longer map length, higher marker density and shorter average distance between the adjacent markers compared to the component linkage maps. Conclusion This high-density consensus map could provide insight into the lentil genome. The consensus map could also help to construct a physical map using a Bacterial Artificial Chromosome library and map based cloning studies. Sequence information of DArT may help localization of orientation scaffolds from Next Generation Sequencing data. PMID:29351563
Hall, L; Laird, J E; Craig, R K
1984-01-01
Nucleotide sequence analysis of cloned guinea-pig casein B cDNA sequences has identified two casein B variants related to the bovine and rat alpha s1 caseins. Amino acid homology was largely confined to the known bovine or predicted rat phosphorylation sites and within the 'signal' precursor sequence. Comparison of the deduced nucleotide sequence of the guinea-pig and rat alpha s1 casein mRNA species showed greater sequence conservation in the non-coding than in the coding regions, suggesting a functional and possibly regulatory role for the non-coding regions of casein mRNA. The results provide insight into the evolution of the casein genes, and raise questions as to the role of conserved nucleotide sequences within the non-coding regions of mRNA species. Images Fig. 1. PMID:6548375
DNA barcode goes two-dimensions: DNA QR code web server.
Liu, Chang; Shi, Linchun; Xu, Xiaolan; Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin
2012-01-01
The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, "DNA barcode" actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications.
Lichenase and coding sequences
Li, Xin-Liang; Ljungdahl, Lars G.; Chen, Huizhong
2000-08-15
The present invention provides a fungal lichenase, i.e., an endo-1,3-1,4-.beta.-D-glucanohydrolase, its coding sequence, recombinant DNA molecules comprising the lichenase coding sequences, recombinant host cells and methods for producing same. The present lichenase is from Orpinomyces PC-2.
Veldman, G M; Klootwijk, J; van Heerikhuizen, H; Planta, R J
1981-01-01
We have determined the nucleotide sequence of part of a cloned yeast ribosomal RNA operon extending from the 5.8S RNA gene downstream into the 5' -terminal region of the 26S RNA gene. We mapped the pertinent processing sites, viz. the 5' end of 26S rRNA and the 3'ends of 5.8S rRNA and its immediate precursor, 7S RNA. At the 3' end of 7S RNA we find the sequence UCGUUU which is very similar to the type I consensus sequence UCAUUA/U present at the 3' ends of 17S, 5.8S and 26S rRNA as well as 18S precursor rRNA in yeast. At the 5' end of the 26S RNA gene we find a sequence of thirteen nucleotides which is homologous to the type II sequence present at the 5' termini of both the 17S and the 5.8S RNA gene. These findings further support the suggestion put forward earlier (G.M. Veldman et al. (1980) Nucl. Acids Res. 8, 2907-2920) that both consensus sequences are involved in the recognition of precursor rRNA by the processing nuclease(s). We discuss a model for the processing of yeast rRNA in which a processing enzyme sequentially recognizes several combinations of a type I and a type II consensus sequence. We also describe the existence of a significant base complementarity between sequences in the 5' -terminal region of 26S rRNA and the 3' -terminal region of 5.8S rRNA. We suggest that base pairing between these sequences contributes to the binding between 5.8S and 26S rRNA. Images PMID:7312619
Consensus statement: Virus taxonomy in the age of metagenomics.
Simmonds, Peter; Adams, Mike J; Benkő, Mária; Breitbart, Mya; Brister, J Rodney; Carstens, Eric B; Davison, Andrew J; Delwart, Eric; Gorbalenya, Alexander E; Harrach, Balázs; Hull, Roger; King, Andrew M Q; Koonin, Eugene V; Krupovic, Mart; Kuhn, Jens H; Lefkowitz, Elliot J; Nibert, Max L; Orton, Richard; Roossinck, Marilyn J; Sabanadzovic, Sead; Sullivan, Matthew B; Suttle, Curtis A; Tesh, Robert B; van der Vlugt, René A; Varsani, Arvind; Zerbini, F Murilo
2017-03-01
The number and diversity of viral sequences that are identified in metagenomic data far exceeds that of experimentally characterized virus isolates. In a recent workshop, a panel of experts discussed the proposal that, with appropriate quality control, viruses that are known only from metagenomic data can, and should be, incorporated into the official classification scheme of the International Committee on Taxonomy of Viruses (ICTV). Although a taxonomy that is based on metagenomic sequence data alone represents a substantial departure from the traditional reliance on phenotypic properties, the development of a robust framework for sequence-based virus taxonomy is indispensable for the comprehensive characterization of the global virome. In this Consensus Statement article, we consider the rationale for why metagenomic sequence data should, and how it can, be incorporated into the ICTV taxonomy, and present proposals that have been endorsed by the Executive Committee of the ICTV.
Brylinski, Michal; Konieczny, Leszek; Kononowicz, Andrzej; Roterman, Irena
2008-03-21
The well-known procedure implemented in ClustalW oriented on the sequence comparison was applied to structure comparison. The consensus sequence as well as consensus structure has been defined for proteins belonging to serpine family. The structure of early stage intermediate was the object for similarity search. The high values of W(sequence) appeared to be accordant with high values of W(structure) making possible structure comparison using common criteria for sequence and structure comparison. Since the early stage structural form has been created according to limited conformational sub-space which does not include the beta-structure (this structure is mediated by C7eq structural form), is particularly important to see, that the C7eq structural form may be treated as the seed for beta-structure present in the final native structure of protein. The applicability of ClustalW procedure to structure comparison makes these two comparisons unified.
Kuhn, Jens H.; Andersen, Kristian G.; Baize, Sylvain; Bào, Yīmíng; Bavari, Sina; Berthet, Nicolas; Blinkova, Olga; Brister, J. Rodney; Clawson, Anna N.; Fair, Joseph; Gabriel, Martin; Garry, Robert F.; Gire, Stephen K.; Goba, Augustine; Gonzalez, Jean-Paul; Günther, Stephan; Happi, Christian T.; Jahrling, Peter B.; Kapetshi, Jimmy; Kobinger, Gary; Kugelman, Jeffrey R.; Leroy, Eric M.; Maganga, Gael Darren; Mbala, Placide K.; Moses, Lina M.; Muyembe-Tamfum, Jean-Jacques; N’Faly, Magassouba; Nichol, Stuart T.; Omilabu, Sunday A.; Palacios, Gustavo; Park, Daniel J.; Paweska, Janusz T.; Radoshitzky, Sheli R.; Rossi, Cynthia A.; Sabeti, Pardis C.; Schieffelin, John S.; Schoepp, Randal J.; Sealfon, Rachel; Swanepoel, Robert; Towner, Jonathan S.; Wada, Jiro; Wauquier, Nadia; Yozwiak, Nathan L.; Formenty, Pierre
2014-01-01
In 2014, Ebola virus (EBOV) was identified as the etiological agent of a large and still expanding outbreak of Ebola virus disease (EVD) in West Africa and a much more confined EVD outbreak in Middle Africa. Epidemiological and evolutionary analyses confirmed that all cases of both outbreaks are connected to a single introduction each of EBOV into human populations and that both outbreaks are not directly connected. Coding-complete genomic sequence analyses of isolates revealed that the two outbreaks were caused by two novel EBOV variants, and initial clinical observations suggest that neither of them should be considered strains. Here we present consensus decisions on naming for both variants (West Africa: “Makona”, Middle Africa: “Lomela”) and provide database-compatible full, shortened, and abbreviated names that are in line with recently established filovirus sub-species nomenclatures. PMID:25421896
Exon–intron organization of genes in the slime mold Physarum polycephalum
Trzcinska-Danielewicz, Joanna; Fronk, Jan
2000-01-01
The slime mold Physarum polycephalum is a morphologically simple organism with a large and complex genome. The exon–intron organization of its genes exhibits features typical for protists and fungi as well as those characteristic for the evolutionarily more advanced species. This indicates that both the taxonomic position as well as the size of the genome shape the exon–intron organization of an organism. The average gene has 3.7 introns which are on average 138 bp, with a rather narrow size distribution. Introns are enriched in AT base pairs by 13% relative to exons. The consensus sequences at exon–intron boundaries resemble those found for other species, with minor differences between short and long introns. A unique feature of P.polycephalum introns is the strong preference for pyrimidines in the coding strand throughout their length, without a particular enrichment at the 3′-ends. PMID:10982858
SURVEY AND SUMMARY: exon-intron organization of genes in the slime mold Physarum polycephalum.
Trzcinska-Danielewicz, J; Fronk, J
2000-09-15
The slime mold Physarum polycephalum is a morphologically simple organism with a large and complex genome. The exon-intron organization of its genes exhibits features typical for protists and fungi as well as those characteristic for the evolutionarily more advanced species. This indicates that both the taxonomic position as well as the size of the genome shape the exon-intron organization of an organism. The average gene has 3.7 introns which are on average 138 bp, with a rather narrow size distribution. Introns are enriched in AT base pairs by 13% relative to exons. The consensus sequences at exon-intron boundaries resemble those found for other species, with minor differences between short and long introns. A unique feature of P.polycephalum introns is the strong preference for pyrimidines in the coding strand throughout their length, without a particular enrichment at the 3'-ends.
FRAGS: estimation of coding sequence substitution rates from fragmentary data
Swart, Estienne C; Hide, Winston A; Seoighe, Cathal
2004-01-01
Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data. PMID:15005802
Chan, Vincy; Thurairajah, Pravheen; Colantonio, Angela
2015-02-04
Although healthcare administrative data are commonly used for traumatic brain injury (TBI) research, there is currently no consensus or consistency on the International Classification of Diseases Version 10 (ICD-10) codes used to define TBI among children and youth internationally. This study systematically reviewed the literature to explore the range of ICD-10 codes that are used to define TBI in this population. The identification of the range of ICD-10 codes to define this population in administrative data is crucial, as it has implications for policy, resource allocation, planning of healthcare services, and prevention strategies. The databases MEDLINE, MEDLINE In-Process, Embase, PsychINFO, CINAHL, SPORTDiscus, and Cochrane Database of Systematic Reviews were systematically searched. Grey literature was searched using Grey Matters and Google. Reference lists of included articles were also searched for relevant studies. Two reviewers independently screened all titles and abstracts using pre-defined inclusion and exclusion criteria. A full text screen was conducted on articles that met the first screen inclusion criteria. All full text articles that met the pre-defined inclusion criteria were included for analysis in this systematic review. A total of 1,326 publications were identified through the predetermined search strategy and 32 articles/reports met all eligibility criteria for inclusion in this review. Five articles specifically examined children and youth aged 19 years or under with TBI. ICD-10 case definitions ranged from the broad injuries to the head codes (ICD-10 S00 to S09) to concussion only (S06.0). There was overwhelming consensus on the inclusion of ICD-10 code S06, intracranial injury, while codes S00 (superficial injury of the head), S03 (dislocation, sprain, and strain of joints and ligaments of head), and S05 (injury of eye and orbit) were only used by articles that examined head injury, none of which specifically examined children and youth. This review provides evidence for discussion on how best to use ICD codes for different goals. This is an important first step in reaching an appropriate definition and can inform future work on reaching consensus on the ICD-10 codes to define TBI for this vulnerable population.
Spherical: an iterative workflow for assembling metagenomic datasets.
Hitch, Thomas C A; Creevey, Christopher J
2018-01-24
The consensus emerging from the study of microbiomes is that they are far more complex than previously thought, requiring better assemblies and increasingly deeper sequencing. However, current metagenomic assembly techniques regularly fail to incorporate all, or even the majority in some cases, of the sequence information generated for many microbiomes, negating this effort. This can especially bias the information gathered and the perceived importance of the minor taxa in a microbiome. We propose a simple but effective approach, implemented in Python, to address this problem. Based on an iterative methodology, our workflow (called Spherical) carries out successive rounds of assemblies with the sequencing reads not yet utilised. This approach also allows the user to reduce the resources required for very large datasets, by assembling random subsets of the whole in a "divide and conquer" manner. We demonstrate the accuracy of Spherical using simulated data based on completely sequenced genomes and the effectiveness of the workflow at retrieving lost information for taxa in three published metagenomics studies of varying sizes. Our results show that Spherical increased the amount of reads utilized in the assembly by up to 109% compared to the base assembly. The additional contigs assembled by the Spherical workflow resulted in a significant (P < 0.05) changes in the predicted taxonomic profile of all datasets analysed. Spherical is implemented in Python 2.7 and freely available for use under the MIT license. Source code and documentation is hosted publically at: https://github.com/thh32/Spherical .
Fast and accurate de novo genome assembly from long uncorrected reads
Vaser, Robert; Sović, Ivan; Nagarajan, Niranjan
2017-01-01
The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment–based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster. PMID:28100585
Visual pattern image sequence coding
NASA Technical Reports Server (NTRS)
Silsbee, Peter; Bovik, Alan C.; Chen, Dapang
1990-01-01
The visual pattern image coding (VPIC) configurable digital image-coding process is capable of coding with visual fidelity comparable to the best available techniques, at compressions which (at 30-40:1) exceed all other technologies. These capabilities are associated with unprecedented coding efficiencies; coding and decoding operations are entirely linear with respect to image size and entail a complexity that is 1-2 orders of magnitude faster than any previous high-compression technique. The visual pattern image sequence coding to which attention is presently given exploits all the advantages of the static VPIC in the reduction of information from an additional, temporal dimension, to achieve unprecedented image sequence coding performance.
Sequence specificity of the human mRNA N6-adenosine methylase in vitro.
Harper, J E; Miceli, S M; Roberts, R J; Manley, J L
1990-01-01
N6-adenosine methylation is a frequent modification of mRNAs and their precursors, but little is known about the mechanism of the reaction or the function of the modification. To explore these questions, we developed conditions to examine N6-adenosine methylase activity in HeLa cell nuclear extracts. Transfer of the methyl group from S-[3H methyl]-adenosylmethionine to unlabeled random copolymer RNA substrates of varying ribonucleotide composition revealed a substrate specificity consistent with a previously deduced consensus sequence, Pu[G greater than A]AC[A/C/U]. 32-P labeled RNA substrates of defined sequence were used to examine the minimum sequence requirements for methylation. Each RNA was 20 nucleotides long, and contained either the core consensus sequence GGACU, or some variation of this sequence. RNAs containing GGACU, either in single or multiple copies, were good substrates for methylation, whereas RNAs containing single base substitutions within the GGACU sequence gave dramatically reduced methylation. These results demonstrate that the N6-adenosine methylase has a strict sequence specificity, and that there is no requirement for extended sequences or secondary structures for methylation. Recognition of this sequence does not require an RNA component, as micrococcal nuclease pretreatment of nuclear extracts actually increased methylation efficiency. Images PMID:2216767
Baig, Tayyba T.; Lanchy, Jean-Marc; Lodmell, J. Stephen
2009-01-01
The packaging signal (ψ) of human immunodeficiency virus type 2 (HIV-2) is present in the 5′ noncoding region of RNA and contains a 10-nucleotide palindrome (pal; 5′-392-GGAGUGCUCC) located upstream of the dimerization signal stem-loop 1 (SL1). pal has been shown to be functionally important in vitro and in vivo. We previously showed that the 3′ side of pal (GCUCC-3′) is involved in base-pairing interactions with a sequence downstream of SL1 to make an extended SL1, which is important for replication in vivo and the regulation of dimerization in vitro. However, the role of the 5′ side of pal (5′-GGAGU) was less clear. Here, we characterized this role using an in vivo SELEX approach. We produced a population of HIV-2 DNA genomes with random sequences within the 5′ side of pal and transfected these into COS-7 cells. Viruses from COS-7 cells were used to infect C8166 permissive cells. After several weeks of serial passage in C8166 cells, surviving viruses were sequenced. On the 5′ side of pal there was a striking convergence toward a GGRGN consensus sequence. Individual clones with consensus and nonconsensus sequences were tested in infectivity and packaging assays. Analysis of individuals that diverged from the consensus sequence showed normal viral RNA and protein synthesis but had replication defects and impaired RNA packaging. These findings clearly indicate that the GGRG motif is essential for viral replication and genomic RNA packaging. PMID:18971263
CapZyme-Seq Comprehensively Defines Promoter-Sequence Determinants for RNA 5' Capping with NAD.
Vvedenskaya, Irina O; Bird, Jeremy G; Zhang, Yuanchao; Zhang, Yu; Jiao, Xinfu; Barvík, Ivan; Krásný, Libor; Kiledjian, Megerditch; Taylor, Deanne M; Ebright, Richard H; Nickels, Bryce E
2018-05-03
Nucleoside-containing metabolites such as NAD + can be incorporated as 5' caps on RNA by serving as non-canonical initiating nucleotides (NCINs) for transcription initiation by RNA polymerase (RNAP). Here, we report CapZyme-seq, a high-throughput-sequencing method that employs NCIN-decapping enzymes NudC and Rai1 to detect and quantify NCIN-capped RNA. By combining CapZyme-seq with multiplexed transcriptomics, we determine efficiencies of NAD + capping by Escherichia coli RNAP for ∼16,000 promoter sequences. The results define preferred transcription start site (TSS) positions for NAD + capping and define a consensus promoter sequence for NAD + capping: HRRASWW (TSS underlined). By applying CapZyme-seq to E. coli total cellular RNA, we establish that sequence determinants for NCIN capping in vivo match the NAD + -capping consensus defined in vitro, and we identify and quantify NCIN-capped small RNAs (sRNAs). Our findings define the promoter-sequence determinants for NCIN capping with NAD + and provide a general method for analysis of NCIN capping in vitro and in vivo. Copyright © 2018 Elsevier Inc. All rights reserved.
Valliere-Douglass, John F; Kodama, Paul; Mujacic, Mirna; Brady, Lowell J; Wang, Wes; Wallace, Alison; Yan, Boxu; Reddy, Pranhitha; Treuheit, Michael J; Balland, Alain
2009-11-20
We report that N-linked oligosaccharide structures can be present on an asparagine residue not adhering to the consensus site motif NX(S/T), where X is not proline, described in the literature. We have observed oligosaccharides on a non-consensus asparaginyl residue in the C(H)1 constant domain of IgG1 and IgG2 antibodies. The initial findings were obtained from characterization of charge variant populations evident in a recombinant human antibody of the IgG2 subclass. HPLC-MS results indicated that cation-exchange chromatography acidic variant populations were enriched in antibody with a second glycosylation site, in addition to the well documented canonical glycosylation site located in the C(H)2 domain. Subsequent tryptic and chymotryptic peptide map data indicated that the second glycosylation site was associated with the amino acid sequence TVSWN(162)SGAL in the C(H)1 domain of the antibody. This highly atypical modification is present at levels of 0.5-2.0% on most of the recombinant antibodies that have been tested and has also been observed in IgG1 antibodies derived from human donors. Site-directed mutagenesis of the C(H)1 domain sequence in a recombinant-human IgG1 antibody resulted in an increase in non-consensus glycosylation to 3.15%, a greater than 4-fold increase over the level observed in the wild type, by changing the -1 and +1 amino acids relative to the asparagine residue at position 162. We believe that further understanding of the phenomenon of non-consensus glycosylation can be used to gain fundamental insights into the fidelity of the cellular glycosylation machinery.
Dickinson, Louise; Ahmed, Hashim U; Allen, Clare; Barentsz, Jelle O; Carey, Brendan; Futterer, Jurgen J; Heijmink, Stijn W; Hoskin, Peter J; Kirkham, Alex; Padhani, Anwar R; Persad, Raj; Puech, Philippe; Punwani, Shonit; Sohaib, Aslam S; Tombal, Bertrand; Villers, Arnauld; van der Meulen, Jan; Emberton, Mark
2011-04-01
Multiparametric magnetic resonance imaging (mpMRI) may have a role in detecting clinically significant prostate cancer in men with raised serum prostate-specific antigen levels. Variations in technique and the interpretation of images have contributed to inconsistency in its reported performance characteristics. Our aim was to make recommendations on a standardised method for the conduct, interpretation, and reporting of prostate mpMRI for prostate cancer detection and localisation. A consensus meeting of 16 European prostate cancer experts was held that followed the UCLA-RAND Appropriateness Method and facilitated by an independent chair. Before the meeting, 520 items were scored for "appropriateness" by panel members, discussed face to face, and rescored. Agreement was reached in 67% of 260 items related to imaging sequence parameters. T2-weighted, dynamic contrast-enhanced, and diffusion-weighted MRI were the key sequences incorporated into the minimum requirements. Consensus was also reached on 54% of 260 items related to image interpretation and reporting, including features of malignancy on individual sequences. A 5-point scale was agreed on for communicating the probability of malignancy, with a minimum of 16 prostatic regions of interest, to include a pictorial representation of suspicious foci. Limitations relate to consensus methodology. Dominant personalities are known to affect the opinions of the group and were countered by a neutral chairperson. Consensus was reached on a number of areas related to the conduct, interpretation, and reporting of mpMRI for the detection, localisation, and characterisation of prostate cancer. Before optimal dissemination of this technology, these outcomes will require formal validation in prospective trials. Copyright © 2010 European Association of Urology. Published by Elsevier B.V. All rights reserved.
Bobrova, E V; Liakhovetskiĭ, V A; Borshchevskaia, E R
2011-01-01
The dependence of errors during reproduction of a sequence of hand movements without visual feedback on the previous right- and left-hand performance ("prehistory") and on positions in space of sequence elements (random or ordered by the explicit rule) was analyzed. It was shown that the preceding information about the ordered positions of the sequence elements was used during right-hand movements, whereas left-hand movements were performed with involvement of the information about the random sequence. The data testify to a central mechanism of the analysis of spatial structure of sequence elements. This mechanism activates movement coding specific for the left hemisphere (vector coding) in case of an ordered sequence structure and positional coding specific for the right hemisphere in case of a random sequence structure.
A filtering method to generate high quality short reads using illumina paired-end technology.
Eren, A Murat; Vineis, Joseph H; Morrison, Hilary G; Sogin, Mitchell L
2013-01-01
Consensus between independent reads improves the accuracy of genome and transcriptome analyses, however lack of consensus between very similar sequences in metagenomic studies can and often does represent natural variation of biological significance. The common use of machine-assigned quality scores on next generation platforms does not necessarily correlate with accuracy. Here, we describe using the overlap of paired-end, short sequence reads to identify error-prone reads in marker gene analyses and their contribution to spurious OTUs following clustering analysis using QIIME. Our approach can also reduce error in shotgun sequencing data generated from libraries with small, tightly constrained insert sizes. The open-source implementation of this algorithm in Python programming language with user instructions can be obtained from https://github.com/meren/illumina-utils.
Discrete Cosine Transform Image Coding With Sliding Block Codes
NASA Astrophysics Data System (ADS)
Divakaran, Ajay; Pearlman, William A.
1989-11-01
A transform trellis coding scheme for images is presented. A two dimensional discrete cosine transform is applied to the image followed by a search on a trellis structured code. This code is a sliding block code that utilizes a constrained size reproduction alphabet. The image is divided into blocks by the transform coding. The non-stationarity of the image is counteracted by grouping these blocks in clusters through a clustering algorithm, and then encoding the clusters separately. Mandela ordered sequences are formed from each cluster i.e identically indexed coefficients from each block are grouped together to form one dimensional sequences. A separate search ensues on each of these Mandela ordered sequences. Padding sequences are used to improve the trellis search fidelity. The padding sequences absorb the error caused by the building up of the trellis to full size. The simulations were carried out on a 256x256 image ('LENA'). The results are comparable to any existing scheme. The visual quality of the image is enhanced considerably by the padding and clustering.
DNA Barcode Goes Two-Dimensions: DNA QR Code Web Server
Li, Huan; Xing, Hang; Liang, Dong; Jiang, Kun; Pang, Xiaohui; Song, Jingyuan; Chen, Shilin
2012-01-01
The DNA barcoding technology uses a standard region of DNA sequence for species identification and discovery. At present, “DNA barcode” actually refers to DNA sequences, which are not amenable to information storage, recognition, and retrieval. Our aim is to identify the best symbology that can represent DNA barcode sequences in practical applications. A comprehensive set of sequences for five DNA barcode markers ITS2, rbcL, matK, psbA-trnH, and CO1 was used as the test data. Fifty-three different types of one-dimensional and ten two-dimensional barcode symbologies were compared based on different criteria, such as coding capacity, compression efficiency, and error detection ability. The quick response (QR) code was found to have the largest coding capacity and relatively high compression ratio. To facilitate the further usage of QR code-based DNA barcodes, a web server was developed and is accessible at http://qrfordna.dnsalias.org. The web server allows users to retrieve the QR code for a species of interests, convert a DNA sequence to and from a QR code, and perform species identification based on local and global sequence similarities. In summary, the first comprehensive evaluation of various barcode symbologies has been carried out. The QR code has been found to be the most appropriate symbology for DNA barcode sequences. A web server has also been constructed to allow biologists to utilize QR codes in practical DNA barcoding applications. PMID:22574113
Bernardes, Juliana; Zaverucha, Gerson; Vaquero, Catherine; Carbone, Alessandra
2016-01-01
Traditional protein annotation methods describe known domains with probabilistic models representing consensus among homologous domain sequences. However, when relevant signals become too weak to be identified by a global consensus, attempts for annotation fail. Here we address the fundamental question of domain identification for highly divergent proteins. By using high performance computing, we demonstrate that the limits of state-of-the-art annotation methods can be bypassed. We design a new strategy based on the observation that many structural and functional protein constraints are not globally conserved through all species but might be locally conserved in separate clades. We propose a novel exploitation of the large amount of data available: 1. for each known protein domain, several probabilistic clade-centered models are constructed from a large and differentiated panel of homologous sequences, 2. a decision-making protocol combines outcomes obtained from multiple models, 3. a multi-criteria optimization algorithm finds the most likely protein architecture. The method is evaluated for domain and architecture prediction over several datasets and statistical testing hypotheses. Its performance is compared against HMMScan and HHblits, two widely used search methods based on sequence-profile and profile-profile comparison. Due to their closeness to actual protein sequences, clade-centered models are shown to be more specific and functionally predictive than the broadly used consensus models. Based on them, we improved annotation of Plasmodium falciparum protein sequences on a scale not previously possible. We successfully predict at least one domain for 72% of P. falciparum proteins against 63% achieved previously, corresponding to 30% of improvement over the total number of Pfam domain predictions on the whole genome. The method is applicable to any genome and opens new avenues to tackle evolutionary questions such as the reconstruction of ancient domain duplications, the reconstruction of the history of protein architectures, and the estimation of protein domain age. Website and software: http://www.lcqb.upmc.fr/CLADE. PMID:27472895
Ivancic-Jelecki, Jelena; Slovic, Anamarija; Šantak, Maja; Tešović, Goran; Forcic, Dubravko
2016-07-29
The canonical genome organization of measles virus (MV) is characterized by total size of 15 894 nucleotides (nts) and defined length of every genomic region, both coding and non-coding. Only rarely have reports of strains possessing non-canonical genomic properties (possessing indels, with or without the change of total genome length) been published. The observed mutations are mutually compensatory in a sense that the total genome length remains polyhexameric. Although programmed and highly precise pseudo-templated nucleotide additions during transcription are inherent to polymerases of all viruses belonging to family Paramyxoviridae, a similar mechanism that would serve to non-randomly correct genome length, if an indel has occurred during replication, has so far not been described in the context of a complete virus genome. We compiled all complete MV genomic sequences (64 in total) available in open access sequence databases. Multiple sequence comparisons and phylogenetic analyses were performed with the aim of exploring whether non-recombinant and non-evolutionary linked measles strains that show deviations from canonical genome organization possess a common genetic characteristic. In 11 MV sequences we detected deviations from canonical genome organization due to short indels located within homopolymeric stretches or next to them. In nine out of 11 identified non-canonical MV sequences, a common feature was observed: one mutation, either an insertion or a deletion, was located in a 28 nts long region in F gene 5' untranslated region (positions 5051-5078 in genomic cDNA of canonical strains). This segment is composed of five tandemly linked homopolymeric stretches, its consensus sequence is G6-7C7-8A6-7G1-3C5-6. Although none of the mononucleotide repeats within this segment has fixed length, the total number of nts in canonical strains is always 28. These nine non-canonical strains, as well as the tenth (not mutated in 5051-5078 segment), can be grouped in three clusters, based on their passage histories/epidemiological data/genetic similarities. There are no indications that the 3 clusters are evolutionary linked, other than the fact that they all belong to clade D. A common narrow genomic region was found to be mutated in different, non-related, wild type strains suggesting that this region might have a function in non-random genome length corrections occurring during MV replication.
CRITICA: coding region identification tool invoking comparative analysis
NASA Technical Reports Server (NTRS)
Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)
1999-01-01
Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).
2011-01-01
Background Big sagebrush (Artemisia tridentata) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush. Results cDNA of A. tridentata sspp. tridentata and vaseyana were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. tridentata and 20,250 contigs in ssp. vaseyana. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. wyomingensis) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. tridentata and vaseyana identified in the combined assembly were also polymorphic within the two geographically distant ssp. wyomingensis samples. Conclusion We have produced a large EST dataset for Artemisia tridentata, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. wyomingensis via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches. PMID:21767398
Jiu-Sheng, Li; Ze-Jiang, Zhao; Jian-Quan, Yao
2017-11-27
In order to extend to 3-bit encoding, we propose notched-wheel structures as polarization insensitive coding metasurfaces to control terahertz wave reflection and suppress backward scattering. By using a coding sequence of "00110011…" along x-axis direction and 16 × 16 random coding sequence, we investigate the polarization insensitive properties of the coding metasurfaces. By designing the coding sequences of the basic coding elements, the terahertz wave reflection can be flexibly manipulated. Additionally, radar cross section (RCS) reduction in the backward direction is less than -10dB in a wide band. The present approach can offer application for novel terahertz manipulation devices.
Noncoding sequence classification based on wavelet transform analysis: part I
NASA Astrophysics Data System (ADS)
Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.
2017-09-01
DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.
Converting Panax ginseng DNA and chemical fingerprints into two-dimensional barcode.
Cai, Yong; Li, Peng; Li, Xi-Wen; Zhao, Jing; Chen, Hai; Yang, Qing; Hu, Hao
2017-07-01
In this study, we investigated how to convert the Panax ginseng DNA sequence code and chemical fingerprints into a two-dimensional code. In order to improve the compression efficiency, GATC2Bytes and digital merger compression algorithms are proposed. HPLC chemical fingerprint data of 10 groups of P. ginseng from Northeast China and the internal transcribed spacer 2 (ITS2) sequence code as the DNA sequence code were ready for conversion. In order to convert such data into a two-dimensional code, the following six steps were performed: First, the chemical fingerprint characteristic data sets were obtained through the inflection filtering algorithm. Second, precompression processing of such data sets is undertaken. Third, precompression processing was undertaken with the P. ginseng DNA (ITS2) sequence codes. Fourth, the precompressed chemical fingerprint data and the DNA (ITS2) sequence code were combined in accordance with the set data format. Such combined data can be compressed by Zlib, an open source data compression algorithm. Finally, the compressed data generated a two-dimensional code called a quick response code (QR code). Through the abovementioned converting process, it can be found that the number of bytes needed for storing P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can be greatly reduced. After GTCA2Bytes algorithm processing, the ITS2 compression rate reaches 75% and the chemical fingerprint compression rate exceeds 99.65% via filtration and digital merger compression algorithm processing. Therefore, the overall compression ratio even exceeds 99.36%. The capacity of the formed QR code is around 0.5k, which can easily and successfully be read and identified by any smartphone. P. ginseng chemical fingerprints and its DNA (ITS2) sequence code can form a QR code after data processing, and therefore the QR code can be a perfect carrier of the authenticity and quality of P. ginseng information. This study provides a theoretical basis for the development of a quality traceability system of traditional Chinese medicine based on a two-dimensional code.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-06-11
... Regulatory Review B. Paperwork Reduction Act C. Regulatory Flexibility Act D. Unfunded Mandates Reform Act E... preamble. APA Administrative Procedure Act CAA Clean Air Act CFR Code of Federal Regulations D.C. District... Authority Rule U.S. United States U.S.C. United States Code VCS Voluntary Consensus Standards VOC Volatile...
Hyder, S M; Stancel, G M; Nawaz, Z; McDonnell, D P; Loose-Mitchell, D S
1992-09-05
We have used transient transfection assays with reporter plasmids expressing chloramphenicol acetyltransferase, linked to regions of mouse c-fos, to identify a specific estrogen response element (ERE) in this protooncogene. This element is located in the untranslated 3'-flanking region of the c-fos gene, 5 kilobases (kb) downstream from the c-fos promoter and 1.5 kb downstream of the poly(A) signal. This element confers estrogen responsiveness to chloramphenicol acetyltransferase reporters linked to both the herpes simplex virus thymidine kinase promoter and the homologous c-fos promoter. Deletion analysis localized the response element to a 200-base pair fragment which contains the element GGTCACCACAGCC that resembles the consensus ERE sequence GGTCACAGTGACC originally identified in Xenopus vitellogenin A2 gene. A synthetic 36-base pair oligodeoxynucleotide containing this c-fos sequence conferred estrogen inducibility to the thymidine kinase promoter. The corresponding sequence also induced reporter activity when present in the c-fos gene fragment 3 kb from the thymidine kinase promoter. Gel-shift experiments demonstrated that synthetic oligonucleotides containing either the consensus ERE or the c-fos element bind human estrogen receptor obtained from a yeast expression system. However, the mobility of the shifted band is faster for the fos-ERE-complex than the consensus ERE complex suggesting that the three-dimensional structure of the protein-DNA complexes is different or that other factors are differentially involved in the two reactions. When the 5'-GGTCA sequence present in the c-fos ERE is mutated to 5'-TTTCA, transcriptional activation and receptor binding activities are both lost. Mutation of the CAGCC-3' element corresponding to the second half-site of the c-fos sequence also led to the loss of receptor binding activity, suggesting that both half-sites of this element are involved in this function. The estrogen induction mediated by either the c-fos or the consensus ERE was blunted by the antiestrogen tamoxifen. Based on these studies, we believe the 3'-fos ERE sequence we have identified may be a major cis-acting element involved in the physiological regulation of the gene by estrogens in vivo.
Sequences of Zika Virus Genomes from a Pediatric Cohort in Nicaragua.
Oldfield, Lauren M; Fedorova, Nadia; Puri, Vinita; Shrivastava, Susmita; Amedeo, Paolo; Durbin, Alan; Rocchi, Iara; Williams, Torrey; Shabman, Reed S; Tan, Gene S; Balmaseda, Angel; Kuan, Guillermina; Saborio, Saira; Gordon, Aubree; Harris, Eva; Pickett, Brett E
2018-06-14
We report here the whole-genome sequence of 11 Zika virus (ZIKV) samples from six pediatric patients in Nicaragua. Serum samples were collected, and ZIKV was isolated in tissue culture. Both serum and virus isolates were sequenced. The consensus ZIKV genomes are greater than 99% identical to each other. Copyright © 2018 Oldfield et al.
Yuryev, A.; Corden, J. L.
1996-01-01
The largest subunit of RNA polymerase II contains a repetitive C-terminal domain (CTD) consisting of tandem repeats of the consensus sequence Tyr(1)Ser(2)Pro(3)Thr(4) Ser(5)Pro(6) Ser(7). Substitution of nonphosphorylatable amino acids at positions two or five of the Saccharomyces cerevisiae CTD is lethal. We developed a selection ssytem for isolating suppressors of this lethal phenotype and cloned a gene, SCA1 (suppressor of CTD alanine), which complements recessive suppressors of lethal multiple-substitution mutations. A partial deletion of SCA1 (sca1Δ::hisG) suppresses alanine or glutamate substitutions at position two of the consensus CTD sequence, and a lethal CTD truncation mutation, but SCA1 deletion does not suppress alanine or glutamate substitutions at position five. SCA1 is identical to SRB9, a suppressor of a cold-sensitive CTD truncation mutation. Strains carrying dominant SRB mutations have the same suppression properties as a sca1Δ::hisG strain. These results reveal a functional difference between positions two and five of the consensus CTD heptapeptide repeat. The ability of SCA1 and SRB mutant alleles to suppress CTD truncation mutations suggest that substitutions at position two, but not at position five, cause a defect in RNA polymerase II function similar to that introduced by CTD truncation. PMID:8725217
2013-01-01
Background Cucumber is an important vegetable crop that is susceptible to many pathogens, but no disease resistance (R) genes have been cloned. The availability of whole genome sequences provides an excellent opportunity for systematic identification and characterization of the nucleotide binding and leucine-rich repeat (NB-LRR) type R gene homolog (RGH) sequences in the genome. Cucumber has a very narrow genetic base making it difficult to construct high-density genetic maps. Development of a consensus map by synthesizing information from multiple segregating populations is a method of choice to increase marker density. As such, the objectives of the present study were to identify and characterize NB-LRR type RGHs, and to develop a high-density, integrated cucumber genetic-physical map anchored with RGH loci. Results From the Gy14 draft genome, 70 NB-containing RGHs were identified and characterized. Most RGHs were in clusters with uneven distribution across seven chromosomes. In silico analysis indicated that all 70 RGHs had EST support for gene expression. Phylogenetic analysis classified 58 RGHs into two clades: CNL and TNL. Comparative analysis revealed high-degree sequence homology and synteny in chromosomal locations of these RGH members between the cucumber and melon genomes. Fifty-four molecular markers were developed to delimit 67 of the 70 RGHs, which were integrated into a genetic map through linkage analysis. A 1,681-locus cucumber consensus map including 10 gene loci and spanning 730.0 cM in seven linkage groups was developed by integrating three component maps with a bin-mapping strategy. Physically, 308 scaffolds with 193.2 Mbp total DNA sequences were anchored onto this consensus map that covered 52.6% of the 367 Mbp cucumber genome. Conclusions Cucumber contains relatively few NB-LRR RGHs that are clustered and unevenly distributed in the genome. All RGHs seem to be transcribed and shared significant sequence homology and synteny with the melon genome suggesting conservation of these RGHs in the Cucumis lineage. The 1,681-locus consensus genetic-physical map developed and the RGHs identified and characterized herein are valuable genomics resources that may have many applications such as quantitative trait loci identification, map-based gene cloning, association mapping, marker-assisted selection, as well as assembly of a more complete cucumber genome. PMID:23531125
Toward a Consensus in Ethics Education for the Doctor of Nursing Practice.
Laabs, Carolyn A
2015-01-01
The purpose of this study was to begin to develop a consensus as to the essential content and methods of ethics education for advanced practice nurses. An online Delphi technique was used to survey ethics experts to determine whether items were essential, desirable, or unnecessary to ethics education for students in doctor of nursing practice programs. Only the American Nurses Association Code of Ethics and ethics terminology were deemed essential foundational knowledge.
ConsPred: a rule-based (re-)annotation framework for prokaryotic genomes.
Weinmaier, Thomas; Platzer, Alexander; Frank, Jeroen; Hellinger, Hans-Jörg; Tischler, Patrick; Rattei, Thomas
2016-11-01
The rapidly growing number of available prokaryotic genome sequences requires fully automated and high-quality software solutions for their initial and re-annotation. Here we present ConsPred, a prokaryotic genome annotation framework that performs intrinsic gene predictions, homology searches, predictions of non-coding genes as well as CRISPR repeats and integrates all evidence into a consensus annotation. ConsPred achieves comprehensive, high-quality annotations based on rules and priorities, similar to decision-making in manual curation and avoids conflicting predictions. Parameters controlling the annotation process are configurable by the user. ConsPred has been used in the institutions of the authors for longer than 5 years and can easily be extended and adapted to specific needs. The ConsPred algorithm for producing a consensus from the varying scores of multiple gene prediction programs approaches manual curation in accuracy. Its rule-based approach for choosing final predictions avoids overriding previous manual curations. ConsPred is implemented in Java, Perl and Shell and is freely available under the Creative Commons license as a stand-alone in-house pipeline or as an Amazon Machine Image for cloud computing, see https://sourceforge.net/projects/conspred/. thomas.rattei@univie.ac.atSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Is a Genome a Codeword of an Error-Correcting Code?
Kleinschmidt, João H.; Silva-Filho, Márcio C.; Bim, Edson; Herai, Roberto H.; Yamagishi, Michel E. B.; Palazzo, Reginaldo
2012-01-01
Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction. PMID:22649495
Genetic dissection of the consensus sequence for the class 2 and class 3 flagellar promoters
Wozniak, Christopher E.; Hughes, Kelly T.
2008-01-01
Summary Computational searches for DNA binding sites often utilize consensus sequences. These search models make assumptions that the frequency of a base pair in an alignment relates to the base pair’s importance in binding and presume that base pairs contribute independently to the overall interaction with the DNA binding protein. These two assumptions have generally been found to be accurate for DNA binding sites. However, these assumptions are often not satisfied for promoters, which are involved in additional steps in transcription initiation after RNA polymerase has bound to the DNA. To test these assumptions for the flagellar regulatory hierarchy, class 2 and class 3 flagellar promoters were randomly mutagenized in Salmonella. Important positions were then saturated for mutagenesis and compared to scores calculated from the consensus sequence. Double mutants were constructed to determine how mutations combined for each promoter type. Mutations in the binding site for FlhD4C2, the activator of class 2 promoters, better satisfied the assumptions for the binding model than did mutations in the class 3 promoter, which is recognized by the σ28 transcription factor. These in vivo results indicate that the activator sites within flagellar promoters can be modeled using simple assumptions but that the DNA sequences recognized by the flagellar sigma factor require more complex models. PMID:18486950
Pasion, S G; Hines, J C; Ou, X; Mahmood, R; Ray, D S
1996-01-01
Gene expression in trypanosomatids appears to be regulated largely at the posttranscriptional level and involves maturation of mRNA precursors by trans splicing of a 39-nucleotide miniexon sequence to the 5' end of the mRNA and cleavage and polyadenylation at the 3' end of the mRNA. To initiate the identification of sequences involved in the periodic expression of DNA replication genes in trypanosomatids, we have mapped splice acceptor sites in the 5' flanking region of the TOP2 gene, which encodes the kinetoplast DNA topoisomerase, and have carried out deletion analysis of this region on a plasmid-encoded TOP2 gene. Block deletions within the 5' untranslated region (UTR) identified two regions (-608 to -388 and -387 to -186) responsible for periodic accumulation of the mRNA. Deletion of one or the other of these sequences had no effect on periodic expression of the mRNA, while deletion of both regions resulted in constitutive expression of the mRNA throughout the cell cycle. Subcloning of these sequences into the 5' UTR of a construct lacking both regions of the TOP2 5' UTR has shown that an octamer consensus sequence present in the 5' UTR of the TOP2, RPA1, and DHFR-TS mRNAs is required for normal cycling of the TOP2 mRNA. Mutation of the consensus octamer sequence in the TOP2 5' UTR in a plasmid construct containing only a single consensus octamer and that shows normal cycling of the plasmid-encoded TOP2 mRNA resulted in substantial reduction of the cycling of the mRNA level. These results imply a negative regulation of TOP2 mRNA during the cell cycle by a mechanism involving redundant elements containing one or more copies of a conserved octamer sequence within the 5' UTR of TOP2 mRNA. PMID:8943327
Pasion, S G; Hines, J C; Ou, X; Mahmood, R; Ray, D S
1996-12-01
Gene expression in trypanosomatids appears to be regulated largely at the posttranscriptional level and involves maturation of mRNA precursors by trans splicing of a 39-nucleotide miniexon sequence to the 5' end of the mRNA and cleavage and polyadenylation at the 3' end of the mRNA. To initiate the identification of sequences involved in the periodic expression of DNA replication genes in trypanosomatids, we have mapped splice acceptor sites in the 5' flanking region of the TOP2 gene, which encodes the kinetoplast DNA topoisomerase, and have carried out deletion analysis of this region on a plasmid-encoded TOP2 gene. Block deletions within the 5' untranslated region (UTR) identified two regions (-608 to -388 and -387 to -186) responsible for periodic accumulation of the mRNA. Deletion of one or the other of these sequences had no effect on periodic expression of the mRNA, while deletion of both regions resulted in constitutive expression of the mRNA throughout the cell cycle. Subcloning of these sequences into the 5' UTR of a construct lacking both regions of the TOP2 5' UTR has shown that an octamer consensus sequence present in the 5' UTR of the TOP2, RPA1, and DHFR-TS mRNAs is required for normal cycling of the TOP2 mRNA. Mutation of the consensus octamer sequence in the TOP2 5' UTR in a plasmid construct containing only a single consensus octamer and that shows normal cycling of the plasmid-encoded TOP2 mRNA resulted in substantial reduction of the cycling of the mRNA level. These results imply a negative regulation of TOP2 mRNA during the cell cycle by a mechanism involving redundant elements containing one or more copies of a conserved octamer sequence within the 5' UTR of TOP2 mRNA.
Logan, Grace; Freimanis, Graham L; King, David J; Valdazo-González, Begoña; Bachanek-Bankowska, Katarzyna; Sanderson, Nicholas D; Knowles, Nick J; King, Donald P; Cottam, Eleanor M
2014-09-30
Next-Generation Sequencing (NGS) is revolutionizing molecular epidemiology by providing new approaches to undertake whole genome sequencing (WGS) in diagnostic settings for a variety of human and veterinary pathogens. Previous sequencing protocols have been subject to biases such as those encountered during PCR amplification and cell culture, or are restricted by the need for large quantities of starting material. We describe here a simple and robust methodology for the generation of whole genome sequences on the Illumina MiSeq. This protocol is specific for foot-and-mouth disease virus (FMDV) or other polyadenylated RNA viruses and circumvents both the use of PCR and the requirement for large amounts of initial template. The protocol was successfully validated using five FMDV positive clinical samples from the 2001 epidemic in the United Kingdom, as well as a panel of representative viruses from all seven serotypes. In addition, this protocol was successfully used to recover 94% of an FMDV genome that had previously been identified as cell culture negative. Genome sequences from three other non-FMDV polyadenylated RNA viruses (EMCV, ERAV, VESV) were also obtained with minor protocol amendments. We calculated that a minimum coverage depth of 22 reads was required to produce an accurate consensus sequence for FMDV O. This was achieved in 5 FMDV/O/UKG isolates and the type O FMDV from the serotype panel with the exception of the 5' genomic termini and area immediately flanking the poly(C) region. We have developed a universal WGS method for FMDV and other polyadenylated RNA viruses. This method works successfully from a limited quantity of starting material and eliminates the requirement for genome-specific PCR amplification. This protocol has the potential to generate consensus-level sequences within a routine high-throughput diagnostic environment.
Pestoides F, an atypical Yersinia pestis strain from the former Soviet Union.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garcia, Emilio; Worsham, Patricia; Bearden, S.
2007-01-01
Unlike the classical Yersinia pestis strains, members of an atypical group of Y. pestis from Central Asia, denominated Y. pestis subspecies caucasica (also known as one of several pestoides types), are distinguished by a number of characteristics including their ability to ferment rhamnose and melibiose, their lack of the small plasmid encoding the plasminogen activator (pla) and pesticin, and their exceptionally large variants of the virulence plasmid pMT (encoding murine toxin and capsular antigen). We have obtained the entire genome sequence of Y. pestis Pestoides F, an isolate from the former Soviet Union that has enabled us to carryout amore » comprehensive genome-wide comparison of this organism's genomic content against the six published sequences of Y. pestis and their Y. pseudotuberculosis ancestor. Based on classical glycerol fermentation (+ve) and nitrate reduction (+ve) Y. pestis Pestoides F is an isolate that belongs to the biovar antiqua. This strain is unusual in other characteristics such as the fact that it carries a non-consensus V antigen (lcrV) sequence, and that unlike other Pla(-) strains, Pestoides F retains virulence by the parenteral and aerosol routes. The chromosome of Pestoides F is 4,517,345 bp in size comprising some 3,936 predicted coding sequences, while its pCD and pMT plasmids are 71,507 bp and 137,010 bp in size respectively. Comparison of chromosome-associated genes in Pestoides F with those in the other sequenced Y. pestis strains reveals differences ranging from strain-specific rearrangements, insertions, deletions, single nucleotide polymorphisms, and a unique distribution of insertion sequences. There is a single approximately 7 kb unique region in the chromosome not found in any of the completed Y. pestis strains sequenced to date, but which is present in the Y. pseudotuberculosis ancestor. Taken together, these findings are consistent with Pestoides F being derived from the most ancient lineage of Y. pestis yet sequenced.« less
Pestoides F, and Atypical Yersinia pestis Strain from the Former Soviet Union
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garcia, E; Worsham, P; Bearden, S
2007-01-05
Unlike the classical Yersinia pestis strains, members of an atypical group of Y. pestis from Central Asia, denominated Y. pestis subspecies caucasica (also known as one of several pestoides types), are distinguished by a number of characteristics including their ability to ferment rhamnose and melibiose, their lacking the small plasmid encoding the plasminogen activator (pla) and pesticin, and their exceptionally large variants of the virulence plasmid pMT (encoding murine toxin and capsular antigen). We have obtained the entire genome sequence of Y. pestis Pestoides F, an isolate from the former Soviet Union that has enabled us to carryout a comprehensivemore » genome-wide comparison of this organism's genomic content against the six published sequences of Y. pestis and their Y. pseudotuberculosis ancestor. Based on classical glycerol fermentation (+ve) and nitrate reduction (+ve) Y. pestis Pestoides F is an isolate that belongs to the biovar antiqua. This strain is unusual in other characteristics such as the fact that it carries a non-consensus V antigen (lcrV) sequence, and that unlike other Pla{sup -} strains, Pestoides F retains virulence by the parenteral and aerosol routes. The chromosome of Pestoides F is 4,517,345 bp in size comprising some 3,936 predicted coding sequences, while its pCD and pMT plasmids are 71,507 bp and 137,010 bp in size respectively. Comparison of chromosome-associated genes in Pestoides F with those in the other sequenced Y. pestis strains, reveals a series of differences ranging from strain-specific rearrangements, insertions, deletions, single nucleotide polymorphisms, and a unique distribution of insertion sequences. There is a single {approx}7 kb unique region in the chromosome not found in any of the completed Y. pestis strains sequenced to date, but which is present in the Y. pseudotuberculosis ancestor. Taken together, these findings are consistent with Pestoides F being derived from the most ancient lineage of Y. pestis yet sequenced.« less
Duda, Anja; Stange, Annett; Lüftenegger, Daniel; Stanke, Nicole; Westphal, Dana; Pietschmann, Thomas; Eastman, Scott W; Linial, Maxine L; Rethwilm, Axel; Lindemann, Dirk
2004-12-01
Analogous to cellular glycoproteins, viral envelope proteins contain N-terminal signal sequences responsible for targeting them to the secretory pathway. The prototype foamy virus (PFV) envelope (Env) shows a highly unusual biosynthesis. Its precursor protein has a type III membrane topology with both the N and C terminus located in the cytoplasm. Coexpression of FV glycoprotein and interaction of its leader peptide (LP) with the viral capsid is essential for viral particle budding and egress. Processing of PFV Env into the particle-associated LP, surface (SU), and transmembrane (TM) subunits occur posttranslationally during transport to the cell surface by yet-unidentified cellular proteases. Here we provide strong evidence that furin itself or a furin-like protease and not the signal peptidase complex is responsible for both processing events. N-terminal protein sequencing of the SU and TM subunits of purified PFV Env-immunoglobulin G immunoadhesin identified furin consensus sequences upstream of both cleavage sites. Mutagenesis analysis of two overlapping furin consensus sequences at the PFV LP/SU cleavage site in the wild-type protein confirmed the sequencing data and demonstrated utilization of only the first site. Fully processed SU was almost completely absent in viral particles of mutants having conserved arginine residues replaced by alanines in the first furin consensus sequence, but normal processing was observed upon mutation of the second motif. Although these mutants displayed a significant loss in infectivity as a result of reduced particle release, no correlation to processing inhibition was observed, since another mutant having normal LP/SU processing had a similar defect.
cWINNOWER algorithm for finding fuzzy dna motifs
NASA Technical Reports Server (NTRS)
Liang, S.; Samanta, M. P.; Biegel, B. A.
2004-01-01
The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4). Copyright Imperial College Press.
cWINNOWER Algorithm for Finding Fuzzy DNA Motifs
NASA Technical Reports Server (NTRS)
Liang, Shoudan
2003-01-01
The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4).
CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) PCR primer design
Rose, Timothy M.; Henikoff, Jorja G.; Henikoff, Steven
2003-01-01
We have developed a new primer design strategy for PCR amplification of distantly related gene sequences based on consensus-degenerate hybrid oligonucleotide primers (CODEHOPs). An interactive program has been written to design CODEHOP PCR primers from conserved blocks of amino acids within multiply-aligned protein sequences. Each CODEHOP consists of a pool of related primers containing all possible nucleotide sequences encoding 3–4 highly conserved amino acids within a 3′ degenerate core. A longer 5′ non-degenerate clamp region contains the most probable nucleotide predicted for each flanking codon. CODEHOPs are used in PCR amplification to isolate distantly related sequences encoding the conserved amino acid sequence. The primer design software and the CODEHOP PCR strategy have been utilized for the identification and characterization of new gene orthologs and paralogs in different plant, animal and bacterial species. In addition, this approach has been successful in identifying new pathogen species. The CODEHOP designer (http://blocks.fhcrc.org/codehop.html) is linked to BlockMaker and the Multiple Alignment Processor within the Blocks Database World Wide Web (http://blocks.fhcrc.org). PMID:12824413
NASA Astrophysics Data System (ADS)
Bhooplapur, Sharad; Akbulut, Mehmetkan; Quinlan, Franklyn; Delfyett, Peter J.
2010-04-01
A novel scheme for recognition of electronic bit-sequences is demonstrated. Two electronic bit-sequences that are to be compared are each mapped to a unique code from a set of Walsh-Hadamard codes. The codes are then encoded in parallel on the spectral phase of the frequency comb lines from a frequency-stabilized mode-locked semiconductor laser. Phase encoding is achieved by using two independent spatial light modulators based on liquid crystal arrays. Encoded pulses are compared using interferometric pulse detection and differential balanced photodetection. Orthogonal codes eight bits long are compared, and matched codes are successfully distinguished from mismatched codes with very low error rates, of around 10-18. This technique has potential for high-speed, high accuracy recognition of bit-sequences, with applications in keyword searches and internet protocol packet routing.
Two Perspectives on the Origin of the Standard Genetic Code
NASA Astrophysics Data System (ADS)
Sengupta, Supratim; Aggarwal, Neha; Bandhu, Ashutosh Vishwa
2014-12-01
The origin of a genetic code made it possible to create ordered sequences of amino acids. In this article we provide two perspectives on code origin by carrying out simulations of code-sequence coevolution in finite populations with the aim of examining how the standard genetic code may have evolved from more primitive code(s) encoding a small number of amino acids. We determine the efficacy of the physico-chemical hypothesis of code origin in the absence and presence of horizontal gene transfer (HGT) by allowing a diverse collection of code-sequence sets to compete with each other. We find that in the absence of horizontal gene transfer, natural selection between competing codes distinguished by differences in the degree of physico-chemical optimization is unable to explain the structure of the standard genetic code. However, for certain probabilities of the horizontal transfer events, a universal code emerges having a structure that is consistent with the standard genetic code.
An algebraic hypothesis about the primeval genetic code architecture.
Sánchez, Robersy; Grau, Ricardo
2009-09-01
A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.
Pressure system recertification at NASA-Langley Research Center
NASA Technical Reports Server (NTRS)
Hudson, C. M.; Ramsey, J. W., Jr.
1983-01-01
Langley Research Center pressure systems are being recertified to ensure safe operation of these systems. The procedures for recertifying these pressure systems are reviewed. Generally, the analysis and inspection requirements outlined in the appropriate national consensus codes are followed. In some instances where the requirements of these codes are not met. The systems are analyzed further, repaired, modified and/or tested to demonstrate their structural integrity.
Mapping and Sequencing the Human Genome
DOE R&D Accomplishments Database
1988-01-01
Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Q Zhai; M Landesman; H Robinson
2011-12-31
Retroviral Gag proteins contain short late-domain motifs that recruit cellular ESCRT pathway proteins to facilitate virus budding. ALIX-binding late domains often contain the core consensus sequence YPX{sub n}L (where X{sub n} can vary in sequence and length). However, some simian immunodeficiency virus (SIV) Gag proteins lack this consensus sequence, yet still bind ALIX. We mapped divergent, ALIX-binding late domains within the p6{sup Gag} proteins of SIV{sub MAC239} ({sub 40}SREK{und P}YKE{und VT}ED{und L}LHLNSLF{sub 59}) and SIV{sub agmTan-1} ({sub 24}AAG{und A}YDP{und AR}KL{und L}EQYAKK{sub 41}). Crystal structures revealed that anchoring tyrosines (in lightface) and nearby hydrophobic residues (underlined) contact the ALIX V domain,more » revealing how lentiviruses employ a diverse family of late-domain sequences to bind ALIX and promote virus budding.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhai, Q.; Robinson, H.; Landesman, M. B.
2011-01-01
Retroviral Gag proteins contain short late-domain motifs that recruit cellular ESCRT pathway proteins to facilitate virus budding. ALIX-binding late domains often contain the core consensus sequence YPX{sub n}L (where X{sub n} can vary in sequence and length). However, some simian immunodeficiency virus (SIV) Gag proteins lack this consensus sequence, yet still bind ALIX. We mapped divergent, ALIX-binding late domains within the p6{sup Gag} proteins of SIV{sub mac239} ({sub 40}SREK{und P}YKE{und VT}ED{und L}LHLNSLF{sub 59}) and SIV{sub agmTan-1} ({sub 24}AAG{und A}YDP{und AR}KL{und L}EQYAKK{sub 41}). Crystal structures revealed that anchoring tyrosines (in lightface) and nearby hydrophobic residues (underlined) contact the ALIX V domain,more » revealing how lentiviruses employ a diverse family of late-domain sequences to bind ALIX and promote virus budding.« less
Freimuth, P; Anderson, C W
1993-03-01
The sequence of a 1158-base pair fragment of the human adenovirus serotype 12 (Ad12) genome was determined. This segment encodes the precursors for virion components Mu and VI. Both Ad12 precursors contain two sequences that conform to a consensus sequence motif for cleavage by the endoproteinase of adenovirus 2 (Ad2). Analysis of the amino terminus of VI and of the peptide fragments found in Ad12 virions demonstrated that these sites are cleaved during Ad12 maturation. This observation suggests that the recognition motif for adenovirus endoproteinases is highly conserved among human serotypes. The adenovirus 2 endoproteinase polypeptide requires additional co-factors for activity (C. W. Anderson, Protein Expression Purif., 1993, 4, 8-15). Synthetic Ad12 or Ad2 pVI carboxy-terminal peptides each permitted efficient cleavage of an artificial endoproteinase substrate by recombinant Ad2 endoproteinase polypeptide.
NASA Astrophysics Data System (ADS)
Lestari, D.; Bustamam, A.; Novianti, T.; Ardaneswari, G.
2017-07-01
DNA sequence can be defined as a succession of letters, representing the order of nucleotides within DNA, using a permutation of four DNA base codes including adenine (A), guanine (G), cytosine (C), and thymine (T). The precise code of the sequences is determined using DNA sequencing methods and technologies, which have been developed since the 1970s and currently become highly developed, advanced and highly throughput sequencing technologies. So far, DNA sequencing has greatly accelerated biological and medical research and discovery. However, in some cases DNA sequencing could produce any ambiguous and not clear enough sequencing results that make them quite difficult to be determined whether these codes are A, T, G, or C. To solve these problems, in this study we can introduce other representation of DNA codes namely Quaternion Q = (PA, PT, PG, PC), where PA, PT, PG, PC are the probability of A, T, G, C bases that could appear in Q and PA + PT + PG + PC = 1. Furthermore, using Quaternion representations we are able to construct the improved scoring matrix for global sequence alignment processes, by applying a dot product method. Moreover, this scoring matrix produces better and higher quality of the match and mismatch score between two DNA base codes. In implementation, we applied the Needleman-Wunsch global sequence alignment algorithm using Octave, to analyze our target sequence which contains some ambiguous sequence data. The subject sequences are the DNA sequences of Streptococcus pneumoniae families obtained from the Genebank, meanwhile the target DNA sequence are received from our collaborator database. As the results we found the Quaternion representations improve the quality of the sequence alignment score and we can conclude that DNA sequence target has maximum similarity with Streptococcus pneumoniae.
USDA-ARS?s Scientific Manuscript database
The soybean Consensus Map 4.0 facilitated the anchoring of 95.6% of the soybean whole genome sequence developed by the Joint Genome Institute, Department of Energy but only properly oriented 66% of the sequence scaffolds. To find additional single nucleotide polymorphism (SNP) markers for additiona...
Berger, Cordula; Parson, Walther
2009-06-01
The degradation state of some biological traces recovered from the crime scene requires the amplification of very short fragments to attain a useful mitochondrial (mt)DNA sequence. We have previously introduced two mini-multiplex assays that amplify 10 overlapping control region (CR) fragments in two separate multiplex PCRs, which brought successful CR consensus sequences from even highly degraded DNA extracts. This procedure requires a total of 20 sequencing reactions per sample, which is laborious and cost intensive. For only moderately degraded samples that we encounter more frequently with typical mtDNA casework material, we developed two new multiplex assays that use a subset of the mini-amplicon primers but embrace larger fragments (midis) and require only 10 sequencing reactions to build a double-stranded CR consensus sequence. We used a preceding mtDNA quantitation step by real-time PCR with two different target fragments (143 and 283 bp) that roughly correspond to the average fragment sizes of the different multiplex approaches to estimate size-dependent mtDNA quantities and to aid the choice of the appropriate PCR multiplexes with respect to quality of the results and required costs.
Yang, V W; Marks, J A; Davis, B P; Jeffries, T W
1994-01-01
This paper describes the first high-efficiency transformation system for the xylose-fermenting yeast Pichia stipitis. The system includes integrating and autonomously replicating plasmids based on the gene for orotidine-5'-phosphate decarboxylase (URA3) and an autonomous replicating sequence (ARS) element (ARS2) isolated from P. stipitis CBS 6054. Ura- auxotrophs were obtained by selecting for resistance to 5-fluoroorotic acid and were identified as ura3 mutants by transformation with P. stipitis URA3. P. stipitis URA3 was cloned by its homology to Saccharomyces cerevisiae URA3, with which it is 69% identical in the coding region. P. stipitis ARS elements were cloned functionally through plasmid rescue. These sequences confer autonomous replication when cloned into vectors bearing the P. stipitis URA3 gene. P. stipitis ARS2 has features similar to those of the consensus ARS of S. cerevisiae and other ARS elements. Circular plasmids bearing the P. stipitis URA3 gene with various amounts of flanking sequences produced 600 to 8,600 Ura+ transformants per micrograms of DNA by electroporation. Most transformants obtained with circular vectors arose without integration of vector sequences. One vector yielded 5,200 to 12,500 Ura+ transformants per micrograms of DNA after it was linearized at various restriction enzyme sites within the P. stipitis URA3 insert. Transformants arising from linearized vectors produced stable integrants, and integration events were site specific for the genomic ura3 in 20% of the transformants examined. Plasmids bearing the P. stipitis URA3 gene and ARS2 element produced more than 30,000 transformants per micrograms of plasmid DNA. Autonomously replicating plasmids were stable for at least 50 generations in selection medium and were present at an average of 10 copies per nucleus. Images PMID:7811063
Sugimura; Sawabe; Ezura
2000-01-01
The alginate lyase-coding genes of Vibrio halioticoli IAM 14596(T), which was isolated from the gut of the abalone Haliotis discus hannai, were cloned using plasmid vector pUC 18, and expressed in Escherichia coli. Three alginate lyase-positive clones, pVHB, pVHC, and pVHE, were obtained, and all clones expressed the enzyme activity specific for polyguluronate. Three genes, alyVG1, alyVG2, and alyVG3, encoding polyguluronate lyase were sequenced: alyVG1 from pVHB was composed of a 1056-bp open reading frame (ORF) encoding 352 amino acid residues; alyVG2 gene from pVHC was composed of a 993-bp ORF encoding 331 amino acid residues; and alyVG3 gene from pVHE was composed of a 705-bp ORF encoding 235 amino acid residues. Comparison of nucleotide and deduced amino acid sequences among AlyVG1, AlyVG2, and AlyVG3 revealed low homologies. The identity value between AlyVG1 and AlyVG2 was 18.7%, and that between AlyVG2 and AlyVG3 was 17.0%. A higher identity value (26.0%) was observed between AlyVG1 and AlyVG3. Sequence comparison among known polyguluronate lyases including AlyVG1, AlyVG2, and AlyVG3 also did not reveal an identical region in these sequences. However, AlyVG1 showed the highest identity value (36.2%) and the highest similarity (73.3%) to AlyA from Klebsiella pneumoniae. A consensus region comprising nine amino acid (YFKAGXYXQ) in the carboxy-terminal region previously reported by Mallisard and colleagues was observed only in AlyVG1 and AlyVG2.
Persistence of opinion in the Sznajd consensus model: computer simulation
NASA Astrophysics Data System (ADS)
Stauffer, D.; de Oliveira, P. M. C.
2002-12-01
The density of never changed opinions during the Sznajd consensus-finding process decays with time t as 1/t^θ. We find θ simeq 3/8 for a chain, compatible with the exact Ising result of Derrida et al. In higher dimensions, however, the exponent differs from the Ising θ. With simultaneous updating of sublattices instead of the usual random sequential updating, the number of persistent opinions decays roughly exponentially. Some of the simulations used multi-spin coding.
Circular codes revisited: a statistical approach.
Gonzalez, D L; Giannerini, S; Rosa, R
2011-04-21
In 1996 Arquès and Michel [1996. A complementary circular code in the protein coding genes. J. Theor. Biol. 182, 45-58] discovered the existence of a common circular code in eukaryote and prokaryote genomes. Since then, circular code theory has provoked great interest and underwent a rapid development. In this paper we discuss some theoretical issues related to the synchronization properties of coding sequences and circular codes with particular emphasis on the problem of retrieval and maintenance of the reading frame. Motivated by the theoretical discussion, we adopt a rigorous statistical approach in order to try to answer different questions. First, we investigate the covering capability of the whole class of 216 self-complementary, C(3) maximal codes with respect to a large set of coding sequences. The results indicate that, on average, the code proposed by Arquès and Michel has the best covering capability but, still, there exists a great variability among sequences. Second, we focus on such code and explore the role played by the proportion of the bases by means of a hierarchy of permutation tests. The results show the existence of a sort of optimization mechanism such that coding sequences are tailored as to maximize or minimize the coverage of circular codes on specific reading frames. Such optimization clearly relates the function of circular codes with reading frame synchronization. Copyright © 2011 Elsevier Ltd. All rights reserved.
Turco, Gina; Schnable, James C.; Pedersen, Brent; Freeling, Michael
2013-01-01
Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize. PMID:23874343
Nanopore DNA Sequencing and Genome Assembly on the International Space Station.
Castro-Wallace, Sarah L; Chiu, Charles Y; John, Kristen K; Stahl, Sarah E; Rubins, Kathleen H; McIntyre, Alexa B R; Dworkin, Jason P; Lupisella, Mark L; Smith, David J; Botkin, Douglas J; Stephenson, Timothy A; Juul, Sissel; Turner, Daniel J; Izquierdo, Fernando; Federman, Scot; Stryke, Doug; Somasekar, Sneha; Alexander, Noah; Yu, Guixia; Mason, Christopher E; Burton, Aaron S
2017-12-21
We evaluated the performance of the MinION DNA sequencer in-flight on the International Space Station (ISS), and benchmarked its performance off-Earth against the MinION, Illumina MiSeq, and PacBio RS II sequencing platforms in terrestrial laboratories. Samples contained equimolar mixtures of genomic DNA from lambda bacteriophage, Escherichia coli (strain K12, MG1655) and Mus musculus (female BALB/c mouse). Nine sequencing runs were performed aboard the ISS over a 6-month period, yielding a total of 276,882 reads with no apparent decrease in performance over time. From sequence data collected aboard the ISS, we constructed directed assemblies of the ~4.6 Mb E. coli genome, ~48.5 kb lambda genome, and a representative M. musculus sequence (the ~16.3 kb mitochondrial genome), at 100%, 100%, and 96.7% consensus pairwise identity, respectively; de novo assembly of the E. coli genome from raw reads yielded a single contig comprising 99.9% of the genome at 98.6% consensus pairwise identity. Simulated real-time analyses of in-flight sequence data using an automated bioinformatic pipeline and laptop-based genomic assembly demonstrated the feasibility of sequencing analysis and microbial identification aboard the ISS. These findings illustrate the potential for sequencing applications including disease diagnosis, environmental monitoring, and elucidating the molecular basis for how organisms respond to spaceflight.
NASA Astrophysics Data System (ADS)
Sui, Xin; Yang, Yongqing; Xu, Xianyun; Zhang, Shuai; Zhang, Lingzhong
2018-02-01
This paper investigates the consensus of multi-agent systems with probabilistic time-varying delays and packet losses via sampled-data control. On the one hand, a Bernoulli-distributed white sequence is employed to model random packet losses among agents. On the other hand, a switched system is used to describe packet dropouts in a deterministic way. Based on the special property of the Laplacian matrix, the consensus problem can be converted into a stabilization problem of a switched system with lower dimensions. Some mean square consensus criteria are derived in terms of constructing an appropriate Lyapunov function and using linear matrix inequalities (LMIs). Finally, two numerical examples are given to show the effectiveness of the proposed method.
USDA-ARS?s Scientific Manuscript database
Background: Vertebrate immune systems generate diverse repertoires of antibodies capable of mediating response to a variety of antigens. Next generation sequencing methods provide unique approaches to a number of immuno-based research areas including antibody discovery and engineering, disease surve...
Xiong, H; Campelo, D; Pollack, R J; Raoult, D; Shao, R; Alem, M; Ali, J; Bilcha, K; Barker, S C
2014-08-01
The Illumina Hiseq platform was used to sequence the entire mitochondrial coding-regions of 20 body lice, Pediculus humanus Linnaeus, and head lice, P. capitis De Geer (Phthiraptera: Pediculidae), from eight towns and cities in five countries: Ethiopia, France, China, Australia and the U.S.A. These data (∼310 kb) were used to see how much more informative entire mitochondrial coding-region sequences were than partial mitochondrial coding-region sequences, and thus to guide the design of future studies of the phylogeny, origin, evolution and taxonomy of body lice and head lice. Phylogenies were compared from entire coding-region sequences (∼15.4 kb), entire cox1 (∼1.5 kb), partial cox1 (∼700 bp) and partial cytb (∼600 bp) sequences. On the one hand, phylogenies from entire mitochondrial coding-region sequences (∼15.4 kb) were much more informative than phylogenies from entire cox1 sequences (∼1.5 kb) and partial gene sequences (∼600 to ∼700 bp). For example, 19 branches had > 95% bootstrap support in our maximum likelihood tree from the entire mitochondrial coding-regions (∼15.4 kb) whereas the tree from 700 bp cox1 had only two branches with bootstrap support > 95%. Yet, by contrast, partial cytb (∼600 bp) and partial cox1 (∼486 bp) sequences were sufficient to genotype lice to Clade A, B or C. The sequences of the mitochondrial genomes of the P. humanus, P. capitis and P. schaeffi Fahrenholz studied are in NCBI GenBank under the accession numbers KC660761-800, KC685631-6330, KC241882-97, EU219988-95, HM241895-8 and JX080388-407. © 2014 The Royal Entomological Society.
Effective Identification of Similar Patients Through Sequential Matching over ICD Code Embedding.
Nguyen, Dang; Luo, Wei; Venkatesh, Svetha; Phung, Dinh
2018-04-11
Evidence-based medicine often involves the identification of patients with similar conditions, which are often captured in ICD (International Classification of Diseases (World Health Organization 2013)) code sequences. With no satisfying prior solutions for matching ICD-10 code sequences, this paper presents a method which effectively captures the clinical similarity among routine patients who have multiple comorbidities and complex care needs. Our method leverages the recent progress in representation learning of individual ICD-10 codes, and it explicitly uses the sequential order of codes for matching. Empirical evaluation on a state-wide cancer data collection shows that our proposed method achieves significantly higher matching performance compared with state-of-the-art methods ignoring the sequential order. Our method better identifies similar patients in a number of clinical outcomes including readmission and mortality outlook. Although this paper focuses on ICD-10 diagnosis code sequences, our method can be adapted to work with other codified sequence data.
Liang, Yunyun; Liu, Sanyang; Zhang, Shengli
2015-01-01
Prediction of protein structural classes for low-similarity sequences is useful for understanding fold patterns, regulation, functions, and interactions of proteins. It is well known that feature extraction is significant to prediction of protein structural class and it mainly uses protein primary sequence, predicted secondary structure sequence, and position-specific scoring matrix (PSSM). Currently, prediction solely based on the PSSM has played a key role in improving the prediction accuracy. In this paper, we propose a novel method called CSP-SegPseP-SegACP by fusing consensus sequence (CS), segmented PsePSSM, and segmented autocovariance transformation (ACT) based on PSSM. Three widely used low-similarity datasets (1189, 25PDB, and 640) are adopted in this paper. Then a 700-dimensional (700D) feature vector is constructed and the dimension is decreased to 224D by using principal component analysis (PCA). To verify the performance of our method, rigorous jackknife cross-validation tests are performed on 1189, 25PDB, and 640 datasets. Comparison of our results with the existing PSSM-based methods demonstrates that our method achieves the favorable and competitive performance. This will offer an important complementary to other PSSM-based methods for prediction of protein structural classes for low-similarity sequences.
Aires-de-Sousa, João; Aires-de-Sousa, Luisa
2003-01-01
We propose representing individual positions in DNA sequences by virtual potentials generated by other bases of the same sequence. This is a compact representation of the neighbourhood of a base. The distribution of the virtual potentials over the whole sequence can be used as a representation of the entire sequence (SEQREP code). It is a flexible code, with a length independent of the sequence size, does not require previous alignment, and is convenient for processing by neural networks or statistical techniques. To evaluate its biological significance, the SEQREP code was used for training Kohonen self-organizing maps (SOMs) in two applications: (a) detection of Alu sequences, and (b) classification of sequences encoding for HIV-1 envelope glycoprotein (env) into subtypes A-G. It was demonstrated that SOMs clustered sequences belonging to different classes into distinct regions. For independent test sets, very high rates of correct predictions were obtained (97% in the first application, 91% in the second). Possible areas of application of SEQREP codes include functional genomics, phylogenetic analysis, detection of repetitions, database retrieval, and automatic alignment. Software for representing sequences by SEQREP code, and for training Kohonen SOMs is made freely available from http://www.dq.fct.unl.pt/qoa/jas/seqrep. Supplementary material is available at http://www.dq.fct.unl.pt/qoa/jas/seqrep/bioinf2002
Zhou, Carol L Ecale
2015-01-01
In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
Complete Coding Genome Sequence for Mogiana Tick Virus, a Jingmenvirus Isolated from Ticks in Brazil
2017-05-04
and capable of infecting a wide range of animal hosts (1–5). Here, we report the complete coding genome sequence (i.e., only missing portions of...segmented nature of the genome was not under- stood. Therefore, only the two genome segments with detectable sequence homolo- gies to flaviviruses were...originally reported (2). We revisited the data set of Maruyama et al. (2) and assembled the complete coding sequences for all four genome segments. We
78 FR 49412 - Personal Flotation Devices Labeling and Standards
Federal Register 2010, 2011, 2012, 2013, 2014
2013-08-14
...The Coast Guard proposes to remove references to type codes in its regulations on the carriage and labeling of Coast Guard-approved personal flotation devices (PFDs). PFD type codes are unique to Coast Guard approval and are not well understood by the public. Removing these type codes from our regulations would facilitate future incorporation by reference of new industry consensus standards for PFD labeling that will more effectively convey safety information, and is a step toward harmonization of our regulations with PFD requirements in Canada and in other countries.
Quantized phase coding and connected region labeling for absolute phase retrieval.
Chen, Xiangcheng; Wang, Yuwei; Wang, Yajun; Ma, Mengchao; Zeng, Chunnian
2016-12-12
This paper proposes an absolute phase retrieval method for complex object measurement based on quantized phase-coding and connected region labeling. A specific code sequence is embedded into quantized phase of three coded fringes. Connected regions of different codes are labeled and assigned with 3-digit-codes combining the current period and its neighbors. Wrapped phase, more than 36 periods, can be restored with reference to the code sequence. Experimental results verify the capability of the proposed method to measure multiple isolated objects.
Functional interrogation of non-coding DNA through CRISPR genome editing
Canver, Matthew C.; Bauer, Daniel E.; Orkin, Stuart H.
2017-01-01
Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. PMID:28288828
Bobrova, E V; Bogacheva, I N; Lyakhovetskii, V A; Fabinskaja, A A; Fomina, E V
2017-01-01
In order to test the hypothesis of hemisphere specialization for different types of information coding (the right hemisphere, for positional coding; the left one, for vector coding), we analyzed the errors of right and left-handers during a task involving the memorization of sequences of movements by the left or the right hand, which activates vector coding by changing the order of movements in memorized sequences. The task was first performed by the right or the left hand, then by the opposite hand. It was found that both'right- and left-handers use the information about the previous movements of the dominant hand, but not of the non-dom" inant one. After changing the hand, right-handers use the information about previous movements of the second hand, while left-handers do not. We compared our results with the data of previous experiments, in which positional coding was activated, and concluded that both right- and left-handers use vector coding for memorizing the sequences of their dominant hands and positional coding for memorizing the sequences of non-dominant hand. No similar patterns of errors were found between right- and left-handers after changing the hand, which suggests that in right- and left-handersthe skills are transferred in different ways depending on the type of coding.
Kim, Seongman; Dai, Gan; O’Callaghan, Dennis J.; Kim, Seong Kee
2012-01-01
The immediate-early protein (IEP), the major regulatory protein encoded by the IE gene of equine herpesvirus 1 (EHV-1), plays a crucial role as both transcription activator and repressor during a productive lytic infection. To investigate the mechanism by which the EHV-1 IEP inhibits its own promoter, IE promoter-luciferase reporter plasmids containing wild-type and mutant IEP-binding site (IEBS) were constructed and used for luciferase reporter assays. The IEP inhibited transcription from its own promoter in the presence of a consensus IEBS (5’-ATCGT-3’) located near the transcription initiation site but did not inhibit when the consensus sequence was deleted. To determine whether the distance between the TATA box and the IEBS affects transcriptional repression, the IEBS was displaced from the original site by the insertion of synthetic DNA sequences. Luciferase reporter assays revealed that the IEP is able to repress its own promoter when the IEBS is located within 26-bp from the TATA box. We also found that the proper orientation and position of the IEBS were required for the repression by the IEP. Interestingly, the level of repression was significantly reduced when a consensus TATA sequence was deleted from the promoter region, indicating that the IEP efficiently inhibits its own promoter in a TATA box-dependent manner. Taken together, these results suggest that the EHV-1 IEP delicately modulates autoregulation of its gene through the consensus IEBS that is near the transcription initiation site and the TATA box. PMID:22265772
Kim, Seongman; Dai, Gan; O'Callaghan, Dennis J; Kim, Seong Kee
2012-04-01
The immediate-early protein (IEP), the major regulatory protein encoded by the IE gene of equine herpesvirus 1 (EHV-1), plays a crucial role as both transcription activator and repressor during a productive lytic infection. To investigate the mechanism by which the EHV-1 IEP inhibits its own promoter, IE promoter-luciferase reporter plasmids containing wild-type and mutant IEP-binding site (IEBS) were constructed and used for luciferase reporter assays. The IEP inhibited transcription from its own promoter in the presence of a consensus IEBS (5'-ATCGT-3') located near the transcription initiation site but did not inhibit when the consensus sequence was deleted. To determine whether the distance between the TATA box and the IEBS affects transcriptional repression, the IEBS was displaced from the original site by the insertion of synthetic DNA sequences. Luciferase reporter assays revealed that the IEP is able to repress its own promoter when the IEBS is located within 26-bp from the TATA box. We also found that the proper orientation and position of the IEBS were required for the repression by the IEP. Interestingly, the level of repression was significantly reduced when a consensus TATA sequence was deleted from the promoter region, indicating that the IEP efficiently inhibits its own promoter in a TATA box-dependent manner. Taken together, these results suggest that the EHV-1 IEP delicately modulates autoregulation of its gene through the consensus IEBS that is near the transcription initiation site and the TATA box. Copyright © 2012. Published by Elsevier B.V.
Kortenhoeven, Cornell; Joubert, Fourie; Bastos, Armanda D S; Abolnik, Celia
2015-02-22
Extensive focus is placed on the comparative analyses of consensus genotypes in the study of West Nile virus (WNV) emergence. Few studies account for genetic change in the underlying WNV quasispecies population variants. These variants are not discernable in the consensus genome at the time of emergence, and the maintenance of mutation-selection equilibria of population variants is greatly underestimated. The emergence of lineage 1 WNV strains has been studied extensively, but recent epidemics caused by lineage 2 WNV strains in Hungary, Austria, Greece and Italy emphasizes the increasing importance of this lineage to public health. In this study we explored the quasispecies dynamics of minority variants that contribute to cell-tropism and host determination, i.e. the ability to infect different cell types or cells from different species from Next Generation Sequencing (NGS) data of a historic lineage 2 WNV strain. Minority variants contributing to host cell membrane association persist in the viral population without contributing to the genetic change in the consensus genome. Minority variants are shown to maintain a stable mutation-selection equilibrium under positive selection, particularly in the capsid gene region. This study is the first to infer positive selection and the persistence of WNV haplotype variants that contribute to viral fitness without accompanying genetic change in the consensus genotype, documented solely from NGS sequence data. The approach used in this study streamlines the experimental design seeking viral minority variants accurately from NGS data whilst minimizing the influence of associated sequence error.
Schmitz, Matthew; Forst, Linda
2016-02-15
Inclusion of information about a patient's work, industry, and occupation, in the electronic health record (EHR) could facilitate occupational health surveillance, better health outcomes, prevention activities, and identification of workers' compensation cases. The US National Institute for Occupational Safety and Health (NIOSH) has developed an autocoding system for "industry" and "occupation" based on 1990 Bureau of Census codes; its effectiveness requires evaluation in conjunction with promoting the mandatory addition of these variables to the EHR. The objective of the study was to evaluate the intercoder reliability of NIOSH's Industry and Occupation Computerized Coding System (NIOCCS) when applied to data collected in a community survey conducted under the Affordable Care Act; to determine the proportion of records that are autocoded using NIOCCS. Standard Occupational Classification (SOC) codes are used by several federal agencies in databases that capture demographic, employment, and health information to harmonize variables related to work activities among these data sources. There are 359 industry and occupation responses that were hand coded by 2 investigators, who came to a consensus on every code. The same variables were autocoded using NIOCCS at the high and moderate criteria level. Kappa was .84 for agreement between hand coders and between the hand coder consensus code versus NIOCCS high confidence level codes for the first 2 digits of the SOC code. For 4 digits, NIOCCS coding versus investigator coding ranged from kappa=.56 to .70. In this study, NIOCCS was able to achieve production rates (ie, to autocode) 31%-36% of entered variables at the "high confidence" level and 49%-58% at the "medium confidence" level. Autocoding (production) rates are somewhat lower than those reported by NIOSH. Agreement between manually coded and autocoded data are "substantial" at the 2-digit level, but only "fair" to "good" at the 4-digit level. This work serves as a baseline for performance of NIOCCS by investigators in the field. Further field testing will clarify NIOCCS effectiveness in terms of ability to assign codes and coding accuracy and will clarify its value as inclusion of these occupational variables in the EHR is promoted.
Sechi, Leonardo A.; Zanetti, Stefania; Dupré, Ilaria; Delogu, Giovanni; Fadda, Giovanni
1998-01-01
The presence of enterobacterial repetitive intergenic consensus (ERIC) sequences was demonstrated for the first time in the genome of Mycobacterium tuberculosis; these sequences have been found in transcribed regions of the chromosomes of gram-negative bacteria. In this study genetic diversity among clinical isolates of M. tuberculosis was determined by PCR with ERIC primers (ERIC-PCR). The study isolates comprised 71 clinical isolates collected from Sardinia, Italy. ERIC-PCR was able to identify 59 distinct profiles. The results obtained were compared with IS6110 and PCR-GTG fingerprinting. We found that the level of differentiation obtained by ERIC-PCR is greater than that obtained by IS6110 fingerprinting and comparable to that obtained by PCR-GTG. This method of fingerprinting is rapid and sensitive and can be applied to the study of the epidemiology of M. tuberculosis infections, especially when IS6110 fingerprinting is not of any help. PMID:9431935
Castrignanò, Tiziana; Canali, Alessandro; Grillo, Giorgio; Liuni, Sabino; Mignone, Flavio; Pesole, Graziano
2004-01-01
The identification and characterization of genome tracts that are highly conserved across species during evolution may contribute significantly to the functional annotation of whole-genome sequences. Indeed, such sequences are likely to correspond to known or unknown coding exons or regulatory motifs. Here, we present a web server implementing a previously developed algorithm that, by comparing user-submitted genome sequences, is able to identify statistically significant conserved blocks and assess their coding or noncoding nature through the measure of a coding potential score. The web tool, available at http://www.caspur.it/CSTminer/, is dynamically interconnected with the Ensembl genome resources and produces a graphical output showing a map of detected conserved sequences and annotated gene features. PMID:15215464
Ho, Cynthia K. Y.; Raghwani, Jayna; Koekkoek, Sylvie; Liang, Richard H.; Van der Meer, Jan T. M.; Van Der Valk, Marc; De Jong, Menno; Pybus, Oliver G.
2016-01-01
ABSTRACT In contrast to other available next-generation sequencing platforms, PacBio single-molecule, real-time (SMRT) sequencing has the advantage of generating long reads albeit with a relatively higher error rate in unprocessed data. Using this platform, we longitudinally sampled and sequenced the hepatitis C virus (HCV) envelope genome region (1,680 nucleotides [nt]) from individuals belonging to a cluster of sexually transmitted cases. All five subjects were coinfected with HIV-1 and a closely related strain of HCV genotype 4d. In total, 50 samples were analyzed by using SMRT sequencing. By using 7 passes of circular consensus sequencing, the error rate was reduced to 0.37%, and the median number of sequences was 612 per sample. A further reduction of insertions was achieved by alignment against a sample-specific reference sequence. However, in vitro recombination during PCR amplification could not be excluded. Phylogenetic analysis supported close relationships among HCV sequences from the four male subjects and subsequent transmission from one subject to his female partner. Transmission was characterized by a strong genetic bottleneck. Viral genetic diversity was low during acute infection and increased upon progression to chronicity but subsequently fluctuated during chronic infection, caused by the alternate detection of distinct coexisting lineages. SMRT sequencing combines long reads with sufficient depth for many phylogenetic analyses and can therefore provide insights into within-host HCV evolutionary dynamics without the need for haplotype reconstruction using statistical algorithms. IMPORTANCE Next-generation sequencing has revolutionized the study of genetically variable RNA virus populations, but for phylogenetic and evolutionary analyses, longer sequences than those generated by most available platforms, while minimizing the intrinsic error rate, are desired. Here, we demonstrate for the first time that PacBio SMRT sequencing technology can be used to generate full-length HCV envelope sequences at the single-molecule level, providing a data set with large sequencing depth for the characterization of intrahost viral dynamics. The selection of consensus reads derived from at least 7 full circular consensus sequencing rounds significantly reduced the intrinsic high error rate of this method. We used this method to genetically characterize a unique transmission cluster of sexually transmitted HCV infections, providing insight into the distinct evolutionary pathways in each patient over time and identifying the transmission-associated genetic bottleneck as well as fluctuations in viral genetic diversity over time, accompanied by dynamic shifts in viral subpopulations. PMID:28077634
USDA-ARS?s Scientific Manuscript database
Lipase gene (lip) of a biodegradable polyhydroxyalkanoate- (PHA-) synthesizing bacterium P. resinovorans NRRL B-2649 was cloned, sequenced and characterized by using consensus primers and PCR-based genome walking method. The ORF of the putative Lip (314 amino acids) and its active site (Ser111, Asp...
A FRET Biosensor for ROCK Based on a Consensus Substrate Sequence Identified by KISS Technology.
Li, Chunjie; Imanishi, Ayako; Komatsu, Naoki; Terai, Kenta; Amano, Mutsuki; Kaibuchi, Kozo; Matsuda, Michiyuki
2017-01-11
Genetically-encoded biosensors based on Förster/fluorescence resonance energy transfer (FRET) are versatile tools for studying the spatio-temporal regulation of signaling molecules within not only the cells but also tissues. Perhaps the hardest task in the development of a FRET biosensor for protein kinases is to identify the kinase-specific substrate peptide to be used in the FRET biosensor. To solve this problem, we took advantage of kinase-interacting substrate screening (KISS) technology, which deduces a consensus substrate sequence for the protein kinase of interest. Here, we show that a consensus substrate sequence for ROCK identified by KISS yielded a FRET biosensor for ROCK, named Eevee-ROCK, with high sensitivity and specificity. By treating HeLa cells with inhibitors or siRNAs against ROCK, we show that a substantial part of the basal FRET signal of Eevee-ROCK was derived from the activities of ROCK1 and ROCK2. Eevee-ROCK readily detected ROCK activation by epidermal growth factor, lysophosphatidic acid, and serum. When cells stably-expressing Eevee-ROCK were time-lapse imaged for three days, ROCK activity was found to increase after the completion of cytokinesis, concomitant with the spreading of cells. Eevee-ROCK also revealed a gradual increase in ROCK activity during apoptosis. Thus, Eevee-ROCK, which was developed from a substrate sequence predicted by the KISS technology, will pave the way to a better understanding of the function of ROCK in a physiological context.
Li, Jie; Overall, Christopher C.; Johnson, Rudd C.; ...
2015-09-21
The alternative sigma factor σ E functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly. The transcriptional regulatory network governed by σ E in Salmonella and E. coli has been examined using microarray, however a genome-wide analysis of σ E–binding sites inSalmonella has not yet been reported. We infected macrophages with Salmonella Typhimurium over a select time course. Using chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq), 31 σ E–binding sites were identified. Seventeen sites were new, which included outer membrane proteins, a quorum-sensing protein, a cellmore » division factor, and a signal transduction modulator. The consensus sequence identified for σ E in vivo binding was similar to the one previously reported, except for a conserved G and A between the -35 and -10 regions. One third of the σ E–binding sites did not contain the consensus sequence, suggesting there may be alternative mechanisms by which σ E modulates transcription. By dissecting direct and indirect modes of σ E-mediated regulation, we found that σ E activates gene expression through recognition of both canonical and reversed consensus sequence. Lastly, new σ E regulated genes ( greA, luxS, ompA and ompX) are shown to be involved in heat shock and oxidative stress responses.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Jie; Overall, Christopher C.; Johnson, Rudd C.
The alternative sigma factor σ E functions to maintain bacterial homeostasis and membrane integrity in response to extracytoplasmic stress by regulating thousands of genes both directly and indirectly. The transcriptional regulatory network governed by σ E in Salmonella and E. coli has been examined using microarray, however a genome-wide analysis of σ E–binding sites inSalmonella has not yet been reported. We infected macrophages with Salmonella Typhimurium over a select time course. Using chromatin immunoprecipitation followed by high-throughput DNA sequencing (ChIP-seq), 31 σ E–binding sites were identified. Seventeen sites were new, which included outer membrane proteins, a quorum-sensing protein, a cellmore » division factor, and a signal transduction modulator. The consensus sequence identified for σ E in vivo binding was similar to the one previously reported, except for a conserved G and A between the -35 and -10 regions. One third of the σ E–binding sites did not contain the consensus sequence, suggesting there may be alternative mechanisms by which σ E modulates transcription. By dissecting direct and indirect modes of σ E-mediated regulation, we found that σ E activates gene expression through recognition of both canonical and reversed consensus sequence. Lastly, new σ E regulated genes ( greA, luxS, ompA and ompX) are shown to be involved in heat shock and oxidative stress responses.« less
Primer development to obtain complete coding sequence of HA and NA genes of influenza A/H3N2 virus.
Agustiningsih, Agustiningsih; Trimarsanto, Hidayat; Setiawaty, Vivi; Artika, I Made; Muljono, David Handojo
2016-08-30
Influenza is an acute respiratory illness and has become a serious public health problem worldwide. The need to study the HA and NA genes in influenza A virus is essential since these genes frequently undergo mutations. This study describes the development of primer sets for RT-PCR to obtain complete coding sequence of Hemagglutinin (HA) and Neuraminidase (NA) genes of influenza A/H3N2 virus from Indonesia. The primers were developed based on influenza A/H3N2 sequence worldwide from Global Initiative on Sharing All Influenza Data (GISAID) and further tested using Indonesian influenza A/H3N2 archived samples of influenza-like illness (ILI) surveillance from 2008 to 2009. An optimum RT-PCR condition was acquired for all HA and NA fragments designed to cover complete coding sequence of HA and NA genes. A total of 71 samples were successfully sequenced for complete coding sequence both of HA and NA genes out of 145 samples of influenza A/H3N2 tested. The developed primer sets were suitable for obtaining complete coding sequences of HA and NA genes of Indonesian samples from 2008 to 2009.
Kantarski, Traci; Larson, Steve; Zhang, Xiaofei; DeHaan, Lee; Borevitz, Justin; Anderson, James; Poland, Jesse
2017-01-01
Development of the first consensus genetic map of intermediate wheatgrass gives insight into the genome and tools for molecular breeding. Intermediate wheatgrass (Thinopyrum intermedium) has been identified as a candidate for domestication and improvement as a perennial grain, forage, and biofuel crop and is actively being improved by several breeding programs. To accelerate this process using genomics-assisted breeding, efficient genotyping methods and genetic marker reference maps are needed. We present here the first consensus genetic map for intermediate wheatgrass (IWG), which confirms the species' allohexaploid nature (2n = 6x = 42) and homology to Triticeae genomes. Genotyping-by-sequencing was used to identify markers that fit expected segregation ratios and construct genetic maps for 13 heterogeneous parents of seven full-sib families. These maps were then integrated using a linear programming method to produce a consensus map with 21 linkage groups containing 10,029 markers, 3601 of which were present in at least two populations. Each of the 21 linkage groups contained between 237 and 683 markers, cumulatively covering 5061 cM (2891 cM--Kosambi) with an average distance of 0.5 cM between each pair of markers. Through mapping the sequence tags to the diploid (2n = 2x = 14) barley reference genome, we observed high colinearity and synteny between these genomes, with three homoeologous IWG chromosomes corresponding to each of the seven barley chromosomes, and mapped translocations that are known in the Triticeae. The consensus map is a valuable tool for wheat breeders to map important disease-resistance genes within intermediate wheatgrass. These genomic tools can help lead to rapid improvement of IWG and development of high-yielding cultivars of this perennial grain that would facilitate the sustainable intensification of agricultural systems.
GATA: A graphic alignment tool for comparative sequenceanalysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nix, David A.; Eisen, Michael B.
2005-01-01
Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
Phylogenetic Network for European mtDNA
Finnilä, Saara; Lehtonen, Mervi S.; Majamaa, Kari
2001-01-01
The sequence in the first hypervariable segment (HVS-I) of the control region has been used as a source of evolutionary information in most phylogenetic analyses of mtDNA. Population genetic inference would benefit from a better understanding of the variation in the mtDNA coding region, but, thus far, complete mtDNA sequences have been rare. We determined the nucleotide sequence in the coding region of mtDNA from 121 Finns, by conformation-sensitive gel electrophoresis and subsequent sequencing and by direct sequencing of the D loop. Furthermore, 71 sequences from our previous reports were included, so that the samples represented all the mtDNA haplogroups present in the Finnish population. We found a total of 297 variable sites in the coding region, which allowed the compilation of unambiguous phylogenetic networks. The D loop harbored 104 variable sites, and, in most cases, these could be localized within the coding-region networks, without discrepancies. Interestingly, many homoplasies were detected in the coding region. Nucleotide variation in the rRNA and tRNA genes was 6%, and that in the third nucleotide positions of structural genes amounted to 22% of that in the HVS-I. The complete networks enabled the relationships between the mtDNA haplogroups to be analyzed. Phylogenetic networks based on the entire coding-region sequence in mtDNA provide a rich source for further population genetic studies, and complete sequences make it easier to differentiate between disease-causing mutations and rare polymorphisms. PMID:11349229
Ishikawa, Sohta A; Inagaki, Yuji; Hashimoto, Tetsuo
2012-01-01
In phylogenetic analyses of nucleotide sequences, 'homogeneous' substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, 'RY-coding' and 'non-homogeneous' models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity.
Palindromic repetitive DNA elements with coding potential in Methanocaldococcus jannaschii.
Suyama, Mikita; Lathe, Warren C; Bork, Peer
2005-10-10
We have identified 141 novel palindromic repetitive elements in the genome of euryarchaeon Methanocaldococcus jannaschii. The total length of these elements is 14.3kb, which corresponds to 0.9% of the total genomic sequence and 6.3% of all extragenic regions. The elements can be divided into three groups (MJRE1-3) based on the sequence similarity. The low sequence identity within each of the groups suggests rather old origin of these elements in M. jannaschii. Three MJRE2 elements were located within the protein coding regions without disrupting the coding potential of the host genes, indicating that insertion of repeats might be a widespread mechanism to enhance sequence diversity in coding regions.
A conserved mechanism for replication origin recognition and binding in archaea.
Majerník, Alan I; Chong, James P J
2008-01-15
To date, methanogens are the only group within the archaea where firing DNA replication origins have not been demonstrated in vivo. In the present study we show that a previously identified cluster of ORB (origin recognition box) sequences do indeed function as an origin of replication in vivo in the archaeon Methanothermobacter thermautotrophicus. Although the consensus sequence of ORBs in M. thermautotrophicus is somewhat conserved when compared with ORB sequences in other archaea, the Cdc6-1 protein from M. thermautotrophicus (termed MthCdc6-1) displays sequence-specific binding that is selective for the MthORB sequence and does not recognize ORBs from other archaeal species. Stabilization of in vitro MthORB DNA binding by MthCdc6-1 requires additional conserved sequences 3' to those originally described for M. thermautotrophicus. By testing synthetic sequences bearing mutations in the MthORB consensus sequence, we show that Cdc6/ORB binding is critically dependent on the presence of an invariant guanine found in all archaeal ORB sequences. Mutation of a universally conserved arginine residue in the recognition helix of the winged helix domain of archaeal Cdc6-1 shows that specific origin sequence recognition is dependent on the interaction of this arginine residue with the invariant guanine. Recognition of a mutated origin sequence can be achieved by mutation of the conserved arginine residue to a lysine or glutamine residue. Thus despite a number of differences in protein and DNA sequences between species, the mechanism of origin recognition and binding appears to be conserved throughout the archaea.
Tins, B; Cassar-Pullicino, V; Haddaway, M; Nachtrab, U
2012-01-01
Objectives The bulk of spinal imaging is still performed with conventional two-dimensional sequences. This study assesses the suitability of three-dimensional sampling perfection with application-optimised contrasts using a different flip angle evolutions (SPACE) sequence for routine spinal imaging. Methods 62 MRI examinations of the spine were evaluated by 2 examiners in consensus for the depiction of anatomy and presence of artefact. We noted pathologies that might be missed using the SPACE sequence only or the SPACE and a sagittal T1 weighted sequence. The reference standards were sagittal and axial T1 weighted and T2 weighted sequences. At a later date the evaluation was repeated by one of the original examiners and an additional examiner. Results There was good agreement of the single evaluations and consensus evaluation for the conventional sequences: κ>0.8, confidence interval (CI)>0.6–1.0. For the SPACE sequence, depiction of anatomy was very good for 84% of cases, with high interobserver agreement, but there was poor interobserver agreement for other cases. For artefact assessment of SPACE, κ=0.92, CI=0.92–1.0. The SPACE sequence was superior to conventional sequences for depiction of anatomy and artefact resistance. The SPACE sequence occasionally missed bone marrow oedema. In conjunction with sagittal T1 weighted sequences, no abnormality was missed. The isotropic SPACE sequence was superior to conventional sequences in imaging difficult anatomy such as in scoliosis and spondylolysis. Conclusion The SPACE sequence allows excellent assessment of anatomy owing to high spatial resolution and resistance to artefact. The sensitivity for bone marrow abnormalities is limited. PMID:22374284
Tins, B; Cassar-Pullicino, V; Haddaway, M; Nachtrab, U
2012-08-01
The bulk of spinal imaging is still performed with conventional two-dimensional sequences. This study assesses the suitability of three-dimensional sampling perfection with application-optimised contrasts using a different flip angle evolutions (SPACE) sequence for routine spinal imaging. 62 MRI examinations of the spine were evaluated by 2 examiners in consensus for the depiction of anatomy and presence of artefact. We noted pathologies that might be missed using the SPACE sequence only or the SPACE and a sagittal T(1) weighted sequence. The reference standards were sagittal and axial T(1) weighted and T(2) weighted sequences. At a later date the evaluation was repeated by one of the original examiners and an additional examiner. There was good agreement of the single evaluations and consensus evaluation for the conventional sequences: κ>0.8, confidence interval (CI)>0.6-1.0. For the SPACE sequence, depiction of anatomy was very good for 84% of cases, with high interobserver agreement, but there was poor interobserver agreement for other cases. For artefact assessment of SPACE, κ=0.92, CI=0.92-1.0. The SPACE sequence was superior to conventional sequences for depiction of anatomy and artefact resistance. The SPACE sequence occasionally missed bone marrow oedema. In conjunction with sagittal T(1) weighted sequences, no abnormality was missed. The isotropic SPACE sequence was superior to conventional sequences in imaging difficult anatomy such as in scoliosis and spondylolysis. The SPACE sequence allows excellent assessment of anatomy owing to high spatial resolution and resistance to artefact. The sensitivity for bone marrow abnormalities is limited.
Functional interrogation of non-coding DNA through CRISPR genome editing.
Canver, Matthew C; Bauer, Daniel E; Orkin, Stuart H
2017-05-15
Methodologies to interrogate non-coding regions have lagged behind coding regions despite comprising the vast majority of the genome. However, the rapid evolution of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing has provided a multitude of novel techniques for laboratory investigation including significant contributions to the toolbox for studying non-coding DNA. CRISPR-mediated loss-of-function strategies rely on direct disruption of the underlying sequence or repression of transcription without modifying the targeted DNA sequence. CRISPR-mediated gain-of-function approaches similarly benefit from methods to alter the targeted sequence through integration of customized sequence into the genome as well as methods to activate transcription. Here we review CRISPR-based loss- and gain-of-function techniques for the interrogation of non-coding DNA. Copyright © 2017 Elsevier Inc. All rights reserved.
Vellas, B; Fielding, R A; Bens, C; Bernabei, R; Cawthon, P M; Cederholm, T; Cruz-Jentoft, A J; Del Signore, S; Donahue, S; Morley, J; Pahor, M; Reginster, J-Y; Rodriguez Mañas, L; Rolland, Y; Roubenoff, R; Sinclair, A; Cesari, M
2018-01-01
Establishment of an ICD-10-CM code for sarcopenia in 2016 was an important step towards reaching international consensus on the need for a nosological framework of age-related skeletal muscle decline. The International Conference on Frailty and Sarcopenia Research Task Force met in April 2017 to discuss the meaning, significance, and barriers to the implementation of the new code as well as strategies to accelerate development of new therapies. Analyses by the Sarcopenia Definitions and Outcomes Consortium are underway to develop quantitative definitions of sarcopenia. A consensus conference is planned to evaluate this analysis. The Task Force also discussed lessons learned from sarcopenia trials that could be applied to future trials, as well as lessons from the osteoporosis field, a clinical condition with many constructs similar to sarcopenia and for which ad hoc treatments have been developed and approved by regulatory agencies.
Optimization of miRNA-seq data preprocessing.
Tam, Shirley; Tsao, Ming-Sound; McPherson, John D
2015-11-01
The past two decades of microRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regulators of many biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platform of choice for the discovery and quantification of miRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstream analyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn from downstream analyses. Using a spike-in dilution study, we evaluated the effects of several general-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. We make practical recommendations on the optimal preprocessing methods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments. © The Author 2015. Published by Oxford University Press.
Masuda, Isao; Matsuzaki, Motomichi; Kita, Kiyoshi
2010-10-01
Diverse mitochondrial (mt) genetic systems have evolved independently of the more uniform nuclear system and often employ modified genetic codes. The organization and genetic system of dinoflagellate mt genomes are particularly unusual and remain an evolutionary enigma. We determined the sequence of full-length cytochrome c oxidase subunit 1 (cox1) mRNA of the earliest diverging dinoflagellate Perkinsus and show that this gene resides in the mt genome. Apparently, this mRNA is not translated in a single reading frame with standard codon usage. Our examination of the nucleotide sequence and three-frame translation of the mRNA suggest that the reading frame must be shifted 10 times, at every AGG and CCC codon, to yield a consensus COX1 protein. We suggest two possible mechanisms for these translational frameshifts: a ribosomal frameshift in which stalled ribosomes skip the first bases of these codons or specialized tRNAs recognizing non-triplet codons, AGGY and CCCCU. Regardless of the mechanism, active and efficient machinery would be required to tolerate the frameshifts predicted in Perkinsus mitochondria. To our knowledge, this is the first evidence of translational frameshifts in protist mitochondria and, by far, is the most extensive case in mitochondria.
A Code Division Multiple Access Communication System for the Low Frequency Band.
1983-04-01
frequency channels spread-spectrum communication / complex sequences, orthogonal codes impulsive noise 20. ABSTRACT (Continue an reverse side It...their transmissions with signature sequences. Our LF/CDMA scheme is different in that each user’s signature sequence set consists of M orthogonal ...signature sequences. Our LF/CDMA scheme is different in that each user’s signature sequence set consists of M orthogonal sequences and thus log 2 M
Stevenson, Clare E. M.; Assaad, Aoun; Chandra, Govind; Le, Tung B. K.; Greive, Sandra J.; Bibb, Mervyn J.; Lawson, David M.
2013-01-01
Consistent with their complex lifestyles and rich secondary metabolite profiles, the genomes of streptomycetes encode a plethora of transcription factors, the vast majority of which are uncharacterized. Herein, we use Surface Plasmon Resonance (SPR) to identify and delineate putative operator sites for SCO3205, a MarR family transcriptional regulator from Streptomyces coelicolor that is well represented in sequenced actinomycete genomes. In particular, we use a novel SPR footprinting approach that exploits indirect ligand capture to vastly extend the lifetime of a standard streptavidin SPR chip. We define two operator sites upstream of sco3205 and a pseudopalindromic consensus sequence derived from these enables further potential operator sites to be identified in the S. coelicolor genome. We evaluate each of these through SPR and test the importance of the conserved bases within the consensus sequence. Informed by these results, we determine the crystal structure of a SCO3205-DNA complex at 2.8 Å resolution, enabling molecular level rationalization of the SPR data. Taken together, our observations support a DNA recognition mechanism involving both direct and indirect sequence readout. PMID:23748564
Kress, W John; Erickson, David L
2007-06-06
A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination.
Rapid Multi-Locus Sequence Typing Using Microfluidic Biochips
2010-05-12
Sequence Types. The evolutionary history of all the B. cereus MLST concatenated Sequence Types (545 taxa, 2,394 nucleotide positions) was inferred using...the Neighbor-Joining method [28]. The bootstrap consensus tree inferred from 100 replicates was taken to represent the evolutionary history of the... Chlamydia (manuscript in preparation) and performed pilot studies on Staphylococcus aureus and Streptoccus pneumoniae (Data S4 and Text S2). Another potential
GeneSilico protein structure prediction meta-server.
Kurowski, Michal A; Bujnicki, Janusz M
2003-07-01
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta.
GeneSilico protein structure prediction meta-server
Kurowski, Michal A.; Bujnicki, Janusz M.
2003-01-01
Rigorous assessments of protein structure prediction have demonstrated that fold recognition methods can identify remote similarities between proteins when standard sequence search methods fail. It has been shown that the accuracy of predictions is improved when refined multiple sequence alignments are used instead of single sequences and if different methods are combined to generate a consensus model. There are several meta-servers available that integrate protein structure predictions performed by various methods, but they do not allow for submission of user-defined multiple sequence alignments and they seldom offer confidentiality of the results. We developed a novel WWW gateway for protein structure prediction, which combines the useful features of other meta-servers available, but with much greater flexibility of the input. The user may submit an amino acid sequence or a multiple sequence alignment to a set of methods for primary, secondary and tertiary structure prediction. Fold-recognition results (target-template alignments) are converted into full-atom 3D models and the quality of these models is uniformly assessed. A consensus between different FR methods is also inferred. The results are conveniently presented on-line on a single web page over a secure, password-protected connection. The GeneSilico protein structure prediction meta-server is freely available for academic users at http://genesilico.pl/meta. PMID:12824313
The sequence specificity of UV-induced DNA damage in a systematically altered DNA sequence.
Khoe, Clairine V; Chung, Long H; Murray, Vincent
2018-06-01
The sequence specificity of UV-induced DNA damage was investigated in a specifically designed DNA plasmid using two procedures: end-labelling and linear amplification. Absorption of UV photons by DNA leads to dimerisation of pyrimidine bases and produces two major photoproducts, cyclobutane pyrimidine dimers (CPDs) and pyrimidine(6-4)pyrimidone photoproducts (6-4PPs). A previous study had determined that two hexanucleotide sequences, 5'-GCTC*AC and 5'-TATT*AA, were high intensity UV-induced DNA damage sites. The UV clone plasmid was constructed by systematically altering each nucleotide of these two hexanucleotide sequences. One of the main goals of this study was to determine the influence of single nucleotide alterations on the intensity of UV-induced DNA damage. The sequence 5'-GCTC*AC was designed to examine the sequence specificity of 6-4PPs and the highest intensity 6-4PP damage sites were found at 5'-GTTC*CC nucleotides. The sequence 5'-TATT*AA was devised to investigate the sequence specificity of CPDs and the highest intensity CPD damage sites were found at 5'-TTTT*CG nucleotides. It was proposed that the tetranucleotide DNA sequence, 5'-YTC*Y (where Y is T or C), was the consensus sequence for the highest intensity UV-induced 6-4PP adduct sites; while it was 5'-YTT*C for the highest intensity UV-induced CPD damage sites. These consensus tetranucleotides are composed entirely of consecutive pyrimidines and must have a DNA conformation that is highly productive for the absorption of UV photons. Crown Copyright © 2018. Published by Elsevier B.V. All rights reserved.
Event-triggered consensus tracking of multi-agent systems with Lur'e nonlinear dynamics
NASA Astrophysics Data System (ADS)
Huang, Na; Duan, Zhisheng; Wen, Guanghui; Zhao, Yu
2016-05-01
In this paper, distributed consensus tracking problem for networked Lur'e systems is investigated based on event-triggered information interactions. An event-triggered control algorithm is designed with the advantages of reducing controller update frequency and sensor energy consumption. By using tools of ?-procedure and Lyapunov functional method, some sufficient conditions are derived to guarantee that consensus tracking is achieved under a directed communication topology. Meanwhile, it is shown that Zeno behaviour of triggering time sequences is excluded for the proposed event-triggered rule. Finally, some numerical simulations on coupled Chua's circuits are performed to illustrate the effectiveness of the theoretical algorithms.
Streamlined Genome Sequence Compression using Distributed Source Coding
Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel
2014-01-01
We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552
ERIC Educational Resources Information Center
Fyfe, Emily R.; DeCaro, Marci S.; Rittle-Johnson, Bethany
2014-01-01
Background: The sequencing of learning materials greatly influences the knowledge that learners construct. Recently, learning theorists have focused on the sequencing of instruction in relation to solving related problems. The general consensus suggests explicit instruction should be provided; however, when to provide instruction remains unclear.…
Sampled-Data Consensus of Linear Multi-agent Systems With Packet Losses.
Zhang, Wenbing; Tang, Yang; Huang, Tingwen; Kurths, Jurgen
In this paper, the consensus problem is studied for a class of multi-agent systems with sampled data and packet losses, where random and deterministic packet losses are considered, respectively. For random packet losses, a Bernoulli-distributed white sequence is used to describe packet dropouts among agents in a stochastic way. For deterministic packet losses, a switched system with stable and unstable subsystems is employed to model packet dropouts in a deterministic way. The purpose of this paper is to derive consensus criteria, such that linear multi-agent systems with sampled-data and packet losses can reach consensus. By means of the Lyapunov function approach and the decomposition method, the design problem of a distributed controller is solved in terms of convex optimization. The interplay among the allowable bound of the sampling interval, the probability of random packet losses, and the rate of deterministic packet losses are explicitly derived to characterize consensus conditions. The obtained criteria are closely related to the maximum eigenvalue of the Laplacian matrix versus the second minimum eigenvalue of the Laplacian matrix, which reveals the intrinsic effect of communication topologies on consensus performance. Finally, simulations are given to show the effectiveness of the proposed results.In this paper, the consensus problem is studied for a class of multi-agent systems with sampled data and packet losses, where random and deterministic packet losses are considered, respectively. For random packet losses, a Bernoulli-distributed white sequence is used to describe packet dropouts among agents in a stochastic way. For deterministic packet losses, a switched system with stable and unstable subsystems is employed to model packet dropouts in a deterministic way. The purpose of this paper is to derive consensus criteria, such that linear multi-agent systems with sampled-data and packet losses can reach consensus. By means of the Lyapunov function approach and the decomposition method, the design problem of a distributed controller is solved in terms of convex optimization. The interplay among the allowable bound of the sampling interval, the probability of random packet losses, and the rate of deterministic packet losses are explicitly derived to characterize consensus conditions. The obtained criteria are closely related to the maximum eigenvalue of the Laplacian matrix versus the second minimum eigenvalue of the Laplacian matrix, which reveals the intrinsic effect of communication topologies on consensus performance. Finally, simulations are given to show the effectiveness of the proposed results.
Examination of the catalytic fitness of the hammerhead ribozyme by in vitro selection.
Tang, J; Breaker, R R
1997-01-01
We have designed a self-cleaving ribozyme construct that is rendered inactive during preparative in vitro transcription by allosteric interactions with ATP. This allosteric ribozyme was constructed by joining a hammerhead domain to an ATP-binding RNA aptamer, thereby creating a ribozyme whose catalytic rate can be controlled by ATP. Upon purification by PAGE, the engineered ribozyme undergoes rapid self-cleavage when incubated in the absence of ATP. This strategy of "allosteric delay" was used to prepare intact hammerhead ribozymes that would otherwise self-destruct during transcription. Using a similar strategy, we have prepared a combinatorial pool of RNA in order to assess the catalytic fitness of ribozymes that carry the natural consensus sequence for the hammerhead. Using in vitro selection, this comprehensive RNA pool was screened for sequence variants of the hammerhead ribozyme that also display catalytic activity. We find that sequences that comprise the core of naturally occurring hammerhead dominate the population of selected RNAs, indicating that the natural consensus sequence of this ribozyme is optimal for catalytic function. PMID:9257650
Singh, Reema; Schilde, Christina; Schaap, Pauline
2016-11-17
Dictyostelia are a well-studied group of organisms with colonial multicellularity, which are members of the mostly unicellular Amoebozoa. A phylogeny based on SSU rDNA data subdivided all Dictyostelia into four major groups, but left the position of the root and of six group-intermediate taxa unresolved. Recent phylogenies inferred from 30 or 213 proteins from sequenced genomes, positioned the root between two branches, each containing two major groups, but lacked data to position the group-intermediate taxa. Since the positions of these early diverging taxa are crucial for understanding the evolution of phenotypic complexity in Dictyostelia, we sequenced six representative genomes of early diverging taxa. We retrieved orthologs of 47 housekeeping proteins with an average size of 890 amino acids from six newly sequenced and eight published genomes of Dictyostelia and unicellular Amoebozoa and inferred phylogenies from single and concatenated protein sequence alignments. Concatenated alignments of all 47 proteins, and four out of five subsets of nine concatenated proteins all produced the same consensus phylogeny with 100% statistical support. Trees inferred from just two out of the 47 proteins, individually reproduced the consensus phylogeny, highlighting that single gene phylogenies will rarely reflect correct species relationships. However, sets of two or three concatenated proteins again reproduced the consensus phylogeny, indicating that a small selection of genes suffices for low cost classification of as yet unincorporated or newly discovered dictyostelid and amoebozoan taxa by gene amplification. The multi-locus consensus phylogeny shows that groups 1 and 2 are sister clades in branch I, with the group-intermediate taxon D. polycarpum positioned as outgroup to group 2. Branch II consists of groups 3 and 4, with the group-intermediate taxon Polysphondylium violaceum positioned as sister to group 4, and the group-intermediate taxon Dictyostelium polycephalum branching at the base of that whole clade. Given the data, the approximately unbiased test rejects all alternative topologies favoured by SSU rDNA and individual proteins with high statistical support. The test also rejects monophyletic origins for the genera Acytostelium, Polysphondylium and Dictyostelium. The current position of Acytostelium ellipticum in the consensus phylogeny indicates that somatic cells were lost twice in Dictyostelia.
Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R
2015-01-01
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced.
Dasenko, Mark A.
2015-01-01
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles being more successfully sequenced. PMID:26716693
Edwards, W. Barry
2013-01-01
The aim of this study was to identify potential ligands of PSMA suitable for further development as novel PSMA-targeted peptides using phage display technology. The human PSMA protein was immobilized as a target followed by incubation with a 15-mer phage display random peptide library. After one round of prescreening and two rounds of screening, high-stringency screening at the third round of panning was performed to identify the highest affinity binders. Phages which had a specific binding activity to PSMA in human prostate cancer cells were isolated and the DNA corresponding to the 15-mers were sequenced to provide three consensus sequences: GDHSPFT, SHFSVGS and EVPRLSLLAVFL as well as other sequences that did not display consensus. Two of the peptide sequences deduced from DNA sequencing of binding phages, SHSFSVGSGDHSPFT and GRFLTGGTGRLLRIS were labeled with 5-carboxyfluorescein and shown to bind and co-internalize with PSMA on human prostate cancer cells by fluorescence microscopy. The high stringency requirements yielded peptides with affinities KD∼1 µM or greater which are suitable starting points for affinity maturation. While these values were less than anticipated, the high stringency did yield peptide sequences that apparently bound to different surfaces on PSMA. These peptide sequences could be the basis for further development of peptides for prostate cancer tumor imaging and therapy. PMID:23935860
Pietan, Lucas L.; Spradling, Theresa A.
2016-01-01
In animals, mitochondrial DNA (mtDNA) typically occurs as a single circular chromosome with 13 protein-coding genes and 22 tRNA genes. The various species of lice examined previously, however, have shown mitochondrial genome rearrangements with a range of chromosome sizes and numbers. Our research demonstrates that the mitochondrial genomes of two species of chewing lice found on pocket gophers, Geomydoecus aurei and Thomomydoecus minor, are fragmented with the 1,536 base-pair (bp) cytochrome-oxidase subunit I (cox1) gene occurring as the only protein-coding gene on a 1,916–1,964 bp minicircular chromosome in the two species, respectively. The cox1 gene of T. minor begins with an atypical start codon, while that of G. aurei does not. Components of the non-protein coding sequence of G. aurei and T. minor include a tRNA (isoleucine) gene, inverted repeat sequences consistent with origins of replication, and an additional non-coding region that is smaller than the non-coding sequence of other lice with such fragmented mitochondrial genomes. Sequences of cox1 minichromosome clones for each species reveal extensive length and sequence heteroplasmy in both coding and noncoding regions. The highly variable non-gene regions of G. aurei and T. minor have little sequence similarity with one another except for a 19-bp region of phylogenetically conserved sequence with unknown function. PMID:27589589
2004-12-09
We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
A Partial Least Squares Based Procedure for Upstream Sequence Classification in Prokaryotes.
Mehmood, Tahir; Bohlin, Jon; Snipen, Lars
2015-01-01
The upstream region of coding genes is important for several reasons, for instance locating transcription factor, binding sites, and start site initiation in genomic DNA. Motivated by a recently conducted study, where multivariate approach was successfully applied to coding sequence modeling, we have introduced a partial least squares (PLS) based procedure for the classification of true upstream prokaryotic sequence from background upstream sequence. The upstream sequences of conserved coding genes over genomes were considered in analysis, where conserved coding genes were found by using pan-genomics concept for each considered prokaryotic species. PLS uses position specific scoring matrix (PSSM) to study the characteristics of upstream region. Results obtained by PLS based method were compared with Gini importance of random forest (RF) and support vector machine (SVM), which is much used method for sequence classification. The upstream sequence classification performance was evaluated by using cross validation, and suggested approach identifies prokaryotic upstream region significantly better to RF (p-value < 0.01) and SVM (p-value < 0.01). Further, the proposed method also produced results that concurred with known biological characteristics of the upstream region.
Golay sequences coded coherent optical OFDM for long-haul transmission
NASA Astrophysics Data System (ADS)
Qin, Cui; Ma, Xiangrong; Hua, Tao; Zhao, Jing; Yu, Huilong; Zhang, Jian
2017-09-01
We propose to use binary Golay sequences in coherent optical orthogonal frequency division multiplexing (CO-OFDM) to improve the long-haul transmission performance. The Golay sequences are generated by binary Reed-Muller codes, which have low peak-to-average power ratio and certain error correction capability. A low-complexity decoding algorithm for the Golay sequences is then proposed to recover the signal. Under same spectral efficiency, the QPSK modulated OFDM with binary Golay sequences coding with and without discrete Fourier transform (DFT) spreading (DFTS-QPSK-GOFDM and QPSK-GOFDM) are compared with the normal BPSK modulated OFDM with and without DFT spreading (DFTS-BPSK-OFDM and BPSK-OFDM) after long-haul transmission. At a 7% forward error correction code threshold (Q2 factor of 8.5 dB), it is shown that DFTS-QPSK-GOFDM outperforms DFTS-BPSK-OFDM by extending the transmission distance by 29% and 18%, in non-dispersion managed and dispersion managed links, respectively.
Sawada, Akihisa; Croom-Carter, Deborah; Kondo, Osamu; Yasui, Masahiro; Koyama-Sato, Maho; Inoue, Masami; Kawa, Keisei; Rickinson, Alan B; Tierney, Rosemary J
2011-05-01
Polymorphisms in Epstein-Barr virus (EBV) latent genes can identify virus strains from different human populations and individual strains within a population. An Asian EBV signature has been defined almost exclusively from Chinese viruses, with little information from other Asian countries. Here we sequenced polymorphic regions of the EBNA1, 2, 3A, 3B, 3C and LMP1 genes of 31 Japanese strains from control donors and EBV-associated T/NK-cell lymphoproliferative disease (T/NK-LPD) patients. Though identical to Chinese strains in their dominant EBNA1 and LMP1 alleles, Japanese viruses were subtly different at other loci. Thus, while Chinese viruses mainly fall into two families with strongly linked 'Wu' or 'Li' alleles at EBNA2 and EBNA3A/B/C, Japanese viruses all have the consensus Wu EBNA2 allele but fall into two families at EBNA3A/B/C. One family has variant Li-like sequences at EBNA3A and 3B and the consensus Li sequence at EBNA3C; the other family has variant Wu-like sequences at EBNA3A, variants of a low frequency Chinese allele 'Sp' at EBNA3B and a consensus Sp sequence at EBNA3C. Thus, EBNA3A/B/C allelotypes clearly distinguish Japanese from Chinese strains. Interestingly, most Japanese viruses also lack those immune-escape mutations in the HLA-A11 epitope-encoding region of EBNA3B that are so characteristic of viruses from the highly A11-positive Chinese population. Control donor-derived and T/NK-LPD-derived strains were similarly distributed across allelotypes and, by using allelic polymorphisms to track virus strains in patients pre- and post-haematopoietic stem-cell transplant, we show that a single strain can induce both T/NK-LPD and B-cell-lymphoproliferative disease in the same patient.
Lagares, Antonio; Hozbor, Daniela F.; Niehaus, Karsten; Otero, Augusto J. L. Pich; Lorenzen, Jens; Arnold, Walter; Pühler, Alfred
2001-01-01
The genetic characterization of a 5.5-kb chromosomal region of Sinorhizobium meliloti 2011 that contains lpsB, a gene required for the normal development of symbiosis with Medicago spp., is presented. The nucleotide sequence of this DNA fragment revealed the presence of six genes: greA and lpsB, transcribed in the forward direction; and lpsE, lpsD, lpsC, and lrp, transcribed in the reverse direction. Except for lpsB, none of the lps genes were relevant for nodulation and nitrogen fixation. Analysis of the transcriptional organization of lpsB showed that greA and lpsB are part of separate transcriptional units, which is in agreement with the finding of a DNA stretch homologous to a “nonnitrogen” promoter consensus sequence between greA and lpsB. The opposite orientation of lpsB with respect to its first downstream coding sequence, lpsE, indicated that the altered LPS and the defective symbiosis of lpsB mutants are both consequences of a primary nonpolar defect in a single gene. Global sequence comparisons revealed that the greA-lpsB and lrp genes of S. meliloti have a genetic organization similar to that of their homologous loci in R. leguminosarum bv. viciae. In particular, high sequence similarity was found between the translation product of lpsB and a core-related biosynthetic mannosyltransferase of R. leguminosarum bv. viciae encoded by the lpcC gene. The functional relationship between these two genes was demonstrated in genetic complementation experiments in which the S. meliloti lpsB gene restored the wild-type LPS phenotype when introduced into lpcC mutants of R. leguminosarum. These results support the view that S. meliloti lpsB also encodes a mannosyltransferase that participates in the biosynthesis of the LPS core. Evidence is provided for the presence of other lpsB-homologous sequences in several members of the family Rhizobiaceae. PMID:11157937
Engineering large-scale agent-based systems with consensus
NASA Technical Reports Server (NTRS)
Bokma, A.; Slade, A.; Kerridge, S.; Johnson, K.
1994-01-01
The paper presents the consensus method for the development of large-scale agent-based systems. Systems can be developed as networks of knowledge based agents (KBA) which engage in a collaborative problem solving effort. The method provides a comprehensive and integrated approach to the development of this type of system. This includes a systematic analysis of user requirements as well as a structured approach to generating a system design which exhibits the desired functionality. There is a direct correspondence between system requirements and design components. The benefits of this approach are that requirements are traceable into design components and code thus facilitating verification. The use of the consensus method with two major test applications showed it to be successful and also provided valuable insight into problems typically associated with the development of large systems.
BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.
Ito, Eric Augusto; Katahira, Isaque; Vicente, Fábio Fernandes da Rocha; Pereira, Luiz Filipe Protasio; Lopes, Fabrício Martins
2018-06-05
With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.
Van Lent, Sarah; Creasy, Heather Huot; Myers, Garry S A; Vanrompay, Daisy
2016-01-01
Variation is a central trait of the polymorphic membrane protein (Pmp) family. The number of pmp coding sequences differs between Chlamydia species, but it is unknown whether the number of pmp coding sequences is constant within a Chlamydia species. The level of conservation of the Pmp proteins has previously only been determined for Chlamydia trachomatis. As different Pmp proteins might be indispensible for the pathogenesis of different Chlamydia species, this study investigated the conservation of Pmp proteins both within and across C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci. The pmp coding sequences were annotated in 16 C. trachomatis, 6 C. pneumoniae, 2 C. abortus, and 16 C. psittaci genomes. The number and organization of polymorphic membrane coding sequences differed within and across the analyzed Chlamydia species. The length of coding sequences of pmpA,pmpB, and pmpH was conserved among all analyzed genomes, while the length of pmpE/F and pmpG, and remarkably also of the subtype pmpD, differed among the analyzed genomes. PmpD, PmpA, PmpH, and PmpA were the most conserved Pmp in C. trachomatis,C. pneumoniae,C. abortus, and C. psittaci, respectively. PmpB was the most conserved Pmp across the 4 analyzed Chlamydia species. © 2016 S. Karger AG, Basel.
TOWARD THE DEVELOPMENT OF A CONSENSUS MATERIALS DATABASE FOR PRESSURE TECHNOLGY APPLICATIONS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Swindeman, Robert W; Ren, Weiju
The ASME construction code books specify materials and fabrication procedures that are acceptable for pressure technology applications. However, with few exceptions, the materials properties provided in the ASME code books provide no statistics or other information pertaining to material variability. Such information is central to the prediction and prevention of failure events. Many sources of materials data exist that provide variability information but such sources do not necessarily represent a consensus of experts with respect to the reported trends that are represented. Such a need has been identified by the ASME Standards Technology, LLC and initial steps have been takenmore » to address these needs: however, these steps are limited to project-specific applications only, such as the joint DOE-ASME project on materials for Generation IV nuclear reactors. In contrast to light-water reactor technology, the experience base for the Generation IV nuclear reactors is somewhat lacking and heavy reliance must be placed on model development and predictive capability. The database for model development is being assembled and includes existing code alloys such as alloy 800H and 9Cr-1Mo-V steel. Ownership and use rights are potential barriers that must be addressed.« less
Westbrook, Jared W.; Chhatre, Vikram E.; Wu, Le-Shin; Chamala, Srikar; Neves, Leandro Gomide; Muñoz, Patricio; Martínez-García, Pedro J.; Neale, David B.; Kirst, Matias; Mockaitis, Keithanne; Nelson, C. Dana; Peter, Gary F.; Echt, Craig S.
2015-01-01
A consensus genetic map for Pinus taeda (loblolly pine) and Pinus elliottii (slash pine) was constructed by merging three previously published P. taeda maps with a map from a pseudo-backcross between P. elliottii and P. taeda. The consensus map positioned 3856 markers via genotyping of 1251 individuals from four pedigrees. It is the densest linkage map for a conifer to date. Average marker spacing was 0.6 cM and total map length was 2305 cM. Functional predictions of mapped genes were improved by aligning expressed sequence tags used for marker discovery to full-length P. taeda transcripts. Alignments to the P. taeda genome mapped 3305 scaffold sequences onto 12 linkage groups. The consensus genetic map was used to compare the genome-wide linkage disequilibrium in a population of distantly related P. taeda individuals (ADEPT2) used for association genetic studies and a multiple-family pedigree used for genomic selection (CCLONES). The prevalence and extent of LD was greater in CCLONES as compared to ADEPT2; however, extended LD with LGs or between LGs was rare in both populations. The average squared correlations, r2, between SNP alleles less than 1 cM apart were less than 0.05 in both populations and r2 did not decay substantially with genetic distance. The consensus map and analysis of linkage disequilibrium establish a foundation for comparative association mapping and genomic selection in P. taeda and P. elliottii. PMID:26068575
Kress, W. John; Erickson, David L.
2007-01-01
Background A useful DNA barcode requires sufficient sequence variation to distinguish between species and ease of application across a broad range of taxa. Discovery of a DNA barcode for land plants has been limited by intrinsically lower rates of sequence evolution in plant genomes than that observed in animals. This low rate has complicated the trade-off in finding a locus that is universal and readily sequenced and has sufficiently high sequence divergence at the species-level. Methodology/Principal Findings Here, a global plant DNA barcode system is evaluated by comparing universal application and degree of sequence divergence for nine putative barcode loci, including coding and non-coding regions, singly and in pairs across a phylogenetically diverse set of 48 genera (two species per genus). No single locus could discriminate among species in a pair in more than 79% of genera, whereas discrimination increased to nearly 88% when the non-coding trnH-psbA spacer was paired with one of three coding loci, including rbcL. In silico trials were conducted in which DNA sequences from GenBank were used to further evaluate the discriminatory power of a subset of these loci. These trials supported the earlier observation that trnH-psbA coupled with rbcL can correctly identify and discriminate among related species. Conclusions/Significance A combination of the non-coding trnH-psbA spacer region and a portion of the coding rbcL gene is recommended as a two-locus global land plant barcode that provides the necessary universality and species discrimination. PMID:17551588
Cenik, Can; Chua, Hon Nian; Singh, Guramrit; Akef, Abdalla; Snyder, Michael P; Palazzo, Alexander F; Moore, Melissa J; Roth, Frederick P
2017-03-01
Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5 ' proximal- i ntron- m inus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N 1 -methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N 1 -methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC. © 2017 Cenik et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Radford, Devon R; Leon-Velarde, Carlos G; Chen, Shu; Hamidi Oskouei, Amir M; Balamurugan, Sampathkumar
2018-03-29
The genomes of two strains of Salmonella enterica subsp. enterica serovar Cubana and serovar Muenchen, isolated from dry hazelnuts and chia seeds, respectively, were sequenced using the Illumina MiSeq platform, assembled de novo using the overlap-layout-consensus method, and aligned to their respective most identical sequence genome scaffolds using MUMMER and BLAST searches. Copyright © 2018 Radford et al.
2014-01-01
Background Next-generation DNA sequencing (NGS) technologies have made huge impacts in many fields of biological research, but especially in evolutionary biology. One area where NGS has shown potential is for high-throughput sequencing of complete mtDNA genomes (of humans and other animals). Despite the increasing use of NGS technologies and a better appreciation of their importance in answering biological questions, there remain significant obstacles to the successful implementation of NGS-based projects, especially for new users. Results Here we present an ‘A to Z’ protocol for obtaining complete human mitochondrial (mtDNA) genomes – from DNA extraction to consensus sequence. Although designed for use on humans, this protocol could also be used to sequence small, organellar genomes from other species, and also nuclear loci. This protocol includes DNA extraction, PCR amplification, fragmentation of PCR products, barcoding of fragments, sequencing using the 454 GS FLX platform, and a complete bioinformatics pipeline (primer removal, reference-based mapping, output of coverage plots and SNP calling). Conclusions All steps in this protocol are designed to be straightforward to implement, especially for researchers who are undertaking next-generation sequencing for the first time. The molecular steps are scalable to large numbers (hundreds) of individuals and all steps post-DNA extraction can be carried out in 96-well plate format. Also, the protocol has been assembled so that individual ‘modules’ can be swapped out to suit available resources. PMID:24460871
Bioinformatics prediction of siRNAs as potential antiviral agents against dengue viruses
Villegas-Rosales, Paula M; Méndez-Tenorio, Alfonso; Ortega-Soto, Elizabeth; Barrón, Blanca L
2012-01-01
Dengue virus (DENV 1-4) represents the major emerging arthropod-borne viral infection in the world. Currently, there is neither an available vaccine nor a specific treatment. Hence, there is a need of antiviral drugs for these viral infections; we describe the prediction of short interfering RNA (siRNA) as potential therapeutic agents against the four DENV serotypes. Our strategy was to carry out a series of multiple alignments using ClustalX program to find conserved sequences among the four DENV serotype genomes to obtain a consensus sequence for siRNAs design. A highly conserved sequence among the four DENV serotypes, located in the encoding sequence for NS4B and NS5 proteins was found. A total of 2,893 complete DENV genomes were downloaded from the NCBI, and after a depuration procedure to identify identical sequences, 220 complete DENV genomes were left. They were edited to select the NS4B and NS5 sequences, which were aligned to obtain a consensus sequence. Three different servers were used for siRNA design, and the resulting siRNAs were aligned to identify the most prevalent sequences. Three siRNAs were chosen, one targeted the genome region that codifies for NS4B protein and the other two; the region for NS5 protein. Predicted secondary structure for DENV genomes was used to demonstrate that the siRNAs were able to target the viral genome forming double stranded structures, necessary to activate the RNA silencing machinery. PMID:22829722
Patricios, Jon S; Hislop, Michael David; Aubry, Mark; Bloomfield, Paul; Broderick, Carolyn; Clifton, Patrick; Ellenbogen, Richard G; Falvey, Éanna Cian; Grand, Julie; Hack, Dallas; Harcourt, Peter Rex; Hughes, David; McGuirk, Nathan; Meeuwisse, Willem; Miller, Jeffrey; Parsons, John T; Richiger, Simona; Sills, Allen; Moran, Kevin B; Shute, Jenny; Raftery, Martin
2018-01-01
The 2017 Berlin Concussion in Sport Group Consensus Statement provides a global summary of best practice in concussion prevention, diagnosis and management, underpinned by systematic reviews and expert consensus. Due to their different settings and rules, individual sports need to adapt concussion guidelines according to their specific regulatory environment. At the same time, consistent application of the Berlin Consensus Statement’s themes across sporting codes is likely to facilitate superior and uniform diagnosis and management, improve concussion education and highlight collaborative research opportunities. This document summarises the approaches discussed by medical representatives from the governing bodies of 10 different contact and collision sports in Dublin, Ireland in July 2017. Those sports are: American football, Australian football, basketball, cricket, equestrian sports, football/soccer, ice hockey, rugby league, rugby union and skiing. This document had been endorsed by 11 sport governing bodies/national federations at the time of being published. PMID:29500252
Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks
2017-01-01
Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees. PMID:28545083
Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks.
Klinkenberg, Don; Backer, Jantien A; Didelot, Xavier; Colijn, Caroline; Wallinga, Jacco
2017-05-01
Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van Coillie, E.; Fiten, P.; Van Damme, J.
1997-03-01
Monocyte chemotactic proteins (MCPs) form a subfamily of chemokines that recruit leukocytes to sites of inflammation and that may contribute to tumor-associated leukocyte infiltration and to the antiviral state against HIV infection. With the use of degenerate primers that were based on CC chemokine consensus sequences, the known MIP-1{alpha}/LD78{alpha}, MCP-1, and MCP-3 genes and the previously unidentified eotaxin and MCP-2 genes were isolated from a YAC contig from human chromosome 17q11.2. The amplified genomic MCP-2 fragment was used to isolate an MCP-2 cosmid from which the gene sequence was determined. The MCP-2 gene shares with the MCP-1 and MCP-3 genesmore » a conserved intron-exon structure and a coding nucleotide sequence homology of 77%. By Northern blot analysis the 1.0-kb MCP-2 mRNA was predominantly detectable in the small intestine, peripheral blood, heart, placenta, lung, skeletal muscle, ovary, colon, spinal cord, pancreas, and thymus. Transcripts of 1.5 and 2.4 kb were found in the testis, the small intestine, and the colon. The isolation of the MCP-2 gene from the chemokine contig localized it on YAC clones of chromosome 17q11.2, which also contain the eotaxin, MCP-1, MCP-3, and NCC-1/MCP-4 genes. The combination of using degenerate primer PCR and YACs illustrates that novel genes can efficiently be isolated from gene cluster contigs with less redundancy and effort than the isolation of novel ESTs. 42 refs., 5 figs., 2 tabs.« less
On the Concept of Cis-regulatory Information: From Sequence Motifs to Logic Functions
NASA Astrophysics Data System (ADS)
Tarpine, Ryan; Istrail, Sorin
The regulatory genome is about the “system level organization of the core genomic regulatory apparatus, and how this is the locus of causality underlying the twin phenomena of animal development and animal evolution” (E.H. Davidson. The Regulatory Genome: Gene Regulatory Networks in Development and Evolution, Academic Press, 2006). Information processing in the regulatory genome is done through regulatory states, defined as sets of transcription factors (sequence-specific DNA binding proteins which determine gene expression) that are expressed and active at the same time. The core information processing machinery consists of modular DNA sequence elements, called cis-modules, that interact with transcription factors. The cis-modules “read” the information contained in the regulatory state of the cell through transcription factor binding, “process” it, and directly or indirectly communicate with the basal transcription apparatus to determine gene expression. This endowment of each gene with the information-receiving capacity through their cis-regulatory modules is essential for the response to every possible regulatory state to which it might be exposed during all phases of the life cycle and in all cell types. We present here a set of challenges addressed by our CYRENE research project aimed at studying the cis-regulatory code of the regulatory genome. The CYRENE Project is devoted to (1) the construction of a database, the cis-Lexicon, containing comprehensive information across species about experimentally validated cis-regulatory modules; and (2) the software development of a next-generation genome browser, the cis-Browser, specialized for the regulatory genome. The presentation is anchored on three main computational challenges: the Gene Naming Problem, the Consensus Sequence Bottleneck Problem, and the Logic Function Inference Problem.
Alcántara, Cristina; Sarmiento-Rubiano, Luz Adriana; Monedero, Vicente; Deutscher, Josef; Pérez-Martínez, Gaspar; Yebra, María J.
2008-01-01
Sequence analysis of the five genes (gutRMCBA) downstream from the previously described sorbitol-6-phosphate dehydrogenase-encoding Lactobacillus casei gutF gene revealed that they constitute a sorbitol (glucitol) utilization operon. The gutRM genes encode putative regulators, while the gutCBA genes encode the EIIC, EIIBC, and EIIA proteins of a phosphoenolpyruvate-dependent sorbitol phosphotransferase system (PTSGut). The gut operon is transcribed as a polycistronic gutFRMCBA messenger, the expression of which is induced by sorbitol and repressed by glucose. gutR encodes a transcriptional regulator with two PTS-regulated domains, a galactitol-specific EIIB-like domain (EIIBGat domain) and a mannitol/fructose-specific EIIA-like domain (EIIAMtl domain). Its inactivation abolished gut operon transcription and sorbitol uptake, indicating that it acts as a transcriptional activator. In contrast, cells carrying a gutB mutation expressed the gut operon constitutively, but they failed to transport sorbitol, indicating that EIIBCGut negatively regulates GutR. A footprint analysis showed that GutR binds to a 35-bp sequence upstream from the gut promoter. A sequence comparison with the presumed promoter region of gut operons from various firmicutes revealed a GutR consensus motif that includes an inverted repeat. The regulation mechanism of the L. casei gut operon is therefore likely to be operative in other firmicutes. Finally, gutM codes for a conserved protein of unknown function present in all sequenced gut operons. A gutM mutant, the first constructed in a firmicute, showed drastically reduced gut operon expression and sorbitol uptake, indicating a regulatory role also for GutM. PMID:18676710
Links, Matthew G; Dumonceaux, Tim J; Hemmingsen, Sean M; Hill, Janet E
2012-01-01
Barcoding with molecular sequences is widely used to catalogue eukaryotic biodiversity. Studies investigating the community dynamics of microbes have relied heavily on gene-centric metagenomic profiling using two genes (16S rRNA and cpn60) to identify and track Bacteria. While there have been criteria formalized for barcoding of eukaryotes, these criteria have not been used to evaluate gene targets for other domains of life. Using the framework of the International Barcode of Life we evaluated DNA barcodes for Bacteria. Candidates from the 16S rRNA gene and the protein coding cpn60 gene were evaluated. Within complete bacterial genomes in the public domain representing 983 species from 21 phyla, the largest difference between median pairwise inter- and intra-specific distances ("barcode gap") was found from cpn60. Distribution of sequence diversity along the ∼555 bp cpn60 target region was remarkably uniform. The barcode gap of the cpn60 universal target facilitated the faithful de novo assembly of full-length operational taxonomic units from pyrosequencing data from a synthetic microbial community. Analysis supported the recognition of both 16S rRNA and cpn60 as DNA barcodes for Bacteria. The cpn60 universal target was found to have a much larger barcode gap than 16S rRNA suggesting cpn60 as a preferred barcode for Bacteria. A large barcode gap for cpn60 provided a robust target for species-level characterization of data. The assembly of consensus sequences for barcodes was shown to be a reliable method for the identification and tracking of novel microbes in metagenomic studies.
A DS-UWB Cognitive Radio System Based on Bridge Function Smart Codes
NASA Astrophysics Data System (ADS)
Xu, Yafei; Hong, Sheng; Zhao, Guodong; Zhang, Fengyuan; di, Jinshan; Zhang, Qishan
This paper proposes a direct-sequence UWB Gaussian pulse of cognitive radio systems based on bridge function smart sequence matrix and the Gaussian pulse. As the system uses the spreading sequence code, that is the bridge function smart code sequence, the zero correlation zones (ZCZs) which the bridge function sequences' auto-correlation functions had, could reduce multipath fading of the pulse interference. The Modulated channel signal was sent into the IEEE 802.15.3a UWB channel. We analysis the ZCZs's inhibition to the interference multipath interference (MPI), as one of the main system sources interferences. The simulation in SIMULINK/MATLAB is described in detail. The result shows the system has better performance by comparison with that employing Walsh sequence square matrix, and it was verified by the formula in principle.
Nemire, Kenneth; Johnson, Daniel A; Vidal, Keith
2016-01-01
Walkway codes and standards are often created through consensus by committees based on a number of factors, including historical precedence, common practice, cost, and empirical data. The authors maintain that in the formulation of codes and standards that impact pedestrian safety, the results of pertinent scientific research should be given significant weight. This article examines many elements of common walkway codes and standards related to changes in level, stairways, stair handrails, and slip resistance. It identifies which portions are based on or supported by empirical data; and which could benefit from additional scientific research. This article identifies areas in which additional research, codes, and standards may be beneficial to enhance pedestrian safety. Copyright © 2015 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Fletcher, Simon P; Ali, Iraj K; Kaminski, Ann; Digard, Paul; Jackson, Richard J
2002-01-01
Classical swine fever virus (CSFV) is a member of the pestivirus family, which shares many features in common with hepatitis C virus (HCV). It is shown here that CSFV has an exceptionally efficient cis-acting internal ribosome entry segment (IRES), which, like that of HCV, is strongly influenced by the sequences immediately downstream of the initiation codon, and is optimal with viral coding sequences in this position. Constructs that retained 17 or more codons of viral coding sequence exhibited full IRES activity, but with only 12 codons, activity was approximately 66% of maximum in vitro (though close to maximum in transfected BHK cells), whereas with just 3 codons or fewer, the activity was only approximately 15% of maximum. The minimal coding region elements required for high activity were exchanged between HCV and CSFV. Although maximum activity was observed in each case with the homologous combination of coding region and 5' UTR, the heterologous combinations were sufficiently active to rule out a highly specific functional interplay between the 5' UTR and coding sequences. On the other hand, inversion of the coding sequences resulted in low IRES activity, particularly with the HCV coding sequences. RNA structure probing showed that the efficiency of internal initiation of these chimeric constructs correlated most closely with the degree of single-strandedness of the region around and immediately downstream of the initiation codon. The low activity IRESs could not be rescued by addition of supplementary eIF4A (the initiation factor with ATP-dependent RNA helicase activity). The extreme sensitivity to secondary structure around the initiation codon is likely to be due to the fact that the eIF4F complex (which has eIF4A as one of its subunits) is not required for and does not participate in initiation on these IRESs. PMID:12515388
ERIC Educational Resources Information Center
Morey, Candice C.; Miron, Monica D.
2016-01-01
Among models of working memory, there is not yet a consensus about how to describe functions specific to storing verbal or visual-spatial memories. We presented aural-verbal and visual-spatial lists simultaneously and sometimes cued one type of information after presentation, comparing accuracy in conditions with and without informative…
Homologues of insulinase, a new superfamily of metalloendopeptidases.
Rawlings, N D; Barrett, A J
1991-01-01
On the basis of a statistical analysis of an alignment of the amino acid sequences, a new superfamily of metalloendopeptidases is proposed, consisting of human insulinase, Escherichia coli protease III and mitochondrial processing endopeptidases from Saccharomyces and Neurospora. These enzymes do not contain the 'HEXXH' consensus sequence found in all previously recognized zinc metalloendopeptidases. PMID:2025223
2012-01-01
Background Detecting the borders between coding and non-coding regions is an essential step in the genome annotation. And information entropy measures are useful for describing the signals in genome sequence. However, the accuracies of previous methods of finding borders based on entropy segmentation method still need to be improved. Methods In this study, we first applied a new recursive entropic segmentation method on DNA sequences to get preliminary significant cuts. A 22-symbol alphabet is used to capture the differential composition of nucleotide doublets and stop codon patterns along three phases in both DNA strands. This process requires no prior training datasets. Results Comparing with the previous segmentation methods, the experimental results on three bacteria genomes, Rickettsia prowazekii, Borrelia burgdorferi and E.coli, show that our approach improves the accuracy for finding the borders between coding and non-coding regions in DNA sequences. Conclusions This paper presents a new segmentation method in prokaryotes based on Jensen-Rényi divergence with a 22-symbol alphabet. For three bacteria genomes, comparing to A12_JR method, our method raised the accuracy of finding the borders between protein coding and non-coding regions in DNA sequences. PMID:23282225
2016-01-01
Background Inclusion of information about a patient’s work, industry, and occupation, in the electronic health record (EHR) could facilitate occupational health surveillance, better health outcomes, prevention activities, and identification of workers’ compensation cases. The US National Institute for Occupational Safety and Health (NIOSH) has developed an autocoding system for “industry” and “occupation” based on 1990 Bureau of Census codes; its effectiveness requires evaluation in conjunction with promoting the mandatory addition of these variables to the EHR. Objective The objective of the study was to evaluate the intercoder reliability of NIOSH’s Industry and Occupation Computerized Coding System (NIOCCS) when applied to data collected in a community survey conducted under the Affordable Care Act; to determine the proportion of records that are autocoded using NIOCCS. Methods Standard Occupational Classification (SOC) codes are used by several federal agencies in databases that capture demographic, employment, and health information to harmonize variables related to work activities among these data sources. There are 359 industry and occupation responses that were hand coded by 2 investigators, who came to a consensus on every code. The same variables were autocoded using NIOCCS at the high and moderate criteria level. Results Kappa was .84 for agreement between hand coders and between the hand coder consensus code versus NIOCCS high confidence level codes for the first 2 digits of the SOC code. For 4 digits, NIOCCS coding versus investigator coding ranged from kappa=.56 to .70. In this study, NIOCCS was able to achieve production rates (ie, to autocode) 31%-36% of entered variables at the “high confidence” level and 49%-58% at the “medium confidence” level. Autocoding (production) rates are somewhat lower than those reported by NIOSH. Agreement between manually coded and autocoded data are “substantial” at the 2-digit level, but only “fair” to “good” at the 4-digit level. Conclusions This work serves as a baseline for performance of NIOCCS by investigators in the field. Further field testing will clarify NIOCCS effectiveness in terms of ability to assign codes and coding accuracy and will clarify its value as inclusion of these occupational variables in the EHR is promoted. PMID:26878932
Common Viral Integration Sites Identified in Avian Leukosis Virus-Induced B-Cell Lymphomas
Justice, James F.; Morgan, Robin W.
2015-01-01
ABSTRACT Avian leukosis virus (ALV) induces B-cell lymphoma and other neoplasms in chickens by integrating within or near cancer genes and perturbing their expression. Four genes—MYC, MYB, Mir-155, and TERT—have previously been identified as common integration sites in these virus-induced lymphomas and are thought to play a causal role in tumorigenesis. In this study, we employ high-throughput sequencing to identify additional genes driving tumorigenesis in ALV-induced B-cell lymphomas. In addition to the four genes implicated previously, we identify other genes as common integration sites, including TNFRSF1A, MEF2C, CTDSPL, TAB2, RUNX1, MLL5, CXorf57, and BACH2. We also analyze the genome-wide ALV integration landscape in vivo and find increased frequency of ALV integration near transcriptional start sites and within transcripts. Previous work has shown ALV prefers a weak consensus sequence for integration in cultured human cells. We confirm this consensus sequence for ALV integration in vivo in the chicken genome. PMID:26670384
Nonspatial Sequence Coding in CA1 Neurons
Allen, Timothy A.; Salz, Daniel M.; McKenzie, Sam
2016-01-01
The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as “in sequence” or “out of sequence”. We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items—in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20–40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4–12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. SIGNIFICANCE STATEMENT The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal region CA1 while rats performed a nonspatial sequence memory task. We found that hippocampal neurons code for the temporal context of items (whether odors were presented in the correct or incorrect sequential position) and that this activity is linked with memory performance. The discovery of this novel form of temporal coding in hippocampal neurons advances our fundamental understanding of the neurobiology of episodic memory and will serve as a foundation for our cross-species, multitechnique approach aimed at elucidating the neural mechanisms underlying memory impairments in aging and dementia. PMID:26843637
Ulloa, Mauricio; Hulse-Kemp, Amanda M; De Santiago, Luis M; Stelly, David M; Burke, John J
2017-01-01
High-density linkage maps are vital to supporting the correct placement of scaffolds and gene sequences on chromosomes and fundamental to contemporary organismal research and scientific approaches to genetic improvement, especially in paleopolyploids with exceptionally complex genomes, eg, upland cotton ( Gossypium hirsutum L., "2n = 52"). Three independently developed intraspecific upland mapping populations were analyzed to generate 3 high-density genetic linkage single-nucleotide polymorphism (SNP) maps and a consensus map using the CottonSNP63K array. The populations consisted of a previously reported F 2 , a recombinant inbred line (RIL), and reciprocal RIL population, from "Phytogen 72" and "Stoneville 474" cultivars. The cluster file provided 7417 genotyped SNP markers, resulting in 26 linkage groups corresponding to the 26 chromosomes (c) of the allotetraploid upland cotton (AD) 1 arisen from the merging of 2 genomes ("A" Old World and "D" New World). Patterns of chromosome-specific recombination were largely consistent across mapping populations. The high-density genetic consensus map included 7244 SNP markers that spanned 3538 cM and comprised 3824 SNP bins, of which 1783 and 2041 were in the A t and D t subgenomes with 1825 and 1713 cM map lengths, respectively. Subgenome average distances were nearly identical, indicating that subgenomic differences in bin number arose due to the high numbers of SNPs on the D t subgenome. Examination of expected recombination frequency or crossovers (COs) on the chromosomes within each population of the 2 subgenomes revealed that COs were also not affected by the SNPs or SNP bin number in these subgenomes. Comparative alignment analyses identified historical ancestral A t -subgenomic translocations of c02 and c03, as well as of c04 and c05. The consensus map SNP sequences aligned with high congruency to the NBI assembly of Gossypium hirsutum . However, the genomic comparisons revealed evidence of additional unconfirmed possible duplications, inversions and translocations, and unbalance SNP sequence homology or SNP sequence/loci genomic dominance, or homeolog loci bias of the upland tetraploid A t and D t subgenomes. The alignments indicated that 364 SNP-associated previously unintegrated scaffolds can be placed in pseudochromosomes of the NBI G hirsutum assembly. This is the first intraspecific SNP genetic linkage consensus map assembled in G hirsutum with a core of reproducible mendelian SNP markers assayed on different populations and it provides further knowledge of chromosome arrangement of genic and nongenic SNPs. Together, the consensus map and RIL populations provide a synergistically useful platform for localizing and identifying agronomically important loci for improvement of the cotton crop.
Simonini, Sara; Roig-Villanova, Irma; Gregis, Veronica; Colombo, Bilitis; Colombo, Lucia; Kater, Martin M.
2012-01-01
BASIC PENTACYSTEINE (BPC) transcription factors have been identified in a large variety of plant species. In Arabidopsis thaliana there are seven BPC genes, which, except for BPC5, are expressed ubiquitously. BPC genes are functionally redundant in a wide range of developmental processes. Recently, we reported that BPC1 binds to guanine and adenine (GA)–rich consensus sequences in the SEEDSTICK (STK) promoter in vitro and induces conformational changes. Here we show by chromatin immunoprecipitation experiments that in vivo BPCs also bind to the consensus boxes, and when these were mutated, expression from the STK promoter was derepressed, resulting in ectopic expression in the inflorescence. We also reveal that SHORT VEGETATIVE PHASE (SVP) is a direct regulator of STK. SVP is a floral meristem identity gene belonging to the MADS box gene family. The SVP-APETALA1 (AP1) dimer recruits the SEUSS (SEU)-LEUNIG (LUG) transcriptional cosuppressor to repress floral homeotic gene expression in the floral meristem. Interestingly, we found that GA consensus sequences in the STK promoter to which BPCs bind are essential for recruitment of the corepressor complex to this promoter. Our data suggest that we have identified a new regulatory mechanism controlling plant gene expression that is probably generally used, when considering BPCs’ wide expression profile and the frequent presence of consensus binding sites in plant promoters. PMID:23054472
Ohno, S
1984-01-01
Three outstanding properties uniquely qualify repeats of base oligomers as the primordial coding sequences of all polypeptide chains. First, when compared with randomly generated base sequences in general, they are more likely to have long open reading frames. Second, periodical polypeptide chains specified by such repeats are more likely to assume either alpha-helical or beta-sheet secondary structures than are polypeptide chains of random sequence. Third, provided that the number of bases in the oligomeric unit is not a multiple of 3, these internally repetitious coding sequences are impervious to randomly sustained base substitutions, deletions, and insertions. This is because the recurring periodicity of their polypeptide chains is given by three consecutive copies of the oligomeric unit translated in three different reading frames. Accordingly, when one reading frame is open, the other two are automatically open as well, all three being capable of coding for polypeptide chains of identical periodicity. Under this circumstance, a frame shift due to the deletion or insertion of a number of bases that is not a multiple of 3 fails to alter the down-stream amino acid sequence, and even a base change causing premature chain-termination can silence only one of the three potential coding units. Newly arisen coding sequences in modern organisms are oligomeric repeats, and most of the older genes retain various vestiges of their original internal repetitions. Some of the genes (e.g., oncogenes) have even inherited the property of being impervious to randomly sustained base changes.
Cerveau, Nicolas; Jackson, Daniel J
2016-12-09
Next-generation sequencing (NGS) technologies are arguably the most revolutionary technical development to join the list of tools available to molecular biologists since PCR. For researchers working with nonconventional model organisms one major problem with the currently dominant NGS platform (Illumina) stems from the obligatory fragmentation of nucleic acid material that occurs prior to sequencing during library preparation. This step creates a significant bioinformatic challenge for accurate de novo assembly of novel transcriptome data. This challenge becomes apparent when a variety of modern assembly tools (of which there is no shortage) are applied to the same raw NGS dataset. With the same assembly parameters these tools can generate markedly different assembly outputs. In this study we present an approach that generates an optimized consensus de novo assembly of eukaryotic coding transcriptomes. This approach does not represent a new assembler, rather it combines the outputs of a variety of established assembly packages, and removes redundancy via a series of clustering steps. We test and validate our approach using Illumina datasets from six phylogenetically diverse eukaryotes (three metazoans, two plants and a yeast) and two simulated datasets derived from metazoan reference genome annotations. All of these datasets were assembled using three currently popular assembly packages (CLC, Trinity and IDBA-tran). In addition, we experimentally demonstrate that transcripts unique to one particular assembly package are likely to be bioinformatic artefacts. For all eight datasets our pipeline generates more concise transcriptomes that in fact possess more unique annotatable protein domains than any of the three individual assemblers we employed. Another measure of assembly completeness (using the purpose built BUSCO databases) also confirmed that our approach yields more information. Our approach yields coding transcriptome assemblies that are more likely to be closer to biological reality than any of the three individual assembly packages we investigated. This approach (freely available as a simple perl script) will be of use to researchers working with species for which there is little or no reference data against which the assembly of a transcriptome can be performed.
Modeling repetitive, non‐globular proteins
Basu, Koli; Campbell, Robert L.; Guo, Shuaiqi; Sun, Tianjun
2016-01-01
Abstract While ab initio modeling of protein structures is not routine, certain types of proteins are more straightforward to model than others. Proteins with short repetitive sequences typically exhibit repetitive structures. These repetitive sequences can be more amenable to modeling if some information is known about the predominant secondary structure or other key features of the protein sequence. We have successfully built models of a number of repetitive structures with novel folds using knowledge of the consensus sequence within the sequence repeat and an understanding of the likely secondary structures that these may adopt. Our methods for achieving this success are reviewed here. PMID:26914323
Badaut, Cyril; Bertin, Gwladys; Rustico, Tatiana; Fievet, Nadine; Massougbodji, Achille; Gaye, Alioune; Deloron, Philippe
2010-01-01
Background Placental malaria is a disease linked to the sequestration of Plasmodium falciparum infected red blood cells (IRBC) in the placenta, leading to reduced materno-fetal exchanges and to local inflammation. One of the virulence factors of P. falciparum involved in cytoadherence to chondroitin sulfate A, its placental receptor, is the adhesive protein VAR2CSA. Its localisation on the surface of IRBC makes it accessible to the immune system. VAR2CSA contains six DBL domains. The DBL6ε domain is the most variable. High variability constitutes a means for the parasite to evade the host immune response. The DBL6ε domain could constitute a very attractive basis for a vaccine candidate but its reported variability necessitates, for antigenic characterisations, identifying and classifying commonalities across isolates. Methodology/Principal Findings Local alignment analysis of the DBL6ε domain had revealed that it is not as variable as previously described. Variability is concentrated in seven regions present on the surface of the DBL6ε domain. The main goal of our work is to classify and group variable sequences that will simplify further research to determine dominant epitopes. Firstly, variable sequences were grouped following their average percent pairwise identity (APPI). Groups comprising many variable sequences sharing low variability were found. Secondly, ELISA experiments following the IgG recognition of a recombinant DBL6ε domain, and of peptides mimicking its seven variable blocks, allowed to determine an APPI cut-off and to isolate groups represented by a single consensus sequence. Conclusions/Significance A new sequence approach is used to compare variable regions in sequences that have extensive segmental gene relationship. Using this approach, the VAR2CSA DBL6 domain is composed of 7 variable blocks with limited polymorphism. Each variable block is composed of a limited number of consensus types. Based on peptide based ELISA, variable blocks with 85% or greater sequence identity are expected to be recognized equally well by antibody and can be considered the same consensus type. Therefore, the analysis of the antibody response against the classified small number of sequences should be helpful to determine epitopes. PMID:20585655
Nanoplatforms for highly sensitive fluorescence detection of cancer-related proteases.
Wang, Hongwang; Udukala, Dinusha N; Samarakoon, Thilani N; Basel, Matthew T; Kalita, Mausam; Abayaweera, Gayani; Manawadu, Harshi; Malalasekera, Aruni; Robinson, Colette; Villanueva, David; Maynez, Pamela; Bossmann, Leonie; Riedy, Elizabeth; Barriga, Jenny; Wang, Ni; Li, Ping; Higgins, Daniel A; Zhu, Gaohong; Troyer, Deryl L; Bossmann, Stefan H
2014-02-01
Numerous proteases are known to be necessary for cancer development and progression including matrix metalloproteinases (MMPs), tissue serine proteases, and cathepsins. The goal of this research is to develop an Fe/Fe3O4 nanoparticle-based system for clinical diagnostics, which has the potential to measure the activity of cancer-associated proteases in biospecimens. Nanoparticle-based "light switches" for measuring protease activity consist of fluorescent cyanine dyes and porphyrins that are attached to Fe/Fe3O4 nanoparticles via consensus sequences. These consensus sequences can be cleaved in the presence of the correct protease, thus releasing a fluorescent dye from the Fe/Fe3O4 nanoparticle, resulting in highly sensitive (down to 1 × 10(-16) mol l(-1) for 12 proteases), selective, and fast nanoplatforms (required time: 60 min).
Carlson, Jonathan M.; Chan, Benjamin; Chopera, Denis R.; Brumme, Chanson J.; Markle, Tristan J.; Martin, Eric; Shahid, Aniqa; Anmole, Gursev; Mwimanzi, Philip; Nassab, Pauline; Penney, Kali A.; Rahman, Manal A.; Milloy, M.-J.; Schechter, Martin T.; Markowitz, Martin; Carrington, Mary; Walker, Bruce D.; Wagner, Theresa; Buchbinder, Susan; Fuchs, Jonathan; Koblin, Beryl; Mayer, Kenneth H.; Harrigan, P. Richard; Brockman, Mark A.; Poon, Art F. Y.; Brumme, Zabrina L.
2014-01-01
HLA-restricted immune escape mutations that persist following HIV transmission could gradually spread through the viral population, thereby compromising host antiviral immunity as the epidemic progresses. To assess the extent and phenotypic impact of this phenomenon in an immunogenetically diverse population, we genotypically and functionally compared linked HLA and HIV (Gag/Nef) sequences from 358 historic (1979–1989) and 382 modern (2000–2011) specimens from four key cities in the North American epidemic (New York, Boston, San Francisco, Vancouver). Inferred HIV phylogenies were star-like, with approximately two-fold greater mean pairwise distances in modern versus historic sequences. The reconstructed epidemic ancestral (founder) HIV sequence was essentially identical to the North American subtype B consensus. Consistent with gradual diversification of a “consensus-like” founder virus, the median “background” frequencies of individual HLA-associated polymorphisms in HIV (in individuals lacking the restricting HLA[s]) were ∼2-fold higher in modern versus historic HIV sequences, though these remained notably low overall (e.g. in Gag, medians were 3.7% in the 2000s versus 2.0% in the 1980s). HIV polymorphisms exhibiting the greatest relative spread were those restricted by protective HLAs. Despite these increases, when HIV sequences were analyzed as a whole, their total average burden of polymorphisms that were “pre-adapted” to the average host HLA profile was only ∼2% greater in modern versus historic eras. Furthermore, HLA-associated polymorphisms identified in historic HIV sequences were consistent with those detectable today, with none identified that could explain the few HIV codons where the inferred epidemic ancestor differed from the modern consensus. Results are therefore consistent with slow HIV adaptation to HLA, but at a rate unlikely to yield imminent negative implications for cellular immunity, at least in North America. Intriguingly, temporal changes in protein activity of patient-derived Nef (though not Gag) sequences were observed, suggesting functional implications of population-level HIV evolution on certain viral proteins. PMID:24762668
Comparison of simple sequence repeats in 19 Archaea.
Trivedi, S
2006-12-05
All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.
Association of Amine-Receptor DNA Sequence Variants with Associative Learning in the Honeybee.
Lagisz, Malgorzata; Mercer, Alison R; de Mouzon, Charlotte; Santos, Luana L S; Nakagawa, Shinichi
2016-03-01
Octopamine- and dopamine-based neuromodulatory systems play a critical role in learning and learning-related behaviour in insects. To further our understanding of these systems and resulting phenotypes, we quantified DNA sequence variations at six loci coding octopamine-and dopamine-receptors and their association with aversive and appetitive learning traits in a population of honeybees. We identified 79 polymorphic sequence markers (mostly SNPs and a few insertions/deletions) located within or close to six candidate genes. Intriguingly, we found that levels of sequence variation in the protein-coding regions studied were low, indicating that sequence variation in the coding regions of receptor genes critical to learning and memory is strongly selected against. Non-coding and upstream regions of the same genes, however, were less conserved and sequence variations in these regions were weakly associated with between-individual differences in learning-related traits. While these associations do not directly imply a specific molecular mechanism, they suggest that the cross-talk between dopamine and octopamine signalling pathways may influence olfactory learning and memory in the honeybee.
Pastor, D; Amaya, W; García-Olcina, R; Sales, S
2007-07-01
We present a simple theoretical model of and the experimental verification for vanishing of the autocorrelation peak due to wavelength detuning on the coding-decoding process of coherent direct sequence optical code multiple access systems based on a superstructured fiber Bragg grating. Moreover, the detuning vanishing effect has been explored to take advantage of this effect and to provide an additional degree of multiplexing and/or optical code tuning.
NASA Astrophysics Data System (ADS)
Leukhin, Anatolii N.
2005-08-01
The algebraic solution of a 'complex' problem of synthesis of phase-coded (PC) sequences with the zero level of side lobes of the cyclic autocorrelation function (ACF) is proposed. It is shown that the solution of the synthesis problem is connected with the existence of difference sets for a given code dimension. The problem of estimating the number of possible code combinations for a given code dimension is solved. It is pointed out that the problem of synthesis of PC sequences is related to the fundamental problems of discrete mathematics and, first of all, to a number of combinatorial problems, which can be solved, as the number factorisation problem, by algebraic methods by using the theory of Galois fields and groups.
Evaluating the protein coding potential of exonized transposable element sequences
Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King
2007-01-01
Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). PMID:18036258
Torres-Escobar, Ascención; Juárez-Rodríguez, María Dolores; Lamont, Richard J.
2013-01-01
Autoinducer-2 (AI-2) is required for biofilm formation and virulence of the oral pathogen Aggregatibacter actinomycetemcomitans, and we previously showed that lsrB codes for a receptor for AI-2. The lsrB gene is expressed as part of the lsrACDBFG operon, which is divergently transcribed from an adjacent lsrRK operon. In Escherichia coli, lsrRK encodes a repressor and AI-2 kinase that function to regulate lsrACDBFG. To determine if lsrRK controls lsrACDBFG expression and influences biofilm growth of A. actinomycetemcomitans, we first defined the promoters for each operon. Transcriptional reporter plasmids containing the 255-bp lsrACDBFG-lsrRK intergenic region (IGR) fused to lacZ showed that essential elements of lsrR promoter reside 89 to 255 bp upstream from the lsrR start codon. Two inverted repeat sequences that represent potential binding sites for LsrR and two sequences resembling the consensus cyclic AMP receptor protein (CRP) binding site were identified in this region. Using electrophoretic mobility shift assay (EMSA), purified LsrR and CRP proteins were shown to bind probes containing these sequences. Surprisingly, the 255-bp IGR did not contain the lsrA promoter. Instead, a fragment encompassing nucleotides +1 to +159 of lsrA together with the 255-bp IGR was required to promote lsrA transcription. This suggests that a region within the lsrA coding sequence influences transcription, or alternatively that the start codon of A. actinomycetemcomitans lsrA has been incorrectly annotated. Transformation of ΔlsrR, ΔlsrK, ΔlsrRK, and Δcrp deletion mutants with lacZ reporters containing the lsrA or lsrR promoter showed that LsrR negatively regulates and CRP positively regulates both lsrACDBFG and lsrRK. However, in contrast to what occurs in E. coli, deletion of lsrK had no effect on the transcriptional activity of the lsrA or lsrR promoters, suggesting that another kinase may be capable of phosphorylating AI-2 in A. actinomycetemcomitans. Finally, biofilm formation of the ΔlsrR, ΔlsrRK, and Δcrp mutants was significantly reduced relative to that of the wild type, indicating that proper regulation of the lsr locus is required for optimal biofilm growth by A. actinomycetemcomitans. PMID:23104800
Wright, Imogen A.; Travers, Simon A.
2014-01-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. PMID:24861618
Genomic Sequence around Butterfly Wing Development Genes: Annotation and Comparative Analysis
Conceição, Inês C.; Long, Anthony D.; Gruber, Jonathan D.; Beldade, Patrícia
2011-01-01
Background Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. Methodology/Principal Findings We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations) and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes). Conclusions The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1) the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2) the high conservation of non-coding sequence around the genes wingless and Ecdysone receptor, both involved in multiple developmental processes including wing pattern formation. PMID:21909358
Speech processing using conditional observable maximum likelihood continuity mapping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, John; Nix, David
A computer implemented method enables the recognition of speech and speech characteristics. Parameters are initialized of first probability density functions that map between the symbols in the vocabulary of one or more sequences of speech codes that represent speech sounds and a continuity map. Parameters are also initialized of second probability density functions that map between the elements in the vocabulary of one or more desired sequences of speech transcription symbols and the continuity map. The parameters of the probability density functions are then trained to maximize the probabilities of the desired sequences of speech-transcription symbols. A new sequence ofmore » speech codes is then input to the continuity map having the trained first and second probability function parameters. A smooth path is identified on the continuity map that has the maximum probability for the new sequence of speech codes. The probability of each speech transcription symbol for each input speech code can then be output.« less
SequenceL: Automated Parallel Algorithms Derived from CSP-NT Computational Laws
NASA Technical Reports Server (NTRS)
Cooke, Daniel; Rushton, Nelson
2013-01-01
With the introduction of new parallel architectures like the cell and multicore chips from IBM, Intel, AMD, and ARM, as well as the petascale processing available for highend computing, a larger number of programmers will need to write parallel codes. Adding the parallel control structure to the sequence, selection, and iterative control constructs increases the complexity of code development, which often results in increased development costs and decreased reliability. SequenceL is a high-level programming language that is, a programming language that is closer to a human s way of thinking than to a machine s. Historically, high-level languages have resulted in decreased development costs and increased reliability, at the expense of performance. In recent applications at JSC and in industry, SequenceL has demonstrated the usual advantages of high-level programming in terms of low cost and high reliability. SequenceL programs, however, have run at speeds typically comparable with, and in many cases faster than, their counterparts written in C and C++ when run on single-core processors. Moreover, SequenceL is able to generate parallel executables automatically for multicore hardware, gaining parallel speedups without any extra effort from the programmer beyond what is required to write the sequen tial/singlecore code. A SequenceL-to-C++ translator has been developed that automatically renders readable multithreaded C++ from a combination of a SequenceL program and sample data input. The SequenceL language is based on two fundamental computational laws, Consume-Simplify- Produce (CSP) and Normalize-Trans - pose (NT), which enable it to automate the creation of parallel algorithms from high-level code that has no annotations of parallelism whatsoever. In our anecdotal experience, SequenceL development has been in every case less costly than development of the same algorithm in sequential (that is, single-core, single process) C or C++, and an order of magnitude less costly than development of comparable parallel code. Moreover, SequenceL not only automatically parallelizes the code, but since it is based on CSP-NT, it is provably race free, thus eliminating the largest quality challenge the parallelized software developer faces.
Amexis, Georgios; Oeth, Paul; Abel, Kenneth; Ivshina, Anna; Pelloquin, Francois; Cantor, Charles R.; Braun, Andreas; Chumakov, Konstantin
2001-01-01
RNA viruses exist as quasispecies, heterogeneous and dynamic mixtures of mutants having one or more consensus sequences. An adequate description of the genomic structure of such viral populations must include the consensus sequence(s) plus a quantitative assessment of sequence heterogeneities. For example, in quality control of live attenuated viral vaccines, the presence of even small quantities of mutants or revertants may indicate incomplete or unstable attenuation that may influence vaccine safety. Previously, we demonstrated the monitoring of oral poliovirus vaccine with the use of mutant analysis by PCR and restriction enzyme cleavage (MAPREC). In this report, we investigate genetic variation in live attenuated mumps virus vaccine by using both MAPREC and a platform (DNA MassArray) based on matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry. Mumps vaccines prepared from the Jeryl Lynn strain typically contain at least two distinct viral substrains, JL1 and JL2, which have been characterized by full length sequencing. We report the development of assays for characterizing sequence variants in these substrains and demonstrate their use in quantitative analysis of substrains and sequence variations in mixed virus cultures and mumps vaccines. The results obtained from both the MAPREC and MALDI-TOF methods showed excellent correlation. This suggests the potential utility of MALDI-TOF for routine quality control of live viral vaccines and for assessment of genetic stability and quantitative monitoring of genetic changes in other RNA viruses of clinical interest. PMID:11593021
ANN modeling of DNA sequences: new strategies using DNA shape code.
Parbhane, R V; Tambe, S S; Kulkarni, B D
2000-09-01
Two new encoding strategies, namely, wedge and twist codes, which are based on the DNA helical parameters, are introduced to represent DNA sequences in artificial neural network (ANN)-based modeling of biological systems. The performance of the new coding strategies has been evaluated by conducting three case studies involving mapping (modeling) and classification applications of ANNs. The proposed coding schemes have been compared rigorously and shown to outperform the existing coding strategies especially in situations wherein limited data are available for building the ANN models.
The primitive code and repeats of base oligomers as the primordial protein-encoding sequence.
Ohno, S; Epplen, J T
1983-01-01
Even if the prebiotic self-replication of nucleic acids and the subsequent emergence of primitive, enzyme-independent tRNAs are accepted as plausible, the origin of life by spontaneous generation still appears improbable. This is because the just-emerged primitive translational machinery had to cope with base sequences that were not preselected for their coding potentials. Particularly if the primitive mitochondria-like code with four chain-terminating base triplets preceded the universal code, the translation of long, randomly generated, base sequences at this critical stage would have merely resulted in the production of short oligopeptides instead of long polypeptide chains. We present the base sequence of a mouse transcript containing tetranucleotide repeats conserved during evolution. Even if translated in accordance with the primitive mitochondria-like code, this transcript in its three reading frames can yield 245-, 246-, and 251-residue-long tetrapeptidic periodical polypeptides that are already acquiring longer periodicities. We contend that the first set of base sequences translated at the beginning of life were such oligonucleotide repeats. By quickly acquiring longer periodicities, their products must have soon gained characteristic secondary structures--alpha-helical or beta-sheet or both. PMID:6574491
American Society for Colposcopy and Cervical Pathology
... Colposcopy Standards Recommendations Patient Resources Journal Membership Member Benefits Join/Renew Member Resources Careers About History Bylaws ... MD, MS, Thomas C. Wright, Jr, MD ASCCP Mobile App Updated Consensus Guidelines for Managing Abnormal Cervical Cancer Screening Tests and Cancer ... * Email: * Enter code: * Message: Thank you Your ...
Klomp, Johanna M; Ouwerkerk-Noordam, Elisabeth; Boon, Mathilde E; van Haaften, Maarten; Heintz, A Peter M
2009-01-01
To evaluate cytologic diagnoses of dysbacteriosis and Gardnerella infection and to obtain insight into the diagnostic problems of Gardnerella. One hundred randomly selected samples of each of 3 diagnostic series were rescreened by 2 pathologists, resulting in 2 rescreening diagnoses and a consensus diagnosis. A smear was considered unequivocal when the original O code and the O code of the consensus diagnoses were equal and discordant when the flora diagnoses of the 2 pathologists differed. Discordance was highest in the dysbacteriotic series (20%) and lowest in the healthy group (4%). Unequivocal diagnoses were established in 65% of the dysbacteriotic smears, 80% of the Gardnerella smears and 93% of the healthy smears. Misclassification of Gardnerella occurred in the presence of clusters of bacteria mixed with spermatozoa. Blue mountain cells in Gardnerella infection can be identified unequivocally in cervical smears. Because of the clinical importance of treating Gardnerella, such advantageous spin-offs of cervical screening should be exploited.
Accurate Typing of Human Leukocyte Antigen Class I Genes by Oxford Nanopore Sequencing.
Liu, Chang; Xiao, Fangzhou; Hoisington-Lopez, Jessica; Lang, Kathrin; Quenzel, Philipp; Duffy, Brian; Mitra, Robi David
2018-04-03
Oxford Nanopore Technologies' MinION has expanded the current DNA sequencing toolkit by delivering long read lengths and extreme portability. The MinION has the potential to enable expedited point-of-care human leukocyte antigen (HLA) typing, an assay routinely used to assess the immunologic compatibility between organ donors and recipients, but the platform's high error rate makes it challenging to type alleles with accuracy. We developed and validated accurate typing of HLA by Oxford nanopore (Athlon), a bioinformatic pipeline that i) maps nanopore reads to a database of known HLA alleles, ii) identifies candidate alleles with the highest read coverage at different resolution levels that are represented as branching nodes and leaves of a tree structure, iii) generates consensus sequences by remapping the reads to the candidate alleles, and iv) calls the final diploid genotype by blasting consensus sequences against the reference database. Using two independent data sets generated on the R9.4 flow cell chemistry, Athlon achieved a 100% accuracy in class I HLA typing at the two-field resolution. Copyright © 2018 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Zhang, Lin-Lin; Tan, Mei-Juan; Liu, Guang-Lei; Chi, Zhe; Wang, Guang-Yuan; Chi, Zhen-Ming
2015-04-01
The INU1 gene encoding an exo-inulinase from the marine-derived yeast Candida membranifaciens subsp. flavinogenie W14-3 was cloned and characterized. It had an open reading frame of 1,536 bp long encoding an inulinase. The coding region of it was not interrupted by any intron. The cloned gene encoded 512 amino acid residues of a protein with a putative signal peptide of 23 amino acids and a calculated molecular mass of 57.8 kDa. The protein sequence deduced from the inulinase gene contained the inulinase consensus sequences (WMNDPNGL), (RDP), ECP FS and Q. The protein also had six conserved putative N-glycosylation sites. The deduced inulinase from the yeast strain W14-3 was found to be closely related to that from Candida kutaonensis sp. nov. KRF1, Kluyveromyces marxianus, and Cryptococcus aureus G7a. The inulinase gene with its signal peptide encoding sequence was subcloned into the pMIRSC11 expression vector and expressed in Saccharomyces sp. W0. The recombinant yeast strain W14-3-INU-112 obtained could produce 16.8 U/ml of inulinase activity and 12.5 % (v/v) ethanol from 250 g/l of inulin within 168 h. The monosaccharides were detected after the hydrolysis of inulin with the crude inulinase (the yeast culture). All the results indicated that the cloned gene and the recombinant yeast strain W14-3-INU-112 had potential applications in biotechnology.
Kuzuoka, M; Takahashi, T; Guron, C; Raghow, R
1994-05-01
Detailed molecular organization of the coding and upstream regulatory regions of the murine homeodomain-containing gene, Msx-1, is reported. The protein-encoding portion of the gene is contained in two exons, 590 and 1214 bp in length, separated by a 2107-bp intron; the homeodomain is located in the second exon. The two-exon organization of the murine Msx-1 gene resembles a number of other homeodomain-containing genes. The 5'-(GTAAGT) and 3'-(CCCTAG) splicing junctions and the mRNA polyadenylation signal (UAUAA) of the murine Msx-1 gene are also characteristic of other vertebrate genes. By nuclease protection and primer extension assays, the start of transcription of the Msx-1 gene was located 256 bp upstream of the first AUG. Computer analysis of the promoter proximal 1280-bp sequence revealed a number of potentially important cis-regulatory sequences; these include the recognition elements for Ap-1, Ap-2, Ap-3, Sp-1, a possible binding site for RAR:RXR, and a number of TCF-1 consensus motifs. Importantly, a perfect reverse complement of (C/G)TTAATTG, which was recently shown to be an optimal binding sequence for the homeodomain of Msx-1 protein (K.M. Catron, N. Iler, and C. Abate (1993) Mol. Cell. Biol. 13:2354-2365), was also located in the murine Msx-1 promoter. Binding of bacterially expressed Msx-1 homeodomain polypeptide to Msx-1-specific oligonucleotide was experimentally demonstrated, raising a distinct possibility of autoregulation of this developmentally regulated gene.
Governing sexual behaviour through humanitarian codes of conduct.
Matti, Stephanie
2015-10-01
Since 2001, there has been a growing consensus that sexual exploitation and abuse of intended beneficiaries by humanitarian workers is a real and widespread problem that requires governance. Codes of conduct have been promoted as a key mechanism for governing the sexual behaviour of humanitarian workers and, ultimately, preventing sexual exploitation and abuse (PSEA). This article presents a systematic study of PSEA codes of conduct adopted by humanitarian non-governmental organisations (NGOs) and how they govern the sexual behaviour of humanitarian workers. It draws on Foucault's analytics of governance and speech act theory to examine the findings of a survey of references to codes of conduct made on the websites of 100 humanitarian NGOs, and to analyse some features of the organisation-specific PSEA codes identified. © 2015 The Author(s). Disasters © Overseas Development Institute, 2015.
Genomics dataset of unidentified disclosed isolates.
Rekadwad, Bhagwan N
2016-09-01
Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.
Hazes, Bart
2014-02-28
Protein-coding DNA sequences and their corresponding amino acid sequences are routinely used to study relationships between sequence, structure, function, and evolution. The rapidly growing size of sequence databases increases the power of such comparative analyses but it makes it more challenging to prepare high quality sequence data sets with control over redundancy, quality, completeness, formatting, and labeling. Software tools for some individual steps in this process exist but manual intervention remains a common and time consuming necessity. CDSbank is a database that stores both the protein-coding DNA sequence (CDS) and amino acid sequence for each protein annotated in Genbank. CDSbank also stores Genbank feature annotation, a flag to indicate incomplete 5' and 3' ends, full taxonomic data, and a heuristic to rank the scientific interest of each species. This rich information allows fully automated data set preparation with a level of sophistication that aims to meet or exceed manual processing. Defaults ensure ease of use for typical scenarios while allowing great flexibility when needed. Access is via a free web server at http://hazeslab.med.ualberta.ca/CDSbank/. CDSbank presents a user-friendly web server to download, filter, format, and name large sequence data sets. Common usage scenarios can be accessed via pre-programmed default choices, while optional sections give full control over the processing pipeline. Particular strengths are: extract protein-coding DNA sequences just as easily as amino acid sequences, full access to taxonomy for labeling and filtering, awareness of incomplete sequences, and the ability to take one protein sequence and extract all synonymous CDS or identical protein sequences in other species. Finally, CDSbank can also create labeled property files to, for instance, annotate or re-label phylogenetic trees.
Domier, L L; Latorre, I J; Steinlage, T A; McCoppin, N; Hartman, G L
2003-10-01
The variability of North American and Asian strains and isolates of Soybean mosaic virus was investigated. First, polymerase chain reaction (PCR) products representing the coat protein (CP)-coding regions of 38 SMVs were analyzed for restriction fragment length polymorphisms (RFLP). Second, the nucleotide and predicted amino acid sequence variability of the P1-coding region of 18 SMVs and the helper component/protease (HC/Pro) and CP-coding regions of 25 SMVs were assessed. The CP nucleotide and predicted amino acid sequences were the most similar and predicted phylogenetic relationships similar to those obtained from RFLP analysis. Neither RFLP nor sequence analyses of the CP-coding regions grouped the SMVs by geographical origin. The P1 and HC/Pro sequences were more variable and separated the North American and Asian SMV isolates into two groups similar to previously reported differences in pathogenic diversity of the two sets of SMV isolates. The P1 region was the most informative of the three regions analyzed. To assess the biological relevance of the sequence differences in the HC/Pro and CP coding regions, the transmissibility of 14 SMV isolates by Aphis glycines was tested. All field isolates of SMV were transmitted efficiently by A. glycines, but the laboratory isolates analyzed were transmitted poorly. The amino acid sequences from most, but not all, of the poorly transmitted isolates contained mutations in the aphid transmission-associated DAG and/or KLSC amino acid sequence motifs of CP and HC/Pro, respectively.
Wentz, Travis G.; Muruvanda, Tim; Thirunavukkarasu, Nagarajan; Hoffmann, Maria; Allard, Marc W.; Hodge, David R.; Pillai, Segaran P.; Hammack, Thomas S.; Brown, Eric W.
2017-01-01
ABSTRACT Clostridial neurotoxins, including botulinum and tetanus neurotoxins, are among the deadliest known bacterial toxins. Until recently, the horizontal mobility of this toxin gene family appeared to be limited to the genus Clostridium. We report here the closed genome sequence of Chryseobacterium piperi, a Gram-negative bacterium containing coding sequences with homology to clostridial neurotoxin family proteins. PMID:29192076
Schwientek, Patrick; Neshat, Armin; Kalinowski, Jörn; Klein, Andreas; Rückert, Christian; Schneiker-Bekel, Susanne; Wendler, Sergej; Stoye, Jens; Pühler, Alfred
2014-11-20
Actinoplanes sp. SE50/110 is the producer of the alpha-glucosidase inhibitor acarbose, which is an economically relevant and potent drug in the treatment of type-2 diabetes mellitus. In this study, we present the detection of transcription start sites on this genome by sequencing enriched 5'-ends of primary transcripts. Altogether, 1427 putative transcription start sites were initially identified. With help of the annotated genome sequence, 661 transcription start sites were found to belong to the leader region of protein-coding genes with the surprising result that roughly 20% of these genes rank among the class of leaderless transcripts. Next, conserved promoter motifs were identified for protein-coding genes with and without leader sequences. The mapped transcription start sites were finally used to improve the annotation of the Actinoplanes sp. SE50/110 genome sequence. Concerning protein-coding genes, 41 translation start sites were corrected and 9 novel protein-coding genes could be identified. In addition to this, 122 previously undetermined non-coding RNA (ncRNA) genes of Actinoplanes sp. SE50/110 were defined. Focusing on antisense transcription start sites located within coding genes or their leader sequences, it was discovered that 96 of those ncRNA genes belong to the class of antisense RNA (asRNA) genes. The remaining 26 ncRNA genes were found outside of known protein-coding genes. Four chosen examples of prominent ncRNA genes, namely the transfer messenger RNA gene ssrA, the ribonuclease P class A RNA gene rnpB, the cobalamin riboswitch RNA gene cobRS, and the selenocysteine-specific tRNA gene selC, are presented in more detail. This study demonstrates that sequencing of enriched 5'-ends of primary transcripts and the identification of transcription start sites are valuable tools for advanced genome annotation of Actinoplanes sp. SE50/110 and most probably also for other bacteria. Copyright © 2014 Elsevier B.V. All rights reserved.
Rouanet, Carine; Reverchon, Sylvie; Rodionov, Dmitry A; Nasser, William
2004-07-16
In Erwinia chrysanthemi, production of pectic enzymes is modulated by a complex network involving several regulators. One of them, PecS, which belongs to the MarR family, also controls the synthesis of various other virulence factors, such as cellulases and indigoidine. Here, the PecS consensus-binding site is defined by combining a systematic evolution of ligands by an exponential enrichment approach and mutational analyses. The consensus consists of a 23-base pair palindromic-like sequence (C(-11)G(-10)A(-9)N(-8)W(-7)T(-6)C(-5)G(-4)T(-3)A(-2))T(-1)A(0)T(1)(T(2)A(3)C(4)G(5)A(6)N(7)N(8)N(9)C(10)G(11)). Mutational experiments revealed that (i) the palindromic organization is required for the binding of PecS, (ii) the very conserved part of the consensus (-6 to 6) allows for a specific interaction with PecS, but the presence of the relatively degenerated bases located apart significantly increases PecS affinity, (iii) the four bases G, A, T, and C are required for efficient binding of PecS, and (iv) the presence of several binding sites on the same promoter increases the affinity of PecS. This consensus is detected in the regions involved in PecS binding on the previously characterized target genes. This variable consensus is in agreement with the observation that the members of the MarR family are able to bind various DNA targets as dimers by means of a winged helix DNA-binding motif. Binding of PecS on a promoter region containing the defined consensus results in a repression of gene transcription in vitro. Preliminary scanning of the E. chrysanthemi genome sequence with the consensus revealed the presence of strong PecS-binding sites in the intergenic region between fliE and fliFGHIJKLMNOPQR which encode proteins involved in the biogenesis of flagellum. Accordingly, PecS directly represses fliE expression. Thus, PecS seems to control the synthesis of virulence factors required for the key steps of plant infection.
Motion Detection in Ultrasound Image-Sequences Using Tensor Voting
NASA Astrophysics Data System (ADS)
Inba, Masafumi; Yanagida, Hirotaka; Tamura, Yasutaka
2008-05-01
Motion detection in ultrasound image sequences using tensor voting is described. We have been developing an ultrasound imaging system adopting a combination of coded excitation and synthetic aperture focusing techniques. In our method, frame rate of the system at distance of 150 mm reaches 5000 frame/s. Sparse array and short duration coded ultrasound signals are used for high-speed data acquisition. However, many artifacts appear in the reconstructed image sequences because of the incompleteness of the transmitted code. To reduce the artifacts, we have examined the application of tensor voting to the imaging method which adopts both coded excitation and synthetic aperture techniques. In this study, the basis of applying tensor voting and the motion detection method to ultrasound images is derived. It was confirmed that velocity detection and feature enhancement are possible using tensor voting in the time and space of simulated ultrasound three-dimensional image sequences.
Novel aromatic ring-hydroxylating dioxygenase genes from coastal marine sediments of Patagonia
Lozada, Mariana; Riva Mercadal, Juan P; Guerrero, Leandro D; Di Marzio, Walter D; Ferrero, Marcela A; Dionisi, Hebe M
2008-01-01
Background Polycyclic aromatic hydrocarbons (PAHs), widespread pollutants in the marine environment, can produce adverse effects in marine organisms and can be transferred to humans through seafood. Our knowledge of PAH-degrading bacterial populations in the marine environment is still very limited, and mainly originates from studies of cultured bacteria. In this work, genes coding catabolic enzymes from PAH-biodegradation pathways were characterized in coastal sediments of Patagonia with different levels of PAH contamination. Results Genes encoding for the catalytic alpha subunit of aromatic ring-hydroxylating dioxygenases (ARHDs) were amplified from intertidal sediment samples using two different primer sets. Products were cloned and screened by restriction fragment length polymorphism analysis. Clones representing each restriction pattern were selected in each library for sequencing. A total of 500 clones were screened in 9 gene libraries, and 193 clones were sequenced. Libraries contained one to five different ARHD gene types, and this number was correlated with the number of PAHs found in the samples above the quantification limit (r = 0.834, p < 0.05). Overall, eight different ARHD gene types were detected in the sediments. In five of them, their deduced amino acid sequences formed deeply rooted branches with previously described ARHD peptide sequences, exhibiting less than 70% identity to them. They contain consensus sequences of the Rieske type [2Fe-2S] cluster binding site, suggesting that these gene fragments encode for ARHDs. On the other hand, three gene types were closely related to previously described ARHDs: archetypical nahAc-like genes, phnAc-like genes as identified in Alcaligenes faecalis AFK2, and phnA1-like genes from marine PAH-degraders from the genus Cycloclasticus. Conclusion These results show the presence of hitherto unidentified ARHD genes in this sub-Antarctic marine environment exposed to anthropogenic contamination. This information can be used to study the geographical distribution and ecological significance of bacterial populations carrying these genes, and to design molecular assays to monitor the progress and effectiveness of remediation technologies. PMID:18366740
Urvoas, Agathe; Guellouz, Asma; Valerio-Lepiniec, Marie; Graille, Marc; Durand, Dominique; Desravines, Danielle C; van Tilbeurgh, Herman; Desmadril, Michel; Minard, Philippe
2010-11-26
Repeat proteins have a modular organization and a regular architecture that make them attractive models for design and directed evolution experiments. HEAT repeat proteins, although very common, have not been used as a scaffold for artificial proteins, probably because they are made of long and irregular repeats. Here, we present and validate a consensus sequence for artificial HEAT repeat proteins. The sequence was defined from the structure-based sequence analysis of a thermostable HEAT-like repeat protein. Appropriate sequences were identified for the N- and C-caps. A library of genes coding for artificial proteins based on this sequence design, named αRep, was assembled using new and versatile methodology based on circular amplification. Proteins picked randomly from this library are expressed as soluble proteins. The biophysical properties of proteins with different numbers of repeats and different combinations of side chains in hypervariable positions were characterized. Circular dichroism and differential scanning calorimetry experiments showed that all these proteins are folded cooperatively and are very stable (T(m) >70 °C). Stability of these proteins increases with the number of repeats. Detailed gel filtration and small-angle X-ray scattering studies showed that the purified proteins form either monomers or dimers. The X-ray structure of a stable dimeric variant structure was solved. The protein is folded with a highly regular topology and the repeat structure is organized, as expected, as pairs of alpha helices. In this protein variant, the dimerization interface results directly from the variable surface enriched in aromatic residues located in the randomized positions of the repeats. The dimer was crystallized both in an apo and in a PEG-bound form, revealing a very well defined binding crevice and some structure flexibility at the interface. This fortuitous binding site could later prove to be a useful binding site for other low molecular mass partners. Copyright © 2010 Elsevier Ltd. All rights reserved.
Martínez, Lidia Mayorga; Orozco, Aurea; Villalobos, Patricia; Valverde-R, Carlos
2008-05-01
Thyroid hormone bioactivity is finely regulated at the cellular level by the peripheral iodothyronine deiodinases (D). The study of thyroid function in fish has been restricted mainly to teleosts, whereas the study and characterization of Ds have been overlooked in chondrichthyes. Here we report the cloning and operational characterization of both the native and the recombinant hepatic type 3 iodothyronine deiodinase in the tropical shark Chiloscyllium punctatum. Native and recombinant sD3 show identical catalytic activities: a strong preference for T3-inner-ring deiodination, a requirement for a high concentration of DTT, a sequential reaction mechanism, and resistance to PTU inhibition. The cloned cDNA contains 1298 nucleotides [excluding the poly(A) tail] and encodes a predicted protein of 259 amino acids. The triplet TGA coding for selenocysteine (Sec) is at position 123. The consensus selenocysteine insertion sequence (SECIS) was identified 228 bp upstream of the poly(A) tail and corresponds to form 2. The deduced amino acid sequence was 77% and 72% identical to other D3 cDNAs in fishes and other vertebrates, respectively. As in the case of other piscivore teleost species, shark expresses hepatic D3 through adulthood. This characteristic may be associated with the alimentary strategy in which the protection from an exogenous overload of thyroid hormones could be of physiological importance for thyroidal homeostasis.
Molecular characteristic and physiological role of DOPA-decarboxylase.
Guenter, Joanna; Lenartowski, Robert
2016-12-31
The enzyme DOPA decarboxylase (aromatic-L-amino-acid decarboxylase, DDC) plays an important role in the dopaminergic system and participates in the uptake and decarboxylation of amine precursors in the peripheral tissues. Apart from catecholamines, DDC catalyses the biosynthesis of serotonin and trace amines. It has been shown that the DDC amino acid sequence is highly evolutionarily conserved across many species. The activity of holoenzyme is regulated by stimulation/blockade of membrane receptors, phosphorylation of serine residues, and DDC interaction with regulatory proteins. A single gene codes for DDC both in neuronal and non-neuronal tissue, but synthesized isoforms of mRNA differ in the 5' UTR and in the presence of alternative exons. Tissue-specific expression of the DDC gene is controlled by two spatially distinct promoters - neuronal and non-neuronal. Several consensus sequences recognized by the HNF and POU family proteins have been mapped in the neuronal DDC promoter. Since DDC is located close to the imprinted gene cluster, its expression can be subjected to tightly controlled epigenetic regulation. Perturbations in DDC expression result in a range of neurodegenerative and psychiatric disorders and correlate with neoplasia. Apart from the above issues, the role of DDC in prostate cancer, bipolar affective disorder, Parkinson's disease and DDC deficiency is discussed in our review. Moreover, novel and prospective clinical treatments based on gene therapy and stem cells for the diseases mentioned above are described.
Favipiravir can evoke lethal mutagenesis and extinction of foot-and-mouth disease virus.
de Avila, Ana Isabel; Moreno, Elena; Perales, Celia; Domingo, Esteban
2017-04-02
Antiviral agents are increasingly considered an option for veterinary medicine. An understanding of their mechanism of activity is important to plan their administration either as monotherapy or in combination with other agents. Previous studies have shown that the broad spectrum antiviral agent favipiravir (T-705) and its derivatives T-1105 and T-1106 are efficient inhibitors of foot-and-mouth disease virus (FMDV) replication in cell culture and in vivo. However, no mechanism for their activity against FMDV has been proposed. In the present study we show that favipiravir (T-705) can act as a lethal mutagen for FMDV in cell culture. Evidence includes virus extinction associated with increase in mutation frequency in the mutant spectrum of 860 residues of the 3D (polymerase)-coding region, and a decrease of specific infectivity while the consensus nucleotide sequence of the region analyzed remained invariant. The mutational spectrum evoked by favipiravir differs from that observed with other viruses in that no predominant transition type is observed, indicating that a movement towards A,U- or G,C-rich regions of sequence space is not a prerequisite for virus extinction. We discuss prospects for the use of favipiravir to assist in the control of FMDV, and its possible broader use in veterinary medicine as an extension of its current status as antiviral agent for human influenza virus. Copyright © 2017 Elsevier B.V. All rights reserved.
Li, You-Hai; Han, Wen-Jin; Gui, Xi-Wu; Wei, Tao; Tang, Shuang-Yan; Jin, Jian-Ming
2016-08-02
Tentoxin, a cyclic tetrapeptide produced by several Alternaria species, inhibits the F₁-ATPase activity of chloroplasts, resulting in chlorosis in sensitive plants. In this study, we report two clustered genes, encoding a putative non-ribosome peptide synthetase (NRPS) TES and a cytochrome P450 protein TES1, that are required for tentoxin biosynthesis in Alternaria alternata strain ZJ33, which was isolated from blighted leaves of Eupatorium adenophorum. Using a pair of primers designed according to the consensus sequences of the adenylation domain of NRPSs, two fragments containing putative adenylation domains were amplified from A. alternata ZJ33, and subsequent PCR analyses demonstrated that these fragments belonged to the same NRPS coding sequence. With no introns, TES consists of a single 15,486 base pair open reading frame encoding a predicted 5161 amino acid protein. Meanwhile, the TES1 gene is predicted to contain five introns and encode a 506 amino acid protein. The TES protein is predicted to be comprised of four peptide synthase modules with two additional N-methylation domains, and the number and arrangement of the modules in TES were consistent with the number and arrangement of the amino acid residues of tentoxin, respectively. Notably, both TES and TES1 null mutants generated via homologous recombination failed to produce tentoxin. This study provides the first evidence concerning the biosynthesis of tentoxin in A. alternata.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schriner, J.E.; Yi, W.; Hofmann, S.L.
Palmitoyl-protein thioesterase (PPT) is a small glycoprotein that removes palmitate groups from cysteine residues in lipid-modified proteins. We recently reported mutations in PPT in patients with infantile neuronal ceroid lipofuscinosis (INCL), a severe neurodegenerative disorder. INCL is characterized by the accumulation of proteolipid storage material in brain and other tissues, suggesting that the disease is a consequence of abnormal catabolism of acylated proteins. In the current paper, we report the sequence of the human PPT cDNA and the structure of the human PPT gene. The cDNA predicts a protein of 306 amino acids that contains a 25-amino-acid signal peptide, threemore » N-linked glycosylation sites, and consensus motifs characteristic of thioesterases. Northern analysis of a human tissue blot revealed ubiquitous expression of a single 2.5-kb mRNA, with highest expression in lung, brain, and heart. The human PPT gene spans 25 kb and is composed of seven coding exons and a large eighth exon, containing the entire 3{prime}-untranslated region of 1388 bp. An Alu repeat and promoter elements corresponding to putative binding sites for several general transcription factors were identified in the 1060 nucleotides upstream of the transcription start site. The human PPT cDNA sequence and gene structure will provide the means for the identification of further causative mutations in INCL and facilitate genetic screening in selected high-risk populations. 31 refs., 5 figs., 1 tab.« less
Performance and limitations of administrative data in the identification of AKI.
Grams, Morgan E; Waikar, Sushrut S; MacMahon, Blaithin; Whelton, Seamus; Ballew, Shoshana H; Coresh, Josef
2014-04-01
Billing codes are frequently used to identify AKI events in epidemiologic research. The goals of this study were to validate billing code-identified AKI against the current AKI consensus definition and to ascertain whether sensitivity and specificity vary by patient characteristic or over time. The study population included 10,056 Atherosclerosis Risk in Communities study participants hospitalized between 1996 and 2008. Billing code-identified AKI was compared with the 2012 Kidney Disease Improving Global Outcomes (KDIGO) creatinine-based criteria (AKIcr) and an approximation of the 2012 KDIGO creatinine- and urine output-based criteria (AKIcr_uop) in a subset with available outpatient data. Sensitivity and specificity of billing code-identified AKI were evaluated over time and according to patient age, race, sex, diabetes status, and CKD status in 546 charts selected for review, with estimates adjusted for sampling technique. A total of 34,179 hospitalizations were identified; 1353 had a billing code for AKI. The sensitivity of billing code-identified AKI was 17.2% (95% confidence interval [95% CI], 13.2% to 21.2%) compared with AKIcr (n=1970 hospitalizations) and 11.7% (95% CI, 8.8% to 14.5%) compared with AKIcr_uop (n=1839 hospitalizations). Specificity was >98% in both cases. Sensitivity was significantly higher in the more recent time period (2002-2008) and among participants aged 65 years and older. Billing code-identified AKI captured a more severe spectrum of disease than did AKIcr and AKIcr_uop, with a larger proportion of patients with stage 3 AKI (34.9%, 19.7%, and 11.5%, respectively) and higher in-hospital mortality (41.2%, 18.7%, and 12.8%, respectively). The use of billing codes to identify AKI has low sensitivity compared with the current KDIGO consensus definition, especially when the urine output criterion is included, and results in the identification of a more severe phenotype. Epidemiologic studies using billing codes may benefit from a high specificity, but the variation in sensitivity may result in bias, particularly when trends over time are the outcome of interest.
The primary structure of the Saccharomyces cerevisiae gene for 3-phosphoglycerate kinase.
Hitzeman, R A; Hagie, F E; Hayflick, J S; Chen, C Y; Seeburg, P H; Derynck, R
1982-01-01
The DNA sequence of the gene for the yeast glycolytic enzyme, 3-phosphoglycerate kinase (PGK), has been obtained by sequencing part of a 3.1 kbp HindIII fragment obtained from the yeast genome. The structural gene sequence corresponds to a reading frame of 1251 bp coding for 416 amino acids with no intervening DNA sequences. The amino acid sequence is approximately 65 percent homologous with human and horse PGK protein sequences and is in general agreement with the published protein sequence for yeast PGK. As for other highly expressed structural genes in yeast, the coding sequence is highly codon biased with 95 percent of the amino acids coded for by a select 25 codons (out of 61 possible). Besides structural DNA sequence, 291 bp of 5'-flanking sequence and 286 bp of 3'-flanking sequence were determined. Transcription starts 36 nucleotides upstream from the translational start and stops 86-93 nucleotides downstream from the translational stop. These results suggest a non-polyadenylated mRNA length of 1373 to 1380 nucleotides, which is consistent with the observed length of 1500 nucleotides for polyadenylated PGK mRNA. A sequence TATATATAAA is found at 145 nucleotides upstream from the translational start. This sequence resembles the TATAAA box that is possibly associated with RNA polymerase II binding. Images PMID:6296791
Lenoir, A; Pélissier, T; Bousquet-Antonelli, C; Deragon, J M
2005-01-01
Brassica oleracea and Arabidopsis thaliana belong to the Brassicaceae(Cruciferae) family and diverged 16 to 19 million years ago. Although the genome size of B. oleracea (approximately 600 million base pairs) is more than four times that of A. thaliana (approximately 130 million base pairs), their gene content is believed to be very similar with more than 85% sequence identity in the coding region. Therefore, this important difference in genome size is likely to reflect a different rate of non-coding DNA accumulation. Transposable elements (TEs) constitute a major fraction of non-coding DNA in plant species. A different rate in TE accumulation between two closely related species can result in significant genome size variations in a short evolutionary period. Short interspersed elements (SINEs) are non-autonomous retroposons that have invaded the genome of most eukaryote species. Several SINE families are present in B. oleracea and A. thaliana and we found that two of them (called RathE1 and RathE2) are present in both species. In this study, the tempo of evolution of RathE1 and RathE2 SINE families in both species was compared. We observed that most B. oleracea RathE2 SINEs are "young" (close to the consensus sequence) and abundant while elements from this family are more degenerated and much less abundant in A. thaliana. However, the situation is different for the RathE1 SINE family for which the youngest elements are found in A. thaliana. Surprisingly, no SINE was found to occupy the same (orthologous) genomic locus in both species suggesting that either these SINE families were not amplified at a significant rate in the common ancestor of the two species or that older elements were lost and only the recent (lineage-specific) insertions remain. To test this latter hypothesis, loci containing a recently inserted SINE in the A. thaliana col-0 ecotype were selected and characterized in several other A. thaliana ecotypes. In addition to the expected SINE containing allele and the pre-integrative allele (i.e. the "empty" allele), we observed in the different ecotypes, alleles with truncated portions of the SINE (up to the complete loss of the element) and of the immediate genomic flanking sequences. The absence of SINEs in orthologous positions between B. oleracea and A. thaliana and the presence in recently diverged A. thaliana ecotypes of alleles containing severely truncated SINEs suggest a very high rate of SINE loss in these species.
Laqueyrerie, A; Militzer, P; Romain, F; Eiglmeier, K; Cole, S; Marchal, G
1995-10-01
Effective protection against a virulent challenge with Mycobacterium tuberculosis is induced mainly by previous immunization with living attenuated mycobacteria, and it has been hypothesized that secreted proteins serve as major targets in the specific immune response. To identify and purify molecules present in culture medium filtrate which are dominant antigens during effective vaccination, a two-step selection procedure was used to select antigens able to interact with T lymphocytes and/or antibodies induced by immunization with living bacteria and to counterselect antigens interacting with the immune effectors induced by immunization with dead bacteria. A Mycobacterium bovis BCG 45/47-kDa antigen complex, present in BCG culture filtrate, has been previously identified and isolated (F. Romain, A. Laqueyrerie, P. Militzer, P. Pescher, P. Chavarot, M. Lagranderie, G. Auregan, M. Gheorghiu, and G. Marchal, Infect. Immun. 61:742-750, 1993). Since the cognate antibodies recognize the very same antigens present in M. tuberculosis culture medium filtrates, a project was undertaken to clone, express, and sequence the corresponding gene of M. tuberculosis. An M. tuberculosis shuttle cosmid library was transferred in Mycobacterium smegmatis and screened with a competitive enzyme-linked immunosorbent assay to detect the clones expressing the proteins. A clone containing a 40-kb DNA insert was selected, and by means of subcloning in Escherichia coli, a 2-kb fragment that coded for the molecules was identified. An open reading frame in the 2,061-nucleotide sequence codes for a secreted protein with a consensus signal peptide of 39 amino acids and a predicted molecular mass of 28,779 Da. The gene was referred to as apa because of the high percentages of proline (21.7%) and alanine (19%) in the purified protein. Southern hybridization analysis of digested total genomic DNA from M. tuberculosis (reference strains H37Rv and H37Ra) indicated that the apa gene was present as a single copy on the genome. The N-terminal identity or homology of the M. tuberculosis and M. bovis BCG purified molecules and their similar global and deduced amino acid compositions demonstrated the perfect correspondence between the molecular and chemical analyses. The presence of a high percentage of proline (21.7%) was confirmed and explained the apparent higher molecular mass (45/47 kDa) determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis resulting from the increased rigidity of molecules due to proline residues.(ABSTRACT TRUNCATED AT 400 WORDS)
Optimal treatment sequence in COPD: Can a consensus be found?
Ferreira, J; Drummond, M; Pires, N; Reis, G; Alves, C; Robalo-Cordeiro, C
2016-01-01
There is currently no consensus on the treatment sequence in chronic obstructive pulmonary disease (COPD), although it is recognized that early diagnosis is of paramount importance to start treatment in the early stages of the disease. Although it is fairly consensual that initial treatment should be with an inhaled short-acting beta agonist, a short-acting muscarinic antagonist, a long-acting beta-agonist or a long-acting muscarinic antagonist. As the disease progresses, several therapeutic options are available, and which to choose at each disease stage remains controversial. When and in which patients to use dual bronchodilation? When to use inhaled corticosteroids? And triple therapy? Are the existing non-inhaled therapies, such as mucolytic agents, antibiotics, phosphodiesterase-4 inhibitors, methylxanthines and immunostimulating agents, useful? If so, which patients would benefit? Should co-morbidities be taken into account when choosing COPD therapy for a patient? This paper reviews current guidelines and available evidence and proposes a therapeutic scheme for COPD patients. We also propose a treatment algorithm in the hope that it will help physicians to decide the best approach for their patients. The authors conclude that, at present, a full consensus on optimal treatment sequence in COPD cannot be found, mainly due to disease heterogeneity and lack of biomarkers to guide treatment. For the time being, and although some therapeutic approaches are consensual, treatment of COPD should be patient-oriented. Copyright © 2015 Sociedade Portuguesa de Pneumologia. Published by Elsevier España, S.L.U. All rights reserved.
Sampled-data consensus in switching networks of integrators based on edge events
NASA Astrophysics Data System (ADS)
Xiao, Feng; Meng, Xiangyu; Chen, Tongwen
2015-02-01
This paper investigates the event-driven sampled-data consensus in switching networks of multiple integrators and studies both the bidirectional interaction and leader-following passive reaction topologies in a unified framework. In these topologies, each information link is modelled by an edge of the information graph and assigned a sequence of edge events, which activate the mutual data sampling and controller updates of the two linked agents. Two kinds of edge-event-detecting rules are proposed for the general asynchronous data-sampling case and the synchronous periodic event-detecting case. They are implemented in a distributed fashion, and their effectiveness in reducing communication costs and solving consensus problems under a jointly connected topology condition is shown by both theoretical analysis and simulation examples.
Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics
NASA Technical Reports Server (NTRS)
Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis.
Buldyrev, S V; Goldberger, A L; Havlin, S; Mantegna, R N; Matsa, M E; Peng, C K; Simons, M; Stanley, H E
1995-05-01
An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
Long-range correlation properties of coding and noncoding DNA sequences: GenBank analysis
NASA Technical Reports Server (NTRS)
Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Matsa, M. E.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
An open question in computational molecular biology is whether long-range correlations are present in both coding and noncoding DNA or only in the latter. To answer this question, we consider all 33301 coding and all 29453 noncoding eukaryotic sequences--each of length larger than 512 base pairs (bp)--in the present release of the GenBank to dtermine whether there is any statistically significant distinction in their long-range correlation properties. Standard fast Fourier transform (FFT) analysis indicates that coding sequences have practically no correlations in the range from 10 bp to 100 bp (spectral exponent beta=0.00 +/- 0.04, where the uncertainty is two standard deviations). In contrast, for noncoding sequences, the average value of the spectral exponent beta is positive (0.16 +/- 0.05) which unambiguously shows the presence of long-range correlations. We also separately analyze the 874 coding and the 1157 noncoding sequences that have more than 4096 bp and find a larger region of power-law behavior. We calculate the probability that these two data sets (coding and noncoding) were drawn from the same distribution and we find that it is less than 10(-10). We obtain independent confirmation of these findings using the method of detrended fluctuation analysis (DFA), which is designed to treat sequences with statistical heterogeneity, such as DNA's known mosaic structure ("patchiness") arising from the nonstationarity of nucleotide concentration. The near-perfect agreement between the two independent analysis methods, FFT and DFA, increases the confidence in the reliability of our conclusion.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Crooks, Gavin E.
WebLogo is a web based application designed to make the generation of sequence logos as easy and painless as possible. Sequesnce logos are a graphical representation of an amino acid or nucleic acid multiple sequence alignment developed by Tom Schneider and Mike Stephens. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position. In general, a sequence logo provides a richermore » and more precise description of, for example, a binding site, than would a consensus sequence.« less
Predicting the transmembrane secondary structure of ligand-gated ion channels.
Bertaccini, E; Trudell, J R
2002-06-01
Recent mutational analyses of ligand-gated ion channels (LGICs) have demonstrated a plausible site of anesthetic action within their transmembrane domains. Although there is a consensus that the transmembrane domain is formed from four membrane-spanning segments, the secondary structure of these segments is not known. We utilized 10 state-of-the-art bioinformatics techniques to predict the transmembrane topology of the tetrameric regions within six members of the LGIC family that are relevant to anesthetic action. They are the human forms of the GABA alpha 1 receptor, the glycine alpha 1 receptor, the 5HT3 serotonin receptor, the nicotinic AChR alpha 4 and alpha 7 receptors and the Torpedo nAChR alpha 1 receptor. The algorithms utilized were HMMTOP, TMHMM, TMPred, PHDhtm, DAS, TMFinder, SOSUI, TMAP, MEMSAT and TOPPred2. The resulting predictions were superimposed on to a multiple sequence alignment of the six amino acid sequences created using the CLUSTAL W algorithm. There was a clear statistical consensus for the presence of four alpha helices in those regions experimentally thought to span the membrane. The consensus of 10 topology prediction techniques supports the hypothesis that the transmembrane subunits of the LGICs are tetrameric bundles of alpha helices.
Reliable Detection of Herpes Simplex Virus Sequence Variation by High-Throughput Resequencing.
Morse, Alison M; Calabro, Kaitlyn R; Fear, Justin M; Bloom, David C; McIntyre, Lauren M
2017-08-16
High-throughput sequencing (HTS) has resulted in data for a number of herpes simplex virus (HSV) laboratory strains and clinical isolates. The knowledge of these sequences has been critical for investigating viral pathogenicity. However, the assembly of complete herpesviral genomes, including HSV, is complicated due to the existence of large repeat regions and arrays of smaller reiterated sequences that are commonly found in these genomes. In addition, the inherent genetic variation in populations of isolates for viruses and other microorganisms presents an additional challenge to many existing HTS sequence assembly pipelines. Here, we evaluate two approaches for the identification of genetic variants in HSV1 strains using Illumina short read sequencing data. The first, a reference-based approach, identifies variants from reads aligned to a reference sequence and the second, a de novo assembly approach, identifies variants from reads aligned to de novo assembled consensus sequences. Of critical importance for both approaches is the reduction in the number of low complexity regions through the construction of a non-redundant reference genome. We compared variants identified in the two methods. Our results indicate that approximately 85% of variants are identified regardless of the approach. The reference-based approach to variant discovery captures an additional 15% representing variants divergent from the HSV1 reference possibly due to viral passage. Reference-based approaches are significantly less labor-intensive and identify variants across the genome where de novo assembly-based approaches are limited to regions where contigs have been successfully assembled. In addition, regions of poor quality assembly can lead to false variant identification in de novo consensus sequences. For viruses with a well-assembled reference genome, a reference-based approach is recommended.
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls.
Flannick, Jason; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M; Agarwala, Vineeta; Gaulton, Kyle J; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J; Rivas, Manuel A; Perry, John R B; Sim, Xueling; Blackwell, Thomas W; Robertson, Neil R; Rayner, N William; Cingolani, Pablo; Locke, Adam E; Tajes, Juan Fernandez; Highland, Heather M; Dupuis, Josee; Chines, Peter S; Lindgren, Cecilia M; Hartl, Christopher; Jackson, Anne U; Chen, Han; Huyghe, Jeroen R; van de Bunt, Martijn; Pearson, Richard D; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M; Gamazon, Eric R; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A; Below, Jennifer E; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L; Pasko, Dorota; Parker, Stephen C J; Varga, Tibor V; Green, Todd; Beer, Nicola L; Day-Williams, Aaron G; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F; Han, Bok-Ghee; Jenkinson, Christopher P; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C Y; Palmer, Nicholette D; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D; Neale, Benjamin M; Purcell, Shaun; Butterworth, Adam S; Howson, Joanna M M; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K L; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H T; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E; Rybin, Dennis; Farook, Vidya S; Fowler, Sharon P; Freedman, Barry I; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K; Puppala, Sobha; Scott, William R; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C; Mangino, Massimo; Bonnycastle, Lori L; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L; Herder, Christian; Groves, Christopher J; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A; Doney, Alex S F; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H; Stirrups, Kathleen; Wood, Andrew R; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N A; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M; Syvänen, Ann-Christine; Bergman, Richard N; Bharadwaj, Dwaipayan; Bottinger, Erwin P; Cho, Yoon Shin; Chandak, Giriraj R; Chan, Juliana Cn; Chia, Kee Seng; Daly, Mark J; Ebrahim, Shah B; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A; Lehman, Donna M; Jia, Weiping; Ma, Ronald C W; Pollin, Toni I; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J F; Small, Kerrin S; Ried, Janina S; DeFronzo, Ralph A; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R; Gloyn, Anna L; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D; Hattersley, Andrew T; Bowden, Donald W; Collins, Francis S; Atzmon, Gil; Chambers, John C; Spector, Timothy D; Laakso, Markku; Strom, Tim M; Bell, Graeme I; Blangero, John; Duggirala, Ravindranath; Tai, E Shyong; McVean, Gilean; Hanis, Craig L; Wilson, James G; Seielstad, Mark; Frayling, Timothy M; Meigs, James B; Cox, Nancy J; Sladek, Rob; Lander, Eric S; Gabriel, Stacey; Mohlke, Karen L; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J; Morris, Andrew P; Kang, Hyun Min; Altshuler, David; Burtt, Noël P; Florez, Jose C; Boehnke, Michael; McCarthy, Mark I
2017-12-19
To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1-5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D.
Sequence data and association statistics from 12,940 type 2 diabetes cases and controls
Jason, Flannick; Fuchsberger, Christian; Mahajan, Anubha; Teslovich, Tanya M.; Agarwala, Vineeta; Gaulton, Kyle J.; Caulkins, Lizz; Koesterer, Ryan; Ma, Clement; Moutsianas, Loukas; McCarthy, Davis J.; Rivas, Manuel A.; Perry, John R. B.; Sim, Xueling; Blackwell, Thomas W.; Robertson, Neil R.; Rayner, N William; Cingolani, Pablo; Locke, Adam E.; Tajes, Juan Fernandez; Highland, Heather M.; Dupuis, Josee; Chines, Peter S.; Lindgren, Cecilia M.; Hartl, Christopher; Jackson, Anne U.; Chen, Han; Huyghe, Jeroen R.; van de Bunt, Martijn; Pearson, Richard D.; Kumar, Ashish; Müller-Nurasyid, Martina; Grarup, Niels; Stringham, Heather M.; Gamazon, Eric R.; Lee, Jaehoon; Chen, Yuhui; Scott, Robert A.; Below, Jennifer E.; Chen, Peng; Huang, Jinyan; Go, Min Jin; Stitzel, Michael L.; Pasko, Dorota; Parker, Stephen C. J.; Varga, Tibor V.; Green, Todd; Beer, Nicola L.; Day-Williams, Aaron G.; Ferreira, Teresa; Fingerlin, Tasha; Horikoshi, Momoko; Hu, Cheng; Huh, Iksoo; Ikram, Mohammad Kamran; Kim, Bong-Jo; Kim, Yongkang; Kim, Young Jin; Kwon, Min-Seok; Lee, Juyoung; Lee, Selyeong; Lin, Keng-Han; Maxwell, Taylor J.; Nagai, Yoshihiko; Wang, Xu; Welch, Ryan P.; Yoon, Joon; Zhang, Weihua; Barzilai, Nir; Voight, Benjamin F.; Han, Bok-Ghee; Jenkinson, Christopher P.; Kuulasmaa, Teemu; Kuusisto, Johanna; Manning, Alisa; Ng, Maggie C. Y.; Palmer, Nicholette D.; Balkau, Beverley; Stančáková, Alena; Abboud, Hanna E.; Boeing, Heiner; Giedraitis, Vilmantas; Prabhakaran, Dorairaj; Gottesman, Omri; Scott, James; Carey, Jason; Kwan, Phoenix; Grant, George; Smith, Joshua D.; Neale, Benjamin M.; Purcell, Shaun; Butterworth, Adam S.; Howson, Joanna M. M.; Lee, Heung Man; Lu, Yingchang; Kwak, Soo-Heon; Zhao, Wei; Danesh, John; Lam, Vincent K. L.; Park, Kyong Soo; Saleheen, Danish; So, Wing Yee; Tam, Claudia H. T.; Afzal, Uzma; Aguilar, David; Arya, Rector; Aung, Tin; Chan, Edmund; Navarro, Carmen; Cheng, Ching-Yu; Palli, Domenico; Correa, Adolfo; Curran, Joanne E.; Rybin, Dennis; Farook, Vidya S.; Fowler, Sharon P.; Freedman, Barry I.; Griswold, Michael; Hale, Daniel Esten; Hicks, Pamela J.; Khor, Chiea-Chuen; Kumar, Satish; Lehne, Benjamin; Thuillier, Dorothée; Lim, Wei Yen; Liu, Jianjun; Loh, Marie; Musani, Solomon K.; Puppala, Sobha; Scott, William R.; Yengo, Loïc; Tan, Sian-Tsung; Taylor, Herman A.; Thameem, Farook; Wilson, Gregory; Wong, Tien Yin; Njølstad, Pål Rasmus; Levy, Jonathan C.; Mangino, Massimo; Bonnycastle, Lori L.; Schwarzmayr, Thomas; Fadista, João; Surdulescu, Gabriela L.; Herder, Christian; Groves, Christopher J.; Wieland, Thomas; Bork-Jensen, Jette; Brandslund, Ivan; Christensen, Cramer; Koistinen, Heikki A.; Doney, Alex S. F.; Kinnunen, Leena; Esko, Tõnu; Farmer, Andrew J.; Hakaste, Liisa; Hodgkiss, Dylan; Kravic, Jasmina; Lyssenko, Valeri; Hollensted, Mette; Jørgensen, Marit E.; Jørgensen, Torben; Ladenvall, Claes; Justesen, Johanne Marie; Käräjämäki, Annemari; Kriebel, Jennifer; Rathmann, Wolfgang; Lannfelt, Lars; Lauritzen, Torsten; Narisu, Narisu; Linneberg, Allan; Melander, Olle; Milani, Lili; Neville, Matt; Orho-Melander, Marju; Qi, Lu; Qi, Qibin; Roden, Michael; Rolandsson, Olov; Swift, Amy; Rosengren, Anders H.; Stirrups, Kathleen; Wood, Andrew R.; Mihailov, Evelin; Blancher, Christine; Carneiro, Mauricio O.; Maguire, Jared; Poplin, Ryan; Shakir, Khalid; Fennell, Timothy; DePristo, Mark; de Angelis, Martin Hrabé; Deloukas, Panos; Gjesing, Anette P.; Jun, Goo; Nilsson, Peter; Murphy, Jacquelyn; Onofrio, Robert; Thorand, Barbara; Hansen, Torben; Meisinger, Christa; Hu, Frank B.; Isomaa, Bo; Karpe, Fredrik; Liang, Liming; Peters, Annette; Huth, Cornelia; O'Rahilly, Stephen P; Palmer, Colin N. A.; Pedersen, Oluf; Rauramaa, Rainer; Tuomilehto, Jaakko; Salomaa, Veikko; Watanabe, Richard M.; Syvänen, Ann-Christine; Bergman, Richard N.; Bharadwaj, Dwaipayan; Bottinger, Erwin P.; Cho, Yoon Shin; Chandak, Giriraj R.; Chan, Juliana CN; Chia, Kee Seng; Daly, Mark J.; Ebrahim, Shah B.; Langenberg, Claudia; Elliott, Paul; Jablonski, Kathleen A.; Lehman, Donna M.; Jia, Weiping; Ma, Ronald C. W.; Pollin, Toni I.; Sandhu, Manjinder; Tandon, Nikhil; Froguel, Philippe; Barroso, Inês; Teo, Yik Ying; Zeggini, Eleftheria; Loos, Ruth J. F.; Small, Kerrin S.; Ried, Janina S.; DeFronzo, Ralph A.; Grallert, Harald; Glaser, Benjamin; Metspalu, Andres; Wareham, Nicholas J.; Walker, Mark; Banks, Eric; Gieger, Christian; Ingelsson, Erik; Im, Hae Kyung; Illig, Thomas; Franks, Paul W.; Buck, Gemma; Trakalo, Joseph; Buck, David; Prokopenko, Inga; Mägi, Reedik; Lind, Lars; Farjoun, Yossi; Owen, Katharine R.; Gloyn, Anna L.; Strauch, Konstantin; Tuomi, Tiinamaija; Kooner, Jaspal Singh; Lee, Jong-Young; Park, Taesung; Donnelly, Peter; Morris, Andrew D.; Hattersley, Andrew T.; Bowden, Donald W.; Collins, Francis S.; Atzmon, Gil; Chambers, John C.; Spector, Timothy D.; Laakso, Markku; Strom, Tim M.; Bell, Graeme I.; Blangero, John; Duggirala, Ravindranath; Tai, E. Shyong; McVean, Gilean; Hanis, Craig L.; Wilson, James G.; Seielstad, Mark; Frayling, Timothy M.; Meigs, James B.; Cox, Nancy J.; Sladek, Rob; Lander, Eric S.; Gabriel, Stacey; Mohlke, Karen L.; Meitinger, Thomas; Groop, Leif; Abecasis, Goncalo; Scott, Laura J.; Morris, Andrew P.; Kang, Hyun Min; Altshuler, David; Burtt, Noël P.; Florez, Jose C.; Boehnke, Michael; McCarthy, Mark I.
2017-01-01
To investigate the genetic basis of type 2 diabetes (T2D) to high resolution, the GoT2D and T2D-GENES consortia catalogued variation from whole-genome sequencing of 2,657 European individuals and exome sequencing of 12,940 individuals of multiple ancestries. Over 27M SNPs, indels, and structural variants were identified, including 99% of low-frequency (minor allele frequency [MAF] 0.1–5%) non-coding variants in the whole-genome sequenced individuals and 99.7% of low-frequency coding variants in the whole-exome sequenced individuals. Each variant was tested for association with T2D in the sequenced individuals, and, to increase power, most were tested in larger numbers of individuals (>80% of low-frequency coding variants in ~82 K Europeans via the exome chip, and ~90% of low-frequency non-coding variants in ~44 K Europeans via genotype imputation). The variants, genotypes, and association statistics from these analyses provide the largest reference to date of human genetic information relevant to T2D, for use in activities such as T2D-focused genotype imputation, functional characterization of variants or genes, and other novel analyses to detect associations between sequence variation and T2D. PMID:29257133
Not All Order Memory Is Equal: Test Demands Reveal Dissociations in Memory for Sequence Information
ERIC Educational Resources Information Center
Jonker, Tanya R.; MacLeod, Colin M.
2017-01-01
Remembering the order of a sequence of events is a fundamental feature of episodic memory. Indeed, a number of formal models represent temporal context as part of the memory system, and memory for order has been researched extensively. Yet, the nature of the code(s) underlying sequence memory is still relatively unknown. Across 4 experiments that…
Hu, Bo; Liu, Dong-Xing; Zhang, Yu-Qing; Song, Jian-Tao; Ji, Xian-Fei; Hou, Zhi-Qiang; Zhang, Zhen-Hai
2016-05-01
In this study we sequenced the complete mitochondrial genome sequencing of a heart failure model of cardiomyopathic Syrian hamster (Mesocricetus auratus) for the first time. The total length of the mitogenome was 16,267 bp. It harbored 13 protein-coding genes, 2 ribosomal RNA genes, 22 transfer RNA genes and 1 non-coding control region.
ERIC Educational Resources Information Center
Lucero, Edgar
2011-01-01
This article focuses on the learner's use of Code-switching to learn the TL (Target Language) equivalent of an L1 word. The interactional pattern that this situation creates defines the Request-Provision-Acknowledgement (RPA) sequence. The article explains each of the turns of the sequence under the combination of the Ethnomethodological…
Quantitative analysis of the anti-noise performance of an m-sequence in an electromagnetic method
NASA Astrophysics Data System (ADS)
Yuan, Zhe; Zhang, Yiming; Zheng, Qijia
2018-02-01
An electromagnetic method with a transmitted waveform coded by an m-sequence achieved better anti-noise performance compared to the conventional manner with a square-wave. The anti-noise performance of the m-sequence varied with multiple coding parameters; hence, a quantitative analysis of the anti-noise performance for m-sequences with different coding parameters was required to optimize them. This paper proposes the concept of an identification system, with the identified Earth impulse response obtained by measuring the system output with the input of the voltage response. A quantitative analysis of the anti-noise performance of the m-sequence was achieved by analyzing the amplitude-frequency response of the corresponding identification system. The effects of the coding parameters on the anti-noise performance are summarized by numerical simulation, and their optimization is further discussed in our conclusions; the validity of the conclusions is further verified by field experiment. The quantitative analysis method proposed in this paper provides a new insight into the anti-noise mechanism of the m-sequence, and could be used to evaluate the anti-noise performance of artificial sources in other time-domain exploration methods, such as the seismic method.
Wright, Imogen A; Travers, Simon A
2014-07-01
The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (<10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Genomics dataset on unclassified published organism (patent US 7547531).
Khan Shawan, Mohammad Mahfuz Ali; Hasan, Md Ashraful; Hossain, Md Mozammel; Hasan, Md Mahmudul; Parvin, Afroza; Akter, Salina; Uddin, Kazi Rasel; Banik, Subrata; Morshed, Mahbubul; Rahman, Md Nazibur; Rahman, S M Badier
2016-12-01
Nucleotide (DNA) sequence analysis provides important clues regarding the characteristics and taxonomic position of an organism. With the intention that, DNA sequence analysis is very crucial to learn about hierarchical classification of that particular organism. This dataset (patent US 7547531) is chosen to simplify all the complex raw data buried in undisclosed DNA sequences which help to open doors for new collaborations. In this data, a total of 48 unidentified DNA sequences from patent US 7547531 were selected and their complete sequences were retrieved from NCBI BioSample database. Quick response (QR) code of those DNA sequences was constructed by DNA BarID tool. QR code is useful for the identification and comparison of isolates with other organisms. AT/GC content of the DNA sequences was determined using ENDMEMO GC Content Calculator, which indicates their stability at different temperature. The highest GC content was observed in GP445188 (62.5%) which was followed by GP445198 (61.8%) and GP445189 (59.44%), while lowest was in GP445178 (24.39%). In addition, New England BioLabs (NEB) database was used to identify cleavage code indicating the 5, 3 and blunt end and enzyme code indicating the methylation site of the DNA sequences was also shown. These data will be helpful for the construction of the organisms' hierarchical classification, determination of their phylogenetic and taxonomic position and revelation of their molecular characteristics.
Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Willis, Judith H.; Hamodrakas, Stavros J.
2010-01-01
The physical properties of cuticle are determined by the structure of its two major components, cuticular proteins (CPs) and chitin, and, also, by their interactions. A common consensus region (extended R&R Consensus) found in the majority of cuticular proteins, the CPRs, binds to chitin. Previous work established that β-pleated sheet predominates in the Consensus region and we proposed that it is responsible for the formation of helicoidal cuticle. Remote sequence similarity between CPRs and a lipocalin, bovine plasma retinol binding protein (RBP), led us to suggest an antiparallel β-sheet half-barrel structure as the basic folding motif of the R&R Consensus. There are several other families of cuticular proteins. One of the best defined is CPF. Its four members in Anopheles gambiae are expressed during the early stages of either pharate pupal or pharate adult development, suggesting that the proteins contribute to the outer regions of the cuticle, the epi- and/or exocuticle. These proteins did not bind to chitin in the same assay used successfully for CPRs. Although CPFs are distinct in sequence from CPRs, the same lipocalin could also be used to derive homology models for one Anopheles gambiae and one Drosophila melanogaster CPF. For the CPFs, the basic folding motif predicted is an eight-stranded, antiparallel β-sheet, full-barrel structure. Possible implications of this structure are discussed and docking experiments were carried out with one possible Drosophila ligand, 7(Z), 11(Z)-heptacosadiene. PMID:20417215
RNAcode: Robust discrimination of coding and noncoding regions in comparative sequence data
Washietl, Stefan; Findeiß, Sven; Müller, Stephan A.; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L.; Stadler, Peter F.; Goldman, Nick
2011-01-01
With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied “out of the box,” without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as “noncoding.” RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode. PMID:21357752
RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data.
Washietl, Stefan; Findeiss, Sven; Müller, Stephan A; Kalkhof, Stefan; von Bergen, Martin; Hofacker, Ivo L; Stadler, Peter F; Goldman, Nick
2011-04-01
With the availability of genome-wide transcription data and massive comparative sequencing, the discrimination of coding from noncoding RNAs and the assessment of coding potential in evolutionarily conserved regions arose as a core analysis task. Here we present RNAcode, a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene-finding software. Our algorithm combines information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box," without any training, to data from all domains of life. We describe the RNAcode method and apply it in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in Escherichia coli and to analyze the coding potential of RNAs previously annotated as "noncoding." RNAcode is open source software and available for all major platforms at http://wash.github.com/rnacode.
High compression image and image sequence coding
NASA Technical Reports Server (NTRS)
Kunt, Murat
1989-01-01
The digital representation of an image requires a very large number of bits. This number is even larger for an image sequence. The goal of image coding is to reduce this number, as much as possible, and reconstruct a faithful duplicate of the original picture or image sequence. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10:1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100:1 for images and around 300:1 for image sequences. Recent progress on some of the main avenues of object-based methods is presented. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophysics and scene analysis.
Probabilistic consensus scoring improves tandem mass spectrometry peptide identification.
Nahnsen, Sven; Bertsch, Andreas; Rahnenführer, Jörg; Nordheim, Alfred; Kohlbacher, Oliver
2011-08-05
Database search is a standard technique for identifying peptides from their tandem mass spectra. To increase the number of correctly identified peptides, we suggest a probabilistic framework that allows the combination of scores from different search engines into a joint consensus score. Central to the approach is a novel method to estimate scores for peptides not found by an individual search engine. This approach allows the estimation of p-values for each candidate peptide and their combination across all search engines. The consensus approach works better than any single search engine across all different instrument types considered in this study. Improvements vary strongly from platform to platform and from search engine to search engine. Compared to the industry standard MASCOT, our approach can identify up to 60% more peptides. The software for consensus predictions is implemented in C++ as part of OpenMS, a software framework for mass spectrometry. The source code is available in the current development version of OpenMS and can easily be used as a command line application or via a graphical pipeline designer TOPPAS.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hathaway, M.D.; Wood, J.R.
1997-10-01
CFD codes capable of utilizing multi-block grids provide the capability to analyze the complete geometry of centrifugal compressors. Attendant with this increased capability is potentially increased grid setup time and more computational overhead with the resultant increase in wall clock time to obtain a solution. If the increase in difficulty of obtaining a solution significantly improves the solution from that obtained by modeling the features of the tip clearance flow or the typical bluntness of a centrifugal compressor`s trailing edge, then the additional burden is worthwhile. However, if the additional information obtained is of marginal use, then modeling of certainmore » features of the geometry may provide reasonable solutions for designers to make comparative choices when pursuing a new design. In this spirit a sequence of grids were generated to study the relative importance of modeling versus detailed gridding of the tip gap and blunt trailing edge regions of the NASA large low-speed centrifugal compressor for which there is considerable detailed internal laser anemometry data available for comparison. The results indicate: (1) There is no significant difference in predicted tip clearance mass flow rate whether the tip gap is gridded or modeled. (2) Gridding rather than modeling the trailing edge results in better predictions of some flow details downstream of the impeller, but otherwise appears to offer no great benefits. (3) The pitchwise variation of absolute flow angle decreases rapidly up to 8% impeller radius ratio and much more slowly thereafter. Although some improvements in prediction of flow field details are realized as a result of analyzing the actual geometry there is no clear consensus that any of the grids investigated produced superior results in every case when compared to the measurements. However, if a multi-block code is available, it should be used, as it has the propensity for enabling better predictions than a single block code.« less
Barnes, Anna; Alonzi, Roberto; Blackledge, Matthew; Charles-Edwards, Geoff; Collins, David J; Cook, Gary; Coutts, Glynn; Goh, Vicky; Graves, Martin; Kelly, Charles; Koh, Dow-Mu; McCallum, Hazel; Miquel, Marc E; O'Connor, James; Padhani, Anwar; Pearson, Rachel; Priest, Andrew; Rockall, Andrea; Stirling, James; Taylor, Stuart; Tunariu, Nina; van der Meulen, Jan; Walls, Darren; Winfield, Jessica; Punwani, Shonit
2018-01-01
Application of whole body diffusion-weighted MRI (WB-DWI) for oncology are rapidly increasing within both research and routine clinical domains. However, WB-DWI as a quantitative imaging biomarker (QIB) has significantly slower adoption. To date, challenges relating to accuracy and reproducibility, essential criteria for a good QIB, have limited widespread clinical translation. In recognition, a UK workgroup was established in 2016 to provide technical consensus guidelines (to maximise accuracy and reproducibility of WB-MRI QIBs) and accelerate the clinical translation of quantitative WB-DWI applications for oncology. A panel of experts convened from cancer centres around the UK with subspecialty expertise in quantitative imaging and/or the use of WB-MRI with DWI. A formal consensus method was used to obtain consensus agreement regarding best practice. Questions were asked about the appropriateness or otherwise on scanner hardware and software, sequence optimisation, acquisition protocols, reporting, and ongoing quality control programs to monitor precision and accuracy and agreement on quality control. The consensus panel was able to reach consensus on 73% (255/351) items and based on consensus areas made recommendations to maximise accuracy and reproducibly of quantitative WB-DWI studies performed at 1.5T. The panel were unable to reach consensus on the majority of items related to quantitative WB-DWI performed at 3T. This UK Quantitative WB-DWI Technical Workgroup consensus provides guidance on maximising accuracy and reproducibly of quantitative WB-DWI for oncology. The consensus guidance can be used by researchers and clinicians to harmonise WB-DWI protocols which will accelerate clinical translation of WB-DWI-derived QIBs.
USDA-ARS?s Scientific Manuscript database
One-hundred-thirty-six expressed sequence tags (ESTs) encoding alpha gliadins from Triticum aestivum cv Butte 86 were identified in public databases and assembled into 19 contigs. Consensus sequences for 12 of the contigs encoded complete alpha gliadin proteins, but only two were identical to protei...
Simone, Domenico; Bay, Denice C.; Leach, Thorin; Turner, Raymond J.
2013-01-01
Background The twin-arginine translocation (Tat) protein export system enables the transport of fully folded proteins across a membrane. This system is composed of two integral membrane proteins belonging to TatA and TatC protein families and in some systems a third component, TatB, a homolog of TatA. TatC participates in substrate protein recognition through its interaction with a twin arginine leader peptide sequence. Methodology/Principal Findings The aim of this study was to explore TatC diversity, evolution and sequence conservation in bacteria to identify how TatC is evolving and diversifying in various bacterial phyla. Surveying bacterial genomes revealed that 77% of all species possess one or more tatC loci and half of these classes possessed only tatC and tatA genes. Phylogenetic analysis of diverse TatC homologues showed that they were primarily inherited but identified a small subset of taxonomically unrelated bacteria that exhibited evidence supporting lateral gene transfer within an ecological niche. Examination of bacilli tatCd/tatCy isoform operons identified a number of known and potentially new Tat substrate genes based on their frequent association to tatC loci. Evolutionary analysis of these Bacilli isoforms determined that TatCy was the progenitor of TatCd. A bacterial TatC consensus sequence was determined and highlighted conserved and variable regions within a three dimensional model of the Escherichia coli TatC protein. Comparative analysis between the TatC consensus sequence and Bacilli TatCd/y isoform consensus sequences revealed unique sites that may contribute to isoform substrate specificity or make TatA specific contacts. Synonymous to non-synonymous nucleotide substitution analyses of bacterial tatC homologues determined that tatC sequence variation differs dramatically between various classes and suggests TatC specialization in these species. Conclusions/Significance TatC proteins appear to be diversifying within particular bacterial classes and its specialization may be driven by the substrates it transports and the environment of its host. PMID:24236045
Siew, Edward D; Basu, Rajit K; Wunsch, Hannah; Shaw, Andrew D; Goldstein, Stuart L; Ronco, Claudio; Kellum, John A; Bagshaw, Sean M
2016-01-01
The purpose of this review is to report how administrative data have been used to study AKI, identify current limitations, and suggest how these data sources might be enhanced to address knowledge gaps in the field. 1) To review the existing evidence-base on how AKI is coded across administrative datasets, 2) To identify limitations, gaps in knowledge, and major barriers to scientific progress in AKI related to coding in administrative data, 3) To discuss how administrative data for AKI might be enhanced to enable "communication" and "translation" within and across administrative jurisdictions, and 4) To suggest how administrative databases might be configured to inform 'registry-based' pragmatic studies. Literature review of English language articles through PubMed search for relevant AKI literature focusing on the validation of AKI in administrative data or used administrative data to describe the epidemiology of AKI. Acute Dialysis Quality Initiative (ADQI) Consensus Conference September 6-7(th), 2015, Banff, Canada. Hospitalized patients with AKI. The coding structure for AKI in many administrative datasets limits understanding of true disease burden (especially less severe AKI), its temporal trends, and clinical phenotyping. Important opportunities exist to improve the quality and coding of AKI data to better address critical knowledge gaps in AKI and improve care. A modified Delphi consensus building process consisting of review of the literature and summary statements were developed through a series of alternating breakout and plenary sessions. Administrative codes for AKI are limited by poor sensitivity, lack of standardization to classify severity, and poor contextual phenotyping. These limitations are further hampered by reduced awareness of AKI among providers and the subjective nature of reporting. While an idealized definition of AKI may be difficult to implement, improving standardization of reporting by using laboratory-based definitions and providing complementary information on the context in which AKI occurs are possible. Administrative databases may also help enhance the conduct of and inform clinical or registry-based pragmatic studies. Data sources largely restricted to North American and Europe. Administrative data are rapidly growing and evolving, and represent an unprecedented opportunity to address knowledge gaps in AKI. Progress will require continued efforts to improve awareness of the impact of AKI on public health, engage key stakeholders, and develop tangible strategies to reconfigure infrastructure to improve the reporting and phenotyping of AKI. WHY IS THIS REVIEW IMPORTANT?: Rapid growth in the size and availability of administrative data has enhanced the clinical study of acute kidney injury (AKI). However, significant limitations exist in coding that hinder our ability to better understand its epidemiology and address knowledge gaps. The following consensus-based review discusses how administrative data have been used to study AKI, identify current limitations, and suggest how these data sources might be enhanced to improve the future study of this disease. WHAT ARE THE KEY MESSAGES?: The current coding structure of administrative data is hindered by a lack of sensitivity, standardization to properly classify severity, and limited clinical phenotyping. These limitations combined with reduced awareness of AKI and the subjective nature of reporting limit understanding of disease burden across settings and time periods. As administrative data become more sophisticated and complex, important opportunities to employ more objective criteria to diagnose and stage AKI as well as improve contextual phenotyping exist that can help address knowledge gaps and improve care.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae).
Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren
2016-04-01
Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans.
Complete Mitochondrial Genome of Echinostoma hortense (Digenea: Echinostomatidae)
Liu, Ze-Xuan; Zhang, Yan; Liu, Yu-Ting; Chang, Qiao-Cheng; Su, Xin; Fu, Xue; Yue, Dong-Mei; Gao, Yuan; Wang, Chun-Ren
2016-01-01
Echinostoma hortense (Digenea: Echinostomatidae) is one of the intestinal flukes with medical importance in humans. However, the mitochondrial (mt) genome of this fluke has not been known yet. The present study has determined the complete mt genome sequences of E. hortense and assessed the phylogenetic relationships with other digenean species for which the complete mt genome sequences are available in GenBank using concatenated amino acid sequences inferred from 12 protein-coding genes. The mt genome of E. hortense contained 12 protein-coding genes, 22 transfer RNA genes, 2 ribosomal RNA genes, and 1 non-coding region. The length of the mt genome of E. hortense was 14,994 bp, which was somewhat smaller than those of other trematode species. Phylogenetic analyses based on concatenated nucleotide sequence datasets for all 12 protein-coding genes using maximum parsimony (MP) method showed that E. hortense and Hypoderaeum conoideum gathered together, and they were closer to each other than to Fasciolidae and other echinostomatid trematodes. The availability of the complete mt genome sequences of E. hortense provides important genetic markers for diagnostics, population genetics, and evolutionary studies of digeneans. PMID:27180575
Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi
2016-06-15
Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Seeing Stems Everywhere: Position-Independent Identification of Stem Morphemes
ERIC Educational Resources Information Center
Crepaldi, Davide; Rastle, Kathleen; Davis, Colin J.; Lupker, Stephen J.
2013-01-01
There is broad consensus that printed complex words are identified on the basis of their constituent morphemes. This fact raises the issue of how the word identification system codes for morpheme position, hence allowing it to distinguish between words like "overhang" and "hangover", and to recognize that "preheat" is…
Educational Research Ethics: A Discussion Paper.
ERIC Educational Resources Information Center
Lafleur, Clay
Educational researchers should be aware of the general consensus of the research community as to what is proper and improper in the conduct of educational research, since there is as yet no explicit code of ethics. Accordingly, this paper examines existing ethical standards to initiate discussion of issues specific to the educational research…
Jones, Frank R.; Gabitzsch, Elizabeth S.; Xu, Younong; Balint, Joseph P.; Borisevich, Viktoriya; Smith, Jennifer; Smith, Jeanon; Peng, Bi-Hung; Walker, Aida; Salazar, Magda; Paessler, Slobodan
2013-01-01
Vaccines against emerging pathogens such as the 2009 H1N1 pandemic virus can benefit from current technologies such as rapid genomic sequencing to construct the most biologically relevant vaccine. A novel platform (Ad5 [E1-, E2b-]) has been utilized to induce immune responses to various antigenic targets. We employed this vector platform to express hemagglutinin (HA) and neuraminidase (NA) genes from 2009 H1N1 pandemic viruses. Inserts were consensuses sequences designed from viral isolate sequences and the vaccine was rapidly constructed and produced. Vaccination induced H1N1 immune responses in mice, which afforded protection from lethal virus challenge. In ferrets, vaccination protected from disease development and significantly reduced viral titers in nasal washes. H1N1 cell mediated immunity as well as antibody induction correlated with the prevention of disease symptoms and reduction of virus replication. The Ad5 [E1-, E2b-] should be evaluated for the rapid development of effective vaccines against infectious diseases. PMID:21821082
Efficient analysis of mouse genome sequences reveal many nonsense variants
Steeland, Sophie; Timmermans, Steven; Van Ryckeghem, Sara; Hulpiau, Paco; Saeys, Yvan; Van Montagu, Marc; Vandenbroucke, Roosmarijn E.; Libert, Claude
2016-01-01
Genetic polymorphisms in coding genes play an important role when using mouse inbred strains as research models. They have been shown to influence research results, explain phenotypical differences between inbred strains, and increase the amount of interesting gene variants present in the many available inbred lines. SPRET/Ei is an inbred strain derived from Mus spretus that has ∼1% sequence difference with the C57BL/6J reference genome. We obtained a listing of all SNPs and insertions/deletions (indels) present in SPRET/Ei from the Mouse Genomes Project (Wellcome Trust Sanger Institute) and processed these data to obtain an overview of all transcripts having nonsynonymous coding sequence variants. We identified 8,883 unique variants affecting 10,096 different transcripts from 6,328 protein-coding genes, which is about 28% of all coding genes. Because only a subset of these variants results in drastic changes in proteins, we focused on variations that are nonsense mutations that ultimately resulted in a gain of a stop codon. These genes were identified by in silico changing the C57BL/6J coding sequences to the SPRET/Ei sequences, converting them to amino acid (AA) sequences, and comparing the AA sequences. All variants and transcripts affected were also stored in a database, which can be browsed using a SPRET/Ei M. spretus variants web tool (www.spretus.org), including a manual. We validated the tool by demonstrating the loss of function of three proteins predicted to be severely truncated, namely Fas, IRAK2, and IFNγR1. PMID:27147605
Tarpey, Patrick S; Stevens, Claire; Teague, Jon; Edkins, Sarah; O'Meara, Sarah; Avis, Tim; Barthorpe, Syd; Buck, Gemma; Butler, Adam; Cole, Jennifer; Dicks, Ed; Gray, Kristian; Halliday, Kelly; Harrison, Rachel; Hills, Katy; Hinton, Jonathon; Jones, David; Menzies, Andrew; Mironenko, Tatiana; Perry, Janet; Raine, Keiran; Richardson, David; Shepherd, Rebecca; Small, Alexandra; Tofts, Calli; Varian, Jennifer; West, Sofie; Widaa, Sara; Yates, Andy; Catford, Rachael; Butler, Julia; Mallya, Uma; Moon, Jenny; Luo, Ying; Dorkins, Huw; Thompson, Deborah; Easton, Douglas F; Wooster, Richard; Bobrow, Martin; Carpenter, Nancy; Simensen, Richard J; Schwartz, Charles E; Stevenson, Roger E; Turner, Gillian; Partington, Michael; Gecz, Jozef; Stratton, Michael R; Futreal, P Andrew; Raymond, F Lucy
2006-12-01
In a systematic sequencing screen of the coding exons of the X chromosome in 250 families with X-linked mental retardation (XLMR), we identified two nonsense mutations and one consensus splice-site mutation in the AP1S2 gene on Xp22 in three families. Affected individuals in these families showed mild-to-profound mental retardation. Other features included hypotonia early in life and delay in walking. AP1S2 encodes an adaptin protein that constitutes part of the adaptor protein complex found at the cytoplasmic face of coated vesicles located at the Golgi complex. The complex mediates the recruitment of clathrin to the vesicle membrane. Aberrant endocytic processing through disruption of adaptor protein complexes is likely to result from the AP1S2 mutations identified in the three XLMR-affected families, and such defects may plausibly cause abnormal synaptic development and function. AP1S2 is the first reported XLMR gene that encodes a protein directly involved in the assembly of endocytic vesicles.
Cost-effective sequencing of full-length cDNA clones powered by a de novo-reference hybrid assembly.
Kuroshu, Reginaldo M; Watanabe, Junichi; Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka; Kasahara, Masahiro
2010-05-07
Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence approximately 800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only approximately US$3 per clone, demonstrating a significant advantage over previous approaches.
VaDiR: an integrated approach to Variant Detection in RNA.
Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy
2018-02-01
Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.
Code of practice for food handler activities.
Smith, T A; Kanas, R P; McCoubrey, I A; Belton, M E
2005-08-01
The food industry regulates various aspects of food handler activities, according to legislation and customer expectations. The purpose of this paper is to provide a code of practice which delineates a set of working standards for food handler hygiene, handwashing, use of protective equipment, wearing of jewellery and body piercing. The code was developed by a working group of occupational physicians with expertise in both food manufacturing and retail, using a risk assessment approach. Views were also obtained from other occupational physicians working within the food industry and the relevant regulatory bodies. The final version of the code (available in full as Supplementary data in Occupational Medicine Online) therefore represents a broad consensus of opinion. The code of practice represents a set of minimum standards for food handler suitability and activities, based on a practical assessment of risk, for application in food businesses. It aims to provide useful working advice to food businesses of all sizes.
Tasaki, E; Hirayama, J; Tazumi, A; Hayashi, K; Hara, Y; Ueno, H; Moore, J E; Millar, B C; Matsuda, M
2012-02-01
Novel clustered regularly-interspaced short palindromic repeats (CRISPRs) locus [7,500 base pairs (bp) in length] occurred in the urease-positive thermophilic Campylobacter (UPTC) Japanese isolate, CF89-12. The 7,500 bp gene loci consisted of the 5'-methylaminomethyl-2-thiouridylate methyltransferase gene, putative (P) CRISPR associated (p-Cas), putative open reading frames, Cas1 and Cas2, leader sequence region (146 bp), 12 CRISPRs consensus sequence repeats (each 36 bp) separated by a non-repetitive unique spacer region of similar length (26-31 bp) and the phosphatidyl glycerophosphatase A gene. When the CRISPRs loci in the UPTC CF89-12 and five C. jejuni isolates were compared with one another, these six isolates contained p-Cas, Cas1 and Cas2 within the loci. Four to 12 CRISPRs consensus sequence repeats separated by a non-repetitive unique spacer region occurred in six isolates and the nucleotide sequences of those repeats gave approximately 92-100% similarity with each other. However, no sequence similarity occurred in the unique spacer regions among these isolates. The putative σ(70) transcriptional promoter and the hypothetical ρ-independent terminator structures for the CRISPRs and Cas were detected. No in vivo transcription of p-Cas, Cas1 and Cas2 was confirmed in the UPTC cells.
NASA Astrophysics Data System (ADS)
El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges; Kobeissy, Firas
2017-01-01
The crucial biological role of proteases has been visible with the development of degradomics discipline involved in the determination of the proteases/substrates resulting in breakdown-products (BDPs) that can be utilized as putative biomarkers associated with different biological-clinical significance. In the field of cancer biology, matrix metalloproteinases (MMPs) have shown to result in MMPs-generated protein BDPs that are indicative of malignant growth in cancer, while in the field of neural injury, calpain-2 and caspase-3 proteases generate BDPs fragments that are indicative of different neural cell death mechanisms in different injury scenarios. Advanced proteomic techniques have shown a remarkable progress in identifying these BDPs experimentally. In this work, we present a bioinformatics-based prediction method that identifies protease-associated BDPs with high precision and efficiency. The method utilizes state-of-the-art sequence matching and alignment algorithms. It starts by locating consensus sequence occurrences and their variants in any set of protein substrates, generating all fragments resulting from cleavage. The complexity exists in space O(mn) as well as in O(Nmn) time, where N, m, and n are the number of protein sequences, length of the consensus sequence, and length per protein sequence, respectively. Finally, the proposed methodology is validated against βII-spectrin protein, a brain injury validated biomarker.
Ghio, Silvina; Martinez Cáceres, Alfredo I.; Talia, Paola; Grasso, Daniel H.
2015-01-01
Paenibacillus sp. A59 was isolated from decaying forest soil in Argentina and characterized as a xylanolytic strain. We report the draft genome sequence of this isolate, with an estimated genome size of 7 Mb which harbor 6,424 coding sequences. Genes coding for hydrolytic enzymes involved in lignocellulose deconstruction were predicted. PMID:26494679
Gene Identification Algorithms Using Exploratory Statistical Analysis of Periodicity
NASA Astrophysics Data System (ADS)
Mukherjee, Shashi Bajaj; Sen, Pradip Kumar
2010-10-01
Studying periodic pattern is expected as a standard line of attack for recognizing DNA sequence in identification of gene and similar problems. But peculiarly very little significant work is done in this direction. This paper studies statistical properties of DNA sequences of complete genome using a new technique. A DNA sequence is converted to a numeric sequence using various types of mappings and standard Fourier technique is applied to study the periodicity. Distinct statistical behaviour of periodicity parameters is found in coding and non-coding sequences, which can be used to distinguish between these parts. Here DNA sequences of Drosophila melanogaster were analyzed with significant accuracy.
Simonen, Marja-Leena; Roivainen, Merja; Iber, Jane; Burns, Cara; Hovi, Tapani
2010-01-01
In 1984, a wild type 3 poliovirus (PV3/FIN84) spread all over Finland causing nine cases of paralytic poliomyelitis and one case of aseptic meningitis. The outbreak was ended in 1985 with an intensive vaccination campaign. By limited sequence comparison with previously isolated PV3 strains, closest relatives of PV3/FIN84 were found among strains circulating in the Mediterranean region. Now we wanted to reanalyse the relationships using approaches currently exploited in poliovirus surveillance. Cell lysates of 22 strains isolated during the outbreak and stored frozen were subjected to RT-PCR amplification in three genomic regions without prior subculture. Sequences of the entire VP1 coding region, 150 nucleotides in the VP1-2A junction, most of the 5' non-coding region, partial sequences of the 3D RNA polymerase coding region and partial 3' non-coding region were compared within the outbreak and with sequences available in data banks. In addition, complete nucleotide sequences were obtained for 2 strains isolated from two different cases of disease during the outbreak. The results confirmed the previously described wide intraepidemic variation of the strains, including amino acid substitutions in antigenic sites, as well as the likely Mediterranean region origin of the strains. Simplot and bootscanning analyses of the complete genomes indicated complicated evolutionary history of the non-capsid coding regions of the genome suggesting several recombinations with different HEV-C viruses in the past.
Chan, Vincy; Thurairajah, Pravheen; Colantonio, Angela
2013-11-13
Although healthcare administrative data are commonly used for traumatic brain injury research, there is currently no consensus or consistency on using the International Classification of Diseases version 10 codes to define traumatic brain injury among children and youth. This protocol is for a systematic review of the literature to explore the range of International Classification of Diseases version 10 codes that are used to define traumatic brain injury in this population. The databases MEDLINE, MEDLINE In-Process, Embase, PsychINFO, CINAHL, SPORTDiscus, and Cochrane Database of Systematic Reviews will be systematically searched. Grey literature will be searched using Grey Matters and Google. Reference lists of included articles will also be searched. Articles will be screened using predefined inclusion and exclusion criteria and all full-text articles that meet the predefined inclusion criteria will be included for analysis. The study selection process and reasons for exclusion at the full-text level will be presented using a PRISMA study flow diagram. Information on the data source of included studies, year and location of study, age of study population, range of incidence, and study purpose will be abstracted into a separate table and synthesized for analysis. All International Classification of Diseases version 10 codes will be listed in tables and the codes that are used to define concussion, acquired traumatic brain injury, head injury, or head trauma will be identified. The identification of the optimal International Classification of Diseases version 10 codes to define this population in administrative data is crucial, as it has implications for policy, resource allocation, planning of healthcare services, and prevention strategies. It also allows for comparisons across countries and studies. This protocol is for a review that identifies the range and most common diagnoses used to conduct surveillance for traumatic brain injury in children and youth. This is an important first step in reaching an appropriate definition using International Classification of Diseases version 10 codes and can inform future work on reaching consensus on the codes to define traumatic brain injury for this vulnerable population.
Genetic code, hamming distance and stochastic matrices.
He, Matthew X; Petoukhov, Sergei V; Ricci, Paolo E
2004-09-01
In this paper we use the Gray code representation of the genetic code C=00, U=10, G=11 and A=01 (C pairs with G, A pairs with U) to generate a sequence of genetic code-based matrices. In connection with these code-based matrices, we use the Hamming distance to generate a sequence of numerical matrices. We then further investigate the properties of the numerical matrices and show that they are doubly stochastic and symmetric. We determine the frequency distributions of the Hamming distances, building blocks of the matrices, decomposition and iterations of matrices. We present an explicit decomposition formula for the genetic code-based matrix in terms of permutation matrices, which provides a hypercube representation of the genetic code. It is also observed that there is a Hamiltonian cycle in a genetic code-based hypercube.
Krzeminska, Urszula; Wilson, Robyn; Rahman, Sadequr; Song, Beng Kah; Seneviratne, Sampath; Gan, Han Ming; Austin, Christopher M
2016-07-01
The complete mitochondrial genomes of two jungle crows (Corvus macrorhynchos) were sequenced. DNA was extracted from tissue samples obtained from shed feathers collected in the field in Sri Lanka and sequenced using the Illumina MiSeq Personal Sequencer. Jungle crow mitogenomes have a structural organization typical of the genus Corvus and are 16,927 bp and 17,066 bp in length, both comprising 13 protein-coding genes, 22 transfer RNA genes, 2 ribosomal subunit genes, and a non-coding control region. In addition, we complement already available house crow (Corvus spelendens) mitogenome resources by sequencing an individual from Singapore. A phylogenetic tree constructed from Corvidae family mitogenome sequences available on GenBank is presented. We confirm the monophyly of the genus Corvus and propose to use complete mitogenome resources for further intra- and interspecies genetic studies.
Evolution of the alternative AQP2 gene: Acquisition of a novel protein-coding sequence in dolphins.
Kishida, Takushi; Suzuki, Miwa; Takayama, Asuka
2018-01-01
Taxon-specific de novo protein-coding sequences are thought to be important for taxon-specific environmental adaptation. A recent study revealed that bottlenose dolphins acquired a novel isoform of aquaporin 2 generated by alternative splicing (alternative AQP2), which helps dolphins to live in hyperosmotic seawater. The AQP2 gene consists of four exons, but the alternative AQP2 gene lacks the fourth exon and instead has a longer third exon that includes the original third exon and a part of the original third intron. Here, we show that the latter half of the third exon of the alternative AQP2 arose from a non-protein-coding sequence. Intact ORF of this de novo sequence is shared not by all cetaceans, but only by delphinoids. However, this sequence is conservative in all modern cetaceans, implying that this de novo sequence potentially plays important roles for marine adaptation in cetaceans. Copyright © 2017 Elsevier Inc. All rights reserved.
2010-01-01
Background Epimedium sagittatum (Sieb. Et Zucc.) Maxim, a traditional Chinese medicinal plant species, has been used extensively as genuine medicinal materials. Certain Epimedium species are endangered due to commercial overexploition, while sustainable application studies, conservation genetics, systematics, and marker-assisted selection (MAS) of Epimedium is less-studied due to the lack of molecular markers. Here, we report a set of expressed sequence tags (ESTs) and simple sequence repeats (SSRs) identified in these ESTs for E. sagittatum. Results cDNAs of E. sagittatum are sequenced using 454 GS-FLX pyrosequencing technology. The raw reads are cleaned and assembled into a total of 76,459 consensus sequences comprising of 17,231 contigs and 59,228 singlets. About 38.5% (29,466) of the consensus sequences significantly match to the non-redundant protein database (E-value < 1e-10), 22,295 of which are further annotated using Gene Ontology (GO) terms. A total of 2,810 EST-SSRs is identified from the Epimedium EST dataset. Trinucleotide SSR is the dominant repeat type (55.2%) followed by dinucleotide (30.4%), tetranuleotide (7.3%), hexanucleotide (4.9%), and pentanucleotide (2.2%) SSR. The dominant repeat motif is AAG/CTT (23.6%) followed by AG/CT (19.3%), ACC/GGT (11.1%), AT/AT (7.5%), and AAC/GTT (5.9%). Thirty-two SSR-ESTs are randomly selected and primer pairs are synthesized for testing the transferability across 52 Epimedium species. Eighteen primer pairs (85.7%) could be successfully transferred to Epimedium species and sixteen of those show high genetic diversity with 0.35 of observed heterozygosity (Ho) and 0.65 of expected heterozygosity (He) and high number of alleles per locus (11.9). Conclusion A large EST dataset with a total of 76,459 consensus sequences is generated, aiming to provide sequence information for deciphering secondary metabolism, especially for flavonoid pathway in Epimedium. A total of 2,810 EST-SSRs is identified from EST dataset and ~1580 EST-SSR markers are transferable. E. sagittatum EST-SSR transferability to the major Epimedium germplasm is up to 85.7%. Therefore, this EST dataset and EST-SSRs will be a powerful resource for further studies such as taxonomy, molecular breeding, genetics, genomics, and secondary metabolism in Epimedium species. PMID:20141623
Single molecule sequencing of the M13 virus genome without amplification
Zhao, Luyang; Deng, Liwei; Li, Gailing; Jin, Huan; Cai, Jinsen; Shang, Huan; Li, Yan; Wu, Haomin; Xu, Weibin; Zeng, Lidong; Zhang, Renli; Zhao, Huan; Wu, Ping; Zhou, Zhiliang; Zheng, Jiao; Ezanno, Pierre; Yang, Andrew X.; Yan, Qin; Deem, Michael W.; He, Jiankui
2017-01-01
Next generation sequencing (NGS) has revolutionized life sciences research. However, GC bias and costly, time-intensive library preparation make NGS an ill fit for increasing sequencing demands in the clinic. A new class of third-generation sequencing platforms has arrived to meet this need, capable of directly measuring DNA and RNA sequences at the single-molecule level without amplification. Here, we use the new GenoCare single-molecule sequencing platform from Direct Genomics to sequence the genome of the M13 virus. Our platform detects single-molecule fluorescence by total internal reflection microscopy, with sequencing-by-synthesis chemistry. We sequenced the genome of M13 to a depth of 316x, with 100% coverage. We determined a consensus sequence accuracy of 100%. In contrast to GC bias inherent to NGS results, we demonstrated that our single-molecule sequencing method yields minimal GC bias. PMID:29253901
Single molecule sequencing of the M13 virus genome without amplification.
Zhao, Luyang; Deng, Liwei; Li, Gailing; Jin, Huan; Cai, Jinsen; Shang, Huan; Li, Yan; Wu, Haomin; Xu, Weibin; Zeng, Lidong; Zhang, Renli; Zhao, Huan; Wu, Ping; Zhou, Zhiliang; Zheng, Jiao; Ezanno, Pierre; Yang, Andrew X; Yan, Qin; Deem, Michael W; He, Jiankui
2017-01-01
Next generation sequencing (NGS) has revolutionized life sciences research. However, GC bias and costly, time-intensive library preparation make NGS an ill fit for increasing sequencing demands in the clinic. A new class of third-generation sequencing platforms has arrived to meet this need, capable of directly measuring DNA and RNA sequences at the single-molecule level without amplification. Here, we use the new GenoCare single-molecule sequencing platform from Direct Genomics to sequence the genome of the M13 virus. Our platform detects single-molecule fluorescence by total internal reflection microscopy, with sequencing-by-synthesis chemistry. We sequenced the genome of M13 to a depth of 316x, with 100% coverage. We determined a consensus sequence accuracy of 100%. In contrast to GC bias inherent to NGS results, we demonstrated that our single-molecule sequencing method yields minimal GC bias.
The role of recombination in the origin and evolution of Alu subfamilies.
Teixeira-Silva, Ana; Silva, Raquel M; Carneiro, João; Amorim, António; Azevedo, Luísa
2013-01-01
Alus are the most abundant and successful short interspersed nuclear elements found in primate genomes. In humans, they represent about 10% of the genome, although few are retrotransposition-competent and are clustered into subfamilies according to the source gene from which they evolved. Recombination between them can lead to genomic rearrangements of clinical and evolutionary significance. In this study, we have addressed the role of recombination in the origin of chimeric Alu source genes by the analysis of all known consensus sequences of human Alus. From the allelic diversity of Alu consensus sequences, validated in extant elements resulting from whole genome searches, distinct events of recombination were detected in the origin of particular subfamilies of AluS and AluY source genes. These results demonstrate that at least two subfamilies are likely to have emerged from ectopic Alu-Alu recombination, which stimulates further research regarding the potential of chimeric active Alus to punctuate the genome.
σ54-Dependent Response to Nitrogen Limitation and Virulence in Burkholderia cenocepacia Strain H111
Lardi, Martina; Aguilar, Claudio; Pedrioli, Alessandro; Omasits, Ulrich; Suppiger, Angela; Cárcamo-Oyarce, Gerardo; Schmid, Nadine; Ahrens, Christian H.
2015-01-01
Members of the genus Burkholderia are versatile bacteria capable of colonizing highly diverse environmental niches. In this study, we investigated the global response of the opportunistic pathogen Burkholderia cenocepacia H111 to nitrogen limitation at the transcript and protein expression levels. In addition to a classical response to nitrogen starvation, including the activation of glutamine synthetase, PII proteins, and the two-component regulatory system NtrBC, B. cenocepacia H111 also upregulated polyhydroxybutyrate (PHB) accumulation and exopolysaccharide (EPS) production in response to nitrogen shortage. A search for consensus sequences in promoter regions of nitrogen-responsive genes identified a σ54 consensus sequence. The mapping of the σ54 regulon as well as the characterization of a σ54 mutant suggests an important role of σ54 not only in control of nitrogen metabolism but also in the virulence of this organism. PMID:25841012
Deans, Zandra C; Costa, Jose Luis; Cree, Ian; Dequeker, Els; Edsjö, Anders; Henderson, Shirley; Hummel, Michael; Ligtenberg, Marjolijn Jl; Loddo, Marco; Machado, Jose Carlos; Marchetti, Antonio; Marquis, Katherine; Mason, Joanne; Normanno, Nicola; Rouleau, Etienne; Schuuring, Ed; Snelson, Keeda-Marie; Thunnissen, Erik; Tops, Bastiaan; Williams, Gareth; van Krieken, Han; Hall, Jacqueline A
2017-01-01
The clinical demand for mutation detection within multiple genes from a single tumour sample requires molecular diagnostic laboratories to develop rapid, high-throughput, highly sensitive, accurate and parallel testing within tight budget constraints. To meet this demand, many laboratories employ next-generation sequencing (NGS) based on small amplicons. Building on existing publications and general guidance for the clinical use of NGS and learnings from germline testing, the following guidelines establish consensus standards for somatic diagnostic testing, specifically for identifying and reporting mutations in solid tumours. These guidelines cover the testing strategy, implementation of testing within clinical service, sample requirements, data analysis and reporting of results. In conjunction with appropriate staff training and international standards for laboratory testing, these consensus standards for the use of NGS in molecular pathology of solid tumours will assist laboratories in implementing NGS in clinical services.
Stephen, Alexa A; Leone, Angelique M; Toplon, David E; Archer, Linda L; Wellehan, James F X
2016-12-01
A juvenile female bald eagle ( Haliaeetus leucocephalus ) was presented with emaciation and proliferative periocular lesions. The eagle did not respond to supportive therapy and was euthanatized. Histopathologic examination of the skin lesions revealed plaques of marked epidermal hyperplasia parakeratosis, marked acanthosis and spongiosis, and eosinophilic intracytoplasmic inclusion bodies. Novel polymerase chain reaction (PCR) assays were done to amplify and sequence DNA polymerase and rpo147 genes. The 4b gene was also analyzed by a previously developed assay. Bayesian and maximum likelihood phylogenetic analyses of the obtained sequences found it to be poxvirus of the genus Avipoxvirus and clustered with other raptor isolates. Better phylogenetic resolution was found in rpo147 rather than the commonly used DNA polymerase. The novel consensus rpo147 PCR assay will create more accurate phylogenic trees and allow better insight into poxvirus history.
NASA Astrophysics Data System (ADS)
Gong, Liang; Wu, Yu; Jian, Qijie; Yin, Chunxiao; Li, Taotao; Gupta, Vijai Kumar; Duan, Xuewu; Jiang, Yueming
2018-01-01
Vibrio qinghaiensis sp.-Q67 (Vqin-Q67) is a freshwater luminescent bacterium that continuously emits blue-green light (485 nm). The bacterium has been widely used for detecting toxic contaminants. Here, we report the complete genome sequence of Vqin-Q67, obtained using third-generation PacBio sequencing technology. Continuous long reads were attained from three PacBio sequencing runs and reads >500 bp with a quality value of >0.75 were merged together into a single dataset. This resultant highly-contiguous de novo assembly has no genome gaps, and comprises two chromosomes with substantial genetic information, including protein-coding genes, non-coding RNA, transposon and gene islands. Our dataset can be useful as a comparative genome for evolution and speciation studies, as well as for the analysis of protein-coding gene families, the pathogenicity of different Vibrio species in fish, the evolution of non-coding RNA and transposon, and the regulation of gene expression in relation to the bioluminescence of Vqin-Q67.
Ardui, Simon; Ameur, Adam; Vermeesch, Joris R; Hestand, Matthew S
2018-01-01
Abstract Short read massive parallel sequencing has emerged as a standard diagnostic tool in the medical setting. However, short read technologies have inherent limitations such as GC bias, difficulties mapping to repetitive elements, trouble discriminating paralogous sequences, and difficulties in phasing alleles. Long read single molecule sequencers resolve these obstacles. Moreover, they offer higher consensus accuracies and can detect epigenetic modifications from native DNA. The first commercially available long read single molecule platform was the RS system based on PacBio's single molecule real-time (SMRT) sequencing technology, which has since evolved into their RSII and Sequel systems. Here we capsulize how SMRT sequencing is revolutionizing constitutional, reproductive, cancer, microbial and viral genetic testing. PMID:29401301
Weight distributions for turbo codes using random and nonrandom permutations
NASA Technical Reports Server (NTRS)
Dolinar, S.; Divsalar, D.
1995-01-01
This article takes a preliminary look at the weight distributions achievable for turbo codes using random, nonrandom, and semirandom permutations. Due to the recursiveness of the encoders, it is important to distinguish between self-terminating and non-self-terminating input sequences. The non-self-terminating sequences have little effect on decoder performance, because they accumulate high encoded weight until they are artificially terminated at the end of the block. From probabilistic arguments based on selecting the permutations randomly, it is concluded that the self-terminating weight-2 data sequences are the most important consideration in the design of constituent codes; higher-weight self-terminating sequences have successively decreasing importance. Also, increasing the number of codes and, correspondingly, the number of permutations makes it more and more likely that the bad input sequences will be broken up by one or more of the permuters. It is possible to design nonrandom permutations that ensure that the minimum distance due to weight-2 input sequences grows roughly as the square root of (2N), where N is the block length. However, these nonrandom permutations amplify the bad effects of higher-weight inputs, and as a result they are inferior in performance to randomly selected permutations. But there are 'semirandom' permutations that perform nearly as well as the designed nonrandom permutations with respect to weight-2 input sequences and are not as susceptible to being foiled by higher-weight inputs.
Rasmussen, C.; Purcell, M.K.; Gregg, J.L.; LaPatra, S.E.; Winton, J.R.; Hershberger, P.K.
2010-01-01
The mesomycetozoean parasite Ichthyophonus hoferi is most commonly associated with marine fish hosts but also occurs in some components of the freshwater rainbow trout Oncorhynchus mykiss aquaculture industry in Idaho, USA. It is not certain how the parasite was introduced into rainbow trout culture, but it might have been associated with the historical practice of feeding raw, ground common carp Cyprinus carpio that were caught by commercial fisherman. Here, we report a major genetic division between west coast freshwater and marine isolates of Ichthyophonus hoferi. Sequence differences were not detected in 2 regions of the highly conserved small subunit (18S) rDNA gene; however, nucleotide variation was seen in internal transcribed spacer loci (ITS1 and ITS2), both within and among the isolates. Intra-isolate variation ranged from 2.4 to 7.6 nucleotides over a region consisting of ~740 bp. Majority consensus sequences from marine/anadromous hosts differed in only 0 to 3 nucleotides (99.6 to 100% nucleotide identity), while those derived from freshwater rainbow trout had no nucleotide substitutions relative to each other. However, the consensus sequences between isolates from freshwater rainbow trout and those from marine/anadromous hosts differed in 13 to 16 nucleotides (97.8 to 98.2% nucleotide identity).
Accurate multiplex polony sequencing of an evolved bacterial genome.
Shendure, Jay; Porreca, Gregory J; Reppas, Nikos B; Lin, Xiaoxia; McCutcheon, John P; Rosenbaum, Abraham M; Wang, Michael D; Zhang, Kun; Mitra, Robi D; Church, George M
2005-09-09
We describe a DNA sequencing technology in which a commonly available, inexpensive epifluorescence microscope is converted to rapid nonelectrophoretic DNA sequencing automation. We apply this technology to resequence an evolved strain of Escherichia coli at less than one error per million consensus bases. A cell-free, mate-paired library provided single DNA molecules that were amplified in parallel to 1-micrometer beads by emulsion polymerase chain reaction. Millions of beads were immobilized in a polyacrylamide gel and subjected to automated cycles of sequencing by ligation and four-color imaging. Cost per base was roughly one-ninth as much as that of conventional sequencing. Our protocols were implemented with off-the-shelf instrumentation and reagents.
Nucleotide Sequence of the blaRTG-2 (CARB-5) Gene and Phylogeny of a New Group of Carbenicillinases
Choury, Daniele; Szajnert, Marie-France; Joly-Guillou, Marie-Laure; Azibi, Kemal; Delpech, Marc; Paul, Gérard
2000-01-01
We determined the nucleotide sequence of the bla gene for the Acinetobacter calcoaceticus β-lactamase previously described as CARB-5. Alignment of the deduced amino acid sequence with those of known β-lactamases revealed that CARB-5 possesses an RTG triad in box VII, as described for the Proteus mirabilis GN79 enzyme, instead of the RSG consensus characteristic of the other carbenicillinases. Phylogenetic studies showed that these RTG enzymes constitute a new, separate group, possibly ancestors of the carbenicillinase family. PMID:10722515
Webb, Kristen M; Rosenthal, Benjamin M
2011-01-01
The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.
Saavedra-Lira, E; Pérez-Montfort, R
1994-05-16
We isolated three overlapping clones from a DNA genomic library of Entamoeba histolytica strain HM1:IMSS, whose translated nucleotide (nt) sequence shows similarities of 51, 48 and 47% with the amino acid (aa) sequences reported for the pyruvate phosphate dikinases from Bacteroides symbiosus, maize and Flaveria trinervia, respectively. The reading frame determined codes for a protein of 886 aa.
Ghio, Silvina; Martinez Cáceres, Alfredo I; Talia, Paola; Grasso, Daniel H; Campos, Eleonora
2015-10-22
Paenibacillus sp. A59 was isolated from decaying forest soil in Argentina and characterized as a xylanolytic strain. We report the draft genome sequence of this isolate, with an estimated genome size of 7 Mb which harbor 6,424 coding sequences. Genes coding for hydrolytic enzymes involved in lignocellulose deconstruction were predicted. Copyright © 2015 Ghio et al.
Caldwell, Rachel; Lin, Yan-Xia; Zhang, Ren
2015-01-01
There is a continuing interest in the analysis of gene architecture and gene expression to determine the relationship that may exist. Advances in high-quality sequencing technologies and large-scale resource datasets have increased the understanding of relationships and cross-referencing of expression data to the large genome data. Although a negative correlation between expression level and gene (especially transcript) length has been generally accepted, there have been some conflicting results arising from the literature concerning the impacts of different regions of genes, and the underlying reason is not well understood. The research aims to apply quantile regression techniques for statistical analysis of coding and noncoding sequence length and gene expression data in the plant, Arabidopsis thaliana, and fruit fly, Drosophila melanogaster, to determine if a relationship exists and if there is any variation or similarities between these species. The quantile regression analysis found that the coding sequence length and gene expression correlations varied, and similarities emerged for the noncoding sequence length (5′ and 3′ UTRs) between animal and plant species. In conclusion, the information described in this study provides the basis for further exploration into gene regulation with regard to coding and noncoding sequence length. PMID:26114098
Robertson, Helen E; Lapraz, François; Egger, Bernhard; Telford, Maximilian J; Schiffer, Philipp H
2017-05-12
Acoels are small, ubiquitous - but understudied - marine worms with a very simple body plan. Their internal phylogeny is still not fully resolved, and the position of their proposed phylum Xenacoelomorpha remains debated. Here we describe mitochondrial genome sequences from the acoels Paratomella rubra and Isodiametra pulchra, and the complete mitochondrial genome of the acoel Archaphanostoma ylvae. The P. rubra and A. ylvae sequences are typical for metazoans in size and gene content. The larger I. pulchra mitochondrial genome contains both ribosomal genes, 21 tRNAs, but only 11 protein-coding genes. We find evidence suggesting a duplicated sequence in the I. pulchra mitochondrial genome. The P. rubra, I. pulchra and A. ylvae mitochondria have a unique genome organisation in comparison to other metazoan mitochondrial genomes. We found a large degree of protein-coding gene and tRNA overlap with little non-coding sequence in the compact P. rubra genome. Conversely, the A. ylvae and I. pulchra genomes have many long non-coding sequences between genes, likely driving genome size expansion in the latter. Phylogenetic trees inferred from mitochondrial genes retrieve Xenacoelomorpha as an early branching taxon in the deuterostomes. Sequence divergence analysis between P. rubra sampled in England and Spain indicates cryptic diversity.
Metal resistance sequences and transgenic plants
Meagher, Richard Brian; Summers, Anne O.; Rugh, Clayton L.
1999-10-12
The present invention provides nucleic acid sequences encoding a metal ion resistance protein, which are expressible in plant cells. The metal resistance protein provides for the enzymatic reduction of metal ions including but not limited to divalent Cu, divalent mercury, trivalent gold, divalent cadmium, lead ions and monovalent silver ions. Transgenic plants which express these coding sequences exhibit increased resistance to metal ions in the environment as compared with plants which have not been so genetically modified. Transgenic plants with improved resistance to organometals including alkylmercury compounds, among others, are provided by the further inclusion of plant-expressible organometal lyase coding sequences, as specifically exemplified by the plant-expressible merB coding sequence. Furthermore, these transgenic plants which have been genetically modified to express the metal resistance coding sequences of the present invention can participate in the bioremediation of metal contamination via the enzymatic reduction of metal ions. Transgenic plants resistant to organometals can further mediate remediation of organic metal compounds, for example, alkylmetal compounds including but not limited to methyl mercury, methyl lead compounds, methyl cadmium and methyl arsenic compounds, in the environment by causing the freeing of mercuric or other metal ions and the reduction of the ionic mercury or other metal ions to the less toxic elemental mercury or other metals.
Kim, Min Jee; Im, Hyun Hwak; Lee, Kwang Youll; Han, Yeon Soo; Kim, Iksoo
2014-06-01
Abstract The complete nucleotide sequences of the mitochondrial genome from the whiter-spotted flower chafer, Protaetia brevitarsis (Coleoptera: Scarabaeidae), was determined. The 20,319-bp long circular genome is the longest among completely sequenced Coleoptera. As is typical in animals, the P. brevitarsis genome consisted of two ribosomal RNAs, 22 transfer RNAs, 13 protein-coding genes and one A + T-rich region. Although the size of the coding genes was typical, the non-coding A + T-rich region was 5654 bp, which is the longest in insects. The extraordinary length of this region was composed of 28,117-bp tandem repeats and 782-bp tandem repeats. These repeat sequences were encompassed by three non-repeat sequences constituting 1804 bp.
EDGE 2017 R&D 100 Entry with Appendix
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chain, Patrick Sam Guy; Davenport, Karen Walston; Li, Po-E
Diabetes, infertility, cancer, and Alzheimer’s disease—the key to one day preventing or even curing such afflictions and diseases (both infectious and genetically driven) may be locked in our own genetic code and the code of microorganisms that inhabit our bodies. The study of this code, known as genomics, has recently become much more promising as a result of two things: (1) vast improvements in high-throughput, nextgeneration sequencing (NSG), and (2) an exponential decrease in the cost of such sequencing. For example, it originally cost approximately $3 billion to sequence the human genome; today, this genome could be resequenced for lessmore » than $1,000.« less
“One code to find them all”: a perl tool to conveniently parse RepeatMasker output files
2014-01-01
Background Of the different bioinformatic methods used to recover transposable elements (TEs) in genome sequences, one of the most commonly used procedures is the homology-based method proposed by the RepeatMasker program. RepeatMasker generates several output files, including the .out file, which provides annotations for all detected repeats in a query sequence. However, a remaining challenge consists of identifying the different copies of TEs that correspond to the identified hits. This step is essential for any evolutionary/comparative analysis of the different copies within a family. Different possibilities can lead to multiple hits corresponding to a unique copy of an element, such as the presence of large deletions/insertions or undetermined bases, and distinct consensus corresponding to a single full-length sequence (like for long terminal repeat (LTR)-retrotransposons). These possibilities must be taken into account to determine the exact number of TE copies. Results We have developed a perl tool that parses the RepeatMasker .out file to better determine the number and positions of TE copies in the query sequence, in addition to computing quantitative information for the different families. To determine the accuracy of the program, we tested it on several RepeatMasker .out files corresponding to two organisms (Drosophila melanogaster and Homo sapiens) for which the TE content has already been largely described and which present great differences in genome size, TE content, and TE families. Conclusions Our tool provides access to detailed information concerning the TE content in a genome at the family level from the .out file of RepeatMasker. This information includes the exact position and orientation of each copy, its proportion in the query sequence, and its quality compared to the reference element. In addition, our tool allows a user to directly retrieve the sequence of each copy and obtain the same detailed information at the family level when a local library with incomplete TE class/subclass information was used with RepeatMasker. We hope that this tool will be helpful for people working on the distribution and evolution of TEs within genomes.
Scaling features of noncoding DNA
NASA Technical Reports Server (NTRS)
Stanley, H. E.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.
1999-01-01
We review evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene, and utilize this fact to build a Coding Sequence Finder Algorithm, which uses statistical ideas to locate the coding regions of an unknown DNA sequence. Finally, we describe briefly some recent work adapting to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function, and reporting that noncoding regions in eukaryotes display a larger redundancy than coding regions. Specifically, we consider the possibility that this result is solely a consequence of nucleotide concentration differences as first noted by Bonhoeffer and his collaborators. We find that cytosine-guanine (CG) concentration does have a strong "background" effect on redundancy. However, we find that for the purine-pyrimidine binary mapping rule, which is not affected by the difference in CG concentration, the Shannon redundancy for the set of analyzed sequences is larger for noncoding regions compared to coding regions.
Chen, W L; Luo, D F; Gao, C; Ding, Y; Wang, S Y
2015-07-01
The familial acute myeloid leukemia related factor gene (FAMLF) was previously identified from a familial AML subtractive cDNA library and shown to undergo alternative splicing. This study used real-time quantitative PCR to investigate the expression of the FAMLF alternative-splicing transcript consensus sequence (FAMLF-CS) in peripheral blood mononuclear cells (PBMCs) from 119 patients with de novo acute leukemia (AL) and 104 healthy controls, as well as in CD34+ cells from 12 AL patients and 10 healthy donors. A 429-bp fragment from a novel splicing variant of FAMLF was obtained, and a 363-bp consensus sequence was targeted to quantify total FAMLF expression. Kruskal-Wallis, Nemenyi, Spearman's correlation, and Mann-Whitney U-tests were used to analyze the data. FAMLF-CS expression in PBMCs from AL patients and CD34+ cells from AL patients and controls was significantly higher than in control PBMCs (P < 0.0001). Moreover, FAMLF-CS expression in PBMCs from the AML group was positively correlated with red blood cell count (rs =0.317, P=0.006), hemoglobin levels (rs = 0.210, P = 0.049), and percentage of peripheral blood blasts (rs = 0.256, P = 0.027), but inversely correlated with hemoglobin levels in the control group (rs = -0.391, P < 0.0001). AML patients with high CD34+ expression showed significantly higher FAMLF-CS expression than those with low CD34+ expression (P = 0.041). Our results showed that FAMLF is highly expressed in both normal and malignant immature hematopoietic cells, but that expression is lower in normal mature PBMCs.
Khalil, Farghama; Yueyu, Xu; Naiyan, Xiao; Di, Liu; Tayyab, Muhammad; Hengbo, Wang; Islam, Waqar; Rauf, Saeed; Pinghua, Chen
2018-05-04
Sugarcane is an essential crop for sugar and biofuel. Globally, its production is severely affected by sugarcane yellow leaf disease (SCYLD) caused by Sugarcane Yellow Leaf Virus (SCYLV). Many aphid vectors are involved in the spread of the disease which reduced the effectiveness of cultural and chemical management. Empirical methods of plant breeding such as introgression from wild and cultivated germplasm were not possible or at least challenging due to the absence of resistance in cultivated and wild germplasm of sugarcane. RNA interference (RNAi) transformation is an effective method to create virus-resistant varieties. Nevertheless, limited progress has been made due to lack of comprehensive research program on SCYLV based on RNAi technique. In order to show improvement and to propose future strategies for the feasibility of the RNAi technique to cope SCYLV, genome-wide consensus sequences of SCYLV were analyzed through GenBank. The coverage rates of every consensus sequence in SCYLV isolates were calculated to evaluate their practicability. Our analysis showed that single consensus sequence from SCYLV could not work well for RNAi based sugarcane breeding programs. This may be due to high mutation rate and continuous recombination within and between various viral strains. Alternative multi-target RNAi strategy is suggested to combat several strains of the viruses and to reduce the silencing escape. The multi-target small interfering RNA (siRNA) can be used together to construct RNAi plant expression plasmid, and to transform sugarcane tissues to develop new sugarcane varieties resistant to SCYLV. Copyright © 2018 Elsevier Ltd. All rights reserved.
Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly
Sugano, Sumio; Morishita, Shinichi; Suzuki, Yutaka
2010-01-01
Background Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded. Methodology We developed a program, MuSICA 2, that assembles millions of short (36-nucleotide) reads collected from a single flow cell lane of Illumina Genome Analyzer to shotgun-sequence ∼800 human full-length cDNA clones. MuSICA 2 performs a hybrid assembly in which an external de novo assembler is run first and the result is then improved by reference alignment of shotgun reads. We compared the MuSICA 2 assembly with 200 pooled full-length cDNA clones finished independently by the conventional primer-walking using Sanger sequencers. The exon-intron structure of the coding sequence was correct for more than 95% of the clones with coding sequence annotation when we excluded cDNA clones insufficiently represented in the shotgun library due to PCR failure (42 out of 200 clones excluded), and the nucleotide-level accuracy of coding sequences of those correct clones was over 99.99%. We also applied MuSICA 2 to full-length cDNA clones from Toxoplasma gondii, to confirm that its ability was competent even for non-human species. Conclusions The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches. PMID:20479877
Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P
1988-02-01
Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators.
Antalis, T M; Clark, M A; Barnes, T; Lehrbach, P R; Devine, P L; Schevzov, G; Goss, N H; Stephens, R W; Tolstoshev, P
1988-01-01
Human monocyte-derived plasminogen activator inhibitor (mPAI-2) was purified to homogeneity from the U937 cell line and partially sequenced. Oligonucleotide probes derived from this sequence were used to screen a cDNA library prepared from U937 cells. One positive clone was sequenced and contained most of the coding sequence as well as a long incomplete 3' untranslated region (1112 base pairs). This cDNA sequence was shown to encode mPAI-2 by hybrid-select translation. A cDNA clone encoding the remainder of the mPAI-2 mRNA was obtained by primer extension of U937 poly(A)+ RNA using a probe complementary to the mPAI-2 coding region. The coding sequence for mPAI-2 was placed under the control of the lambda PL promoter, and the protein expressed in Escherichia coli formed a complex with urokinase that could be detected immunologically. By nucleotide sequence analysis, mPAI-2 cDNA encodes a protein containing 415 amino acids with a predicted unglycosylated Mr of 46,543. The predicted amino acid sequence of mPAI-2 is very similar to placental PAI-2 (3 amino acid differences) and shows extensive homology with members of the serine protease inhibitor (serpin) superfamily. mPAI-2 was found to be more homologous to ovalbumin (37%) than the endothelial plasminogen activator inhibitor, PAI-1 (26%). Like ovalbumin, mPAI-2 appears to have no typical amino-terminal signal sequence. The 3' untranslated region of the mPAI-2 cDNA contains a putative regulatory sequence that has been associated with the inflammatory mediators. Images PMID:3257578
[Learning and Repetive Reproduction of Memorized Sequences by the Right and the Left Hand].
Bobrova, E V; Lyakhovetskii, V A; Bogacheva, I N
2015-01-01
An important stage of learning a new skill is repetitive reproduction of one and the same sequence of movements, which plays a significant role in forming of the movement stereotypes. Two groups of right-handers repeatedly memorized (6-10 repetitions) the sequences of their hand transitions by experimenter in 6 positions, firstly by the right hand (RH), and then--by the left hand (LH) or vice versa. Random sequences previously unknown to the volunteers were reproduced in the 11 series. Modified sequences were tested in the 2nd and 3rd series, where the same elements' positions were presented in different order. The processes of repetitive sequence reproduction were similar for RH and LH. However, the learning of the modified sequences differed: Information about elements' position disregarding the reproduction order was used only when LH initiated task performing. This information was not used when LH followed RH and when RH performed the task. Consequently, the type of information coding activated by LH helped learn the positions of sequence elements, while the type of information coding activated by RH prevented learning. It is supposedly connected with the predominant role of right hemisphere in the processes of positional coding and motor learning.
Sequence Polishing Library (SPL) v10.0
DOE Office of Scientific and Technical Information (OSTI.GOV)
Oberortner, Ernst
The Sequence Polishing Library (SPL) is a suite of software tools in order to automate "Design for Synthesis and Assembly" workflows. Specifically: The SPL "Converter" tool converts files among the following sequence data exchange formats: CSV, FASTA, GenBank, and Synthetic Biology Open Language (SBOL); The SPL "Juggler" tool optimizes the codon usages of DNA coding sequences according to an optimization strategy, a user-specific codon usage table and genetic code. In addition, the SPL "Juggler" can translate amino acid sequences into DNA sequences.:The SPL "Polisher" verifies NA sequences against DNA synthesis constraints, such as GC content, repeating k-mers, and restriction sites.more » In case of violations, the "Polisher" reports the violations in a comprehensive manner. The "Polisher" tool can also modify the violating regions according to an optimization strategy, a user-specific codon usage table and genetic code;The SPL "Partitioner" decomposes large DNA sequences into smaller building blocks with partial overlaps that enable an efficient assembly. The "Partitioner" enables the user to configure the characteristics of the overlaps, which are mostly determined by the utilized assembly protocol, such as length, GC content, or melting temperature.« less
Multiple Access Interference Reduction Using Received Response Code Sequence for DS-CDMA UWB System
NASA Astrophysics Data System (ADS)
Toh, Keat Beng; Tachikawa, Shin'ichi
This paper proposes a combination of novel Received Response (RR) sequence at the transmitter and a Matched Filter-RAKE (MF-RAKE) combining scheme receiver system for the Direct Sequence-Code Division Multiple Access Ultra Wideband (DS-CDMA UWB) multipath channel model. This paper also demonstrates the effectiveness of the RR sequence in Multiple Access Interference (MAI) reduction for the DS-CDMA UWB system. It suggests that by using conventional binary code sequence such as the M sequence or the Gold sequence, there is a possibility of generating extra MAI in the UWB system. Therefore, it is quite difficult to collect the energy efficiently although the RAKE reception method is applied at the receiver. The main purpose of the proposed system is to overcome the performance degradation for UWB transmission due to the occurrence of MAI during multiple accessing in the DS-CDMA UWB system. The proposed system improves the system performance by improving the RAKE reception performance using the RR sequence which can reduce the MAI effect significantly. Simulation results verify that significant improvement can be obtained by the proposed system in the UWB multipath channel models.
Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV)
Martin, Andrew C. R.
2014-01-01
The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and ’dotifying’ repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/. PMID:25653836
Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).
Martin, Andrew C R
2014-01-01
The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.
Chromosome-Encoded Broad-Spectrum Ambler Class A β-Lactamase RUB-1 from Serratia rubidaea
Didi, Jennifer; Ergani, Ayla; Lima, Sandra
2016-01-01
ABSTRACT Whole-genome sequencing of Serratia rubidaea CIP 103234T revealed a chromosomally located Ambler class A β-lactamase gene. The gene was cloned, and the β-lactamase, RUB-1, was characterized. RUB-1 displayed 74% and 73% amino acid sequence identity with the GIL-1 and TEM-1 penicillinases, respectively, and its substrate profile was similar to that of the latter β-lactamases. Analysis by 5′ rapid amplification of cDNA ends revealed promoter sequences highly divergent from the Escherichia coli σ70 consensus sequence. This work further illustrates the heterogeneity of β-lactamases among Serratia spp. PMID:27956418
Chromosome-Encoded Broad-Spectrum Ambler Class A β-Lactamase RUB-1 from Serratia rubidaea.
Bonnin, Rémy A; Didi, Jennifer; Ergani, Ayla; Lima, Sandra; Naas, Thierry
2017-02-01
Whole-genome sequencing of Serratia rubidaea CIP 103234 T revealed a chromosomally located Ambler class A β-lactamase gene. The gene was cloned, and the β-lactamase, RUB-1, was characterized. RUB-1 displayed 74% and 73% amino acid sequence identity with the GIL-1 and TEM-1 penicillinases, respectively, and its substrate profile was similar to that of the latter β-lactamases. Analysis by 5' rapid amplification of cDNA ends revealed promoter sequences highly divergent from the Escherichia coli σ 70 consensus sequence. This work further illustrates the heterogeneity of β-lactamases among Serratia spp. Copyright © 2017 American Society for Microbiology.
LongISLND: in silico sequencing of lengthy and noisy datatypes
Lau, Bayo; Mohiyuddin, Marghoob; Mu, John C.; Fang, Li Tai; Bani Asadi, Narges; Dallett, Carolina; Lam, Hugo Y. K.
2016-01-01
Summary: LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling. Availability and Implementation: LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd Contact: hugo.lam@roche.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27667791
Outdoor Test Facility and Related Facilities | Photovoltaic Research | NREL
advanced or emerging photovoltaic (PV) technologies under simulated, accelerated indoor and outdoor, and evaluate prototype, pre-commercial, and commercial PV modules. One of the major roles of researchers at the OTF is to work with industry to develop uniform and consensus standards and codes for testing PV
RNAcentral: an international database of ncRNA sequences
Williams, Kelly Porter
2014-10-28
The field of non-coding RNA biology has been hampered by the lack of availability of a comprehensive, up-to-date collection of accessioned RNA sequences. Here we present the first release of RNAcentral, a database that collates and integrates information from an international consortium of established RNA sequence databases. The initial release contains over 8.1 million sequences, including representatives of all major functional classes. A web portal (http://rnacentral.org) provides free access to data, search functionality, cross-references, source code and an integrated genome browser for selected species.
Bustamante, Carlos; Ovenden, Jennifer R
2016-01-01
The silver gemfish Rexea solandri is an important economic resource but Vulnerable to overfishing in Australian waters. The complete mitochondrial genome sequence is described from 1.6 million reads obtained via next generation sequencing. The total length of the mitogenome is 16,350 bp comprising 2 rRNA, 13 protein-coding genes, 22 tRNA and 2 non-coding regions. The mitogenome sequence was validated against sequences of PCR fragments and BLAST queries of Genbank. Gene order was equivalent to that found in marine fishes.
Escamilla-Cejudo, José Antonio; Báez, Jorge Lara; Peña, Rodolfo; Luna, Patricia Lorena Ruiz; Ordunez, Pedro
2016-11-01
Several Central American countries are seeing continued growth in the number of deaths from chronic kidney disease of nontraditional causes (CKDnT) among farm workers and there is underreporting. This report presents the results of a consensus process coordinated by the Pan American Health Organization/World Health Organization (PAHO/WHO), the United States Centers for Disease Control and Prevention (CDC), and the Latin American Society of Nephrology and Hypertension (SLANH). This consensus seeks to increase the probability of detecting and recording deaths from these causes. There has been recognition of the negative impact of the lack of a standardized instrument and the lack of training in the medical profession for adequate registration of the cause or causes of death. As a result of the consensus, the following has been proposed: temporarily use a code from the Codes for Special Purposes in the International Classification of Diseases (ICD-10); continue to promote use of the WHO international standardized instrument for recording causes and preceding events related to death; increase training of physicians responsible for filling out death certificates; take action to increase the coverage and quality of information on mortality; and create a decision tree to facilitate selection of CKDnT as a specific cause of death, while presenting the role that different regional and subregional mechanisms in the Region of the Americas should play in order to improve CKD and CKDnT mortality records.
Integrated consensus genetic and physical maps of flax (Linum usitatissimum L.).
Cloutier, Sylvie; Ragupathy, Raja; Miranda, Evelyn; Radovanovic, Natasa; Reimer, Elsa; Walichnowski, Andrzej; Ward, Kerry; Rowland, Gordon; Duguid, Scott; Banik, Mitali
2012-12-01
Three linkage maps of flax (Linum usitatissimum L.) were constructed from populations CDC Bethune/Macbeth, E1747/Viking and SP2047/UGG5-5 containing between 385 and 469 mapped markers each. The first consensus map of flax was constructed incorporating 770 markers based on 371 shared markers including 114 that were shared by all three populations and 257 shared between any two populations. The 15 linkage group map corresponds to the haploid number of chromosomes of this species. The marker order of the consensus map was largely collinear in all three individual maps but a few local inversions and marker rearrangements spanning short intervals were observed. Segregation distortion was present in all linkage groups which contained 1-52 markers displaying non-Mendelian segregation. The total length of the consensus genetic map is 1,551 cM with a mean marker density of 2.0 cM. A total of 670 markers were anchored to 204 of the 416 fingerprinted contigs of the physical map corresponding to ~274 Mb or 74 % of the estimated flax genome size of 370 Mb. This high resolution consensus map will be a resource for comparative genomics, genome organization, evolution studies and anchoring of the whole genome shotgun sequence.
High rate concatenated coding systems using bandwidth efficient trellis inner codes
NASA Technical Reports Server (NTRS)
Deng, Robert H.; Costello, Daniel J., Jr.
1989-01-01
High-rate concatenated coding systems with bandwidth-efficient trellis inner codes and Reed-Solomon (RS) outer codes are investigated for application in high-speed satellite communication systems. Two concatenated coding schemes are proposed. In one the inner code is decoded with soft-decision Viterbi decoding, and the outer RS code performs error-correction-only decoding (decoding without side information). In the other, the inner code is decoded with a modified Viterbi algorithm, which produces reliability information along with the decoded output. In this algorithm, path metrics are used to estimate the entire information sequence, whereas branch metrics are used to provide reliability information on the decoded sequence. This information is used to erase unreliable bits in the decoded output. An errors-and-erasures RS decoder is then used for the outer code. The two schemes have been proposed for high-speed data communication on NASA satellite channels. The rates considered are at least double those used in current NASA systems, and the results indicate that high system reliability can still be achieved.
O'Connell, T D; Rokosh, D G; Simpson, P C
2001-05-01
alpha1-Adrenergic receptor (AR) subtypes in the heart are expressed by myocytes but not by fibroblasts, a feature that distinguishes alpha1-ARs from beta-ARs. Here we studied myocyte-specific expression of alpha1-ARs, focusing on the subtype alpha1C (also called alpha1A), a subtype implicated in cardiac hypertrophic signaling in rat models. We first cloned the mouse alpha1C-AR gene, which consisted of two exons with an 18 kb intron, similar to the alpha1B-AR gene. The receptor coding sequence was >90% homologous to that of rat and human. alpha1C-AR transcription in mouse heart was initiated from a single Inr consensus sequence at -588 from the ATG; this and a putative polyadenylation sequence 8.5 kb 3' could account for the predominant 11 kb alpha1C mRNA in mouse heart. A 5'-nontranscribed fragment of 4.4 kb was active as a promoter in cardiac myocytes but not in fibroblasts. Promoter activity in myocytes required a single muscle CAT (MCAT) element, and this MCAT bound in vitro to recombinant and endogenous transcriptional enhancer factor-1. Thus, alpha1C-AR transcription in cardiac myocytes shares MCAT dependence with other cardiac-specific genes, including the alpha- and beta-myosin heavy chains, skeletal alpha-actin, and brain natriuretic peptide. However, the mouse alpha1C gene was not transcribed in the neonatal heart and was not activated by alpha1-AR and other hypertrophic agonists in rat myocytes, and thus differed from other MCAT-dependent genes and the rat alpha1C gene.
Pauciullo, Alfredo; Erhardt, Georg
2015-01-01
In the present paper, we report for the first time the characterization of llama (Lama glama) caseins at transcriptomic and genetic level. A total of 288 casein clones transcripts were analysed from two lactating llamas. The most represented mRNA populations were those correctly assembled (85.07%) and they encoded for mature proteins of 215, 217, 187 and 162 amino acids respectively for the CSN1S1, CSN2, CSN1S2 and CSN3 genes. The exonic subdivision evidenced a structure made of 21, 9, 17 and 6 exons for the αs1-, β-, αs2- and κ-casein genes respectively. Exon skipping and duplication events were evidenced. Two variants A and B were identified in the αs1-casein gene as result of the alternative out-splicing of the exon 18. An additional exon coding for a novel esapeptide was found to be cryptic in the κ-casein gene, whereas one extra exon was found in the αs2-casein gene by the comparison with the Camelus dromedaries sequence. A total of 28 putative phosphorylated motifs highlighted a complex heterogeneity and a potential variable degree of post-translational modifications. Ninety-six polymorphic sites were found through the comparison of the lama casein cDNAs with the homologous camel sequences, whereas the first description and characterization of the 5’- and 3’-regulatory regions allowed to identify the main putative consensus sequences involved in the casein genes expression, thus opening the way to new investigations -so far- never achieved in this species. PMID:25923814
Pauciullo, Alfredo; Erhardt, Georg
2015-01-01
In the present paper, we report for the first time the characterization of llama (Lama glama) caseins at transcriptomic and genetic level. A total of 288 casein clones transcripts were analysed from two lactating llamas. The most represented mRNA populations were those correctly assembled (85.07%) and they encoded for mature proteins of 215, 217, 187 and 162 amino acids respectively for the CSN1S1, CSN2, CSN1S2 and CSN3 genes. The exonic subdivision evidenced a structure made of 21, 9, 17 and 6 exons for the αs1-, β-, αs2- and κ-casein genes respectively. Exon skipping and duplication events were evidenced. Two variants A and B were identified in the αs1-casein gene as result of the alternative out-splicing of the exon 18. An additional exon coding for a novel esapeptide was found to be cryptic in the κ-casein gene, whereas one extra exon was found in the αs2-casein gene by the comparison with the Camelus dromedaries sequence. A total of 28 putative phosphorylated motifs highlighted a complex heterogeneity and a potential variable degree of post-translational modifications. Ninety-six polymorphic sites were found through the comparison of the lama casein cDNAs with the homologous camel sequences, whereas the first description and characterization of the 5'- and 3'-regulatory regions allowed to identify the main putative consensus sequences involved in the casein genes expression, thus opening the way to new investigations -so far- never achieved in this species.
Butte, Nancy F; Voruganti, V Saroja; Cole, Shelley A; Haack, Karin; Comuzzie, Anthony G; Muzny, Donna M; Wheeler, David A; Chang, Kyle; Hawes, Alicia; Gibbs, Richard A
2011-09-22
Our objective was to resequence insulin receptor substrate 2 (IRS2) to identify variants associated with obesity- and diabetes-related traits in Hispanic children. Exonic and intronic segments, 5' and 3' flanking regions of IRS2 (∼14.5 kb), were bidirectionally sequenced for single nucleotide polymorphism (SNP) discovery in 934 Hispanic children using 3730XL DNA Sequencers. Additionally, 15 SNPs derived from Illumina HumanOmni1-Quad BeadChips were analyzed. Measured genotype analysis tested associations between SNPs and obesity and diabetes-related traits. Bayesian quantitative trait nucleotide analysis was used to statistically infer the most likely functional polymorphisms. A total of 140 SNPs were identified with minor allele frequencies (MAF) ranging from 0.001 to 0.47. Forty-two of the 70 coding SNPs result in nonsynonymous amino acid substitutions relative to the consensus sequence; 28 SNPs were detected in the promoter, 12 in introns, 28 in the 3'-UTR, and 2 in the 5'-UTR. Two insertion/deletions (indels) were detected. Ten independent rare SNPs (MAF = 0.001-0.009) were associated with obesity-related traits (P = 0.01-0.00002). SNP 10510452_139 in the promoter region was shown to have a high posterior probability (P = 0.77-0.86) of influencing BMI, fat mass, and waist circumference in Hispanic children. SNP 10510452_139 contributed between 2 and 4% of the population variance in body weight and composition. None of the SNPs or indels were associated with diabetes-related traits or accounted for a previously identified quantitative trait locus on chromosome 13 for fasting serum glucose. Rare but not common IRS2 variants may play a role in the regulation of body weight but not an essential role in fasting glucose homeostasis in Hispanic children.
2012-01-01
Background We present a comprehensive transcriptome analysis of the fungus Ascosphaera apis, an economically important pathogen of the Western honey bee (Apis mellifera) that causes chalkbrood disease. Our goals were to further annotate the A. apis reference genome and to identify genes that are candidates for being differentially expressed during host infection versus axenic culture. Results We compared A. apis transcriptome sequence from mycelia grown on liquid or solid media with that dissected from host-infected tissue. 454 pyrosequencing provided 252 Mb of filtered sequence reads from both culture types that were assembled into 10,087 contigs. Transcript contigs, protein sequences from multiple fungal species, and ab initio gene predictions were included as evidence sources in the Maker gene prediction pipeline, resulting in 6,992 consensus gene models. A phylogeny based on 12 of these protein-coding loci further supported the taxonomic placement of Ascosphaera as sister to the core Onygenales. Several common protein domains were less abundant in A. apis compared with related ascomycete genomes, particularly cytochrome p450 and protein kinase domains. A novel gene family was identified that has expanded in some ascomycete lineages, but not others. We manually annotated genes with homologs in other fungal genomes that have known relevance to fungal virulence and life history. Functional categories of interest included genes involved in mating-type specification, intracellular signal transduction, and stress response. Computational and manual annotations have been made publicly available on the Bee Pests and Pathogens website. Conclusions This comprehensive transcriptome analysis substantially enhances our understanding of the A. apis genome and its expression during infection of honey bee larvae. It also provides resources for future molecular studies of chalkbrood disease and ultimately improved disease management. PMID:22747707
Stephens, E B; Mukherjee, S; Sahni, M; Zhuge, W; Raghavan, R; Singh, D K; Leung, K; Atkinson, B; Li, Z; Joag, S V; Liu, Z Q; Narayan, O
1997-05-12
We have examined both the sequence changes in the LTR, gag, vif, vpr, vpx, tat, rev, vpu, env, and nef genes and the cell tropism of a cell-free stock of chimeric simian-human immunodeficiency virus (SHIV) isolated from the cerebrospinal fluid of a pig-tailed macaque (PNb) that developed AIDS. This virus (SHIVKU-1) is highly pathogenic when inoculated into other macaques. DNA sequence analysis of PCR-amplified products revealed a total of 5 nucleotide changes in the LTR while vif had 2 consensus amino acid changes. The gag, vif, and vpx had no consensus amino acid substitutions, whereas vpr had 1 consensus substitution. The tat and rev genes of the HXB2 region of SHIVKU-1 had 2 and 1 consensus amino acid changes, respectively. The vpu gene of the HXB2 region of SHIV, which originally had an ACG at the beginning of the gene, reverted to an initiation ATG codon and in addition contained a consensus amino acid substitution at position 69 of this protein. As expected, the majority of the nucleotide substitutions were found in the env and nef genes. Thirteen and 5 amino acid changes were predicted for the corresponding Env and Nef proteins, respectively. In addition, one-third of the env gene clones isolated from the SHIVKU-1 stock had a 5-amino-acid deletion in the V4 region. Using three independent assays, we determined that the changes in the SHIVKU-1 were associated with an increase in the efficiency of replication in macrophages. The strikingly few consensus changes in the virus suggest that conversion of this virus to one capable of causing AIDS in pig-tailed macaques was associated with relatively few changes in the viral envelope and/or accessory genes. These results will provide the basis for the development of a pathogenic, molecular clone of SHIV capable of causing AIDS in pig-tailed macaques.
Intact coding region of the serotonin transporter gene in obsessive-compulsive disorder
DOE Office of Scientific and Technical Information (OSTI.GOV)
Altemus, M.; Murphy, D.L.; Greenberg, B.
1996-07-26
Epidemiologic studies indicate that obsessive-compulsive disorder is genetically transmitted in some families, although no genetic abnormalities have been identified in individuals with this disorder. The selective response of obsessive-compulsive disorder to treatment with agents which block serotonin reuptake suggests the gene coding for the serotonin transporter as a candidate gene. The primary structure of the serotonin-transporter coding region was sequenced in 22 patients with obsessive-compulsive disorder, using direct PCR sequencing of cDNA synthesized from platelet serotonin-transporter mRNA. No variations in amino acid sequence were found among the obsessive-compulsive disorder patients or healthy controls. These results do not support a rolemore » for alteration in the primary structure of the coding region of the serotonin-transporter gene in the pathogenesis of obsessive-compulsive disorder. 27 refs.« less
AbouHaidar, Mounir Georges; Venkataraman, Srividhya; Golshani, Ashkan; Liu, Bolin; Ahmad, Tauqeer
2014-10-07
The highly structured (64% GC) covalently closed circular (CCC) RNA (220 nt) of the virusoid associated with rice yellow mottle virus codes for a 16-kDa highly basic protein using novel modalities for coding, translation, and gene expression. This CCC RNA is the smallest among all known viroids and virusoids and the only one that codes proteins. Its sequence possesses an internal ribosome entry site and is directly translated through two (or three) completely overlapping ORFs (shifting to a new reading frame at the end of each round). The initiation and termination codons overlap UGAUGA (underline highlights the initiation codon AUG within the combined initiation-termination sequence). Termination codons can be ignored to obtain larger read-through proteins. This circular RNA with no noncoding sequences is a unique natural supercompact "nanogenome."
Viral Linkage in HIV-1 Seroconverters and Their Partners in an HIV-1 Prevention Clinical Trial
Campbell, Mary S.; Mullins, James I.; Hughes, James P.; Celum, Connie; Wong, Kim G.; Raugi, Dana N.; Sorensen, Stefanie; Stoddard, Julia N.; Zhao, Hong; Deng, Wenjie; Kahle, Erin; Panteleeff, Dana; Baeten, Jared M.; McCutchan, Francine E.; Albert, Jan; Leitner, Thomas; Wald, Anna; Corey, Lawrence; Lingappa, Jairam R.
2011-01-01
Background Characterization of viruses in HIV-1 transmission pairs will help identify biological determinants of infectiousness and evaluate candidate interventions to reduce transmission. Although HIV-1 sequencing is frequently used to substantiate linkage between newly HIV-1 infected individuals and their sexual partners in epidemiologic and forensic studies, viral sequencing is seldom applied in HIV-1 prevention trials. The Partners in Prevention HSV/HIV Transmission Study (ClinicalTrials.gov #NCT00194519) was a prospective randomized placebo-controlled trial that enrolled serodiscordant heterosexual couples to determine the efficacy of genital herpes suppression in reducing HIV-1 transmission; as part of the study analysis, HIV-1 sequences were examined for genetic linkage between seroconverters and their enrolled partners. Methodology/Principal Findings We obtained partial consensus HIV-1 env and gag sequences from blood plasma for 151 transmission pairs and performed deep sequencing of env in some cases. We analyzed sequences with phylogenetic techniques and developed a Bayesian algorithm to evaluate the probability of linkage. For linkage, we required monophyletic clustering between enrolled partners' sequences and a Bayesian posterior probability of ≥50%. Adjudicators classified each seroconversion, finding 108 (71.5%) linked, 40 (26.5%) unlinked, and 3 (2.0%) indeterminate transmissions, with linkage determined by consensus env sequencing in 91 (84%). Male seroconverters had a higher frequency of unlinked transmissions than female seroconverters. The likelihood of transmission from the enrolled partner was related to time on study, with increasing numbers of unlinked transmissions occurring after longer observation periods. Finally, baseline viral load was found to be significantly higher among linked transmitters. Conclusions/Significance In this first use of HIV-1 sequencing to establish endpoints in a large clinical trial, more than one-fourth of transmissions were unlinked to the enrolled partner, illustrating the relevance of these methods in the design of future HIV-1 prevention trials in serodiscordant couples. A hierarchy of sequencing techniques, analysis methods, and expert adjudication contributed to the linkage determination process. PMID:21399681
Hara, Yasushi; Hayashi, Kyohei; Nakajima, Takuya; Kagawa, Shizuko; Tazumi, Akihiro; Moore, John E; Matsuda, Motoo
2013-09-01
Clustered regularly interspaced short palindromic repeats (CRISPRs), of approximately 10,000 base pairs (bp) in length, were shown to occur in the Japanese Taylorella equigenitalis strain, EQ59. The locus was composed of the putative CRISPRs-associated with 5 (cas5), RAMP csd1, csd2, recB, cas1, a leader region, 13 CRISPR consensus sequence repeats (each 32 bp; 5'-TCAGCCACGTTCGCGTGGCTGTGTGTTTAAAG-3'). These were in turn separated by 12 non repetitive unique spacer regions of similar length. In addition, a leader region, a transposase/IS protein, a leader region, and cas3 were also seen. All seven putative open reading frames carry their ribosome binding sites. Promoter consensus sequences at the -35 and -10 regions and putative intrinsic ρ-independent transcription terminator regions also occurred. A possible long overlap of 170 bp in length occurred between the recB and cas1 loci. Positive reverse transcription PCR signals of cas5, RAMP csd1, csd2-recB/cas1, and cas3 were generated. A putative secondary structure of the CRISPR consensus repeats was constructed. Following this, CRISPR results of the T. equigenitalis EQ59 isolate were subsequently compared with those from the Taylorella asinigenitalis MCE3 isolate.
Gene and genon concept: coding versus regulation
2007-01-01
We analyse here the definition of the gene in order to distinguish, on the basis of modern insight in molecular biology, what the gene is coding for, namely a specific polypeptide, and how its expression is realized and controlled. Before the coding role of the DNA was discovered, a gene was identified with a specific phenotypic trait, from Mendel through Morgan up to Benzer. Subsequently, however, molecular biologists ventured to define a gene at the level of the DNA sequence in terms of coding. As is becoming ever more evident, the relations between information stored at DNA level and functional products are very intricate, and the regulatory aspects are as important and essential as the information coding for products. This approach led, thus, to a conceptual hybrid that confused coding, regulation and functional aspects. In this essay, we develop a definition of the gene that once again starts from the functional aspect. A cellular function can be represented by a polypeptide or an RNA. In the case of the polypeptide, its biochemical identity is determined by the mRNA prior to translation, and that is where we locate the gene. The steps from specific, but possibly separated sequence fragments at DNA level to that final mRNA then can be analysed in terms of regulation. For that purpose, we coin the new term “genon”. In that manner, we can clearly separate product and regulative information while keeping the fundamental relation between coding and function without the need to introduce a conceptual hybrid. In mRNA, the program regulating the expression of a gene is superimposed onto and added to the coding sequence in cis - we call it the genon. The complementary external control of a given mRNA by trans-acting factors is incorporated in its transgenon. A consequence of this definition is that, in eukaryotes, the gene is, in most cases, not yet present at DNA level. Rather, it is assembled by RNA processing, including differential splicing, from various pieces, as steered by the genon. It emerges finally as an uninterrupted nucleic acid sequence at mRNA level just prior to translation, in faithful correspondence with the amino acid sequence to be produced as a polypeptide. After translation, the genon has fulfilled its role and expires. The distinction between the protein coding information as materialised in the final polypeptide and the processing information represented by the genon allows us to set up a new information theoretic scheme. The standard sequence information determined by the genetic code expresses the relation between coding sequence and product. Backward analysis asks from which coding region in the DNA a given polypeptide originates. The (more interesting) forward analysis asks in how many polypeptides of how many different types a given DNA segment is expressed. This concerns the control of the expression process for which we have introduced the genon concept. Thus, the information theoretic analysis can capture the complementary aspects of coding and regulation, of gene and genon. PMID:18087760
Oh, Chang Seok; Lee, Soong Deok; Kim, Yi-Suk; Shin, Dong Hoon
2015-01-01
Previous study showed that East Asian mtDNA haplogroups, especially those of Koreans, could be successfully assigned by the coupled use of analyses on coding region SNP markers and control region mutation motifs. In this study, we tried to see if the same triple multiplex analysis for coding regions SNPs could be also applicable to ancient samples from East Asia as the complementation for sequence analysis of mtDNA control region. By the study on Joseon skeleton samples, we know that mtDNA haplogroup determined by coding region SNP markers successfully falls within the same haplogroup that sequence analysis on control region can assign. Considering that ancient samples in previous studies make no small number of errors in control region mtDNA sequencing, coding region SNP analysis can be used as good complimentary to the conventional haplogroup determination, especially of archaeological human bone samples buried underground over long periods. PMID:26345190
Flexible and polarization-controllable diffusion metasurface with optical transparency
NASA Astrophysics Data System (ADS)
Zhuang, Yaqiang; Wang, Guangming; Liang, Jiangang; Cai, Tong; Guo, Wenlong; Zhang, Qingfeng
2017-11-01
In this paper, a novel coding metasurface is proposed to realize polarization-controllable diffusion scattering. The anisotropic Jerusalem-cross unit cell is employed as the basic coding element due to its polarization-dependent phase response. The isotropic random coding sequence is firstly designed to obtain diffusion scattering, and the anisotropic random coding sequence is subsequently realized by adding different periodic coding sequences to the original isotropic one along different directions. For demonstration, we designed and fabricated a flexible polarization-controllable diffusion metasurface (PCDM) with both chessboard diffusion and hedge diffusion under different polarizations. The specular scattering reduction performance of the anisotropic metasurface is better than the isotropic one because the scattered energies are redirected away from the specular reflection direction. For potential applications, the flexible PCDM wrapped around a cylinder structure is investigated and tested for polarization-controllable diffusion scattering. The numerical and experimental results coincide well, indicating anisotropic low scatterings with comparable performances. This paper provides an alternative approach for designing high-performance, flexible, low-scattering platforms.
Structure and stability of the ankyrin domain of the Drosophila Notch receptor.
Zweifel, Mark E; Leahy, Daniel J; Hughson, Frederick M; Barrick, Doug
2003-11-01
The Notch receptor contains a conserved ankyrin repeat domain that is required for Notch-mediated signal transduction. The ankyrin domain of Drosophila Notch contains six ankyrin sequence repeats previously identified as closely matching the ankyrin repeat consensus sequence, and a putative seventh C-terminal sequence repeat that exhibits lower similarity to the consensus sequence. To better understand the role of the Notch ankyrin domain in Notch-mediated signaling and to examine how structure is distributed among the seven ankyrin sequence repeats, we have determined the crystal structure of this domain to 2.0 angstroms resolution. The seventh, C-terminal, ankyrin sequence repeat adopts a regular ankyrin fold, but the first, N-terminal ankyrin repeat, which contains a 15-residue insertion, appears to be largely disordered. The structure reveals a substantial interface between ankyrin polypeptides, showing a high degree of shape and charge complementarity, which may be related to homotypic interactions suggested from indirect studies. However, the Notch ankyrin domain remains largely monomeric in solution, demonstrating that this interface alone is not sufficient to promote tight association. Using the structure, we have classified reported mutations within the Notch ankyrin domain that are known to disrupt signaling into those that affect buried residues and those restricted to surface residues. We show that the buried substitutions greatly decrease protein stability, whereas the surface substitutions have only a marginal affect on stability. The surface substitutions are thus likely to interfere with Notch signaling by disrupting specific Notch-effector interactions and map the sites of these interactions.
Architecture of a Fur Binding Site: a Comparative Analysis
Lavrrar, Jennifer L.; McIntosh, Mark A.
2003-01-01
Fur is an iron-binding transcriptional repressor that recognizes a 19-bp consensus site of the sequence 5′-GATAATGATAATCATTATC-3′. This site can be defined as three adjacent hexamers of the sequence 5′-GATAAT-3′, with the third being slightly imperfect (an F-F-F configuration), or as two hexamers in the forward orientation separated by one base pair from a third hexamer in the reverse orientation (an F-F-x-R configuration). Although Fur can bind synthetic DNA sequences containing the F-F-F arrangement, most natural binding sites are variations of the F-F-x-R arrangement. The studies presented here compared the ability of Fur to recognize synthetic DNA sequences containing two to four adjacent hexamers with binding to sequences containing variations of the F-F-x-R arrangement (including natural operator sequences from the entS and fepB promoter regions of Escherichia coli). Gel retardation assays showed that the F-F-x-R architecture was necessary for high-affinity Fur-DNA interactions and that contiguous hexamers were not recognized as effectively. In addition, the stoichiometry of Fur at each binding site was determined, showing that Fur interacted with its minimal 19-bp binding site as two overlapping dimers. These data confirm the proposed overlapping-dimer binding model, where the unit of interaction with a single Fur dimer is two inverted hexamers separated by a C:G base pair, with two overlapping units comprising the 19-bp consensus binding site required for the high-affinity interaction with two Fur dimers. PMID:12644489
Rombel, I T; McMorran, B J; Lamont, I L
1995-02-20
Many bacteria respond to a lack of iron in the environment by synthesizing siderophores, which act as iron-scavenging compounds. Fluorescent pseudomonads synthesize strain-specific but chemically related siderophores called pyoverdines or pseudobactins. We have investigated the mechanisms by which iron controls expression of genes involved in pyoverdine metabolism in Pseudomonas aeruginosa. Transcription of these genes is repressed by the presence of iron in the growth medium. Three promoters from these genes were cloned and the activities of the promoters were dependent on the amounts of iron in the growth media. Two of the promoters were sequenced and the transcriptional start site were identified by S1 nuclease analysis. Sequences similar to the consensus binding site for the Fur repressor protein, which controls expression of iron-repressible genes in several gram-negative species, were not present in the promoters, suggesting that they are unlikely to have a high affinity for Fur. However, comparison of the promoter sequences with those of iron-regulated genes from other Pseudomonas species and also the iron-regulated exotoxin gene of P. aeruginosa allowed identification of a shared sequence element, with the consensus sequence (G/C)CTAAAT-CCC, which is likely to act as a binding site for a transcriptional activator protein. Mutations in this sequence greatly reduced the activities of the promoters characterized here as well as those of other iron-regulated promoters. The requirement for this motif in the promoters of iron-regulated genes of different Pseudomonas species indicates that similar mechanisms are likely to be involved in controlling expression of a range of iron-regulated genes in pseudomonads.
Patricios, Jon S; Ardern, Clare L; Hislop, Michael David; Aubry, Mark; Bloomfield, Paul; Broderick, Carolyn; Clifton, Patrick; Echemendia, Ruben J; Ellenbogen, Richard G; Falvey, Éanna Cian; Fuller, Gordon Ward; Grand, Julie; Hack, Dallas; Harcourt, Peter Rex; Hughes, David; McGuirk, Nathan; Meeuwisse, Willem; Miller, Jeffrey; Parsons, John T; Richiger, Simona; Sills, Allen; Moran, Kevin B; Shute, Jenny; Raftery, Martin
2018-05-01
The 2017 Berlin Concussion in Sport Group Consensus Statement provides a global summary of best practice in concussion prevention, diagnosis and management, underpinned by systematic reviews and expert consensus. Due to their different settings and rules, individual sports need to adapt concussion guidelines according to their specific regulatory environment. At the same time, consistent application of the Berlin Consensus Statement's themes across sporting codes is likely to facilitate superior and uniform diagnosis and management, improve concussion education and highlight collaborative research opportunities. This document summarises the approaches discussed by medical representatives from the governing bodies of 10 different contact and collision sports in Dublin, Ireland in July 2017. Those sports are: American football, Australian football, basketball, cricket, equestrian sports, football/soccer, ice hockey, rugby league, rugby union and skiing. This document had been endorsed by 11 sport governing bodies/national federations at the time of being published. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Process of labeling specific chromosomes using recombinant repetitive DNA
Moyzis, R.K.; Meyne, J.
1988-02-12
Chromosome preferential nucleotide sequences are first determined from a library of recombinant DNA clones having families of repetitive sequences. Library clones are identified with a low homology with a sequence of repetitive DNA families to which the first clones respectively belong and variant sequences are then identified by selecting clones having a pattern of hybridization with genomic DNA dissimilar to the hybridization pattern shown by the respective families. In another embodiment, variant sequences are selected from a sequence of a known repetitive DNA family. The selected variant sequence is classified as chromosome specific, chromosome preferential, or chromosome nonspecific. Sequences which are classified as chromosome preferential are further sequenced and regions are identified having a low homology with other regions of the chromosome preferential sequence or with known sequences of other family members and consensus sequences of the repetitive DNA families for the chromosome preferential sequences. The selected low homology regions are then hybridized with chromosomes to determine those low homology regions hybridized with a specific chromosome under normal stringency conditions.
Coding visual features extracted from video sequences.
Baroffio, Luca; Cesana, Matteo; Redondi, Alessandro; Tagliasacchi, Marco; Tubaro, Stefano
2014-05-01
Visual features are successfully exploited in several applications (e.g., visual search, object recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis tasks require features to be transmitted over a bandwidth-limited network, thus calling for coding techniques to reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted from video sequences. To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-compress (ATC) paradigm in the context of visual sensor networks. That is, sets of visual features are extracted from video frames, encoded at remote nodes, and finally transmitted to a central controller that performs visual analysis. This is in contrast to the traditional compress-then-analyze (CTA) paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for further processing. In this paper, we compare these coding paradigms using metrics that are routinely adopted to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the proposed coding scheme, ATC outperforms CTA with respect to all evaluation metrics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ebert, D.
1997-07-01
This is a report on the CSNI Workshop on Transient Thermal-Hydraulic and Neutronic Codes Requirements held at Annapolis, Maryland, USA November 5-8, 1996. This experts` meeting consisted of 140 participants from 21 countries; 65 invited papers were presented. The meeting was divided into five areas: (1) current and prospective plans of thermal hydraulic codes development; (2) current and anticipated uses of thermal-hydraulic codes; (3) advances in modeling of thermal-hydraulic phenomena and associated additional experimental needs; (4) numerical methods in multi-phase flows; and (5) programming language, code architectures and user interfaces. The workshop consensus identified the following important action items tomore » be addressed by the international community in order to maintain and improve the calculational capability: (a) preserve current code expertise and institutional memory, (b) preserve the ability to use the existing investment in plant transient analysis codes, (c) maintain essential experimental capabilities, (d) develop advanced measurement capabilities to support future code validation work, (e) integrate existing analytical capabilities so as to improve performance and reduce operating costs, (f) exploit the proven advances in code architecture, numerics, graphical user interfaces, and modularization in order to improve code performance and scrutibility, and (g) more effectively utilize user experience in modifying and improving the codes.« less
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.
Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A
2010-02-01
Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.
Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer
Wojcik, Sylwia E.; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S.; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z.; Rai, Kanti R.; Kipps, Thomas J.; Keating, Michael J.
2010-01-01
Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas. PMID:19926640
Wang, Anqi; Wang, Zhanyu; Li, Zheng; Li, Lei M
2018-06-15
It is highly desirable to assemble genomes of high continuity and consistency at low cost. The current bottleneck of draft genome continuity using the second generation sequencing (SGS) reads is primarily caused by uncertainty among repetitive sequences. Even though the single-molecule real-time sequencing technology is very promising to overcome the uncertainty issue, its relatively high cost and error rate add burden on budget or computation. Many long-read assemblers take the overlap-layout-consensus (OLC) paradigm, which is less sensitive to sequencing errors, heterozygosity and variability of coverage. However, current assemblers of SGS data do not sufficiently take advantage of the OLC approach. Aiming at minimizing uncertainty, the proposed method BAUM, breaks the whole genome into regions by adaptive unique mapping; then the local OLC is used to assemble each region in parallel. BAUM can (i) perform reference-assisted assembly based on the genome of a close species (ii) or improve the results of existing assemblies that are obtained based on short or long sequencing reads. The tests on two eukaryote genomes, a wild rice Oryza longistaminata and a parrot Melopsittacus undulatus, show that BAUM achieved substantial improvement on genome size and continuity. Besides, BAUM reconstructed a considerable amount of repetitive regions that failed to be assembled by existing short read assemblers. We also propose statistical approaches to control the uncertainty in different steps of BAUM. http://www.zhanyuwang.xin/wordpress/index.php/2017/07/21/baum. Supplementary data are available at Bioinformatics online.
Sasazawa, Yukiko; Sato, Natsumi; Suzuki, Takehiro; Dohmae, Naoshi; Simizu, Siro
The thrombopoietin receptor, also known as c-Mpl, is a member of the cytokine superfamily, which regulates the differentiation of megakaryocytes and formation of platelets by binding to its ligand, thrombopoietin (TPO), through Janus kinase (JAK)-signal transducer and activator of transcription (STAT) signaling. The loss-of-function mutations of c-Mpl cause severe thrombocytopenia due to impaired megakaryocytopoiesis, and gain-of-function mutations cause thrombocythemia. c-Mpl contains two Trp-Ser-Xaa-Trp-Ser (Xaa represents any amino acids) sequences, which are characteristic sequences of type I cytokine receptors, corresponding to C-mannosylation consensus sequences: Trp-Xaa-Xaa-Trp/Cys. C-mannosylation is a post-translational modification of tryptophan residue in which one mannose is attached to the first tryptophan residue in the consensus sequence via C-C linkage. Although c-Mpl contains some C-mannosylation sequences, whether c-Mpl is C-mannosylated or not has been uninvestigated. We identified that c-Mpl is C-mannosylated not only at Trp(269) and Trp(474), which are putative C-mannosylation site, but also at Trp(272), Trp(416), and Trp(477). Using C-mannosylation defective mutant of c-Mpl, the C-mannosylated tryptophan residues at four sites (Trp(269), Trp(272), Trp(474), and Trp(477)) are essential for c-Mpl-mediated JAK-STAT signaling. Our findings suggested that C-mannosylation of c-Mpl is a possible therapeutic target for platelet disorders. Copyright © 2015 Elsevier Inc. All rights reserved.
Sequence and structural analyses of nuclear export signals in the NESdb database
Xu, Darui; Farmer, Alicia; Collett, Garen; Grishin, Nick V.; Chook, Yuh Min
2012-01-01
We compiled >200 nuclear export signal (NES)–containing CRM1 cargoes in a database named NESdb. We analyzed the sequences and three-dimensional structures of natural, experimentally identified NESs and of false-positive NESs that were generated from the database in order to identify properties that might distinguish the two groups of sequences. Analyses of amino acid frequencies, sequence logos, and agreement with existing NES consensus sequences revealed strong preferences for the Φ1-X3-Φ2-X2-Φ3-X-Φ4 pattern and for negatively charged amino acids in the nonhydrophobic positions of experimentally identified NESs but not of false positives. Strong preferences against certain hydrophobic amino acids in the hydrophobic positions were also revealed. These findings led to a new and more precise NES consensus. More important, three-dimensional structures are now available for 68 NESs within 56 different cargo proteins. Analyses of these structures showed that experimentally identified NESs are more likely than the false positives to adopt α-helical conformations that transition to loops at their C-termini and more likely to be surface accessible within their protein domains or be present in disordered or unobserved parts of the structures. Such distinguishing features for real NESs might be useful in future NES prediction efforts. Finally, we also tested CRM1-binding of 40 NESs that were found in the 56 structures. We found that 16 of the NES peptides did not bind CRM1, hence illustrating how NESs are easily misidentified. PMID:22833565
Screening for Protein-DNA Interactions by Automatable DNA-Protein Interaction ELISA
Schüssler, Axel; Kolukisaoglu, H. Üner; Koch, Grit; Wallmeroth, Niklas; Hecker, Andreas; Thurow, Kerstin; Zell, Andreas; Harter, Klaus; Wanke, Dierk
2013-01-01
DNA-binding proteins (DBPs), such as transcription factors, constitute about 10% of the protein-coding genes in eukaryotic genomes and play pivotal roles in the regulation of chromatin structure and gene expression by binding to short stretches of DNA. Despite their number and importance, only for a minor portion of DBPs the binding sequence had been disclosed. Methods that allow the de novo identification of DNA-binding motifs of known DBPs, such as protein binding microarray technology or SELEX, are not yet suited for high-throughput and automation. To close this gap, we report an automatable DNA-protein-interaction (DPI)-ELISA screen of an optimized double-stranded DNA (dsDNA) probe library that allows the high-throughput identification of hexanucleotide DNA-binding motifs. In contrast to other methods, this DPI-ELISA screen can be performed manually or with standard laboratory automation. Furthermore, output evaluation does not require extensive computational analysis to derive a binding consensus. We could show that the DPI-ELISA screen disclosed the full spectrum of binding preferences for a given DBP. As an example, AtWRKY11 was used to demonstrate that the automated DPI-ELISA screen revealed the entire range of in vitro binding preferences. In addition, protein extracts of AtbZIP63 and the DNA-binding domain of AtWRKY33 were analyzed, which led to a refinement of their known DNA-binding consensi. Finally, we performed a DPI-ELISA screen to disclose the DNA-binding consensus of a yet uncharacterized putative DBP, AtTIFY1. A palindromic TGATCA-consensus was uncovered and we could show that the GATC-core is compulsory for AtTIFY1 binding. This specific interaction between AtTIFY1 and its DNA-binding motif was confirmed by in vivo plant one-hybrid assays in protoplasts. Thus, the value and applicability of the DPI-ELISA screen for de novo binding site identification of DBPs, also under automatized conditions, is a promising approach for a deeper understanding of gene regulation in any organism of choice. PMID:24146751
Tramontano, A; Macchiato, M F
1986-01-01
An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761
Godet, Angélique N; Guergnon, Julien; Maire, Virginie; Croset, Amélie; Garcia, Alphonse
2010-04-01
Previous studies established that PP1 is a target for Bcl-2 proteins and an important regulator of apoptosis. The two distinct functional PP1 consensus docking motifs, R/Kx((0,1))V/IxF and FxxR/KxR/K, involved in PP1 binding and cell death were previously characterized in the BH1 and BH3 domains of some Bcl-2 proteins. In this study, we demonstrate that DPT-AIF(1), a peptide containing the AIF(562-571) sequence located in a c-terminal domain of AIF, is a new PP1 interacting and cell penetrating molecule. We also showed that DPT-AIF(1) provoked apoptosis in several human cell lines. Furthermore, DPT-APAF(1) a bi-partite cell penetrating peptide containing APAF-1(122-131), a non penetrating sequence from APAF-1 protein, linked to our previously described DPT-sh1 peptide shuttle, is also a PP1-interacting death molecule. Both AIF(562-571) and APAF-1(122-131) sequences contain a common R/Kx((0,1))V/IxFxxR/KxR/K motif, shared by several proteins involved in control of cell survival pathways. This motif combines the two distinct PP1c consensus docking motifs initially identified in some Bcl-2 proteins. Interestingly DPT-AIF(2) and DPT-APAF(2) that carry a F to A mutation within this combinatorial motif, no longer exhibited any PP1c binding or apoptotic effects. Moreover the F to A mutation in DPT-AIF(2) also suppressed cell penetration. These results indicate that the combinatorial PP1c docking motif R/Kx((0,1))V/IxFxxR/KxR/K, deduced from AIF(562-571) and APAF-1(122-131) sequences, is a new PP1c-dependent Apoptotic Signature. This motif is also a new tool for drug design that could be used to characterize potential anti-tumour molecules.
Hopple, J S; Vilgalys, R
1999-10-01
Phylogenetic relationships were investigated in the mushroom genus Coprinus based on sequence data from the nuclear encoded large-subunit rDNA gene. Forty-seven species of Coprinus and 19 additional species from the families Coprinaceae, Strophariaceae, Bolbitiaceae, Agaricaceae, Podaxaceae, and Montagneaceae were studied. A total of 1360 sites was sequenced across seven divergent domains and intervening sequences. A total of 302 phylogenetically informative characters was found. Ninety-eight percent of the average divergence between taxa was located within the divergent domains, with domains D2 and D8 being most divergent and domains D7 and D10 the least divergent. An empirical test of phylogenetic signal among divergent domains also showed that domains D2 and D3 had the lowest levels of homoplasy. Two equally most parsimonious trees were resolved using Wagner parsimony. A character-state weighted analysis produced 12 equally most parsimonious trees similar to those generated by Wagner parsimony. Phylogenetic analyses employing topological constraints suggest that none of the major taxonomic systems proposed for subgeneric classification is able to completely reflect phylogenetic relationships in Coprinus. A strict consensus integration of the two Wagner trees demonstrates the problematic nature of choosing outgroups within dark-spored mushrooms. The genus Coprinus is found to be polyphyletic and is separated into three distinct clades. Most Coprinus taxa belong to the first two clades, which together form a larger monophyletic group with Lacrymaria and Psathyrella in basal positions. A third clade contains members of Coprinus section Comati as well as the genus Leucocoprinus, Podaxis pistillaris, Montagnea arenaria, and Agaricus pocillator. This third clade is separated from the other species of Coprinus by members of the families Strophariaceae and Bolbitiaceae and the genus Panaeolus. Copyright 1999 Academic Press.
Cao, Jingyuan; Zhou, Wenting; Yi, Yao; Jia, Zhiyuan; Bi, Shengli
2013-01-01
Hepatitis A virus (HAV) is the most common cause of infectious hepatitis throughout the world, spread largely by the fecal-oral route. To characterize the genetic diversity of the virus circulating in China where HAV in endemic, we selected the outbreak cases with identical sequences in VP1-2A junction region and compiled a panel of 42 isolates. The VP3-VP1-2A regions of the HAV capsid-coding genes were further sequenced and analyzed. The quasispecies distribution was evaluated by cloning the VP3 and VP1-2A genes in three clinical samples. Phylogenetic analysis demonstrated that the same genotyping results could be obtained whether using the complete VP3, VP1, or partial VP1-2A genes for analysis in this study, although some differences did exist. Most isolates clustered in sub-genotype IA, and fewer in sub-genotype IB. No amino acid mutations were found at the published neutralizing epitope sites, however, several unique amino acid substitutions in the VP3 or VP1 region were identified, with two amino acid variants closely located to the immunodominant site. Quasispecies analysis showed the mutation frequencies were in the range of 7.22x10-4 -2.33x10-3 substitutions per nucleotide for VP3, VP1, or VP1-2A. When compared with the consensus sequences, mutated nucleotide sites represented the minority of all the analyzed sequences sites. HAV replicated as a complex distribution of closely genetically related variants referred to as quasispecies, and were under negative selection. The results indicate that diverse HAV strains and quasispecies inside the viral populations are presented in China, with unique amino acid substitutions detected close to the immunodominant site, and that the possibility of antigenic escaping mutants cannot be ruled out and needs to be further analyzed. PMID:24069343
Cloning and pharmacological characterization of the rabbit bradykinin B2 receptor.
Bachvarov, D R; Saint-Jacques, E; Larrivée, J F; Levesque, L; Rioux, F; Drapeau, G; Marceau, F
1995-12-01
Degenerate primers, corresponding to consensus sequences of third and sixth transmembrane domains of G protein-coupled receptor superfamily, were used for the polymerase chain reaction amplification and consecutive characterization of G protein-coupled receptors present in cultured rabbit aortic smooth muscle cells. One of the isolated resulting fragments was highly homologous to the corresponding region of the bradykinin (BK) B2 receptor cloned in other species. The polymerase chain reaction fragment was used to screen a rabbit genomic library, which allowed the identification of an intronless 1101-nucleotide open reading frame which codes for a 367-amino acid receptor protein. The rabbit B2 receptor sequence is more than 80% identical to the ones determined in three other species and retain putative glycosylation, palmitoylation and phosphorylation sites. In the rabbit genomic sequence, an acceptor splice sequence was found 8 base pairs upstream of the start codon. Northern blot analysis showed a high expression of a major transcript (4.2 kilobases) in the rabbit kidney and duodenum, and a less abundant expression in other tissues. Southern blot experiments suggest that a single copy of this gene exists in the rabbit genome. The cloned rabbit B2 receptor expressed in COS-1 cells binds [3H]BK in a saturable manner (KD 2.1 nM) and this ligand competes with a series of kinin agonists and antagonist with a rank order consistent with the B2 receptor identity. The insurmountable character of the antagonism exerted by Hoe 140 against BK on the rabbit B2 receptor, previously shown in pharmacological experiments, was confirmed in binding experiments with the cloned receptor expressed in a controlled manner. By contrast, Hoe 140 competed with [3H]BK in a surmountable manner for the human B2 receptor expressed in COS-1 cells. The cloning of the rabbit B2 receptor will be useful notably for the study of the structural basis of antagonist binding and for studies on receptor regulation in a relatively large animal.
Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E
2013-08-15
Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.
Recurrence time statistics: versatile tools for genomic DNA sequence analysis.
Cao, Yinhe; Tung, Wen-Wen; Gao, J B
2004-01-01
With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.