Campion, S R; Ameen, A S; Lai, L; King, J M; Munzenmaier, T N
2001-08-15
This report describes the application of a simple computational tool, AAPAIR.TAB, for the systematic analysis of the cysteine-rich EGF, Sushi, and Laminin motif/sequence families at the two-amino acid level. Automated dipeptide frequency/bias analysis detects preferences in the distribution of amino acids in established protein families, by determining which "ordered dipeptides" occur most frequently in comprehensive motif-specific sequence data sets. Graphic display of the dipeptide frequency/bias data revealed family-specific preferences for certain dipeptides, but more importantly detected a shared preference for employment of the ordered dipeptides Gly-Tyr (GY) and Gly-Phe (GF) in all three protein families. The dipeptide Asn-Gly (NG) also exhibited high-frequency and bias in the EGF and Sushi motif families, whereas Asn-Thr (NT) was distinguished in the Laminin family. Evaluation of the distribution of dipeptides identified by frequency/bias analysis subsequently revealed the highly restricted localization of the G(F/Y) and N(G/T) sequence elements at two separate sites of extreme conservation in the consensus sequence of all three sequence families. The similar employment of the high-frequency/bias dipeptides in three distinct protein sequence families was further correlated with the concurrence of these shared molecular determinants at similar positions within the distinctive scaffolds of three structurally divergent, but similarly employed, motif modules.
USDA-ARS?s Scientific Manuscript database
The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...
Onuț-Brännström, Ioana; Benjamin, Mitchell; Scofield, Douglas G; Heiðmarsson, Starri; Andersson, Martin G I; Lindström, Eva S; Johannesson, Hanna
2018-03-13
In this study, we explored the diversity of green algal symbionts (photobionts) in sympatric populations of the cosmopolitan lichen-forming fungi Thamnolia and Cetraria. We sequenced with both Sanger and Ion Torrent High-Throughput Sequencing technologies the photobiont ITS-region of 30 lichen thalli from two islands: Iceland and Öland. While Sanger recovered just one photobiont genotype from each thallus, the Ion Torrent data recovered 10-18 OTUs for each pool of 5 lichen thalli, suggesting that individual lichens can contain heterogeneous photobiont populations. Both methods showed evidence for photobiont sharing between Thamnolia and Cetraria on Iceland. In contrast, our data suggest that on Öland the two mycobionts associate with distinct photobiont communities, with few shared OTUs revealed by Ion Torrent sequencing. Furthermore, by comparing our sequences with public data, we identified closely related photobionts from geographically distant localities. Taken together, we suggest that the photobiont composition in Thamnolia and Cetraria results from both photobiont-mycobiont codispersal and local acquisition during mycobiont establishment and/or lichen growth. We hypothesize that this is a successful strategy for lichens to be flexible in the use of the most adapted photobiont for the environment.
Interactive web-based identification and visualization of transcript shared sequences.
Azhir, Alaleh; Merino, Louis-Henri; Nauen, David W
2018-05-12
We have developed TraC (Transcript Consensus), a web-based tool for detecting and visualizing shared sequences among two or more mRNA transcripts such as splice variants. Results including exon-exon boundaries are returned in a highly intuitive, data-rich, interactive plot that permits users to explore the similarities and differences of multiple transcript sequences. The online tool (http://labs.pathology.jhu.edu/nauen/trac/) is free to use. The source code is freely available for download (https://github.com/nauenlab/TraC). Copyright © 2018 Elsevier Inc. All rights reserved.
Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi
2014-09-18
Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
Ferreira-Paim, Kennio; Ferreira, Thatiana Bragine; Andrade-Silva, Leonardo; Mora, Delio Jose; Springer, Deborah J; Heitman, Joseph; Fonseca, Fernanda Machado; Matos, Dulcilena; Melhem, Márcia Souza Carvalho; Silva-Vergara, Mario León
2014-01-01
Although Cryptococcus laurentii has been considered saprophytic and its taxonomy is still being described, several cases of human infections have already reported. This study aimed to evaluate molecular aspects of C. laurentii isolates from Brazil, Botswana, Canada, and the United States. In this study, 100 phenotypically identified C. laurentii isolates were evaluated by sequencing the 18S nuclear ribosomal small subunit rRNA gene (18S-SSU), D1/D2 region of 28S nuclear ribosomal large subunit rRNA gene (28S-LSU), and the internal transcribed spacer (ITS) of the ribosomal region. BLAST searches using 550-bp, 650-bp, and 550-bp sequenced amplicons obtained from the 18S-SSU, 28S-LSU, and the ITS region led to the identification of 75 C. laurentii strains that shared 99-100% identity with C. laurentii CBS 139. A total of nine isolates shared 99% identity with both Bullera sp. VY-68 and C. laurentii RY1. One isolate shared 99% identity with Cryptococcus rajasthanensis CBS 10406, and eight isolates shared 100% identity with Cryptococcus sp. APSS 862 according to the 28S-LSU and ITS regions and designated as Cryptococcus aspenensis sp. nov. (CBS 13867). While 16 isolates shared 99% identity with Cryptococcus flavescens CBS 942 according to the 18S-SSU sequence, only six were confirmed using the 28S-LSU and ITS region sequences. The remaining 10 shared 99% identity with Cryptococcus terrestris CBS 10810, which was recently described in Brazil. Through concatenated sequence analyses, seven sequence types in C. laurentii, three in C. flavescens, one in C. terrestris, and one in the C. aspenensis sp. nov. were identified. Sequencing permitted the characterization of 75% of the environmental C. laurentii isolates from different geographical areas and the identification of seven haplotypes of this species. Among sequenced regions, the increased variability of the ITS region in comparison to the 18S-SSU and 28S-LSU regions reinforces its applicability as a DNA barcode.
Botero, Adriana; Kapeller, Irit; Cooper, Crystal; Clode, Peta L; Shlomai, Joseph; Thompson, R C Andrew
2018-05-17
Kinetoplast DNA (kDNA) is the mitochondrial genome of trypanosomatids. It consists of a few dozen maxicircles and several thousand minicircles, all catenated topologically to form a two-dimensional DNA network. Minicircles are heterogeneous in size and sequence among species. They present one or several conserved regions that contain three highly conserved sequence blocks. CSB-1 (10 bp sequence) and CSB-2 (8 bp sequence) present lower interspecies homology, while CSB-3 (12 bp sequence) or the Universal Minicircle Sequence is conserved within most trypanosomatids. The Universal Minicircle Sequence is located at the replication origin of the minicircles, and is the binding site for the UMS binding protein, a protein involved in trypanosomatid survival and virulence. Here, we describe the structure and organisation of the kDNA of Trypanosoma copemani, a parasite that has been shown to infect mammalian cells and has been associated with the drastic decline of the endangered Australian marsupial, the woylie (Bettongia penicillata). Deep genomic sequencing showed that T. copemani presents two classes of minicircles that share sequence identity and organisation in the conserved sequence blocks with those of Trypanosoma cruzi and Trypanosoma lewisi. A 19,257 bp partial region of the maxicircle of T. copemani that contained the entire coding region was obtained. Comparative analysis of the T. copemani entire maxicircle coding region with the coding regions of T. cruzi and T. lewisi showed they share 71.05% and 71.28% identity, respectively. The shared features in the maxicircle/minicircle organisation and sequence between T. copemani and T. cruzi/T. lewisi suggest similarities in their process of kDNA replication, and are of significance in understanding the evolution of Australian trypanosomes. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Complete Genome Sequences of Bacillus Phages Janet and OTooleKemple52
2018-01-01
ABSTRACT We report here the genome sequences of two novel Bacillus cereus group-infecting bacteriophages, Janet and OTooleKemple52. These bacteriophages are double-stranded DNA-containing Myoviridae isolated from soil samples. While their genomes share a high degree of sequence identity with one another, their host preferences are unique. PMID:29748396
Brody, Thomas; Yavatkar, Amarendra S; Kuzin, Alexander; Kundu, Mukta; Tyson, Leonard J; Ross, Jermaine; Lin, Tzu-Yang; Lee, Chi-Hon; Awasaki, Takeshi; Lee, Tzumin; Odenwald, Ward F
2012-01-01
Background: Phylogenetic footprinting has revealed that cis-regulatory enhancers consist of conserved DNA sequence clusters (CSCs). Currently, there is no systematic approach for enhancer discovery and analysis that takes full-advantage of the sequence information within enhancer CSCs. Results: We have generated a Drosophila genome-wide database of conserved DNA consisting of >100,000 CSCs derived from EvoPrints spanning over 90% of the genome. cis-Decoder database search and alignment algorithms enable the discovery of functionally related enhancers. The program first identifies conserved repeat elements within an input enhancer and then searches the database for CSCs that score highly against the input CSC. Scoring is based on shared repeats as well as uniquely shared matches, and includes measures of the balance of shared elements, a diagnostic that has proven to be useful in predicting cis-regulatory function. To demonstrate the utility of these tools, a temporally-restricted CNS neuroblast enhancer was used to identify other functionally related enhancers and analyze their structural organization. Conclusions: cis-Decoder reveals that co-regulating enhancers consist of combinations of overlapping shared sequence elements, providing insights into the mode of integration of multiple regulating transcription factors. The database and accompanying algorithms should prove useful in the discovery and analysis of enhancers involved in any developmental process. Developmental Dynamics 241:169–189, 2012. © 2011 Wiley Periodicals, Inc. Key findings A genome-wide catalog of Drosophila conserved DNA sequence clusters. cis-Decoder discovers functionally related enhancers. Functionally related enhancers share balanced sequence element copy numbers. Many enhancers function during multiple phases of development. PMID:22174086
Complete Genome Sequences of Bacillus Phages Janet and OTooleKemple52.
Kent, Brenna; Raymond, Thomas; Mosier, Philip D; Johnson, Allison A
2018-05-10
We report here the genome sequences of two novel Bacillus cereus group-infecting bacteriophages, Janet and OTooleKemple52. These bacteriophages are double-stranded DNA-containing Myoviridae isolated from soil samples. While their genomes share a high degree of sequence identity with one another, their host preferences are unique. Copyright © 2018 Kent et al.
Ferreira-Paim, Kennio; Ferreira, Thatiana Bragine; Andrade-Silva, Leonardo; Mora, Delio Jose; Springer, Deborah J.; Heitman, Joseph; Fonseca, Fernanda Machado; Matos, Dulcilena; Melhem, Márcia Souza Carvalho; Silva-Vergara, Mario León
2014-01-01
Background Although Cryptococcus laurentii has been considered saprophytic and its taxonomy is still being described, several cases of human infections have already reported. This study aimed to evaluate molecular aspects of C. laurentii isolates from Brazil, Botswana, Canada, and the United States. Methods In this study, 100 phenotypically identified C. laurentii isolates were evaluated by sequencing the 18S nuclear ribosomal small subunit rRNA gene (18S-SSU), D1/D2 region of 28S nuclear ribosomal large subunit rRNA gene (28S-LSU), and the internal transcribed spacer (ITS) of the ribosomal region. Results BLAST searches using 550-bp, 650-bp, and 550-bp sequenced amplicons obtained from the 18S-SSU, 28S-LSU, and the ITS region led to the identification of 75 C. laurentii strains that shared 99–100% identity with C. laurentii CBS 139. A total of nine isolates shared 99% identity with both Bullera sp. VY-68 and C. laurentii RY1. One isolate shared 99% identity with Cryptococcus rajasthanensis CBS 10406, and eight isolates shared 100% identity with Cryptococcus sp. APSS 862 according to the 28S-LSU and ITS regions and designated as Cryptococcus aspenensis sp. nov. (CBS 13867). While 16 isolates shared 99% identity with Cryptococcus flavescens CBS 942 according to the 18S-SSU sequence, only six were confirmed using the 28S-LSU and ITS region sequences. The remaining 10 shared 99% identity with Cryptococcus terrestris CBS 10810, which was recently described in Brazil. Through concatenated sequence analyses, seven sequence types in C. laurentii, three in C. flavescens, one in C. terrestris, and one in the C. aspenensis sp. nov. were identified. Conclusions Sequencing permitted the characterization of 75% of the environmental C. laurentii isolates from different geographical areas and the identification of seven haplotypes of this species. Among sequenced regions, the increased variability of the ITS region in comparison to the 18S-SSU and 28S-LSU regions reinforces its applicability as a DNA barcode. PMID:25251413
Christley, Scott; Scarborough, Walter; Salinas, Eddie; Rounds, William H; Toby, Inimary T; Fonner, John M; Levin, Mikhail K; Kim, Min; Mock, Stephen A; Jordan, Christopher; Ostmeyer, Jared; Buntzman, Adam; Rubelt, Florian; Davila, Marco L; Monson, Nancy L; Scheuermann, Richard H; Cowell, Lindsay G
2018-01-01
Recent technological advances in immune repertoire sequencing have created tremendous potential for advancing our understanding of adaptive immune response dynamics in various states of health and disease. Immune repertoire sequencing produces large, highly complex data sets, however, which require specialized methods and software tools for their effective analysis and interpretation. VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provide access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene segment assignment, repertoire characterization, and repertoire comparison. VDJServer also provides sophisticated visualizations for exploratory analysis. It is accessible through a standard web browser via a graphical user interface designed for use by immunologists, clinicians, and bioinformatics researchers. VDJServer provides a data commons for public sharing of repertoire sequencing data, as well as private sharing of data between users. We describe the main functionality and architecture of VDJServer and demonstrate its capabilities with use cases from cancer immunology and autoimmunity. VDJServer provides a complete analysis suite for human and mouse T-cell and B-cell receptor repertoire sequencing data. The combination of its user-friendly interface and high-performance computing allows large immune repertoire sequencing projects to be analyzed with no programming or software installation required. VDJServer is a web-accessible cloud platform that provides access through a graphical user interface to a data management infrastructure, a collection of analysis tools covering all steps in an analysis, and an infrastructure for sharing data along with workflows, results, and computational provenance. VDJServer is a free, publicly available, and open-source licensed resource.
CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU
Ma, Jianliang; Meng, Jinglei; Chen, Tianzhou; Wu, Minghui
2015-01-01
Ultra high thread-level parallelism in modern GPUs usually introduces numerous memory requests simultaneously. So there are always plenty of memory requests waiting at each bank of the shared LLC (L2 in this paper) and global memory. For global memory, various schedulers have already been developed to adjust the request sequence. But we find few work has ever focused on the service sequence on the shared LLC. We measured that a big number of GPU applications always queue at LLC bank for services, which provide opportunity to optimize the service order on LLC. Through adjusting the GPU memory request service order, we can improve the schedulability of SM. So we proposed a critical-aware shared LLC request scheduling algorithm (CaLRS) in this paper. The priority representative of memory request is critical for CaLRS. We use the number of memory requests that originate from the same warp but have not been serviced when they arrive at the shared LLC bank to represent the criticality of each warp. Experiments show that the proposed scheme can boost the SM schedulability effectively by promoting the scheduling priority of the memory requests with high criticality and improves the performance of GPU indirectly. PMID:25729772
Characterization of cDNAs and genomic DNAs for human threonyl- and cysteinyl-tRNA synthetases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cruzen, M.E.
1993-01-01
Techniques of molecular biology were used to clone, sequence and map two human aminoacyl-tRNA synthetase (aaRS) cDNAs: threonyl-tRNA synthetase (ThrRS) a class II enzyme and cysteinyl-tRNA synthetase (CysRS) a class I enzyme. The predicted protein sequence of human ThrRS is highly homologous to that of lower eukaryotic and prokaryotic ThRSs, particularly in the regions containing the three structural motifs common to all class II synthetases. Signature regions 1 and 2, which characterize the class IIa subgroup (SerRS, ThrRS and HisRS) are highly conserved from bacteria to human. Structural predictions for human ThrRS based on the known structure of the closelymore » related SerRS from E.coli implicate strongly conserved residues in the signature sequences to be important in substrate binding. The amino terminal 100 residues of the deduced amino acid sequence of ThrRS shares structural similarity to SerRS consistent with forming an antiparallel helix implicated in tRNA binding. The 5' untranslated sequence of the human ThrRS gene shares short stretches of common sequence with the gene for hamster HisRS including a binding site for the promoter specific transcription factor sp-1. The deduced amino acid sequence of human CysRS has a high degree of sequence identify to E. coli CysRS. Human CysRS possesses the classic characteristics of a class I synthetase and is most closely related to the MetRS subgroup. The amino terminal half of human CysRS can be modeled as a nucleotide binding fold and shares significant sequence and structural similarity to the other enzymes in this subgroup. The CysRS structural gene (CARS) was mapped to human chromosome 11p15.5 by fluorescent in situ hybridization. CARS is the first aaRS gene to be mapped to chromosome 11. The steady state of both CysRS and ThrRs mRNA were quantitated in several human tissues. Message levels for these enzymes appear to be subjected to differential regulation in different cell types.« less
Mosaic Graphs and Comparative Genomics in Phage Communities
Belcaid, Mahdi; Bergeron, Anne
2010-01-01
Abstract Comparing the genomes of two closely related viruses often produces mosaics where nearly identical sequences alternate with sequences that are unique to each genome. When several closely related genomes are compared, the unique sequences are likely to be shared with third genomes, leading to virus mosaic communities. Here we present comparative analysis of sets of Staphylococcus aureus phages that share large identical sequences with up to three other genomes, and with different partners along their genomes. We introduce mosaic graphs to represent these complex recombination events, and use them to illustrate the breath and depth of sequence sharing: some genomes are almost completely made up of shared sequences, while genomes that share very large identical sequences can adopt alternate functional modules. Mosaic graphs also allow us to identify breakpoints that could eventually be used for the construction of recombination networks. These findings have several implications on phage metagenomics assembly, on the horizontal gene transfer paradigm, and more generally on the understanding of the composition and evolutionary dynamics of virus communities. PMID:20874413
Identification of distant drug off-targets by direct superposition of binding pocket surfaces.
Schumann, Marcel; Armen, Roger S
2013-01-01
Correctly predicting off-targets for a given molecular structure, which would have the ability to bind a large range of ligands, is both particularly difficult and important if they share no significant sequence or fold similarity with the respective molecular target ("distant off-targets"). A novel approach for identification of off-targets by direct superposition of protein binding pocket surfaces is presented and applied to a set of well-studied and highly relevant drug targets, including representative kinases and nuclear hormone receptors. The entire Protein Data Bank is searched for similar binding pockets and convincing distant off-target candidates were identified that share no significant sequence or fold similarity with the respective target structure. These putative target off-target pairs are further supported by the existence of compounds that bind strongly to both with high topological similarity, and in some cases, literature examples of individual compounds that bind to both. Also, our results clearly show that it is possible for binding pockets to exhibit a striking surface similarity, while the respective off-target shares neither significant sequence nor significant fold similarity with the respective molecular target ("distant off-target").
Identification of Distant Drug Off-Targets by Direct Superposition of Binding Pocket Surfaces
Schumann, Marcel; Armen, Roger S.
2013-01-01
Correctly predicting off-targets for a given molecular structure, which would have the ability to bind a large range of ligands, is both particularly difficult and important if they share no significant sequence or fold similarity with the respective molecular target (“distant off-targets”). A novel approach for identification of off-targets by direct superposition of protein binding pocket surfaces is presented and applied to a set of well-studied and highly relevant drug targets, including representative kinases and nuclear hormone receptors. The entire Protein Data Bank is searched for similar binding pockets and convincing distant off-target candidates were identified that share no significant sequence or fold similarity with the respective target structure. These putative target off-target pairs are further supported by the existence of compounds that bind strongly to both with high topological similarity, and in some cases, literature examples of individual compounds that bind to both. Also, our results clearly show that it is possible for binding pockets to exhibit a striking surface similarity, while the respective off-target shares neither significant sequence nor significant fold similarity with the respective molecular target (“distant off-target”). PMID:24391782
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-05-01
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-01-01
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
Ravi, Anuradha; Avershina, Ekaterina; Angell, Inga Leena; Ludvigsen, Jane; Manohar, Prasanth; Padmanaban, Sumathi; Nachimuthu, Ramesh; Snipen, Lars; Rudi, Knut
2018-06-01
Use of the 16S rRNA gene in microbiota studies is limited by the lack of taxonomic and functional resolution. High resolution analyses are particularly important for understanding transmission and persistence of bacteria. The aim of our work was therefore to compare a novel reduced metagenome sequencing (RMS) approach with 16S rRNA gene sequencing to determine both the metagenome genetic diversity and the mother-to-child sharing of the microbiota in a cohort of 17 mother-child pairs. We found that although both approaches gave comparable results with respect to sample separation and taxonomy, RMS gave higher resolution and the potential for genomic-/functional assignment. Using RMS we estimated that the metagenome size increased from about 60 Mbp for 4-day-old children to about 225 Mbp for mothers. The 4-day-old children shared 7% of the metagenome sequences with the mothers, while the metagenome sequence sharing was >30% among the mothers. We found 15 genomes shared across >50% of the mothers, of which 10 belonged to Clostridia. Only Bacteroides showed a direct mother-child association, with B. vulgatus being abundant in both 4-day-old children and mothers. For the functional assignments, we identified a significant association between antibiotic usage during labor, and quantity of Fosfomycin resistance genes. In conclusion, our results show a higher functional and taxonomic resolution for RMS compared to 16S rRNA gene sequencing, where RMS enabled a detailed description of mother to child gut microbiota transmission - supporting a late recruitment of most gut bacteria and an effect of antibiotic treatment during labor on infant antibiotic resistance gene patterns. Copyright © 2018. Published by Elsevier B.V.
Inaugural Genomics Automation Congress and the coming deluge of sequencing data.
Creighton, Chad J
2010-10-01
Presentations at Select Biosciences's first 'Genomics Automation Congress' (Boston, MA, USA) in 2010 focused on next-generation sequencing and the platforms and methodology around them. The meeting provided an overview of sequencing technologies, both new and emerging. Speakers shared their recent work on applying sequencing to profile cells for various levels of biomolecular complexity, including DNA sequences, DNA copy, DNA methylation, mRNA and microRNA. With sequencing time and costs continuing to drop dramatically, a virtual explosion of very large sequencing datasets is at hand, which will probably present challenges and opportunities for high-level data analysis and interpretation, as well as for information technology infrastructure.
Budiman, Muhammad A.; Mao, Long; Wood, Todd C.; Wing, Rod A.
2000-01-01
Recently a new strategy using BAC end sequences as sequence-tagged connectors (STCs) was proposed for whole-genome sequencing projects. In this study, we present the construction and detailed characterization of a 15.0 haploid genome equivalent BAC library for the cultivated tomato, Lycopersicon esculentum cv. Heinz 1706. The library contains 129,024 clones with an average insert size of 117.5 kb and a chloroplast content of 1.11%. BAC end sequences from 1490 ends were generated and analyzed as a preliminary evaluation for using this library to develop an STC framework to sequence the tomato genome. A total of 1205 BAC end sequences (80.9%) were obtained, with an average length of 360 high-quality bases, and were searched against the GenBank database. Using a cutoff expectation value of <10−6, and combining the results from BLASTN, BLASTX, and TBLASTX searches, 24.3% of the BAC end sequences were similar to known sequences, of which almost half (48.7%) share sequence similarities to retrotransposons and 7% to known genes. Some of the transposable element sequences were the first reported in tomato, such as sequences similar to maize transposon Activator (Ac) ORF and tobacco pararetrovirus-like sequences. Interestingly, there were no BAC end sequences similar to the highly repeated TGRI and TGRII elements. However, the majority (70.3%) of STCs did not share significant sequence similarities to any sequences in GenBank at either the DNA or predicted protein levels, indicating that a large portion of the tomato genome is still unknown. Our data demonstrate that this BAC library is suitable for developing an STC database to sequence the tomato genome. The advantages of developing an STC framework for whole-genome sequencing of tomato are discussed. [The BAC end sequences described in this paper have been deposited in the GenBank data library under accession nos. AQ367111–AQ368361.] PMID:10645957
The promise and challenge of high-throughput sequencing of the antibody repertoire
Georgiou, George; Ippolito, Gregory C; Beausang, John; Busse, Christian E; Wardemann, Hedda; Quake, Stephen R
2014-01-01
Efforts to determine the antibody repertoire encoded by B cells in the blood or lymphoid organs using high-throughput DNA sequencing technologies have been advancing at an extremely rapid pace and are transforming our understanding of humoral immune responses. Information gained from high-throughput DNA sequencing of immunoglobulin genes (Ig-seq) can be applied to detect B-cell malignancies with high sensitivity, to discover antibodies specific for antigens of interest, to guide vaccine development and to understand autoimmunity. Rapid progress in the development of experimental protocols and informatics analysis tools is helping to reduce sequencing artifacts, to achieve more precise quantification of clonal diversity and to extract the most pertinent biological information. That said, broader application of Ig-seq, especially in clinical settings, will require the development of a standardized experimental design framework that will enable the sharing and meta-analysis of sequencing data generated by different laboratories. PMID:24441474
Tuo, D; Shen, W; Yan, P; Li, Ch; Gao, L; Li, X; Li, H; Zhou, P
2013-01-01
Papaya leaf distortion mosaic virus is highly destructive to commercial papaya production. Here, the complete genome sequence was determined for an isolate of papaya leaf distortion mosaic virus, designated PLDMV-DF, infecting the commercialized papaya ringspot virus (PRSV)-resistant transgenic papaya from China. Excluding the 3'-poly (A) tail, the sequence shares high sequence identity to several PLDMV isolates from Taiwan and Japan and is phylogenetically most closely related to the isolate from Japan. Infection of PLDMV-DF in transgenic PRSV-resistant papaya may indicate emergence of this disease in genetically engineered plants. The reported sequence for this isolate may help generate bi-transgenic papaya resistant to PRSV and PLDMV.
Quaranfil, Johnston Atoll, and Lake Chad viruses are novel members of the family Orthomyxoviridae.
Presti, Rachel M; Zhao, Guoyan; Beatty, Wandy L; Mihindukulasuriya, Kathie A; da Rosa, Amelia P A Travassos; Popov, Vsevolod L; Tesh, Robert B; Virgin, Herbert W; Wang, David
2009-11-01
Arboviral infections are an important cause of emerging infections due to the movements of humans, animals, and hematophagous arthropods. Quaranfil virus (QRFV) is an unclassified arbovirus originally isolated from children with mild febrile illness in Quaranfil, Egypt, in 1953. It has subsequently been isolated in multiple geographic areas from ticks and birds. We used high-throughput sequencing to classify QRFV as a novel orthomyxovirus. The genome of this virus is comprised of multiple RNA segments; five were completely sequenced. Proteins with limited amino acid similarity to conserved domains in polymerase (PA, PB1, and PB2) and hemagglutinin (HA) genes from known orthomyxoviruses were predicted to be present in four of the segments. The fifth sequenced segment shared no detectable similarity to any protein and is of uncertain function. The end-terminal sequences of QRFV are conserved between segments and are different from those of the known orthomyxovirus genera. QRFV is known to cross-react serologically with two other unclassified viruses, Johnston Atoll virus (JAV) and Lake Chad virus (LKCV). The complete open reading frames of PB1 and HA were sequenced for JAV, while a fragment of PB1 of LKCV was identified by mass sequencing. QRFV and JAV PB1 and HA shared 80% and 70% amino acid identity to each other, respectively; the LKCV PB1 fragment shared 83% amino acid identity with the corresponding region of QRFV PB1. Based on phylogenetic analyses, virion ultrastructural features, and the unique end-terminal sequences identified, we propose that QRFV, JAV, and LKCV comprise a novel genus of the family Orthomyxoviridae.
Quaranfil, Johnston Atoll, and Lake Chad Viruses Are Novel Members of the Family Orthomyxoviridae▿
Presti, Rachel M.; Zhao, Guoyan; Beatty, Wandy L.; Mihindukulasuriya, Kathie A.; Travassos da Rosa, Amelia P. A.; Popov, Vsevolod L.; Tesh, Robert B.; Virgin, Herbert W.; Wang, David
2009-01-01
Arboviral infections are an important cause of emerging infections due to the movements of humans, animals, and hematophagous arthropods. Quaranfil virus (QRFV) is an unclassified arbovirus originally isolated from children with mild febrile illness in Quaranfil, Egypt, in 1953. It has subsequently been isolated in multiple geographic areas from ticks and birds. We used high-throughput sequencing to classify QRFV as a novel orthomyxovirus. The genome of this virus is comprised of multiple RNA segments; five were completely sequenced. Proteins with limited amino acid similarity to conserved domains in polymerase (PA, PB1, and PB2) and hemagglutinin (HA) genes from known orthomyxoviruses were predicted to be present in four of the segments. The fifth sequenced segment shared no detectable similarity to any protein and is of uncertain function. The end-terminal sequences of QRFV are conserved between segments and are different from those of the known orthomyxovirus genera. QRFV is known to cross-react serologically with two other unclassified viruses, Johnston Atoll virus (JAV) and Lake Chad virus (LKCV). The complete open reading frames of PB1 and HA were sequenced for JAV, while a fragment of PB1 of LKCV was identified by mass sequencing. QRFV and JAV PB1 and HA shared 80% and 70% amino acid identity to each other, respectively; the LKCV PB1 fragment shared 83% amino acid identity with the corresponding region of QRFV PB1. Based on phylogenetic analyses, virion ultrastructural features, and the unique end-terminal sequences identified, we propose that QRFV, JAV, and LKCV comprise a novel genus of the family Orthomyxoviridae. PMID:19726499
Hu, Jun-Jie; Huang, Si; Wen, Tao; Esch, Gerald W; Liang, Yu; Li, Hong-Liang
2017-01-01
Sheep (Ovis aries) are intermediate hosts for at least six named species of Sarcocystis: S. tenella, S. arieticanis, S. gigantea, S. medusiformis, S. mihoensis, and S. microps. Here, only two species, S. tenella and S. arieticanis, were found in 79 of 86 sheep (91.9%) in Kunming, China, based on their morphological characteristics. Four genetic markers, i.e., 18S rRNA gene, 28S rRNA gene, mitochondrial cox1 gene, and ITS-1 region, were sequenced and characterized for the two species of Sarcocystis. Sequences of the three former markers for S. tenella shared high identities with those of S. capracanis in goats, i.e., 99.0%, 98.3%, and 93.6%, respectively; the same three marker sequences of S. arieticanis shared high identities with those of S. hircicanis in goats, i.e., 98.5%, 96.5%, and 92.5%, respectively. No sequences in GenBank were found to significantly resemble the ITS-1 regions of S. tenella and S. arieticanis. Identities of the four genetic markers for S. tenella and S. arieticanis were 96.3%, 95.4%, 82.5%, and 66.2%, respectively. © J.-J. Hu et al., published by EDP Sciences, 2017.
Busk, Peter Kamp; Lange, Lene
2013-06-01
Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.
Sherpas share genetic variations with Tibetans for high-altitude adaptation.
Bhandari, Sushil; Zhang, Xiaoming; Cui, Chaoying; Yangla; Liu, Lan; Ouzhuluobu; Baimakangzhuo; Gonggalanzi; Bai, Caijuan; Bianba; Peng, Yi; Zhang, Hui; Xiang, Kun; Shi, Hong; Liu, Shiming; Gengdeng; Wu, Tianyi; Qi, Xuebin; Su, Bing
2017-01-01
Sherpas, a highlander population living in Khumbu region of Nepal, are well known for their superior climbing ability in Himalayas. However, the genetic basis of their adaptation to high-altitude environments remains elusive. We collected DNA samples of 582 Sherpas from Nepal and Tibetan Autonomous Region of China, and we measured their hemoglobin levels and degrees of blood oxygen saturation. We genotyped 29 EPAS1 SNPs, two EGLN1 SNPs and the TED polymorphism (3.4 kb deletion) in Sherpas. We also performed genetic association analysis among these sequence variants with phenotypic data. We found similar allele frequencies on the tested 32 variants of these genes in Sherpas and Tibetans. Sherpa individuals carrying the derived alleles of EPAS1 (rs113305133, rs116611511 and rs12467821), EGLN1 (rs186996510 and rs12097901) and TED have lower hemoglobin levels when compared with those wild-type allele carriers. Most of the EPAS1 variants showing significant association with hemoglobin levels in Tibetans were replicated in Sherpas. The shared sequence variants and hemoglobin trait between Sherpas and Tibetans indicate a shared genetic basis for high-altitude adaptation, consistent with the proposal that Sherpas are in fact a recently derived population from Tibetans and they inherited adaptive variants for high-altitude adaptation from their Tibetan ancestors.
Karaboga, D; Aslan, S
2016-04-27
The great majority of biological sequences share significant similarity with other sequences as a result of evolutionary processes, and identifying these sequence similarities is one of the most challenging problems in bioinformatics. In this paper, we present a discrete artificial bee colony (ABC) algorithm, which is inspired by the intelligent foraging behavior of real honey bees, for the detection of highly conserved residue patterns or motifs within sequences. Experimental studies on three different data sets showed that the proposed discrete model, by adhering to the fundamental scheme of the ABC algorithm, produced competitive or better results than other metaheuristic motif discovery techniques.
Tharia, Hazel A; Shrive, Annette K; Mills, John D; Arme, Chris; Williams, Gwyn T; Greenhough, Trevor J
2002-02-22
The serum amyloid P component (SAP)-like pentraxin Limulus polyphemus SAP is a recently discovered, distinct pentraxin species, of known structure, which does not bind phosphocholine and whose N-terminal sequence has been shown to differ markedly from the highly conserved N terminus of all other known horseshoe crab pentraxins. The complete cDNA sequence of Limulus SAP, and the derived amino acid sequence, the first invertebrate SAP-like pentraxin sequence, have been determined. Two sequences were identified that differed only in the length of the 3' untranslated region. Limulus SAP is synthesised as a precursor protein of 234 amino acid residues, the first 17 residues encoding a signal peptide that is absent from the mature protein. Phylogenetic analysis clusters Limulus SAP pentraxin with the horseshoe crab C-reactive proteins (CRPs) rather than the mammalian SAPs, which are clustered with mammalian CRPs. The deduced amino acid sequence shares 22% identity with both human SAP and CRP, which are 51% identical, and 31-35% with horseshoe crab CRPs. These analyses indicate that gene duplication of CRP (or SAP), followed by sequence divergence and the evolution of CRP and/or SAP function, occurred independently along the chordate and arthropod evolutionary lines rather than in a common ancestor. They further indicate that the CRP/SAP gene duplication event in Limulus occurred before both the emergence of the Limulus CRP variants and the mammalian CRP/SAP gene duplication. Limulus SAP, which does not exhibit the CRP characteristic of calcium-dependent binding to phosphocholine, is established as a pentraxin species distinct from all other known horseshoe crab pentraxins that exist in many variant forms sharing a high level of sequence homology. Copyright 2002 Elsevier Science Ltd.
First description of Grapevine leafroll-associated virus 5 in Argentina and partial genome sequence.
Gómez Talquenca, Sebastián; Muñoz, Claudio; Grau, Oscar; Gracia, Olga
2009-02-01
An accession of Vitis vinifera cv. Red Globe from Argentina, was found to be infected with Grapevine leafroll-associated virus-5 by ELISA. It was partially sequenced, and three ORFs, corresponding to HSP70h, HSP90h, and CP, were found. This isolate shares a high aminoacid identity with the previously reported sequence of the virus, and identities between 80% and 90% with previously reported GLRaV-9 and GLRaV-4 isolates. The analysis of the sequence supports the clustering together with GLRaV-4 and GLRV-9 inside the Ampelovirus genus.
Khan, Arifa S; Vacante, Dominick A; Cassart, Jean-Pol; Ng, Siemon H S; Lambert, Christophe; Charlebois, Robert L; King, Kathryn E
Several nucleic-acid based technologies have recently emerged with capabilities for broad virus detection. One of these, high throughput sequencing, has the potential for novel virus detection because this method does not depend upon prior viral sequence knowledge. However, the use of high throughput sequencing for testing biologicals poses greater challenges as compared to other newly introduced tests due to its technical complexities and big data bioinformatics. Thus, the Advanced Virus Detection Technologies Users Group was formed as a joint effort by regulatory and industry scientists to facilitate discussions and provide a forum for sharing data and experiences using advanced new virus detection technologies, with a focus on high throughput sequencing technologies. The group was initiated as a task force that was coordinated by the Parenteral Drug Association and subsequently became the Advanced Virus Detection Technologies Interest Group to continue efforts for using new technologies for detection of adventitious viruses with broader participation, including international government agencies, academia, and technology service providers. © PDA, Inc. 2016.
Delgado-Gaytán, María F; Rosas-Rodríguez, Jesús A; Yepiz-Plascencia, Gloria; Figueroa-Soto, Ciria G; Valenzuela-Soto, Elisa M
2017-10-01
The enzyme betaine aldehyde dehydrogenase (BADH) catalyzes the irreversible oxidation of betaine aldehyde to glycine betaine (GB), a very efficient osmolyte accumulated during osmotic stress. In this study, we determined the nucleotide sequence of the cDNA for the BADH from the white shrimp Litopenaeus vannamei (LvBADH). The cDNA was 1882 bp long, with a complete open reading frame of 1524 bp, encoding 507 amino acids with a predicted molecular mass of 54.15 kDa and a pI of 5.4. The predicted LvBADH amino acid sequence shares a high degree of identity with marine invertebrate BADHs. Catalytic residues (C-298, E-264 and N-167) and the decapeptide VTLELGGKSP involved in nucleotide binding and highly conserved in BADHs were identified in the amino acid sequence. Phylogenetic analyses classified LvBADH in a clade that includes ALDH9 sequences from marine invertebrates. Molecular modeling of LvBADH revealed that the protein has amino acid residues and sequence motifs essential for the function of the ALDH9 family of enzymes. LvBADH modeling showed three potential monovalent cation binding sites, one site is located in an intra-subunit cavity; other in an inter-subunit cavity and a third in a central-cavity of the protein. The results show that LvBADH shares a high degree of identity with BADH sequences from marine invertebrates and enzymes that belong to the ALDH9 family. Our findings suggest that the LvBADH has molecular mechanisms of regulation similar to those of other BADHs belonging to the ALDH9 family, and that BADH might be playing a role in the osmoregulation capacity of L. vannamei. Copyright © 2017 Elsevier B.V. All rights reserved.
On Asymptotically Good Ramp Secret Sharing Schemes
NASA Astrophysics Data System (ADS)
Geil, Olav; Martin, Stefano; Martínez-Peñas, Umberto; Matsumoto, Ryutaroh; Ruano, Diego
Asymptotically good sequences of linear ramp secret sharing schemes have been intensively studied by Cramer et al. in terms of sequences of pairs of nested algebraic geometric codes. In those works the focus is on full privacy and full reconstruction. In this paper we analyze additional parameters describing the asymptotic behavior of partial information leakage and possibly also partial reconstruction giving a more complete picture of the access structure for sequences of linear ramp secret sharing schemes. Our study involves a detailed treatment of the (relative) generalized Hamming weights of the considered codes.
Setliff, Ian; McDonnell, Wyatt J; Raju, Nagarajan; Bombardi, Robin G; Murji, Amyn A; Scheepers, Cathrine; Ziki, Rutendo; Mynhardt, Charissa; Shepherd, Bryan E; Mamchak, Alusha A; Garrett, Nigel; Karim, Salim Abdool; Mallal, Simon A; Crowe, James E; Morris, Lynn; Georgiev, Ivelin S
2018-06-13
Characterization of single antibody lineages within infected individuals has provided insights into the development of Env-specific antibodies. However, a systems-level understanding of the humoral response against HIV-1 is limited. Here, we interrogated the antibody repertoires of multiple HIV-infected donors from an infection-naive state through acute and chronic infection using next-generation sequencing. This analysis revealed the existence of "public" antibody clonotypes that were shared among multiple HIV-infected individuals. The HIV-1 reactivity for representative antibodies from an identified public clonotype shared by three donors was confirmed. Furthermore, a meta-analysis of publicly available antibody repertoire sequencing datasets revealed antibodies with high sequence identity to known HIV-reactive antibodies, even in repertoires that were reported to be HIV naive. The discovery of public antibody clonotypes in HIV-infected individuals represents an avenue of significant potential for better understanding antibody responses to HIV-1 infection, as well as for clonotype-specific vaccine development. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
Donkey Orchid Symptomless Virus: A Viral ‘Platypus’ from Australian Terrestrial Orchids
Wylie, Stephen J.; Li, Hua; Jones, Michael G. K.
2013-01-01
Complete and partial genome sequences of two isolates of an unusual new plant virus, designated Donkey orchid symptomless virus (DOSV) were identified using a high-throughput sequencing approach. The virus was identified from asymptomatic plants of Australian terrestrial orchid Diuris longifolia (Common donkey orchid) growing in a remnant forest patch near Perth, western Australia. DOSV was identified from two D. longifolia plants of 264 tested, and from at least one plant of 129 Caladenia latifolia (pink fairy orchid) plants tested. Phylogenetic analysis of the genome revealed open reading frames (ORF) encoding seven putative proteins of apparently disparate origins. A 69-kDa protein (ORF1) that overlapped the replicase shared low identity with MPs of plant tymoviruses (Tymoviridae). A 157-kDa replicase (ORF2) and 22-kDa coat protein (ORF4) shared 32% and 40% amino acid identity, respectively, with homologous proteins encoded by members of the plant virus family Alphaflexiviridae. A 44-kDa protein (ORF3) shared low identity with myosin and an autophagy protein from Squirrelpox virus. A 27-kDa protein (ORF5) shared no identity with described proteins. A 14-kDa protein (ORF6) shared limited sequence identity (26%) over a limited region of the envelope glycoprotein precursor of mammal-infecting Crimea-Congo hemorrhagic fever virus (Bunyaviridae). The putative 25-kDa movement protein (MP) (ORF7) shared limited (27%) identity with 3A-like MPs of members of the plant-infecting Tombusviridae and Virgaviridae. Transmissibility was shown when DOSV systemically infected Nicotiana benthamiana plants. Structure and organization of the domains within the putative replicase of DOSV suggests a common evolutionary origin with ‘potexvirus-like’ replicases of viruses within the Alphaflexiviridae and Tymoviridae, and the CP appears to be ancestral to CPs of allexiviruses (Alphaflexiviridae). The MP shares an evolutionary history with MPs of dianthoviruses, but the other putative proteins are distant from plant viruses. DOSV is not readily classified in current lower order virus taxa. PMID:24223974
Comprehensive view of the population history of Arabia as inferred by mtDNA variation.
Černý, Viktor; Čížková, Martina; Poloni, Estella S; Al-Meeri, Ali; Mulligan, Connie J
2016-04-01
Genetic and archaeological research supports the theory that Arabia was the first region traversed by modern humans as they left Africa and dispersed throughout Eurasia. However, the role of Arabia from the initial migration out of Africa until more recent times is still unclear. We have generated 379 new hypervariable segment 1 (HVS-1) sequences from a range of geographic locations throughout Yemen. We compare these data to published HVS-1 sequences representing Arabia and neighboring regions to build a unique dataset of 186 populations and 14,290 sequences. We identify 4,563 haplotypes unevenly distributed across Arabia and neighboring regions. Arabia contains higher proportions of shared haplotypes than the regions with which it shares these haplotypes, suggesting high levels of migration through the region. Populations in Arabia show higher levels of population expansion than those in East Africa, but lower levels than the Near East, Middle East or India. Arabian populations also show very high levels of genetic variation that overlaps with variation from most other regions. We take a population genetics approach to provide a comprehensive view of the relationships of Arabian and neighboring populations. We show that Arabian populations share closest links to the Near East and North Africa, but have a more ancient origin with slower demographic growth and/or lower migration rates. Our conclusions are supported by phylogenetic studies but also suggest that recent migrations have erased signals of earlier events. © 2015 Wiley Periodicals, Inc.
HMM-ModE: implementation, benchmarking and validation with HMMER3
2014-01-01
Background HMM-ModE is a computational method that generates family specific profile HMMs using negative training sequences. The method optimizes the discrimination threshold using 10 fold cross validation and modifies the emission probabilities of profiles to reduce common fold based signals shared with other sub-families. The protocol depends on the program HMMER for HMM profile building and sequence database searching. The recent release of HMMER3 has improved database search speed by several orders of magnitude, allowing for the large scale deployment of the method in sequence annotation projects. We have rewritten our existing scripts both at the level of parsing the HMM profiles and modifying emission probabilities to upgrade HMM-ModE using HMMER3 that takes advantage of its probabilistic inference with high computational speed. The method is benchmarked and tested on GPCR dataset as an accurate and fast method for functional annotation. Results The implementation of this method, which now works with HMMER3, is benchmarked with the earlier version of HMMER, to show that the effect of local-local alignments is marked only in the case of profiles containing a large number of discontinuous match states. The method is tested on a gold standard set of families and we have reported a significant reduction in the number of false positive hits over the default HMM profiles. When implemented on GPCR sequences, the results showed an improvement in the accuracy of classification compared with other methods used to classify the familyat different levels of their classification hierarchy. Conclusions The present findings show that the new version of HMM-ModE is a highly specific method used to differentiate between fold (superfamily) and function (family) specific signals, which helps in the functional annotation of protein sequences. The use of modified profile HMMs of GPCR sequences provides a simple yet highly specific method for classification of the family, being able to predict the sub-family specific sequences with high accuracy even though sequences share common physicochemical characteristics between sub-families. PMID:25073805
Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences
2018-01-01
Prediction of taxonomy for marker gene sequences such as 16S ribosomal RNA (rRNA) is a fundamental task in microbiology. Most experimentally observed sequences are diverged from reference sequences of authoritatively named organisms, creating a challenge for prediction methods. I assessed the accuracy of several algorithms using cross-validation by identity, a new benchmark strategy which explicitly models the variation in distances between query sequences and the closest entry in a reference database. When the accuracy of genus predictions was averaged over a representative range of identities with the reference database (100%, 99%, 97%, 95% and 90%), all tested methods had ≤50% accuracy on the currently-popular V4 region of 16S rRNA. Accuracy was found to fall rapidly with identity; for example, better methods were found to have V4 genus prediction accuracy of ∼100% at 100% identity but ∼50% at 97% identity. The relationship between identity and taxonomy was quantified as the probability that a rank is the lowest shared by a pair of sequences with a given pair-wise identity. With the V4 region, 95% identity was found to be a twilight zone where taxonomy is highly ambiguous because the probabilities that the lowest shared rank between pairs of sequences is genus, family, order or class are approximately equal. PMID:29682424
Chen, Tsung-Chi; Li, Ju-Ting; Fan, Ya-Shu; Yeh, Yi-Chun; Yeh, Shyi-Dong; Kormelink, Richard
2013-06-01
Tomato yellow ring virus (TYRV), first isolated from tomato in Iran, was classified as a non-approved species of the genus Tospovirus based on the characterization of its genomic S RNA. In the current study, the complete sequences of the genomic L and M RNAs of TYRV were determined and analyzed. The L RNA has 8,877 nucleotides (nt) and codes in the viral complementary (vc) strand for the putative RNA-dependent RNA polymerase (RdRp) of 2,873 amino acids (aa) (331 kDa). The RdRp of TYRV shares the highest aa sequence identity (88.7 %) with that of Iris yellow spot virus (IYSV), and contains conserved motifs shared with those of the animal-infecting bunyaviruses. The M RNA contains 4,786 nt and codes in ambisense arrangement for the NSm protein of 308 aa (34.5 kDa) in viral sense, and the Gn/Gc glycoprotein precursor (GP) of 1,310 aa (128 kDa) in vc-sense. Phylogenetic analyses indicated that TYRV is closely clustered with IYSV and Polygonum ringspot virus (PolRSV). The NSm and GP of TYRV share the highest aa sequence identity with those of IYSV and PolRSV (89.9 and 80.2-86.5 %, respectively). Moreover, the GPs of TYRV, IYSV, and PolRSV share highly similar characteristics, among which an identical deduced N-terminal protease cleavage site that is distinct from all tospoviral GPs analyzed thus far. Taken together, the elucidation of the complete genome sequence and biological features of TYRV support a close ancestral relationship with IYSV and PolRSV.
Ying, Jianchao; Wang, Huifeng; Bao, Bokan; Zhang, Ying; Zhang, Jinfang; Zhang, Cheng; Li, Aifang; Lu, Junwan; Li, Peizhen; Ying, Jun; Liu, Qi; Xu, Teng; Yi, Huiguang; Li, Jinsong; Zhou, Li; Zhou, Tieli; Xu, Zuyuan; Ni, Liyan; Bao, Qiyu
2015-01-01
The homocysteine methyltransferase encoded by mmuM is widely distributed among microbial organisms. It is the key enzyme that catalyzes the last step in methionine biosynthesis and plays an important role in the metabolism process. It also enables the microbial organisms to tolerate high concentrations of selenium in the environment. In this research, 533 mmuM gene sequences covering 70 genera of the bacteria were selected from GenBank database. The distribution frequency of mmuM is different in the investigated genera of bacteria. The mapping results of 160 mmuM reference sequences showed that the mmuM genes were found in 7 species of pathogen genomes sequenced in this work. The polymerase chain reaction products of one mmuM genotype (NC_013951 as the reference) were sequenced and the sequencing results confirmed the mapping results. Furthermore, 144 representative sequences were chosen for phylogenetic analysis and some mmuM genes from totally different genera (such as the genes between Escherichia and Klebsiella and between Enterobacter and Kosakonia) shared closer phylogenetic relationship than those from the same genus. Comparative genomic analysis of the mmuM encoding regions on plasmids and bacterial chromosomes showed that pKF3-140 and pIP1206 plasmids shared a 21 kb homology region and a 4.9 kb fragment in this region was in fact originated from the Escherichia coli chromosome. These results further suggested that mmuM gene did go through the gene horizontal transfer among different species or genera of bacteria. High-throughput sequencing combined with comparative genomics analysis would explore distribution and dissemination of the mmuM gene among bacteria and its evolution at a molecular level.
Kim, Hoon; Zheng, Siyuan; Amini, Seyed S.; Virk, Selene M.; Mikkelsen, Tom; Brat, Daniel J.; Grimsby, Jonna; Sougnez, Carrie; Muller, Florian; Hu, Jian; Sloan, Andrew E.; Cohen, Mark L.; Van Meir, Erwin G.; Scarpace, Lisa; Laird, Peter W.; Weinstein, John N.; Lander, Eric S.; Gabriel, Stacey; Getz, Gad; Meyerson, Matthew; Chin, Lynda; Barnholtz-Sloan, Jill S.
2015-01-01
Glioblastoma (GBM) is a prototypical heterogeneous brain tumor refractory to conventional therapy. A small residual population of cells escapes surgery and chemoradiation, resulting in a typically fatal tumor recurrence ∼7 mo after diagnosis. Understanding the molecular architecture of this residual population is critical for the development of successful therapies. We used whole-genome sequencing and whole-exome sequencing of multiple sectors from primary and paired recurrent GBM tumors to reconstruct the genomic profile of residual, therapy resistant tumor initiating cells. We found that genetic alteration of the p53 pathway is a primary molecular event predictive of a high number of subclonal mutations in glioblastoma. The genomic road leading to recurrence is highly idiosyncratic but can be broadly classified into linear recurrences that share extensive genetic similarity with the primary tumor and can be directly traced to one of its specific sectors, and divergent recurrences that share few genetic alterations with the primary tumor and originate from cells that branched off early during tumorigenesis. Our study provides mechanistic insights into how genetic alterations in primary tumors impact the ensuing evolution of tumor cells and the emergence of subclonal heterogeneity. PMID:25650244
Do humans and nonhuman animals share the grouping principles of the Iambic - Trochaic Law?
de la Mora, Daniela M.; Nespor, Marina; Toro, Juan M.
2014-01-01
The Iambic-Trochaic Law describes humans’ tendency to form trochaic groups over sequences varying in pitch or intensity (i.e., the loudest or highest sound marks group beginnings), and iambic groups over sequences varying in duration (i.e., the longest sound marks group endings). The extent to which these perceptual biases are shared by humans and nonhuman animals is yet unclear. In Experiment 1, we trained rats to discriminate pitch-alternating sequences of tones from sequences randomly varying in pitch. In Experiment 2, rats were trained to discriminate duration-alternating sequences of tones from sequences randomly varying in duration. We found that nonhuman animals group as trochees sequences based on pitch variations, but they do not group as iambs sequences varying in duration. Importantly, humans grouped the same stimuli following the principles of the Iambic-Trochaic Law (Experiment 3). These results suggest an early emergence of the trochaic rhythmic grouping bias based on pitch, possibly relying on perceptual abilities shared by humans and other mammals as well, whereas the iambic rhythmic grouping bias based on duration might depend on language experience. PMID:22956287
Do humans and nonhuman animals share the grouping principles of the iambic-trochaic law?
de la Mora, Daniela M; Nespor, Marina; Toro, Juan M
2013-01-01
The iambic-trochaic law describes humans' tendency to form trochaic groups over sequences varying in pitch or intensity (i.e., the loudest or highest sounds mark group beginnings), and iambic groups over sequences varying in duration (i.e., the longest sounds mark group endings). The extent to which these perceptual biases are shared by humans and nonhuman animals is yet unclear. In Experiment 1, we trained rats to discriminate pitch-alternating sequences of tones from sequences randomly varying in pitch. In Experiment 2, rats were trained to discriminate duration-alternating sequences of tones from sequences randomly varying in duration. We found that nonhuman animals group sequences based on pitch variations as trochees, but they do not group sequences varying in duration as iambs. Importantly, humans grouped the same stimuli following the principles of the iambic-trochaic law (Exp. 3). These results suggest the early emergence of the trochaic rhythmic grouping bias based on pitch, possibly relying on perceptual abilities shared by humans and other mammals, whereas the iambic rhythmic grouping bias based on duration might depend on language experience.
The DNA Bank: High-Security Bank Accounts to Protect and Share Your Genetic Identity.
den Dunnen, Johan T
2015-07-01
With the cost of genome sequencing decreasing every day, DNA information has the potential of affecting the lives of everyone. Surprisingly, an individual has little knowledge about his own DNA information, can rarely access it, and has hardly any control over its use. This may result in preventable, life-threatening situations, and also significantly inhibits scientific progress. What we urgently need is a "DNA bank," a resource providing a secure personal account where, similar to a financial institution, you can store your DNA sequence. Using this private and secure DNA bank account, you govern your sequence-related business. For any genetic study performed, the data generated must be transferred (paid) to your DNA account. Using your account, you regulate access, knowing for what purpose (informed consent) and only for the genetic data you are willing to share. The DNA account ensures you are in the driver's seat, know what is known, and control what is happening with it. © 2015 WILEY PERIODICALS, INC.
Boehm; Gibson; Lubzens
2000-01-01
This study was initiated to search for species-specific and strain-specific satellite DNA sequences for which oligonucleotide primers could be designed to differentiate between various commercially important strains of the marine monogonont rotifers Brachionus rotundiformis and Brachionus plicatilis. Two unrelated, highly reiterated satellite sequences were cloned and characterized. The eight sequenced monomers from B. rotundiformis and six from B. plicatilis had low intrarepeat variability and were similar in their overall lengths, A + T compositions, and high degrees of repeated motif substructure. However, hybridizations to 19 representative strains, sequence characterizations, and GenBank searches indicated that these two satellites are morphotype-specific and population-specific, respectively, and share little homology to each other or to other characterized sequences in the database. Primer pairs designed for the B. rotundiformis satellite confirmed hybridization specificities on polymerase chain reaction and could serve as a useful molecular diagnostic tool to identify strains belonging to the SS morphotype, which are gaining widespread usage as first feeds for marine fish in commercial production.
Vis, D J; Lewin, J; Liao, R G; Mao, M; Andre, F; Ward, R L; Calvo, F; Teh, B T; Camargo, A A; Knoppers, B M; Sawyers, C L; Wessels, L F A; Lawler, M; Siu, L L; Voest, E
2017-05-01
While next generation sequencing has enhanced our understanding of the biological basis of malignancy, current knowledge on global practices for sequencing cancer samples is limited. To address this deficiency, we developed a survey to provide a snapshot of current sequencing activities globally, identify barriers to data sharing and use this information to develop sustainable solutions for the cancer research community. A multi-item survey was conducted assessing demographics, clinical data collection, genomic platforms, privacy/ethics concerns, funding sources and data sharing barriers for sequencing initiatives globally. Additionally, respondents were asked as to provide the primary intent of their initiative (clinical diagnostic, research or combination). Of 107 initiatives invited to participate, 59 responded (response rate = 55%). Whole exome sequencing (P = 0.03) and whole genome sequencing (P = 0.01) were utilized less frequently in clinical diagnostic than in research initiatives. Procedures to identify cancer-specific variants were heterogeneous, with bioinformatics pipelines employing different mutation calling/variant annotation algorithms. Measurement of treatment efficacy varied amongst initiatives, with time on treatment (57%) and RECIST (53%) being the most common; however, other parameters were also employed. Whilst 72% of initiatives indicated data sharing, its scope varied, with a number of restrictions in place (e.g. transfer of raw data). The largest perceived barriers to data harmonization were the lack of financial support (P < 0.01) and bioinformatics concerns (e.g. lack of interoperability) (P = 0.02). Capturing clinical data was more likely to be perceived as a barrier to data sharing by larger initiatives than by smaller initiatives (P = 0.01). These results identify the main barriers, as perceived by the cancer sequencing community, to effective sharing of cancer genomic and clinical data. They highlight the need for greater harmonization of technical, ethical and data capture processes in cancer sample sequencing worldwide, in order to support effective and responsible data sharing for the benefit of patients. © The Author 2017. Published by Oxford University Press on behalf of the European Society for Medical Oncology.
Tattiyapong, Muncharee; Sivakumar, Thillaiampalam; Takemae, Hitoshi; Simking, Pacharathon; Jittapalapong, Sathaporn; Igarashi, Ikuo; Yokoyama, Naoaki
2016-07-01
Babesia bovis, an intraerythrocytic protozoan parasite, causes severe clinical disease in cattle worldwide. The genetic diversity of parasite antigens often results in different immune profiles in infected animals, hindering efforts to develop immune control methodologies against the B. bovis infection. In this study, we analyzed the genetic diversity of the merozoite surface antigen-1 (msa-1) gene using 162 B. bovis-positive blood DNA samples sourced from cattle populations reared in different geographical regions of Thailand. The identity scores shared among 93 msa-1 gene sequences isolated by PCR amplification were 43.5-100%, and the similarity values among the translated amino acid sequences were 42.8-100%. Of 23 total clades detected in our phylogenetic analysis, Thai msa-1 gene sequences occurred in 18 clades; seven among them were composed of sequences exclusively from Thailand. To investigate differential antigenicity of isolated MSA-1 proteins, we expressed and purified eight recombinant MSA-1 (rMSA-1) proteins, including an rMSA-1 from B. bovis Texas (T2Bo) strain and seven rMSA-1 proteins based on the Thai msa-1 sequences. When these antigens were analyzed in a western blot assay, anti-T2Bo cattle serum strongly reacted with the rMSA-1 from T2Bo, as well as with three other rMSA-1 proteins that shared 54.9-68.4% sequence similarity with T2Bo MSA-1. In contrast, no or weak reactivity was observed for the remaining rMSA-1 proteins, which shared low sequence similarity (35.0-39.7%) with T2Bo MSA-1. While demonstrating the high genetic diversity of the B. bovis msa-1 gene in Thailand, the present findings suggest that the genetic diversity results in antigenicity variations among the MSA-1 antigens of B. bovis in Thailand. Copyright © 2016 Elsevier B.V. All rights reserved.
Ogembo, Javier Gordon; Caoili, Barbara L; Shikata, Masamitsu; Chaeychomsri, Sudawan; Kobayashi, Michihiro; Ikeda, Motoko
2009-10-01
A newly cloned Helicoverpa armigera nucleopolyhedrovirus (HearNPV) from Kenya, HearNPV-NNg1, has a higher insecticidal activity than HearNPV-G4, which also exhibits lower insecticidal activity than HearNPV-C1. In the search for genes and/or nucleotide sequences that might be involved in the observed virulence differences among Helicoverpa spp. NPVs, the entire genome of NNg1 was sequenced and compared with previously sequenced genomes of G4, C1 and Helicoverpa zea single-nucleocapsid NPV (Hz). The NNg1 genome was 132,425 bp in length, with a total of 143 putative open reading frames (ORFs), and shared high levels of overall amino acid and nucleotide sequence identities with G4, C1 and Hz. Three NNg1 ORFs, ORF5, ORF100 and ORF124, which were shared with C1, were absent in G4 and Hz, while NNg1 and C1 were missing a homologue of G4/Hz ORF5. Another three ORFs, ORF60 (bro-b), ORF119 and ORF120, and one direct repeat sequence (dr) were unique to NNg1. Relative to the overall nucleotide sequence identity, lower sequence identities were observed between NNg1 hrs and the homologous hrs in the other three Helicoverpa spp. NPVs, despite containing the same number of hrs located at essentially the same positions on the genomes. Differences were also observed between NNg1 and each of the other three Helicoverpa spp. NPVs in the diversity of bro genes encoded on the genomes. These results indicate several putative genes and nucleotide sequences that may be responsible for the virulence differences observed among Helicoverpa spp., yet the specific genes and/or nucleotide sequences responsible have not been identified.
Selinger, David A.; Chandler, Vicki L.
2001-01-01
The maize (Zea mays) b1 gene encodes a transcription factor that regulates the anthocyanin pigment pathway. Of the b1 alleles with distinct tissue-specific expression, B-Peru and B-Bolivia are the only alleles that confer seed pigmentation. B-Bolivia produces variable and weaker seed expression but darker, more regular plant expression relative to B-Peru. Our experiments demonstrated that B-Bolivia is not expressed in the seed when transmitted through the male. When transmitted through the female the proportion of kernels pigmented and the intensity of pigment varied. Molecular characterization of B-Bolivia demonstrated that it shares the first 530 bp of the upstream region with B-Peru, a region sufficient for seed expression. Immediately upstream of 530 bp, B-Bolivia is completely divergent from B-Peru. These sequences share sequence similarity to retrotransposons. Transient expression assays of various promoter constructs identified a 33-bp region in B-Bolivia that can account for the reduced aleurone pigment amounts (40%) observed with B-Bolivia relative to B-Peru. Transgenic plants carrying the B-Bolivia promoter proximal region produced pigmented seeds. Similar to native B-Bolivia, some transgene loci are variably expressed in seeds. In contrast to native B-Bolivia, the transgene loci are expressed in seeds when transmitted through both the male and female. Some transgenic lines produced pigment in vegetative tissues, but the tissue-specificity was different from B-Bolivia, suggesting the introduced sequences do not contain the B-Bolivia plant-specific regulatory sequences. We hypothesize that the chromatin context of the B-Bolivia allele controls its epigenetic seed expression properties, which could be influenced by the adjacent highly repeated retrotransposon sequence. PMID:11244116
Insights from Human/Mouse genome comparisons
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pennacchio, Len A.
2003-03-30
Large-scale public genomic sequencing efforts have provided a wealth of vertebrate sequence data poised to provide insights into mammalian biology. These include deep genomic sequence coverage of human, mouse, rat, zebrafish, and two pufferfish (Fugu rubripes and Tetraodon nigroviridis) (Aparicio et al. 2002; Lander et al. 2001; Venter et al. 2001; Waterston et al. 2002). In addition, a high-priority has been placed on determining the genomic sequence of chimpanzee, dog, cow, frog, and chicken (Boguski 2002). While only recently available, whole genome sequence data have provided the unique opportunity to globally compare complete genome contents. Furthermore, the shared evolutionary ancestrymore » of vertebrate species has allowed the development of comparative genomic approaches to identify ancient conserved sequences with functionality. Accordingly, this review focuses on the initial comparison of available mammalian genomes and describes various insights derived from such analysis.« less
Zhu, Yinzhou; Pirnie, Stephan P; Carmichael, Gordon G
2017-08-01
Ribose methylation (2'- O -methylation, 2'- O Me) occurs at high frequencies in rRNAs and other small RNAs and is carried out using a shared mechanism across eukaryotes and archaea. As RNA modifications are important for ribosome maturation, and alterations in these modifications are associated with cellular defects and diseases, it is important to characterize the landscape of 2'- O -methylation. Here we report the development of a highly sensitive and accurate method for ribose methylation detection using next-generation sequencing. A key feature of this method is the generation of RNA fragments with random 3'-ends, followed by periodate oxidation of all molecules terminating in 2',3'-OH groups. This allows only RNAs harboring 2'-OMe groups at their 3'-ends to be sequenced. Although currently requiring microgram amounts of starting material, this method is robust for the analysis of rRNAs even at low sequencing depth. © 2017 Zhu et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Gascoyne-Binzi, D M; Heritage, J; Hawkey, P M
1993-11-01
High-level tetracycline-resistant Neisseria gonorrhoeae (TRNG) has been associated with the presence of a plasmid approximately 25.2 MDa in size which carries a Tet M tetracycline resistance determinant. Two different plasmid types, American and Dutch, have previously been described, based on the restriction endonuclease digestion pattern. In this study, the tet(M) genes from the two plasmid types have been amplified by the polymerase chain reaction (PCR) and then sequenced. The gene sequences from the two plasmids shared 96.8% identity, and showed similarities with different segments of the tet(M) gene sequences from Tn1545, Tn916 and Ureaplasma urealyticum. The data suggest that it is highly likely that the Tet M determinant found in the American type plasmid has a different origin from that present in the Dutch plasmid.
Sequence determination and analysis of the NSs genes of two tospoviruses.
Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O
2012-03-01
The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.
Networking Biology: The Origins of Sequence-Sharing Practices in Genomics.
Stevens, Hallam
2015-10-01
The wide sharing of biological data, especially nucleotide sequences, is now considered to be a key feature of genomics. Historians and sociologists have attempted to account for the rise of this sharing by pointing to precedents in model organism communities and in natural history. This article supplements these approaches by examining the role that electronic networking technologies played in generating the specific forms of sharing that emerged in genomics. The links between early computer users at the Stanford Artificial Intelligence Laboratory in the 1960s, biologists using local computer networks in the 1970s, and GenBank in the 1980s, show how networking technologies carried particular practices of communication, circulation, and data distribution from computing into biology. In particular, networking practices helped to transform sequences themselves into objects that had value as a community resource.
Nomiyama, H; Kuhara, S; Kukita, T; Otsuka, T; Sakaki, Y
1981-01-01
The 26S ribosomal RNA gene of Physarum polycephalum is interrupted by two introns, and we have previously determined the sequence of one of them (intron 1) (Nomiyama et al. Proc.Natl.Acad.Sci.USA 78, 1376-1380, 1981). In this study we sequenced the second intron (intron 2) of about 0.5 kb length and its flanking regions, and found that one nucleotide at each junction is identical in intron 1 and intron 2, though the junction regions share no other sequence homology. Comparison of the flanking exon sequences to E. coli 23S rRNA sequences shows that conserved sequences are interspersed with tracts having little homology. In particular, the region encompassing the intron 2 interruption site is highly conserved. The E. coli ribosomal protein L1 binding region is also conserved. Images PMID:6171776
Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun
2012-01-01
Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.
Kim, Hoon; Zheng, Siyuan; Amini, Seyed S; Virk, Selene M; Mikkelsen, Tom; Brat, Daniel J; Grimsby, Jonna; Sougnez, Carrie; Muller, Florian; Hu, Jian; Sloan, Andrew E; Cohen, Mark L; Van Meir, Erwin G; Scarpace, Lisa; Laird, Peter W; Weinstein, John N; Lander, Eric S; Gabriel, Stacey; Getz, Gad; Meyerson, Matthew; Chin, Lynda; Barnholtz-Sloan, Jill S; Verhaak, Roel G W
2015-03-01
Glioblastoma (GBM) is a prototypical heterogeneous brain tumor refractory to conventional therapy. A small residual population of cells escapes surgery and chemoradiation, resulting in a typically fatal tumor recurrence ∼ 7 mo after diagnosis. Understanding the molecular architecture of this residual population is critical for the development of successful therapies. We used whole-genome sequencing and whole-exome sequencing of multiple sectors from primary and paired recurrent GBM tumors to reconstruct the genomic profile of residual, therapy resistant tumor initiating cells. We found that genetic alteration of the p53 pathway is a primary molecular event predictive of a high number of subclonal mutations in glioblastoma. The genomic road leading to recurrence is highly idiosyncratic but can be broadly classified into linear recurrences that share extensive genetic similarity with the primary tumor and can be directly traced to one of its specific sectors, and divergent recurrences that share few genetic alterations with the primary tumor and originate from cells that branched off early during tumorigenesis. Our study provides mechanistic insights into how genetic alterations in primary tumors impact the ensuing evolution of tumor cells and the emergence of subclonal heterogeneity. © 2015 Kim et al.; Published by Cold Spring Harbor Laboratory Press.
Peng, Chuanhua; Wang, Xiaoping; Li, Fei; Lin, Yongjun
2012-01-01
The rice stem borer, Chilo suppressalis (Walker) (Lepidoptera: Pyralidae), is one of the most detrimental pests affecting rice crops. The use of Bacillus thuringiensis (Bt) toxins has been explored as a means to control this pest, but the potential for C. suppressalis to develop resistance to Bt toxins makes this approach problematic. Few C. suppressalis gene sequences are known, which makes in-depth study of gene function difficult. Herein, we sequenced the midgut transcriptome of the rice stem borer. In total, 37,040 contigs were obtained, with a mean size of 497 bp. As expected, the transcripts of C. suppressalis shared high similarity with arthropod genes. Gene ontology and KEGG analysis were used to classify the gene functions in C. suppressalis. Using the midgut transcriptome data, we conducted a proteome analysis to identify proteins expressed abundantly in the brush border membrane vesicles (BBMV). Of the 100 top abundant proteins that were excised and subjected to mass spectrometry analysis, 74 share high similarity with known proteins. Among these proteins, Western blot analysis showed that Aminopeptidase N and EH domain-containing protein have the binding activities with Bt-toxin Cry1Ac. These data provide invaluable information about the gene sequences of C. suppressalis and the proteins that bind with Cry1Ac. PMID:22666467
A gyrovirus infecting a sea bird
Li, Linlin; Pesavento, Patricia A.; Gaynor, Anne M.; Duerr, Rebecca S.; Phan, Tung Gia; Zhang, Wen; Deng, Xutao
2015-01-01
We characterized the genome of a highly divergent gyrovirus (GyV8) in the spleen and uropygial gland tissues of a diseased northern fulmar (Fulmarus glacialis), a pelagic bird beached in San Francisco, California. No other exogenous viral sequences could be identified using viral metagenomics. The small circular DNA genome shared no significant nucleotide sequence identity, and only 38–42 % amino acid sequence identity in VP1, with any of the previously identified gyroviruses. GyV8 is the first member of the third major phylogenetic clade of this viral genus and the first gyrovirus detected in an avian species other than chicken. PMID:26036564
Bastien, Olivier; Maréchal, Eric
2008-08-07
Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate. Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.
eShadow: A tool for comparing closely related sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ovcharenko, Ivan; Boffelli, Dario; Loots, Gabriela G.
2004-01-15
Primate sequence comparisons are difficult to interpret due to the high degree of sequence similarity shared between such closely related species. Recently, a novel method, phylogenetic shadowing, has been pioneered for predicting functional elements in the human genome through the analysis of multiple primate sequence alignments. We have expanded this theoretical approach to create a computational tool, eShadow, for the identification of elements under selective pressure in multiple sequence alignments of closely related genomes, such as in comparisons of human to primate or mouse to rat DNA. This tool integrates two different statistical methods and allows for the dynamic visualizationmore » of the resulting conservation profile. eShadow also includes a versatile optimization module capable of training the underlying Hidden Markov Model to differentially predict functional sequences. This module grants the tool high flexibility in the analysis of multiple sequence alignments and in comparing sequences with different divergence rates. Here, we describe the eShadow comparative tool and its potential uses for analyzing both multiple nucleotide and protein alignments to predict putative functional elements. The eShadow tool is publicly available at http://eshadow.dcode.org/« less
Gibbs, Mark J; Armstrong, John S; Gibbs, Adrian J
2005-01-01
Background Most current DNA diagnostic tests for identifying organisms use specific oligonucleotide probes that are complementary in sequence to, and hence only hybridise with the DNA of one target species. By contrast, in traditional taxonomy, specimens are usually identified by 'dichotomous keys' that use combinations of characters shared by different members of the target set. Using one specific character for each target is the least efficient strategy for identification. Using combinations of shared bisectionally-distributed characters is much more efficient, and this strategy is most efficient when they separate the targets in a progressively binary way. Results We have developed a practical method for finding minimal sets of sub-sequences that identify individual sequences, and could be targeted by combinations of probes, so that the efficient strategy of traditional taxonomic identification could be used in DNA diagnosis. The sizes of minimal sub-sequence sets depended mostly on sequence diversity and sub-sequence length and interactions between these parameters. We found that 201 distinct cytochrome oxidase subunit-1 (CO1) genes from moths (Lepidoptera) were distinguished using only 15 sub-sequences 20 nucleotides long, whereas only 8–10 sub-sequences 6–10 nucleotides long were required to distinguish the CO1 genes of 92 species from the 9 largest orders of insects. Conclusion The presence/absence of sub-sequences in a set of gene sequences can be used like the questions in a traditional dichotomous taxonomic key; hybridisation probes complementary to such sub-sequences should provide a very efficient means for identifying individual species, subtypes or genotypes. Sequence diversity and sub-sequence length are the major factors that determine the numbers of distinguishing sub-sequences in any set of sequences. PMID:15817134
Postel, Alexander; Schmeiser, Stefanie; Zimmermann, Bernd; Becher, Paul
2016-01-01
Molecular epidemiology has become an indispensable tool in the diagnosis of diseases and in tracing the infection routes of pathogens. Due to advances in conventional sequencing and the development of high throughput technologies, the field of sequence determination is in the process of being revolutionized. Platforms for sharing sequence information and providing standardized tools for phylogenetic analyses are becoming increasingly important. The database (DB) of the European Union (EU) and World Organisation for Animal Health (OIE) Reference Laboratory for classical swine fever offers one of the world’s largest semi-public virus-specific sequence collections combined with a module for phylogenetic analysis. The classical swine fever (CSF) DB (CSF-DB) became a valuable tool for supporting diagnosis and epidemiological investigations of this highly contagious disease in pigs with high socio-economic impacts worldwide. The DB has been re-designed and now allows for the storage and analysis of traditionally used, well established genomic regions and of larger genomic regions including complete viral genomes. We present an application example for the analysis of highly similar viral sequences obtained in an endemic disease situation and introduce the new geographic “CSF Maps” tool. The concept of this standardized and easy-to-use DB with an integrated genetic typing module is suited to serve as a blueprint for similar platforms for other human or animal viruses. PMID:27827988
ACLAME: a CLAssification of Mobile genetic Elements, update 2010.
Leplae, Raphaël; Lima-Mendez, Gipsi; Toussaint, Ariane
2010-01-01
The ACLAME database is dedicated to the collection, analysis and classification of sequenced mobile genetic elements (MGEs, in particular phages and plasmids). In addition to providing information on the MGEs content, classifications are available at various levels of organization. At the gene/protein level, families group similar sequences that are expected to share the same function. Families of four or more proteins are manually assigned with a functional annotation using the GeneOntology and the locally developed ontology MeGO dedicated to MGEs. At the genome level, evolutionary cohesive modules group sets of protein families shared among MGEs. At the population level, networks display the reticulate evolutionary relationships among MGEs. To increase the coverage of the phage sequence space, ACLAME version 0.4 incorporates 760 high-quality predicted prophages selected from the Prophinder database. Most of the data can be downloaded from the freely accessible ACLAME web site (http://aclame.ulb.ac.be). The BLAST interface for querying the database has been extended and numerous tools for in-depth analysis of the results have been added.
NASA Astrophysics Data System (ADS)
Giblin, M. F.; Sieckman, G. L.; Owen, N. K.; Hoffman, T. J.; Forte, L. R.; Volkert, W. A.
2005-12-01
The human Escherichia coli heat-stable enterotoxin (STh, amino acid sequence N1SSNYCCELCCNPACTGCY19) binds specifically to the guanylate cyclase C (GC-C) receptor, which is present in high density on the apical surface of normal intestinal epithelial cells as well as on the surface of human colon cancer cells. In the current study, two STh analogs were synthesized and evaluated in vitro and in vivo. Both analogs shared identical 6-19 core sequences, and had N-terminal pendant DOTA moieties. The analogs differed in the identity of a 6 amino acid peptide sequence intervening between DOTA and the 6-19 core. In one analog, the peptide was an RGD-containing sequence found in human fibronectin (GRGDSP), while in the other this peptide sequence was randomly scrambled (GRDSGP). The results indicated that the presence of the human fibronectin sequence in the hybrid peptide did not affect tumor localization in vivo.
High-purity circular RNA isolation method (RPAD) reveals vast collection of intronic circRNAs.
Panda, Amaresh C; De, Supriyo; Grammatikakis, Ioannis; Munk, Rachel; Yang, Xiaoling; Piao, Yulan; Dudekula, Dawood B; Abdelmohsen, Kotb; Gorospe, Myriam
2017-07-07
High-throughput RNA sequencing methods coupled with specialized bioinformatic analyses have recently uncovered tens of thousands of unique circular (circ)RNAs, but their complete sequences, genes of origin and functions are largely unknown. Given that circRNAs lack free ends and are thus relatively stable, their association with microRNAs (miRNAs) and RNA-binding proteins (RBPs) can influence gene expression programs. While exoribonuclease treatment is widely used to degrade linear RNAs and enrich circRNAs in RNA samples, it does not efficiently eliminate all linear RNAs. Here, we describe a novel method for the isolation of highly pure circRNA populations involving RNase R treatment followed by Polyadenylation and poly(A)+ RNA Depletion (RPAD), which removes linear RNA to near completion. High-throughput sequencing of RNA prepared using RPAD from human cervical carcinoma HeLa cells and mouse C2C12 myoblasts led to two surprising discoveries: (i) many exonic circRNA (EcircRNA) isoforms share an identical backsplice sequence but have different body sizes and sequences, and (ii) thousands of novel intronic circular RNAs (IcircRNAs) are expressed in cells. In sum, isolating high-purity circRNAs using the RPAD method can enable quantitative and qualitative analyses of circRNA types and sequence composition, paving the way for the elucidation of circRNA functions. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
High-purity circular RNA isolation method (RPAD) reveals vast collection of intronic circRNAs
De, Supriyo; Grammatikakis, Ioannis; Munk, Rachel; Yang, Xiaoling; Piao, Yulan; Dudekula, Dawood B.; Gorospe, Myriam
2017-01-01
Abstract High-throughput RNA sequencing methods coupled with specialized bioinformatic analyses have recently uncovered tens of thousands of unique circular (circ)RNAs, but their complete sequences, genes of origin and functions are largely unknown. Given that circRNAs lack free ends and are thus relatively stable, their association with microRNAs (miRNAs) and RNA-binding proteins (RBPs) can influence gene expression programs. While exoribonuclease treatment is widely used to degrade linear RNAs and enrich circRNAs in RNA samples, it does not efficiently eliminate all linear RNAs. Here, we describe a novel method for the isolation of highly pure circRNA populations involving RNase R treatment followed by Polyadenylation and poly(A)+ RNA Depletion (RPAD), which removes linear RNA to near completion. High-throughput sequencing of RNA prepared using RPAD from human cervical carcinoma HeLa cells and mouse C2C12 myoblasts led to two surprising discoveries: (i) many exonic circRNA (EcircRNA) isoforms share an identical backsplice sequence but have different body sizes and sequences, and (ii) thousands of novel intronic circular RNAs (IcircRNAs) are expressed in cells. In sum, isolating high-purity circRNAs using the RPAD method can enable quantitative and qualitative analyses of circRNA types and sequence composition, paving the way for the elucidation of circRNA functions. PMID:28444238
Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri
2015-12-01
Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences. Copyright © 2015 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hiraiwa, Akikazu; Yamanaka, Katsuo; Kwok, W.W.
Although HLA genes have been shown to be associated with certain diseases, the basis for this association is unknown. Recent studies, however, have documented patterns of nucleotide sequence variation among some HLA genes associated with a particular disease. For rheumatoid arthritis, HLA genes in most patients have a shared nucleotide sequence encoding a key structural element of an HLA class II polypeptide; this sequence element is critical for the interaction of the HLA molecule with antigenic peptides and with responding T cells, suggestive of a direct role for this sequence element in disease susceptibility. The authors describe the serological andmore » cellular immunologic characteristics encoded by this rheumatoid arthritis-associated sequence element. Site-directed mutagenesis of the DRB1 gene was used to define amino acids critical for antibody and T-cell recognition of this structural element, focusing on residues that distinguish the rheumatoid arthritis-associated alleles Dw4 and Dw14 from a closely related allele, Dw10, not associated with disease. Both the gain and loss of rheumatoid arthritis-associated epitopes were highly dependent on three residues within a discrete domain of the HLA-DR molecule. Recognition was most strongly influenced by the following amino acids (in order): 70 > 71 > 67. Some alloreactive T-cell clones were also influenced by amino acid variation in portions of the DR molecule lying outside the shared sequence element.« less
The limits of protein sequence comparison?
Pearson, William R; Sierk, Michael L
2010-01-01
Modern sequence alignment algorithms are used routinely to identify homologous proteins, proteins that share a common ancestor. Homologous proteins always share similar structures and often have similar functions. Over the past 20 years, sequence comparison has become both more sensitive, largely because of profile-based methods, and more reliable, because of more accurate statistical estimates. As sequence and structure databases become larger, and comparison methods become more powerful, reliable statistical estimates will become even more important for distinguishing similarities that are due to homology from those that are due to analogy (convergence). The newest sequence alignment methods are more sensitive than older methods, but more accurate statistical estimates are needed for their full power to be realized. PMID:15919194
Brown, Eric W.; Detter, Chris; Gerner-Smidt, Peter; Gilmour, Matthew W.; Harmsen, Dag; Hendriksen, Rene S.; Hewson, Roger; Heymann, David L.; Johansson, Karin; Ijaz, Kashef; Keim, Paul S.; Koopmans, Marion; Kroneman, Annelies; Wong, Danilo Lo Fo; Lund, Ole; Palm, Daniel; Sawanpanyalert, Pathom; Sobel, Jeremy; Schlundt, Jørgen
2012-01-01
The rapid advancement of genome technologies holds great promise for improving the quality and speed of clinical and public health laboratory investigations and for decreasing their cost. The latest generation of genome DNA sequencers can provide highly detailed and robust information on disease-causing microbes, and in the near future these technologies will be suitable for routine use in national, regional, and global public health laboratories. With additional improvements in instrumentation, these next- or third-generation sequencers are likely to replace conventional culture-based and molecular typing methods to provide point-of-care clinical diagnosis and other essential information for quicker and better treatment of patients. Provided there is free-sharing of information by all clinical and public health laboratories, these genomic tools could spawn a global system of linked databases of pathogen genomes that would ensure more efficient detection, prevention, and control of endemic, emerging, and other infectious disease outbreaks worldwide. PMID:23092707
Gibbs motif sampling: detection of bacterial outer membrane protein repeats.
Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.
1995-01-01
The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488
Yomano, L P; Scopes, R K; Ingram, L O
1993-01-01
Phosphoglycerate mutase is an essential glycolytic enzyme for Zymomonas mobilis, catalyzing the reversible interconversion of 3-phosphoglycerate and 2-phosphoglycerate. The pgm gene encoding this enzyme was cloned on a 5.2-kbp DNA fragment and expressed in Escherichia coli. Recombinants were identified by using antibodies directed against purified Z. mobilis phosphoglycerate mutase. The pgm gene contains a canonical ribosome-binding site, a biased pattern of codon usage, a long upstream untranslated region, and four promoters which share sequence homology. Interestingly, adhA and a D-specific 2-hydroxyacid dehydrogenase were found on the same DNA fragment and appear to form a cluster of genes which function in central metabolism. The translated sequence for Z. mobilis pgm was in full agreement with the 40 N-terminal amino acid residues determined by protein sequencing. The primary structure of the translated sequence is highly conserved (52 to 60% identity with other phosphoglycerate mutases) and also shares extensive homology with bisphosphoglycerate mutases (51 to 59% identity). Since Southern blots indicated the presence of only a single copy of pgm in the Z. mobilis chromosome, it is likely that the cloned pgm gene functions to provide both activities. Z. mobilis phosphoglycerate mutase is unusual in that it lacks the flexible tail and lysines at the carboxy terminus which are present in the enzyme isolated from all other organisms examined. Images PMID:8320209
Melters, Daniël P; Bradnam, Keith R; Young, Hugh A; Telis, Natalie; May, Michael R; Ruby, J Graham; Sebra, Robert; Peluso, Paul; Eid, John; Rank, David; Garcia, José Fernando; DeRisi, Joseph L; Smith, Timothy; Tobias, Christian; Ross-Ibarra, Jeffrey; Korf, Ian; Chan, Simon W L
2013-01-30
Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data. Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution. While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.
2013-01-01
Background Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data. Results Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution. Conclusions While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes. PMID:23363705
Community standards for genomic resources, genetic conservation, and data integration
Jill Wegrzyn; Meg Staton; Emily Grau; Richard Cronn; C. Dana Nelson
2017-01-01
Genetics and genomics are increasingly important in forestry management and conservation. Next generation sequencing can increase analytical power, but still relies on building on the structure of previously acquired data. Data standards and data sharing allow the community to maximize the analytical power of high throughput genomics data. The landscape of incomplete...
Ek-Huchim, Juan Pablo; Aguirre-Macedo, Ma Leopoldina; Améndola-Pimenta, Monica; Vidal-Martínez, Victor Manuel; Pérez-Vega, Juan Antonio; Simá-Alvarez, Raúl; Jiménez-García, Isabel; Zamora-Bustillos, Roberto; Rodríguez-Canul, Rossanna
2017-08-02
The protozoan Perkinsus marinus (Mackin, Owen & Collier) Levine, 1978 causes perkinsosis in the American oyster Crassostrea virginica Gmelin, 1791. This pathogen is present in cultured C. virginica from the Gulf of Mexico and has been reported recently in Saccostrea palmula (Carpenter, 1857), Crassostrea corteziensis (Hertlein, 1951) and Crassostrea gigas (Thunberg, 1793) from the Mexican Pacific coast. Transportation of fresh oysters for human consumption and repopulation could be implicated in the transmission and dissemination of this parasite across the Mexican Pacific coast. The aim of this study was two-fold. First, we evaluated the P. marinus infection parameters by PCR and RFTM (Ray's fluid thioglycollate medium) in C. virginica from four major lagoons (Términos Lagoon, Campeche; Carmen-Pajonal-Machona Lagoon complex, Tabasco; Mandinga Lagoon, Veracruz; and La Pesca Lagoon, Tamaulipas) from the Gulf of Mexico. Secondly, we used DNA sequence analyses of the ribosomal non-transcribed spacer (rNTS) region of P. marinus to determine the possible translocation of this species from the Gulf of Mexico to the Mexican Pacific coast. Perkinsus marinus prevalence by PCR was 57.7% (338 out of 586 oysters) and 38.2% (224 out of 586 oysters) by RFTM. The highest prevalence was observed in the Carmen-Pajonal-Machona Lagoon complex in the state of Tabasco (73% by PCR and 58% by RFTM) and the estimated weighted prevalence (WP) was less than 1.0 in the four lagoons. Ten unique rDNA-NTS sequences of P. marinus [termed herein the "P. marinus (Pm) haplotype"] were identified in the Gulf of Mexico sample. They shared 96-100% similarity with 18 rDNA-NTS sequences from the GenBank database which were derived from 16 Mexican Pacific coast infections and two sequences from the USA. The phylogenetic tree and the haplotype network showed that the P. marinus rDNA-NTS sequences from Mexico were distant from the rDNA-NTS sequences of P. marinus reported from the USA. The ten rDNA-NTS sequences described herein were restricted to specific locations displaying different geographical connections within the Gulf of Mexico; the Carmen-Pajonal-Machona Pm1 haplotype from the state of Tabasco shared a cluster with P. marinus isolates reported from the Mexican Pacific coast. The rDNA-NTS sequences of P. marinus from the state of Tabasco shared high similarity with the reference rDNA-NTS sequences from the Mexican Pacific coast. The high similarity suggests a transfer of oysters infected with P. marinus from the Mexican part of the Gulf of Mexico into the Mexican Pacific coast.
Asaf, Sajjad; Khan, Abdul Latif; Khan, Muhammad Aaqil; Waqas, Muhammad; Kang, Sang-Mo; Yun, Byung-Wook; Lee, In-Jung
2017-08-08
We investigated the complete chloroplast (cp) genomes of non-model Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea using Illumina paired-end sequencing to understand their genetic organization and structure. Detailed bioinformatics analysis revealed genome sizes of both subspecies ranging between 154.4~154.5 kbp, with a large single-copy region (84,197~84,158 bp), a small single-copy region (17,738~17,813 bp) and pair of inverted repeats (IRa/IRb; 26,264~26,259 bp). Both cp genomes encode 130 genes, including 85 protein-coding genes, eight ribosomal RNA genes and 37 transfer RNA genes. Whole cp genome comparison of A. halleri ssp. gemmifera and A. lyrata ssp. petraea, along with ten other Arabidopsis species, showed an overall high degree of sequence similarity, with divergence among some intergenic spacers. The location and distribution of repeat sequences were determined, and sequence divergences of shared genes were calculated among related species. Comparative phylogenetic analysis of the entire genomic data set and 70 shared genes between both cp genomes confirmed the previous phylogeny and generated phylogenetic trees with the same topologies. The sister species of A. halleri ssp. gemmifera is A. umezawana, whereas the closest relative of A. lyrata spp. petraea is A. arenicola.
O'Daniel, Julianne M; McLaughlin, Heather M; Amendola, Laura M; Bale, Sherri J; Berg, Jonathan S; Bick, David; Bowling, Kevin M; Chao, Elizabeth C; Chung, Wendy K; Conlin, Laura K; Cooper, Gregory M; Das, Soma; Deignan, Joshua L; Dorschner, Michael O; Evans, James P; Ghazani, Arezou A; Goddard, Katrina A; Gornick, Michele; Farwell Hagman, Kelly D; Hambuch, Tina; Hegde, Madhuri; Hindorff, Lucia A; Holm, Ingrid A; Jarvik, Gail P; Knight Johnson, Amy; Mighion, Lindsey; Morra, Massimo; Plon, Sharon E; Punj, Sumit; Richards, C Sue; Santani, Avni; Shirts, Brian H; Spinner, Nancy B; Tang, Sha; Weck, Karen E; Wolf, Susan M; Yang, Yaping; Rehm, Heidi L
2017-05-01
While the diagnostic success of genomic sequencing expands, the complexity of this testing should not be overlooked. Numerous laboratory processes are required to support the identification, interpretation, and reporting of clinically significant variants. This study aimed to examine the workflow and reporting procedures among US laboratories to highlight shared practices and identify areas in need of standardization. Surveys and follow-up interviews were conducted with laboratories offering exome and/or genome sequencing to support a research program or for routine clinical services. The 73-item survey elicited multiple choice and free-text responses that were later clarified with phone interviews. Twenty-one laboratories participated. Practices highly concordant across all groups included consent documentation, multiperson case review, and enabling patient opt-out of incidental or secondary findings analysis. Noted divergence included use of phenotypic data to inform case analysis and interpretation and reporting of case-specific quality metrics and methods. Few laboratory policies detailed procedures for data reanalysis, data sharing, or patient access to data. This study provides an overview of practices and policies of experienced exome and genome sequencing laboratories. The results enable broader consideration of which practices are becoming standard approaches, where divergence remains, and areas of development in best practice guidelines that may be helpful.Genet Med advance online publication 03 Novemeber 2016.
Phylogeographic reconstruction of a bacterial species with high levels of lateral gene transfer
Pearson, T.; Giffard, P.; Beckstrom-Sternberg, S.; Auerbach, R.; Hornstra, H.; Tuanyok, A.; Price, E.P.; Glass, M.B.; Leadem, B.; Beckstrom-Sternberg, J. S.; Allan, G.J.; Foster, J.T.; Wagner, D.M.; Okinaka, R.T.; Sim, S.H.; Pearson, O.; Wu, Z.; Chang, J.; Kaul, R.; Hoffmaster, A.R.; Brettin, T.S.; Robison, R.A.; Mayo, M.; Gee, J.E.; Tan, P.; Currie, B.J.; Keim, P.
2009-01-01
Background: Phylogeographic reconstruction of some bacterial populations is hindered by low diversity coupled with high levels of lateral gene transfer. A comparison of recombination levels and diversity at seven housekeeping genes for eleven bacterial species, most of which are commonly cited as having high levels of lateral gene transfer shows that the relative contributions of homologous recombination versus mutation for Burkholderia pseudomallei is over two times higher than for Streptococcus pneumoniae and is thus the highest value yet reported in bacteria. Despite the potential for homologous recombination to increase diversity, B. pseudomallei exhibits a relative lack of diversity at these loci. In these situations, whole genome genotyping of orthologous shared single nucleotide polymorphism loci, discovered using next generation sequencing technologies, can provide very large data sets capable of estimating core phylogenetic relationships. We compared and searched 43 whole genome sequences of B. pseudomallei and its closest relatives for single nucleotide polymorphisms in orthologous shared regions to use in phylogenetic reconstruction. Results: Bayesian phylogenetic analyses of >14,000 single nucleotide polymorphisms yielded completely resolved trees for these 43 strains with high levels of statistical support. These results enable a better understanding of a separate analysis of population differentiation among >1,700 B. pseudomallei isolates as defined by sequence data from seven housekeeping genes. We analyzed this larger data set for population structure and allele sharing that can be attributed to lateral gene transfer. Our results suggest that despite an almost panmictic population, we can detect two distinct populations of B. pseudomallei that conform to biogeographic patterns found in many plant and animal species. That is, separation along Wallace's Line, a biogeographic boundary between Southeast Asia and Australia. Conclusion: We describe an Australian origin for B. pseudomallei, characterized by a single introduction event into Southeast Asia during a recent glacial period, and variable levels of lateral gene transfer within populations. These patterns provide insights into mechanisms of genetic diversification in B. pseudomallei and its closest relatives, and provide a framework for integrating the traditionally separate fields of population genetics and phylogenetics for other bacterial species with high levels of lateral gene transfer. ?? 2009 Pearson et al; licensee BioMed Central Ltd.
Detection of the High-Level Aminoglycoside Resistance Gene aph(2")-Ib in Enterococcus faecium
Kao, Susan J.; You, Il; Clewell, Don B.; Donabedian, Susan M.; Zervos, Marcus J.; Petrin, Joanne; Shaw, Karen J.; Chow, Joseph W.
2000-01-01
A new high-level gentamicin resistance gene, designated aph(2")-Ib, was cloned from Enterococcus faecium SF11770. The deduced amino acid sequence of the 897-bp open reading frame of aph(2")-Ib shares homology with the aminoglycoside-modifying enzymes AAC(6′)-APH(2"), APH(2")-Ic, and APH(2")-Id. The observed phosphotransferase activity is designated APH(2")-Ib. PMID:10991878
Gao, J; Naglich, J G; Laidlaw, J; Whaley, J M; Seizinger, B R; Kley, N
1995-02-15
The human von Hippel-Lindau disease (VHL) gene has recently been identified and, based on the nucleotide sequence of a partial cDNA clone, has been predicted to encode a novel protein with as yet unknown functions [F. Latif et al., Science (Washington DC), 260: 1317-1320, 1993]. The length of the encoded protein and the characteristics of the cellular expressed protein are as yet unclear. Here we report the cloning and characterization of a mouse gene (mVHLh1) that is widely expressed in different mouse tissues and shares high homology with the human VHL gene. It predicts a protein 181 residues long (and/or 162 amino acids, considering a potential alternative start codon), which across a core region of approximately 140 residues displays a high degree of sequence identity (98%) to the predicted human VHL protein. High stringency DNA and RNA hybridization experiments and protein expression analyses indicate that this gene is the most highly VHL-related mouse gene, suggesting that it represents the mouse VHL gene homologue rather than a related gene sharing a conserved functional domain. These findings provide new insights into the potential organization of the VHL gene and nature of its encoded protein.
Overvoorde, P J; Chao, W S; Grimes, H D
1997-06-20
Photoaffinity labeling of a soybean cotyledon membrane fraction identified a sucrose-binding protein (SBP). Subsequent studies have shown that the SBP is a unique plasma membrane protein that mediates the linear uptake of sucrose in the presence of up to 30 mM external sucrose when ectopically expressed in yeast. Analysis of the SBP-deduced amino acid sequence indicates it lacks sequence similarity with other known transport proteins. Data presented here, however, indicate that the SBP shares significant sequence and structural homology with the vicilin-like seed storage proteins that organize into homotrimers. These similarities include a repeated sequence that forms the basis of the reiterated domain structure characteristic of the vicilin-like protein family. In addition, analytical ultracentrifugation and nonreducing SDS-polyacrylamide gel electrophoresis demonstrate that the SBP appears to be organized into oligomeric complexes with a Mr indicative of the existence of SBP homotrimers and homodimers. The structural similarity shared by the SBP and vicilin-like proteins provides a novel framework to explore the mechanistic basis of SBP-mediated sucrose uptake. Expression of the maize Glb protein (a vicilin-like protein closely related to the SBP) in yeast demonstrates that a closely related vicilin-like protein is unable to mediate sucrose uptake. Thus, despite sequence and structural similarities shared by the SBP and the vicilin-like protein family, the SBP is functionally divergent from other members of this group.
Understanding natural language for spacecraft sequencing
NASA Technical Reports Server (NTRS)
Katz, Boris; Brooks, Robert N., Jr.
1987-01-01
The paper describes a natural language understanding system, START, that translates English text into a knowledge base. The understanding and the generating modules of START share a Grammar which is built upon reversible transformations. Users can retrieve information by querying the knowledge base in English; the system then produces an English response. START can be easily adapted to many different domains. One such domain is spacecraft sequencing. A high-level overview of sequencing as it is practiced at JPL is presented in the paper, and three areas within this activity are identified for potential application of the START system. Examples are given of an actual dialog with START based on simulated data for the Mars Observer mission.
Reed, K M; Dorschner, M O; Todd, T N; Phillips, R B
1998-09-01
Sequence variation in the control region (D-loop) of the mitochondrial DNA (mtDNA) was examined to assess the genetic distinctiveness of the shortjaw cisco (Coregonus zenithicus). Individuals from within the Great Lakes Basin as well as inland lakes outside the basin were sampled. DNA fragments containing the entire D-loop were amplified by PCR from specimens of C. zenithicus and the related species C. artedi, C. hoyi, C. kiyi, and C. clupeaformis. DNA sequence analysis revealed high similarity within and among species and shared polymorphism for length variants. Based on this analysis, the shortjaw cisco is not genetically distinct from other cisco species.
Complete genome sequences of two novel European clade bovine foamy viruses from Germany and Poland.
Hechler, Torsten; Materniak, Magdalena; Kehl, Timo; Kuzmak, Jacek; Löchelt, Martin
2012-10-01
Bovine foamy virus (BFV), or bovine spumaretrovirus, is an infectious agent of cattle with no obvious disease association but high prevalence in its host. Here, we report two complete BFV sequences, BFV-Riems, isolated in 1978 in East Germany, and BFV100, isolated in 2005 in Poland. Both new BFV isolates share the overall genetic makeup of other foamy viruses (FV). Although isolated almost 25 years apart and propagated in either bovine (BFV-Riems) or nonbovine (BFV100) cells, both viruses are highly related, forming the European BFV clade. Despite clear differences, BFV-Riems and BFV100 are still very similar to BFV isolates from China and the United States, comprising the non-European BFV clade. The genomic sequences presented here confirm the concept of high sequence conservation across most of the FV genome. Analyses of cell culture-derived genomes reveal that proviral DNA may specifically lack introns in the env-bel coding region. The spacing of the splice sites in this region suggests that BFV has developed a novel mode to express a secretory but nonfunctional Env protein.
Complete Genome Sequences of Two Novel European Clade Bovine Foamy Viruses from Germany and Poland
Hechler, Torsten; Materniak, Magdalena; Kehl, Timo; Kuzmak, Jacek
2012-01-01
Bovine foamy virus (BFV), or bovine spumaretrovirus, is an infectious agent of cattle with no obvious disease association but high prevalence in its host. Here, we report two complete BFV sequences, BFV-Riems, isolated in 1978 in East Germany, and BFV100, isolated in 2005 in Poland. Both new BFV isolates share the overall genetic makeup of other foamy viruses (FV). Although isolated almost 25 years apart and propagated in either bovine (BFV-Riems) or nonbovine (BFV100) cells, both viruses are highly related, forming the European BFV clade. Despite clear differences, BFV-Riems and BFV100 are still very similar to BFV isolates from China and the United States, comprising the non-European BFV clade. The genomic sequences presented here confirm the concept of high sequence conservation across most of the FV genome. Analyses of cell culture-derived genomes reveal that proviral DNA may specifically lack introns in the env-bel coding region. The spacing of the splice sites in this region suggests that BFV has developed a novel mode to express a secretory but nonfunctional Env protein. PMID:22966195
Wijeratne, Saranga; Fraga, Martina; Meulia, Tea; Doohan, Doug; Li, Zhaohu; Qu, Feng
2013-01-01
Dodders are among the most important parasitic plants that cause serious yield losses in crop plants. In this report, we sought to unveil the genetic basis of dodder parasitism by profiling the trancriptomes of Cuscuta pentagona and C. suaveolens, two of the most common dodder species using a next-generation RNA sequencing platform. De novo assembly of the sequence reads resulted in more than 46,000 isotigs and contigs (collectively referred to as expressed sequence tags or ESTs) for each species, with more than half of them predicted to encode proteins that share significant sequence similarities with known proteins of non-parasitic plants. Comparing our datasets with transcriptomes of 12 other fully sequenced plant species confirmed a close evolutionary relationship between dodder and tomato. Using a rigorous set of filtering parameters, we were able to identify seven pairs of ESTs that appear to be shared exclusively by parasitic plants, thus providing targets for tailored management approaches. In addition, we also discovered ESTs with sequences similarities to known plant viruses, including cryptic viruses, in the dodder sequence assemblies. Together this study represents the first comprehensive transcriptome profiling of parasitic plants in the Cuscuta genus, and is expected to contribute to our understanding of the molecular mechanisms of parasitic plant-host plant interactions. PMID:24312295
Jiang, Linjian; Wijeratne, Asela J; Wijeratne, Saranga; Fraga, Martina; Meulia, Tea; Doohan, Doug; Li, Zhaohu; Qu, Feng
2013-01-01
Dodders are among the most important parasitic plants that cause serious yield losses in crop plants. In this report, we sought to unveil the genetic basis of dodder parasitism by profiling the trancriptomes of Cuscuta pentagona and C. suaveolens, two of the most common dodder species using a next-generation RNA sequencing platform. De novo assembly of the sequence reads resulted in more than 46,000 isotigs and contigs (collectively referred to as expressed sequence tags or ESTs) for each species, with more than half of them predicted to encode proteins that share significant sequence similarities with known proteins of non-parasitic plants. Comparing our datasets with transcriptomes of 12 other fully sequenced plant species confirmed a close evolutionary relationship between dodder and tomato. Using a rigorous set of filtering parameters, we were able to identify seven pairs of ESTs that appear to be shared exclusively by parasitic plants, thus providing targets for tailored management approaches. In addition, we also discovered ESTs with sequences similarities to known plant viruses, including cryptic viruses, in the dodder sequence assemblies. Together this study represents the first comprehensive transcriptome profiling of parasitic plants in the Cuscuta genus, and is expected to contribute to our understanding of the molecular mechanisms of parasitic plant-host plant interactions.
The NCI Genomic Data Commons as an engine for precision medicine.
Jensen, Mark A; Ferretti, Vincent; Grossman, Robert L; Staudt, Louis M
2017-07-27
The National Cancer Institute Genomic Data Commons (GDC) is an information system for storing, analyzing, and sharing genomic and clinical data from patients with cancer. The recent high-throughput sequencing of cancer genomes and transcriptomes has produced a big data problem that precludes many cancer biologists and oncologists from gleaning knowledge from these data regarding the nature of malignant processes and the relationship between tumor genomic profiles and treatment response. The GDC aims to democratize access to cancer genomic data and to foster the sharing of these data to promote precision medicine approaches to the diagnosis and treatment of cancer.
Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J.; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius
2016-01-01
The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data. PMID:28785418
Connor, Thomas R; Loman, Nicholas J; Thompson, Simon; Smith, Andy; Southgate, Joel; Poplawski, Radoslaw; Bull, Matthew J; Richardson, Emily; Ismail, Matthew; Thompson, Simon Elwood-; Kitchen, Christine; Guest, Martyn; Bakke, Marius; Sheppard, Samuel K; Pallen, Mark J
2016-09-01
The increasing availability and decreasing cost of high-throughput sequencing has transformed academic medical microbiology, delivering an explosion in available genomes while also driving advances in bioinformatics. However, many microbiologists are unable to exploit the resulting large genomics datasets because they do not have access to relevant computational resources and to an appropriate bioinformatics infrastructure. Here, we present the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) facility, a shared computing infrastructure that has been designed from the ground up to provide an environment where microbiologists can share and reuse methods and data.
Identification of a novel vitivirus from grapevines in New Zealand.
Blouin, Arnaud G; Keenan, Sandi; Napier, Kathryn R; Barrero, Roberto A; MacDiarmid, Robin M
2018-01-01
We report a sequence of a novel vitivirus from Vitis vinifera obtained using two high-throughput sequencing (HTS) strategies on RNA. The initial discovery from small-RNA sequencing was confirmed by HTS of the total RNA and Sanger sequencing. The new virus has a genome structure similar to the one reported for other vitiviruses, with five open reading frames (ORFs) coding for the conserved domains described for members of that genus. Phylogenetic analysis of the complete genome sequence confirmed its affiliation to the genus Vitivirus, with the closest described viruses being grapevine virus E (GVE) and Agave tequilana leaf virus (ATLV). However, the virus we report is distinct and shares only 51% amino acid sequence identity with GVE in the replicase polyprotein and 66.8% amino acid sequence identity with ATLV in the coat protein. This is well below the threshold determined by the ICTV for species demarcation, and we propose that this virus represents a new species. It is provisionally named "grapevine virus G".
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wu, Fangzhen; Wang, Huanhuan; Raghothamachar, Balaji
A new method has been developed to determine the fault vectors associated with stacking faults in 4H-SiC from their stacking sequences observed on high resolution TEM images. This method, analogous to the Burgers circuit technique for determination of dislocation Burgers vector, involves determination of the vectors required in the projection of the perfect lattice to correct the deviated path constructed in the faulted material. Results for several different stacking faults were compared with fault vectors determined from X-ray topographic contrast analysis and were found to be consistent. This technique is expected to applicable to all structures comprising corner shared tetrahedra.
LLNL Genomic Assessment: Viral and Bacterial Sequencing Needs for TMTI, Task 1.4.2 Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Slezak, T; Borucki, M; Lam, M
Good progress has been made on both bacterial and viral sequencing by the TMTI centers. While access to appropriate samples is a limiting factor to throughput, excellent progress has been made with respect to getting agreements in place with key sources of relevant materials. Sharing of sequenced genomes funded by TMTI has been extremely limited to date. The April 2010 exercise should force a resolution to this, but additional managerial pressures may be needed to ensure that rapid sharing of TMTI-funded sequencing occurs, regardless of collaborator constraints concerning ultimate publication(s). Policies to permit TMTI-internal rapid sharing of sequenced genomes shouldmore » be written into all TMTI agreements with collaborators now being negotiated. TMTI needs to establish a Web-based system for tracking samples destined for sequencing. This includes metadata on sample origins and contributor, information on sample shipment/receipt, prioritization by TMTI, assignment to one or more sequencing centers (including possible TMTI-sponsored sequencing at a contributor site), and status history of the sample sequencing effort. While this system could be a component of the AFRL system, it is not part of any current development effort. Policy and standardized procedures are needed to ensure appropriate verification of all TMTI samples prior to the investment in sequencing. PCR, arrays, and classical biochemical tests are examples of potential verification methods. Verification is needed to detect miss-labeled, degraded, mixed or contaminated samples. Regular QC exercises are needed to ensure that the TMTI-funded centers are meeting all standards for producing quality genomic sequence data.« less
Qiao, Jianlin; Shen, Yang; Shi, Meimei; Lu, Yanrong; Cheng, Jingqiu; Chen, Younan
2014-05-01
Through binding to von Willebrand factor (VWF), platelet glycoprotein (GP) Ibα, the major ligand-binding subunit of the GPIb-IX-V complex, initiates platelet adhesion and aggregation in response to exposed VWF or elevated fluid-shear stress. There is little data regarding non-human primate platelet GPIbα. This study cloned and characterized rhesus monkey (Macaca Mullatta) platelet GPIbα. DNAMAN software was used for sequence analysis and alignment. N/O-glycosylation sites and 3-D structure modelling were predicted by online OGPET v1.0, NetOGlyc 1.0 Server and SWISS-MODEL, respectively. Platelet function was evaluated by ADP- or ristocetin-induced platelet aggregation. Rhesus monkey GPIbα contains 2,268 nucleotides with an open reading frame encoding 755 amino acids. Rhesus monkey GPIbα nucleotide and protein sequences share 93.27% and 89.20% homology respectively, with human. Sequences encoding the leucine-rich repeats of rhesus monkey GPIbα share strong similarity with human, whereas PEST sequences and N/O-glycosylated residues vary. The GPIbα-binding residues for thrombin, filamin A and 14-3-3ζ are highly conserved between rhesus monkey and human. Platelet function analysis revealed monkey and human platelets respond similarly to ADP, but rhesus monkey platelets failed to respond to low doses of ristocetin where human platelets achieved 76% aggregation. However, monkey platelets aggregated in response to higher ristocetin doses. Monkey GPIbα shares strong homology with human GPIbα, however there are some differences in rhesus monkey platelet activation through GPIbα engagement, which need to be considered when using rhesus monkey platelet to investigate platelet GPIbα function. Copyright © 2014 Elsevier Ltd. All rights reserved.
Li, Hanjie; Ye, Congting; Ji, Guoli; Wu, Xiaohui; Xiang, Zhe; Li, Yuanyue; Cao, Yonghao; Liu, Xiaolong; Douek, Daniel C; Price, David A; Han, Jiahuai
2012-09-01
Overlap of TCR repertoires among individuals provides the molecular basis for public T cell responses. By deep-sequencing the TCRβ repertoires of CD4+CD8+ thymocytes from three individual mice, we observed that a substantial degree of TCRβ overlap, comprising ∼10-15% of all unique amino acid sequences and ∼5-10% of all unique nucleotide sequences across any two individuals, is already present at this early stage of T cell development. The majority of TCRβ sharing between individual thymocyte repertoires could be attributed to the process of convergent recombination, with additional contributions likely arising from recombinatorial biases; the role of selection during intrathymic development was negligible. These results indicate that the process of TCR gene recombination is the major determinant of clonotype sharing between individuals.
Random Amplification and Pyrosequencing for Identification of Novel Viral Genome Sequences
Hang, Jun; Forshey, Brett M.; Kochel, Tadeusz J.; Li, Tao; Solórzano, Víctor Fiestas; Halsey, Eric S.; Kuschner, Robert A.
2012-01-01
ssRNA viruses have high levels of genomic divergence, which can lead to difficulty in genomic characterization of new viruses using traditional PCR amplification and sequencing methods. In this study, random reverse transcription, anchored random PCR amplification, and high-throughput pyrosequencing were used to identify orthobunyavirus sequences from total RNA extracted from viral cultures of acute febrile illness specimens. Draft genome sequence for the orthobunyavirus L segment was assembled and sequentially extended using de novo assembly contigs from pyrosequencing reads and orthobunyavirus sequences in GenBank as guidance. Accuracy and continuous coverage were achieved by mapping all reads to the L segment draft sequence. Subsequently, RT-PCR and Sanger sequencing were used to complete the genome sequence. The complete L segment was found to be 6936 bases in length, encoding a 2248-aa putative RNA polymerase. The identified L segment was distinct from previously published South American orthobunyaviruses, sharing 63% and 54% identity at the nucleotide and amino acid level, respectively, with the complete Oropouche virus L segment and 73% and 81% identity at the nucleotide and amino acid level, respectively, with a partial Caraparu virus L segment. The result demonstrated the effectiveness of a sequence-independent amplification and next-generation sequencing approach for obtaining complete viral genomes from total nucleic acid extracts and its use in pathogen discovery. PMID:22468136
Maximum-likelihood estimation of recent shared ancestry (ERSA).
Huff, Chad D; Witherspoon, David J; Simonson, Tatum S; Xing, Jinchuan; Watkins, W Scott; Zhang, Yuhua; Tuohy, Therese M; Neklason, Deborah W; Burt, Randall W; Guthery, Stephen L; Woodward, Scott R; Jorde, Lynn B
2011-05-01
Accurate estimation of recent shared ancestry is important for genetics, evolution, medicine, conservation biology, and forensics. Established methods estimate kinship accurately for first-degree through third-degree relatives. We demonstrate that chromosomal segments shared by two individuals due to identity by descent (IBD) provide much additional information about shared ancestry. We developed a maximum-likelihood method for the estimation of recent shared ancestry (ERSA) from the number and lengths of IBD segments derived from high-density SNP or whole-genome sequence data. We used ERSA to estimate relationships from SNP genotypes in 169 individuals from three large, well-defined human pedigrees. ERSA is accurate to within one degree of relationship for 97% of first-degree through fifth-degree relatives and 80% of sixth-degree and seventh-degree relatives. We demonstrate that ERSA's statistical power approaches the maximum theoretical limit imposed by the fact that distant relatives frequently share no DNA through a common ancestor. ERSA greatly expands the range of relationships that can be estimated from genetic data and is implemented in a freely available software package.
Phylogenetic relationships among superfamilies of Neritimorpha (Mollusca: Gastropoda).
Uribe, Juan E; Colgan, Don; Castro, Lyda R; Kano, Yasunori; Zardoya, Rafael
2016-11-01
Despite the extraordinary morphological and ecological diversity of Neritimorpha, few studies have focused on the phylogenetic relationships of this lineage of gastropods, which includes four extant superfamilies: Neritopsoidea, Hydrocenoidea, Helicinoidea, and Neritoidea. Here, the nucleotide sequences of the complete mitochondrial genomes of Georissa bangueyensis (Hydrocenoidea), Neritina usnea (Neritoidea), and Pleuropoma jana (Helicinoidea) and the nearly complete mt genomes of Titiscania sp. (Neritopsoidea) and Theodoxus fluviatilis (Neritoidea) were determined. Phylogenetic reconstructions using probabilistic methods were based on mitochondrial (13 protein coding genes and two ribosomal rRNA genes), nuclear (partial 28S rRNA, 18S rRNA, actin, and histone H3 genes) and combined sequence data sets. All phylogenetic analyses except one converged on a single, highly supported tree in which Neritopsoidea was recovered as the sister group of a clade including Helicinoidea as the sister group of Hydrocenoidea and Neritoidea. This topology agrees with the fossil record and supports at least three independent invasions of land by neritimorph snails. The mitochondrial genomes of Titiscania sp., G. bangueyensis, N. usnea, and T. fluviatilis share the same gene organization previously described for Nerita mt genomes whereas that of P. jana has undergone major rearrangements. We sequenced about half of the mitochondrial genome of another species of Helicinoidea, Viana regina, and confirmed that this species shares the highly derived gene order of P. jana. Copyright © 2016 Elsevier Inc. All rights reserved.
Genes involved in convergent evolution of eusociality in bees
Woodard, S. Hollis; Fischman, Brielle J.; Venkat, Aarti; Hudson, Matt E.; Varala, Kranthi; Cameron, Sydney A.; Clark, Andrew G.; Robinson, Gene E.
2011-01-01
Eusociality has arisen independently at least 11 times in insects. Despite this convergence, there are striking differences among eusocial lifestyles, ranging from species living in small colonies with overt conflict over reproduction to species in which colonies contain hundreds of thousands of highly specialized sterile workers produced by one or a few queens. Although the evolution of eusociality has been intensively studied, the genetic changes involved in the evolution of eusociality are relatively unknown. We examined patterns of molecular evolution across three independent origins of eusociality by sequencing transcriptomes of nine socially diverse bee species and combining these data with genome sequence from the honey bee Apis mellifera to generate orthologous sequence alignments for 3,647 genes. We found a shared set of 212 genes with a molecular signature of accelerated evolution across all eusocial lineages studied, as well as unique sets of 173 and 218 genes with a signature of accelerated evolution specific to either highly or primitively eusocial lineages, respectively. These results demonstrate that convergent evolution can involve a mosaic pattern of molecular changes in both shared and lineage-specific sets of genes. Genes involved in signal transduction, gland development, and carbohydrate metabolism are among the most prominent rapidly evolving genes in eusocial lineages. These findings provide a starting point for linking specific genetic changes to the evolution of eusociality. PMID:21482769
Zhu, Ruo-Lin; Zhang, Qi-Ya
2014-04-01
Paralichthys olivaceus rhabdovirus (PORV), which is associated with high mortality rates in flounder, was isolated in China in 2005. Here, we provide an annotated sequence record of PORV, the genome of which comprises 11,182 nucleotides and contains six genes in the order 3'-N-P-M-G-NV-L-5'. Phylogenetic analysis based on glycoprotein sequences of PORV and other rhabdoviruses showed that PORV clusters with viral haemorrhagic septicemia virus (VHSV), genus Novirhabdovirus, family Rhabdoviridae. Further phylogenetic analysis of the combined amino acid sequences of six proteins of PORV and VHSV strains showed that PORV clusters with Korean strains and is closely related to Asian strains, all of which were isolated from flounder. In a comparison in which the sequences of the six proteins were combined, PORV shared the highest identity (98.3 %) with VHSV strain KJ2008 from Korea.
Silencing Effect of Hominoid Highly Conserved Noncoding Sequences on Embryonic Brain Development
Mahmoudi Saber, Morteza
2017-01-01
Abstract Superfamily Hominoidea, which consists of Hominidae (humans and great apes) and Hylobatidae (gibbons), is well-known for sharing human-like characteristics, however, the genomic origins of these shared unique phenotypes have mainly remained elusive. To decipher the underlying genomic basis of Hominoidea-restricted phenotypes, we identified and characterized Hominoidea-restricted highly conserved noncoding sequences (HCNSs) that are a class of potential regulatory elements which may be involved in evolution of lineage-specific phenotypes. We discovered 679 such HCNSs from human, chimpanzee, gorilla, orangutan and gibbon genomes. These HCNSs were demonstrated to be under purifying selection but with lineage-restricted characteristics different from old CNSs. A significant proportion of their ancestral sequences had accelerated rates of nucleotide substitutions, insertions and deletions during the evolution of common ancestor of Hominoidea, suggesting the intervention of positive Darwinian selection for creating those HCNSs. In contrary to enhancer elements and similar to silencer sequences, these Hominoidea-restricted HCNSs are located in close proximity of transcription start sites. Their target genes are enriched in the nervous system, development and transcription, and they tend to be remotely located from the nearest coding gene. Chip-seq signals and gene expression patterns suggest that Hominoidea-restricted HCNSs are likely to be functional regulatory elements by imposing silencing effects on their target genes in a tissue-restricted manner during fetal brain development. These HCNSs, emerged through adaptive evolution and conserved through purifying selection, represent a set of promising targets for future functional studies of the evolution of Hominoidea-restricted phenotypes. PMID:28633494
Hochreiter, Sepp
2013-01-01
Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority—152 000 IBD segments—are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also, in some cases exclusively, in Africans. The lengths of IBD segments and their sharing between continental populations indicate that many short IBD segments from chromosome 1 existed before humans migrated out of Africa. Thus, rare variants that tag these short IBD segments predate human migration from Africa. The software package HapFABIA is available from Bioconductor. All data sets, result files and programs for data simulation, preprocessing and evaluation are supplied at http://www.bioinf.jku.at/research/short-IBD. PMID:24174545
Shared prefetching to reduce execution skew in multi-threaded systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eichenberger, Alexandre E; Gunnels, John A
Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated basedmore » on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.« less
Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community.
Krampis, Konstantinos; Booth, Tim; Chapman, Brad; Tiwari, Bela; Bicak, Mesude; Field, Dawn; Nelson, Karen E
2012-03-19
A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them.
Cloud BioLinux: pre-configured and on-demand bioinformatics computing for the genomics community
2012-01-01
Background A steep drop in the cost of next-generation sequencing during recent years has made the technology affordable to the majority of researchers, but downstream bioinformatic analysis still poses a resource bottleneck for smaller laboratories and institutes that do not have access to substantial computational resources. Sequencing instruments are typically bundled with only the minimal processing and storage capacity required for data capture during sequencing runs. Given the scale of sequence datasets, scientific value cannot be obtained from acquiring a sequencer unless it is accompanied by an equal investment in informatics infrastructure. Results Cloud BioLinux is a publicly accessible Virtual Machine (VM) that enables scientists to quickly provision on-demand infrastructures for high-performance bioinformatics computing using cloud platforms. Users have instant access to a range of pre-configured command line and graphical software applications, including a full-featured desktop interface, documentation and over 135 bioinformatics packages for applications including sequence alignment, clustering, assembly, display, editing, and phylogeny. Each tool's functionality is fully described in the documentation directly accessible from the graphical interface of the VM. Besides the Amazon EC2 cloud, we have started instances of Cloud BioLinux on a private Eucalyptus cloud installed at the J. Craig Venter Institute, and demonstrated access to the bioinformatic tools interface through a remote connection to EC2 instances from a local desktop computer. Documentation for using Cloud BioLinux on EC2 is available from our project website, while a Eucalyptus cloud image and VirtualBox Appliance is also publicly available for download and use by researchers with access to private clouds. Conclusions Cloud BioLinux provides a platform for developing bioinformatics infrastructures on the cloud. An automated and configurable process builds Virtual Machines, allowing the development of highly customized versions from a shared code base. This shared community toolkit enables application specific analysis platforms on the cloud by minimizing the effort required to prepare and maintain them. PMID:22429538
Awadi, Asma; Suchentrunk, Franz; Makni, Mohamed; Ben Slimen, Hichem
2016-10-01
North African hares are currently included in cape hares, Lepus capensis sensu lato, a taxon that may be considered a superspecies or a complex of closely related species. The existing molecular data, however, are not unequivocal, with mtDNA control region sequences suggesting a separate species status and nuclear loci (allozymes, microsatellites) revealing conspecificity of L. capensis and L. europaeus. Here, we study sequence variation in the intron 6 (468 bp) of the transferrin nuclear gene, of 105 hares with different coat colour from different regions in Tunisia with respect to genetic diversity and differentiation, as well as their phylogenetic status. Forty-six haplotypes (alleles) were revealed and compared phylogenetically to all available TF haplotypes of various Lepus species retrieved from GenBank. Maximum Likelihood, neighbor joining and median joining network analyses concordantly grouped all currently obtained haplotypes together with haplotypes belonging to six different Chinese hare species and the African scrub hare L. saxatilis. Moreover, two Tunisian haploypes were shared with L. capensis, L timidus, L. sinensis, L. yarkandensis, and L. hainanus from China. These results indicated the evolutionary complexity of the genus Lepus with the mixing of nuclear gene haplotypes resulting from introgressive hybridization or/and shared ancestral polymorphism. We report the presence of shared ancestral polymorphism between North African and Chinese hares. This has not been detected earlier in the mtDNA sequences of the same individuals. Genetic diversity of the TF sequences from the Tunisian populations was relatively high compared to other hare populations. However, genetic differentiation and gene flow analyses (AMOVA, F ST , Nm) indicated little divergence with the absence of geographically meaningful phylogroups and lack of clustering with coat colour types. These results confirm the presence of a single hare species in Tunisia, but a sound inference on its phylogenetic position would require additional nuclear markers and numerous geographically meaningful samples from Africa and Eurasia.
Johnson, Timothy J; Kariyawasam, Subhashinie; Wannemuehler, Yvonne; Mangiamele, Paul; Johnson, Sara J; Doetkott, Curt; Skyberg, Jerod A; Lynne, Aaron M; Johnson, James R; Nolan, Lisa K
2007-04-01
Escherichia coli strains that cause disease outside the intestine are known as extraintestinal pathogenic E. coli (ExPEC) and include human uropathogenic E. coli (UPEC) and avian pathogenic E. coli (APEC). Regardless of host of origin, ExPEC strains share many traits. It has been suggested that these commonalities may enable APEC to cause disease in humans. Here, we begin to test the hypothesis that certain APEC strains possess potential to cause human urinary tract infection through virulence genotyping of 1,000 APEC and UPEC strains, generation of the first complete genomic sequence of an APEC (APEC O1:K1:H7) strain, and comparison of this genome to all available human ExPEC genomic sequences. The genomes of APEC O1 and three human UPEC strains were found to be remarkably similar, with only 4.5% of APEC O1's genome not found in other sequenced ExPEC genomes. Also, use of multilocus sequence typing showed that some of the sequenced human ExPEC strains were more like APEC O1 than other human ExPEC strains. This work provides evidence that at least some human and avian ExPEC strains are highly similar to one another, and it supports the possibility that a food-borne link between some APEC and UPEC strains exists. Future studies are necessary to assess the ability of APEC to overcome the hurdles necessary for such a food-borne transmission, and epidemiological studies are required to confirm that such a phenomenon actually occurs.
Chen, Tsute; Siddiqui, Huma; Olsen, Ingar
2017-01-01
Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functional genomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica . All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/.
Chen, Tsute; Siddiqui, Huma; Olsen, Ingar
2017-01-01
Currently, genome sequences of a total of 19 Porphyromonas gingivalis strains are available, including eight completed genomes (strains W83, ATCC 33277, TDC60, HG66, A7436, AJW4, 381, and A7A1-28) and 11 high-coverage draft sequences (JCVI SC001, F0185, F0566, F0568, F0569, F0570, SJD2, W4087, W50, Ando, and MP4-504) that are assembled into fewer than 300 contigs. The objective was to compare these genomes at both nucleotide and protein sequence levels in order to understand their phylogenetic and functional relatedness. Four copies of 16S rRNA gene sequences were identified in each of the eight complete genomes and one in the other 11 unfinished genomes. These 43 16S rRNA sequences represent only 24 unique sequences and the derived phylogenetic tree suggests a possible evolutionary history for these strains. Phylogenomic comparison based on shared proteins and whole genome nucleotide sequences consistently showed two groups with closely related members: one consisted of ATCC 33277, 381, and HG66, another of W83, W50, and A7436. At least 1,037 core/shared proteins were identified in the 19 P. gingivalis genomes based on the most stringent detecting parameters. Comparative functional genomics based on genome-wide comparisons between NCBI and RAST annotations, as well as additional approaches, revealed functions that are unique or missing in individual P. gingivalis strains, or species-specific in all P. gingivalis strains, when compared to a neighboring species P. asaccharolytica. All the comparative results of this study are available online for download at ftp://www.homd.org/publication_data/20160425/. PMID:28261563
Nullomers and High Order Nullomers in Genomic Sequences
Vergni, Davide; Santoni, Daniele
2016-01-01
A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications. PMID:27906971
Spiroplasma species share common DNA sequences among their viruses, plasmids and genomes.
Ranhand, J M; Nur, I; Rose, D L; Tully, J G
1987-01-01
Alkaline-Southern-blot analyses showed that a spiroplasma plasmid, pRA1, obtained from Spiroplasma citri (Maroc-R8A2), contained DNA sequences that were homologous to spiroplasma type 3 viruses (SV3) obtained from S. citri (Maroc-R8A2), S. citri (608) and S. mirum (SMCA). In addition, pRA1 and SV3(608) DNA shared common, but not necessarily related, sequences with extrachromosomal DNA derived from 11 Spiroplasma species or strains. Furthermore, SV3(608) had DNA homology with the chromosome from 6 distinct spiroplasmas but not with chromosomal DNA from eight other Spiroplasma species or strains. The biological function of these common sequences is unknown.
Complete genome analysis of jasmine virus T from Jasminum sambac in China.
Tang, Yajun; Gao, Fangluan; Yang, Zhen; Wu, Zujian; Yang, Liang
2016-07-01
The genome of a potyvirus (isolate JaVT_FZ) recovered from jasmine (Jasminum sambac L.) showing yellow ringspot symptoms in Fuzhou, China, was sequenced. JaVT_FZ is closely related to seven other potyviruses with completely sequenced genomes, with which it shares 66-70 % nucleotide and 52-56 % amino acid sequence identity. However, the coat protein (CP) gene shares 82-92 % nucleotide and 90-97 % amino acid sequence identity with those of two partially sequenced potyviruses, named jasmine potyvirus T (JaVT-jasmine) and jasmine yellow mosaic potyvirus (JaYMV-India), respectively. This suggests that JaVT_FZ, JaVT-jasmine and JaYMV-India should be regarded as members of a single potyvirus species, for which the name "Jasmine virus T" has priority.
An oleate 12-hydroxylase from Ricinus communis L. is a fatty acyl desaturase homolog
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van De Loo, F.J.; Broun, P.; Turner, S.
1995-07-18
Recent spectroscopic evidence implicating a binuclear iron site at the reaction center of fatty acyl desaturases suggested to us that certain fatty acyl hydroxylases may share significant amino acid sequence similarity with desaturases. To test this theory, we prepared a cDNA library from developing endosperm of the castor-oil plant (Ricinus communis L.) and obtained partial nucleotide sequences for 468 anonymous clones that were not expressed at high levels in leaves, a tissue deficient in 12-hydroxyoleic acid. This resulted in the identification of several cDNA clones encoding a polypeptide of 387 amino acids with a predicted molecular weight of 44,407 andmore » with {approx}67% sequence homology to microsomal oleate desaturase from Arabidopsis. Expression of a full-length clone under control of the cauliflower mosaic virus 35S promoter in transgenic tobacco resulted in the accumulation of low levels of 12-hydroxyoleic acid in seeds, indicating that the clone encodes the castor oleate hydroxylase. These results suggest that fatty acyl desaturases and hydroxylases share similar reaction mechanisms and provide an example of enzyme evolution. 26 refs., 6 figs., 1 tab.« less
MPD: a pathogen genome and metagenome database
Zhang, Tingting; Miao, Jiaojiao; Han, Na; Qiang, Yujun; Zhang, Wen
2018-01-01
Abstract Advances in high-throughput sequencing have led to unprecedented growth in the amount of available genome sequencing data, especially for bacterial genomes, which has been accompanied by a challenge for the storage and management of such huge datasets. To facilitate bacterial research and related studies, we have developed the Mypathogen database (MPD), which provides access to users for searching, downloading, storing and sharing bacterial genomics data. The MPD represents the first pathogenic database for microbial genomes and metagenomes, and currently covers pathogenic microbial genomes (6604 genera, 11 071 species, 41 906 strains) and metagenomic data from host, air, water and other sources (28 816 samples). The MPD also functions as a management system for statistical and storage data that can be used by different organizations, thereby facilitating data sharing among different organizations and research groups. A user-friendly local client tool is provided to maintain the steady transmission of big sequencing data. The MPD is a useful tool for analysis and management in genomic research, especially for clinical Centers for Disease Control and epidemiological studies, and is expected to contribute to advancing knowledge on pathogenic bacteria genomes and metagenomes. Database URL: http://data.mypathogen.org PMID:29917040
The future scalability of pH-based genome sequencers: A theoretical perspective
NASA Astrophysics Data System (ADS)
Go, Jonghyun; Alam, Muhammad A.
2013-10-01
Sequencing of human genome is an essential prerequisite for personalized medicine and early prognosis of various genetic diseases. The state-of-art, high-throughput genome sequencing technologies provide improved sequencing; however, their reliance on relatively expensive optical detection schemes has prevented wide-spread adoption of the technology in routine care. In contrast, the recently announced pH-based electronic genome sequencers achieve fast sequencing at low cost because of the compatibility with the current microelectronics technology. While the progress in technology development has been rapid, the physics of the sequencing chips and the potential for future scaling (and therefore, cost reduction) remain unexplored. In this article, we develop a theoretical framework and a scaling theory to explain the principle of operation of the pH-based sequencing chips and use the framework to explore various perceived scaling limits of the technology related to signal to noise ratio, well-to-well crosstalk, and sequencing accuracy. We also address several limitations inherent to the key steps of pH-based genome sequencers, which are widely shared by many other sequencing platforms in the market but remained unexplained properly so far.
When data sharing gets close to 100%: what human paleogenetics can teach the open science movement.
Anagnostou, Paolo; Capocasa, Marco; Milia, Nicola; Sanna, Emanuele; Battaggia, Cinzia; Luzi, Daniela; Destro Bisol, Giovanni
2015-01-01
This study analyzes data sharing regarding mitochondrial, Y chromosomal and autosomal polymorphisms in a total of 162 papers on ancient human DNA published between 1988 and 2013. The estimated sharing rate was not far from totality (97.6% ± 2.1%) and substantially higher than observed in other fields of genetic research (evolutionary, medical and forensic genetics). Both a questionnaire-based survey and the examination of Journals' editorial policies suggest that this high sharing rate cannot be simply explained by the need to comply with stakeholders requests. Most data were made available through body text, but the use of primary databases increased in coincidence with the introduction of complete mitochondrial and next-generation sequencing methods. Our study highlights three important aspects. First, our results imply that researchers' awareness of the importance of openness and transparency for scientific progress may complement stakeholders' policies in achieving very high sharing rates. Second, widespread data sharing does not necessarily coincide with a prevalent use of practices which maximize data findability, accessibility, useability and preservation. A detailed look at the different ways in which data are released can be very useful to detect failures to adopt the best sharing modalities and understand how to correct them. Third and finally, the case of human paleogenetics tells us that a widespread awareness of the importance of Open Science may be important to build reliable scientific practices even in the presence of complex experimental challenges.
IRiS: construction of ARG networks at genomic scales.
Javed, Asif; Pybus, Marc; Melé, Marta; Utro, Filippo; Bertranpetit, Jaume; Calafell, Francesc; Parida, Laxmi
2011-09-01
Given a set of extant haplotypes IRiS first detects high confidence recombination events in their shared genealogy. Next using the local sequence topology defined by each detected event, it integrates these recombinations into an ancestral recombination graph. While the current system has been calibrated for human population data, it is easily extendible to other species as well. IRiS (Identification of Recombinations in Sequences) binary files are available for non-commercial use in both Linux and Microsoft Windows, 32 and 64 bit environments from https://researcher.ibm.com/researcher/view_project.php?id = 2303 parida@us.ibm.com.
Li, Yongqiang; Deng, Congliang; Bian, Yong; Zhao, Xiaoli; Zhou, Qi
2017-04-01
Apple stem grooving virus (ASGV), apple chlorotic leaf spot virus (ACLSV), and prunus necrotic ringspot virus (PNRSV) were identified in a crab apple tree by small RNA deep sequencing. The complete genome sequence of ACLSV isolate BJ (ACLSV-BJ) was 7554 nucleotides and shared 67.0%-83.0% nucleotide sequence identity with other ACLSV isolates. A phylogenetic tree based on the complete genome sequence of all available ACLSV isolates showed that ACLSV-BJ clustered with the isolates SY01 from hawthorn, MO5 from apple, and JB, KMS and YH from pear. The complete nucleotide sequence of ASGV-BJ was 6509 nucleotides (nt) long and shared 78.2%-80.7% nucleotide sequence identity with other isolates. ASGV-BJ and the isolate ASGV_kfp clustered together in the phylogenetic tree as an independent clade. Recombination analysis showed that isolate ASGV-BJ was a naturally occurring recombinant.
Reed, Kent M.; Dorschner, Michael O.; Todd, Thomas N.; Phillips, Ruth B.
1998-01-01
Sequence variation in the control region (D-loop) of the mitochondrial DNA (mtDNA) was examined to assess the genetic distinctiveness of the shortjaw cisco (Coregonus zenithicus). Individuals from within the Great Lakes Basin as well as inland lakes outside the basin were sampled. DNA fragments containing the entire D-loop were amplified by PCR from specimens ofC. zenithicus and the related species C. artedi, C. hoyi, C. kiyi, and C. clupeaformis. DNA sequence analysis revealed high similarity within and among species and shared polymorphism for length variants. Based on this analysis, the shortjaw cisco is not genetically distinct from other cisco species.
Demographic history and rare allele sharing among human populations.
Gravel, Simon; Henn, Brenna M; Gutenkunst, Ryan N; Indap, Amit R; Marth, Gabor T; Clark, Andrew G; Yu, Fuli; Gibbs, Richard A; Bustamante, Carlos D
2011-07-19
High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2-4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.
Demographic history and rare allele sharing among human populations
Gravel, Simon; Henn, Brenna M.; Gutenkunst, Ryan N.; Indap, Amit R.; Marth, Gabor T.; Clark, Andrew G.; Yu, Fuli; Gibbs, Richard A.; Bustamante, Carlos D.; Altshuler, David L.; Durbin, Richard M.; Abecasis, Gonçalo R.; Bentley, David R.; Chakravarti, Aravinda; Clark, Andrew G.; Collins, Francis S.; De La Vega, Francisco M.; Donnelly, Peter; Egholm, Michael; Flicek, Paul; Gabriel, Stacey B.; Gibbs, Richard A.; Knoppers, Bartha M.; Lander, Eric S.; Lehrach, Hans; Mardis, Elaine R.; McVean, Gil A.; Nickerson, Debbie A.; Peltonen, Leena; Schafer, Alan J.; Sherry, Stephen T.; Wang, Jun; Wilson, Richard K.; Gibbs, Richard A.; Deiros, David; Metzker, Mike; Muzny, Donna; Reid, Jeff; Wheeler, David; Wang, Jun; Li, Jingxiang; Jian, Min; Li, Guoqing; Li, Ruiqiang; Liang, Huiqing; Tian, Geng; Wang, Bo; Wang, Jian; Wang, Wei; Yang, Huanming; Zhang, Xiuqing; Zheng, Huisong; Lander, Eric S.; Altshuler, David L.; Ambrogio, Lauren; Bloom, Toby; Cibulskis, Kristian; Fennell, Tim J.; Gabriel, Stacey B.; Jaffe, David B.; Shefler, Erica; Sougnez, Carrie L.; Bentley, David R.; Gormley, Niall; Humphray, Sean; Kingsbury, Zoya; Koko-Gonzales, Paula; Stone, Jennifer; McKernan, Kevin J.; Costa, Gina L.; Ichikawa, Jeffry K.; Lee, Clarence C.; Sudbrak, Ralf; Lehrach, Hans; Borodina, Tatiana A.; Dahl, Andreas; Davydov, Alexey N.; Marquardt, Peter; Mertes, Florian; Nietfeld, Wilfiried; Rosenstiel, Philip; Schreiber, Stefan; Soldatov, Aleksey V.; Timmermann, Bernd; Tolzmann, Marius; Egholm, Michael; Affourtit, Jason; Ashworth, Dana; Attiya, Said; Bachorski, Melissa; Buglione, Eli; Burke, Adam; Caprio, Amanda; Celone, Christopher; Clark, Shauna; Conners, David; Desany, Brian; Gu, Lisa; Guccione, Lorri; Kao, Kalvin; Kebbel, Andrew; Knowlton, Jennifer; Labrecque, Matthew; McDade, Louise; Mealmaker, Craig; Minderman, Melissa; Nawrocki, Anne; Niazi, Faheem; Pareja, Kristen; Ramenani, Ravi; Riches, David; Song, Wanmin; Turcotte, Cynthia; Wang, Shally; Mardis, Elaine R.; Wilson, Richard K.; Dooling, David; Fulton, Lucinda; Fulton, Robert; Weinstock, George; Durbin, Richard M.; Burton, John; Carter, David M.; Churcher, Carol; Coffey, Alison; Cox, Anthony; Palotie, Aarno; Quail, Michael; Skelly, Tom; Stalker, James; Swerdlow, Harold P.; Turner, Daniel; De Witte, Anniek; Giles, Shane; Gibbs, Richard A.; Wheeler, David; Bainbridge, Matthew; Challis, Danny; Sabo, Aniko; Yu, Fuli; Yu, Jin; Wang, Jun; Fang, Xiaodong; Guo, Xiaosen; Li, Ruiqiang; Li, Yingrui; Luo, Ruibang; Tai, Shuaishuai; Wu, Honglong; Zheng, Hancheng; Zheng, Xiaole; Zhou, Yan; Li, Guoqing; Wang, Jian; Yang, Huanming; Marth, Gabor T.; Garrison, Erik P.; Huang, Weichun; Indap, Amit; Kural, Deniz; Lee, Wan-Ping; Leong, Wen Fung; Quinlan, Aaron R.; Stewart, Chip; Stromberg, Michael P.; Ward, Alistair N.; Wu, Jiantao; Lee, Charles; Mills, Ryan E.; Shi, Xinghua; Daly, Mark J.; DePristo, Mark A.; Altshuler, David L.; Ball, Aaron D.; Banks, Eric; Bloom, Toby; Browning, Brian L.; Cibulskis, Kristian; Fennell, Tim J.; Garimella, Kiran V.; Grossman, Sharon R.; Handsaker, Robert E.; Hanna, Matt; Hartl, Chris; Jaffe, David B.; Kernytsky, Andrew M.; Korn, Joshua M.; Li, Heng; Maguire, Jared R.; McCarroll, Steven A.; McKenna, Aaron; Nemesh, James C.; Philippakis, Anthony A.; Poplin, Ryan E.; Price, Alkes; Rivas, Manuel A.; Sabeti, Pardis C.; Schaffner, Stephen F.; Shefler, Erica; Shlyakhter, Ilya A.; Cooper, David N.; Ball, Edward V.; Mort, Matthew; Phillips, Andrew D.; Stenson, Peter D.; Sebat, Jonathan; Makarov, Vladimir; Ye, Kenny; Yoon, Seungtai C.; Bustamante, Carlos D.; Clark, Andrew G.; Boyko, Adam; Degenhardt, Jeremiah; Gravel, Simon; Gutenkunst, Ryan N.; Kaganovich, Mark; Keinan, Alon; Lacroute, Phil; Ma, Xin; Reynolds, Andy; Clarke, Laura; Flicek, Paul; Cunningham, Fiona; Herrero, Javier; Keenen, Stephen; Kulesha, Eugene; Leinonen, Rasko; McLaren, William M.; Radhakrishnan, Rajesh; Smith, Richard E.; Zalunin, Vadim; Zheng-Bradley, Xiangqun; Korbel, Jan O.; Stütz, Adrian M.; Humphray, Sean; Bauer, Markus; Cheetham, R. Keira; Cox, Tony; Eberle, Michael; James, Terena; Kahn, Scott; Murray, Lisa; Chakravarti, Aravinda; Ye, Kai; De La Vega, Francisco M.; Fu, Yutao; Hyland, Fiona C. L.; Manning, Jonathan M.; McLaughlin, Stephen F.; Peckham, Heather E.; Sakarya, Onur; Sun, Yongming A.; Tsung, Eric F.; Batzer, Mark A.; Konkel, Miriam K.; Walker, Jerilyn A.; Sudbrak, Ralf; Albrecht, Marcus W.; Amstislavskiy, Vyacheslav S.; Herwig, Ralf; Parkhomchuk, Dimitri V.; Sherry, Stephen T.; Agarwala, Richa; Khouri, Hoda M.; Morgulis, Aleksandr O.; Paschall, Justin E.; Phan, Lon D.; Rotmistrovsky, Kirill E.; Sanders, Robert D.; Shumway, Martin F.; Xiao, Chunlin; McVean, Gil A.; Auton, Adam; Iqbal, Zamin; Lunter, Gerton; Marchini, Jonathan L.; Moutsianas, Loukas; Myers, Simon; Tumian, Afidalina; Desany, Brian; Knight, James; Winer, Roger; Craig, David W.; Beckstrom-Sternberg, Steve M.; Christoforides, Alexis; Kurdoglu, Ahmet A.; Pearson, John V.; Sinari, Shripad A.; Tembe, Waibhav D.; Haussler, David; Hinrichs, Angie S.; Katzman, Sol J.; Kern, Andrew; Kuhn, Robert M.; Przeworski, Molly; Hernandez, Ryan D.; Howie, Bryan; Kelley, Joanna L.; Melton, S. Cord; Abecasis, Gonçalo R.; Li, Yun; Anderson, Paul; Blackwell, Tom; Chen, Wei; Cookson, William O.; Ding, Jun; Kang, Hyun Min; Lathrop, Mark; Liang, Liming; Moffatt, Miriam F.; Scheet, Paul; Sidore, Carlo; Snyder, Matthew; Zhan, Xiaowei; Zöllner, Sebastian; Awadalla, Philip; Casals, Ferran; Idaghdour, Youssef; Keebler, John; Stone, Eric A.; Zilversmit, Martine; Jorde, Lynn; Xing, Jinchuan; Eichler, Evan E.; Aksay, Gozde; Alkan, Can; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Kidd, Jeffrey M.; Sahinalp, S. Cenk; Sudmant, Peter H.; Mardis, Elaine R.; Chen, Ken; Chinwalla, Asif; Ding, Li; Koboldt, Daniel C.; McLellan, Mike D.; Dooling, David; Weinstock, George; Wallis, John W.; Wendl, Michael C.; Zhang, Qunyuan; Durbin, Richard M.; Albers, Cornelis A.; Ayub, Qasim; Balasubramaniam, Senduran; Barrett, Jeffrey C.; Carter, David M.; Chen, Yuan; Conrad, Donald F.; Danecek, Petr; Dermitzakis, Emmanouil T.; Hu, Min; Huang, Ni; Hurles, Matt E.; Jin, Hanjun; Jostins, Luke; Keane, Thomas M.; Le, Si Quang; Lindsay, Sarah; Long, Quan; MacArthur, Daniel G.; Montgomery, Stephen B.; Parts, Leopold; Stalker, James; Tyler-Smith, Chris; Walter, Klaudia; Zhang, Yujun; Gerstein, Mark B.; Snyder, Michael; Abyzov, Alexej; Balasubramanian, Suganthi; Bjornson, Robert; Du, Jiang; Grubert, Fabian; Habegger, Lukas; Haraksingh, Rajini; Jee, Justin; Khurana, Ekta; Lam, Hugo Y. K.; Leng, Jing; Mu, Xinmeng Jasmine; Urban, Alexander E.; Zhang, Zhengdong; Li, Yingrui; Luo, Ruibang; Marth, Gabor T.; Garrison, Erik P.; Kural, Deniz; Quinlan, Aaron R.; Stewart, Chip; Stromberg, Michael P.; Ward, Alistair N.; Wu, Jiantao; Lee, Charles; Mills, Ryan E.; Shi, Xinghua; McCarroll, Steven A.; Banks, Eric; DePristo, Mark A.; Handsaker, Robert E.; Hartl, Chris; Korn, Joshua M.; Li, Heng; Nemesh, James C.; Sebat, Jonathan; Makarov, Vladimir; Ye, Kenny; Yoon, Seungtai C.; Degenhardt, Jeremiah; Kaganovich, Mark; Clarke, Laura; Smith, Richard E.; Zheng-Bradley, Xiangqun; Korbel, Jan O.; Humphray, Sean; Cheetham, R. Keira; Eberle, Michael; Kahn, Scott; Murray, Lisa; Ye, Kai; De La Vega, Francisco M.; Fu, Yutao; Peckham, Heather E.; Sun, Yongming A.; Batzer, Mark A.; Konkel, Miriam K.; Walker, Jerilyn A.; Xiao, Chunlin; Iqbal, Zamin; Desany, Brian; Blackwell, Tom; Snyder, Matthew; Xing, Jinchuan; Eichler, Evan E.; Aksay, Gozde; Alkan, Can; Hajirasouliha, Iman; Hormozdiari, Fereydoun; Kidd, Jeffrey M.; Chen, Ken; Chinwalla, Asif; Ding, Li; McLellan, Mike D.; Wallis, John W.; Hurles, Matt E.; Conrad, Donald F.; Walter, Klaudia; Zhang, Yujun; Gerstein, Mark B.; Snyder, Michael; Abyzov, Alexej; Du, Jiang; Grubert, Fabian; Haraksingh, Rajini; Jee, Justin; Khurana, Ekta; Lam, Hugo Y. K.; Leng, Jing; Mu, Xinmeng Jasmine; Urban, Alexander E.; Zhang, Zhengdong; Gibbs, Richard A.; Bainbridge, Matthew; Challis, Danny; Coafra, Cristian; Dinh, Huyen; Kovar, Christie; Lee, Sandy; Muzny, Donna; Nazareth, Lynne; Reid, Jeff; Sabo, Aniko; Yu, Fuli; Yu, Jin; Marth, Gabor T.; Garrison, Erik P.; Indap, Amit; Leong, Wen Fung; Quinlan, Aaron R.; Stewart, Chip; Ward, Alistair N.; Wu, Jiantao; Cibulskis, Kristian; Fennell, Tim J.; Gabriel, Stacey B.; Garimella, Kiran V.; Hartl, Chris; Shefler, Erica; Sougnez, Carrie L.; Wilkinson, Jane; Clark, Andrew G.; Gravel, Simon; Grubert, Fabian; Clarke, Laura; Flicek, Paul; Smith, Richard E.; Zheng-Bradley, Xiangqun; Sherry, Stephen T.; Khouri, Hoda M.; Paschall, Justin E.; Shumway, Martin F.; Xiao, Chunlin; McVean, Gil A.; Katzman, Sol J.; Abecasis, Gonçalo R.; Blackwell, Tom; Mardis, Elaine R.; Dooling, David; Fulton, Lucinda; Fulton, Robert; Koboldt, Daniel C.; Durbin, Richard M.; Balasubramaniam, Senduran; Coffey, Allison; Keane, Thomas M.; MacArthur, Daniel G.; Palotie, Aarno; Scott, Carol; Stalker, James; Tyler-Smith, Chris; Gerstein, Mark B.; Balasubramanian, Suganthi; Chakravarti, Aravinda; Knoppers, Bartha M.; Abecasis, Gonçalo R.; Bustamante, Carlos D.; Gharani, Neda; Gibbs, Richard A.; Jorde, Lynn; Kaye, Jane S.; Kent, Alastair; Li, Taosha; McGuire, Amy L.; McVean, Gil A.; Ossorio, Pilar N.; Rotimi, Charles N.; Su, Yeyang; Toji, Lorraine H.; TylerSmith, Chris; Brooks, Lisa D.; Felsenfeld, Adam L.; McEwen, Jean E.; Abdallah, Assya; Juenger, Christopher R.; Clemm, Nicholas C.; Collins, Francis S.; Duncanson, Audrey; Green, Eric D.; Guyer, Mark S.; Peterson, Jane L.; Schafer, Alan J.; Abecasis, Gonçalo R.; Altshuler, David L.; Auton, Adam; Brooks, Lisa D.; Durbin, Richard M.; Gibbs, Richard A.; Hurles, Matt E.; McVean, Gil A.
2011-01-01
High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2–4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence. PMID:21730125
Development of Mycoplasma synoviae (MS) core genome multilocus sequence typing (cgMLST) scheme.
Ghanem, Mostafa; El-Gazzar, Mohamed
2018-05-01
Mycoplasma synoviae (MS) is a poultry pathogen with reported increased prevalence and virulence in recent years. MS strain identification is essential for prevention, control efforts and epidemiological outbreak investigations. Multiple multilocus based sequence typing schemes have been developed for MS, yet the resolution of these schemes could be limited for outbreak investigation. The cost of whole genome sequencing became close to that of sequencing the seven MLST targets; however, there is no standardized method for typing MS strains based on whole genome sequences. In this paper, we propose a core genome multilocus sequence typing (cgMLST) scheme as a standardized and reproducible method for typing MS based whole genome sequences. A diverse set of 25 MS whole genome sequences were used to identify 302 core genome genes as cgMLST targets (35.5% of MS genome) and 44 whole genome sequences of MS isolates from six countries in four continents were used for typing applying this scheme. cgMLST based phylogenetic trees displayed a high degree of agreement with core genome SNP based analysis and available epidemiological information. cgMLST allowed evaluation of two conventional MLST schemes of MS. The high discriminatory power of cgMLST allowed differentiation between samples of the same conventional MLST type. cgMLST represents a standardized, accurate, highly discriminatory, and reproducible method for differentiation between MS isolates. Like conventional MLST, it provides stable and expandable nomenclature, allowing for comparing and sharing the typing results between different laboratories worldwide. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Wang, Gang; Sun, Yanwei; Xu, Ruirui; Qu, Jing; Tee, Chuansia; Jiang, Xiyuan; Ye, Jian
2014-04-01
Jatropha curcas mosaic disease (JcMD) is a newly emerging disease that has been reported in Africa and India. Here, we report the complete nucleotide sequence of a new Indian cassava mosaic virus isolate (ICMV-SG) from Singapore. Infection of ICMV-SG showed more severe JcMD in Jatropha curcas and Nicotiana benthamiana than the other ICMV isolates reported previously, though ICMV-SG shares high sequence identity with the other ICMV isolates. Agroinfectious DNA-A alone sufficiently induced systemic symptoms in N. benthamiana, but not in J. curcas. Results from agroinfection assays showed that systemic infection of ICMV-SG in J. curcas required both DNA-A and DNA-B components.
Chowdhury, S M Z H; Omar, A R; Aini, I; Hair-Bejo, M; Jamaluddin, A A; Md-Zain, B M; Kono, Y
2003-12-01
Specific-pathogen-free (SPF) chickens inoculated with low passage Chicken anaemia virus (CAV), SMSC-1 and 3-1 isolates produced lesions suggestive of CAV infection. Repeated passages of the isolates in cell culture until passage 60 (P60) and passage 123 produced viruses that showed a significantly reduced level of pathogenicity in SPF chickens compared to the low passage isolates. Sequence comparison indicated that nucleotide changes in only the coding region of the P60 passage isolates were thought to contribute to virus attenuation. Phylogenetic analysis indicated that SMSC-1 and 3-1 were highly divergent, but their P60 passage derivatives shared significant homology to a Japanese isolate A2.
2015-12-01
proportion greater than 0.25 (iv) Read depth greater than 8 in at least one sample The Table below shows variant data from Family 1041 categorized by...patients from a severely affected breast cancer Family 1041 . All Shared Rare Excluding IBD0 Intergenic 3,345,727 1,650,045 35,927 3,990 ncRNA
Kawasaki, Junna; Kawamura, Maki; Ohsato, Yoshiharu; Ito, Jumpei; Nishigaki, Kazuo
2017-10-15
Recombination events induce significant genetic changes, and this process can result in virus genetic diversity or in the generation of novel pathogenicity. We discovered a new recombinant feline leukemia virus (FeLV) gag gene harboring an unrelated insertion, termed the X region, which was derived from Felis catus endogenous gammaretrovirus 4 (FcERV-gamma4). The identified FcERV-gamma4 proviruses have lost their coding capabilities, but some can express their viral RNA in feline tissues. Although the X-region-carrying recombinant FeLVs appeared to be replication-defective viruses, they were detected in 6.4% of tested FeLV-infected cats. All isolated recombinant FeLV clones commonly incorporated a middle part of the FcERV-gamma4 5'-leader region as an X region. Surprisingly, a sequence corresponding to the portion contained in all X regions is also present in at least 13 endogenous retroviruses (ERVs) observed in the cat, human, primate, and pig genomes. We termed this shared genetic feature the commonly shared (CS) sequence. Despite our phylogenetic analysis indicating that all CS-sequence-carrying ERVs are classified as gammaretroviruses, no obvious closeness was revealed among these ERVs. However, the Shannon entropy in the CS sequence was lower than that in other parts of the provirus genome. Notably, the CS sequence of human endogenous retrovirus T had 73.8% similarity with that of FcERV-gamma4, and specific signals were detected in the human genome by Southern blot analysis using a probe for the FcERV-gamma4 CS sequence. Our results provide an interesting evolutionary history for CS-sequence circulation among several distinct ancestral viruses and a novel recombined virus over a prolonged period. IMPORTANCE Recombination among ERVs or modern viral genomes causes a rapid evolution of retroviruses, and this phenomenon can result in the serious situation of viral disease reemergence. We identified a novel recombinant FeLV gag gene that contains an unrelated sequence, termed the X region. This region originated from the 5' leader of FcERV-gamma4, a replication-incompetent feline ERV. Surprisingly, a sequence corresponding to the X region is also present in the 5' portion of other ERVs, including human endogenous retroviruses. Scattered copies of the ERVs carrying the unique genetic feature, here named the commonly shared (CS) sequence, were found in each host genome, suggesting that ancestral viruses may have captured and maintained the CS sequence. More recently, a novel recombinant FeLV hijacked the CS sequence from inactivated FcERV-gamma4 as the X region. Therefore, tracing the CS sequences can provide unique models for not only the modern reservoir of new recombinant viruses but also the genetic features shared among ancient retroviruses. Copyright © 2017 American Society for Microbiology.
Garland, Ellen C; Noad, Michael J; Goldizen, Anne W; Lilley, Matthew S; Rekdahl, Melinda L; Garrigue, Claire; Constantine, Rochelle; Daeschler Hauser, Nan; Poole, M Michael; Robbins, Jooke
2013-01-01
Humpback whales have a continually evolving vocal sexual display, or "song," that appears to undergo both evolutionary and "revolutionary" change. All males within a population adhere to the current content and arrangement of the song. Populations within an ocean basin share similarities in their songs; this sharing is complex as multiple variations of the song (song types) may be present within a region at any one time. To quantitatively investigate the similarity of song types, songs were compared at both the individual singer and population level using the Levenshtein distance technique and cluster analysis. The highly stereotyped sequences of themes from the songs of 211 individuals from populations within the western and central South Pacific region from 1998 through 2008 were grouped together based on the percentage of song similarity, and compared to qualitatively assigned song types. The analysis produced clusters of highly similar songs that agreed with previous qualitative assignments. Each cluster contained songs from multiple populations and years, confirming the eastward spread of song types and their progressive evolution through the study region. Quantifying song similarity and exchange will assist in understanding broader song dynamics and contribute to the use of vocal displays as population identifiers.
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...
High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis.
Simonyan, Vahan; Mazumder, Raja
2014-09-30
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis.
High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis
Simonyan, Vahan; Mazumder, Raja
2014-01-01
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis. PMID:25271953
Chen, Wei-Hua; Wang, Xue-Xia; Lin, Wei; He, Xiao-Wei; Wu, Zhen-Qiang; Lin, Ying; Hu, Song-Nian; Wang, Xiao-Ning
2006-01-01
Background The cynomolgus monkey (Macaca fascicularis) is one of the most widely used surrogate animal models for an increasing number of human diseases and vaccines, especially immune-system-related ones. Towards a better understanding of the gene expression background upon its immunogenetics, we constructed a cDNA library from Epstein-Barr virus (EBV)-transformed B lymphocytes of a cynomolgus monkey and sequenced 10,000 randomly picked clones. Results After processing, 8,312 high-quality expressed sequence tags (ESTs) were generated and assembled into 3,728 unigenes. Annotations of these uniquely expressed transcripts demonstrated that out of the 2,524 open reading frame (ORF) positive unigenes (mitochondrial and ribosomal sequences were not included), 98.8% shared significant similarities (E-value less than 1e-10) with the NCBI nucleotide (nt) database, while only 67.7% (E-value less than 1e-5) did so with the NCBI non-redundant protein (nr) database. Further analysis revealed that 90.0% of the unigenes that shared no similarities to the nr database could be assigned to human chromosomes, in which 75 did not match significantly to any cynomolgus monkey and human ESTs. The mapping regions to known human genes on the human genome were described in detail. The protein family and domain analysis revealed that the first, second and fourth of the most abundantly expressed protein families were all assigned to immunoglobulin and major histocompatibility complex (MHC)-related proteins. The expression profiles of these genes were compared with that of homologous genes in human blood, lymph nodes and a RAMOS cell line, which demonstrated expression changes after transformation with EBV. The degree of sequence similarity of the MHC class I and II genes to the human reference sequences was evaluated. The results indicated that class I molecules showed weak amino acid identities (<90%), while class II showed slightly higher ones. Conclusion These results indicated that the genes expressed in the cynomolgus monkey could be used to identify novel protein-coding genes and revise those incomplete or incorrect annotations in the human genome by comparative methods, since the old world monkeys and humans share high similarities at the molecular level, especially within coding regions. The identification of multiple genes involved in the immune response, their sequence variations to the human homologues, and their responses to EBV infection could provide useful information to improve our understanding of the cynomolgus monkey immune system. PMID:16618371
Wu, Shijin; Li, Yuan; Wang, Penghua; Zhong, Li; Qiu, Lequan; Chen, Jianmeng
2016-04-01
The environmental risk of fluoride and chloride pollution is pronounced in soils adjacent to solar photovoltaic sites. The elevated levels of fluoride and chloride in these soils have had significant impacts on the population size and overall biological activity of the soil microbial communities. The microbial community also plays an essential role in remediation of these soils. Questions remain as to how the fluoride and chloride contamination and subsequent remediation at these sites have impacted the population structure of the soil microbial communities. We analyzed the microbial communities in soils collected from close to a solar photovoltaic enterprise by pyrosequencing of the 16S rRNA tag. In addition, we used multivariate statistics to identity the relationships shared between sequence diversity and heterogeneity in the soil environment. The overall microbial communities were surprisingly diverse, harboring a wide variety of taxa and sharing significant correlations with different degrees of fluoride and chloride contamination. The contaminated soils harbored abundant bacteria that were probably resistant to the high acidity, high fluoride and chloride concentration, and high osmotic pressure environment. The dominant genera were Sphingomonas, Subgroup_6_norank, Clostridium sensu stricto, Nitrospira, Rhizomicrobium, and Acidithiobacillus. The results of this study provide new information regarding a previously uncharacterized ecosystem and show the value of high-throughput sequencing in the study of complex ecosystems.
Hybrid threshold adaptable quantum secret sharing scheme with reverse Huffman-Fibonacci-tree coding.
Lai, Hong; Zhang, Jun; Luo, Ming-Xing; Pan, Lei; Pieprzyk, Josef; Xiao, Fuyuan; Orgun, Mehmet A
2016-08-12
With prevalent attacks in communication, sharing a secret between communicating parties is an ongoing challenge. Moreover, it is important to integrate quantum solutions with classical secret sharing schemes with low computational cost for the real world use. This paper proposes a novel hybrid threshold adaptable quantum secret sharing scheme, using an m-bonacci orbital angular momentum (OAM) pump, Lagrange interpolation polynomials, and reverse Huffman-Fibonacci-tree coding. To be exact, we employ entangled states prepared by m-bonacci sequences to detect eavesdropping. Meanwhile, we encode m-bonacci sequences in Lagrange interpolation polynomials to generate the shares of a secret with reverse Huffman-Fibonacci-tree coding. The advantages of the proposed scheme is that it can detect eavesdropping without joint quantum operations, and permits secret sharing for an arbitrary but no less than threshold-value number of classical participants with much lower bandwidth. Also, in comparison with existing quantum secret sharing schemes, it still works when there are dynamic changes, such as the unavailability of some quantum channel, the arrival of new participants and the departure of participants. Finally, we provide security analysis of the new hybrid quantum secret sharing scheme and discuss its useful features for modern applications.
Hybrid threshold adaptable quantum secret sharing scheme with reverse Huffman-Fibonacci-tree coding
Lai, Hong; Zhang, Jun; Luo, Ming-Xing; Pan, Lei; Pieprzyk, Josef; Xiao, Fuyuan; Orgun, Mehmet A.
2016-01-01
With prevalent attacks in communication, sharing a secret between communicating parties is an ongoing challenge. Moreover, it is important to integrate quantum solutions with classical secret sharing schemes with low computational cost for the real world use. This paper proposes a novel hybrid threshold adaptable quantum secret sharing scheme, using an m-bonacci orbital angular momentum (OAM) pump, Lagrange interpolation polynomials, and reverse Huffman-Fibonacci-tree coding. To be exact, we employ entangled states prepared by m-bonacci sequences to detect eavesdropping. Meanwhile, we encode m-bonacci sequences in Lagrange interpolation polynomials to generate the shares of a secret with reverse Huffman-Fibonacci-tree coding. The advantages of the proposed scheme is that it can detect eavesdropping without joint quantum operations, and permits secret sharing for an arbitrary but no less than threshold-value number of classical participants with much lower bandwidth. Also, in comparison with existing quantum secret sharing schemes, it still works when there are dynamic changes, such as the unavailability of some quantum channel, the arrival of new participants and the departure of participants. Finally, we provide security analysis of the new hybrid quantum secret sharing scheme and discuss its useful features for modern applications. PMID:27515908
Eco-epidemiology of Novel Bartonella Genotypes from Parasitic Flies of Insectivorous Bats.
Sándor, Attila D; Földvári, Mihály; Krawczyk, Aleksandra I; Sprong, Hein; Corduneanu, Alexandra; Barti, Levente; Görföl, Tamás; Estók, Péter; Kováts, Dávid; Szekeres, Sándor; László, Zoltán; Hornok, Sándor; Földvári, Gábor
2018-04-29
Bats are important zoonotic reservoirs for many pathogens worldwide. Although their highly specialized ectoparasites, bat flies (Diptera: Hippoboscoidea), can transmit Bartonella bacteria including human pathogens, their eco-epidemiology is unexplored. Here, we analyzed the prevalence and diversity of Bartonella strains sampled from 10 bat fly species from 14 European bat species. We found high prevalence of Bartonella spp. in most bat fly species with wide geographical distribution. Bat species explained most of the variance in Bartonella distribution with the highest prevalence of infected flies recorded in species living in dense groups exclusively in caves. Bat gender but not bat fly gender was also an important factor with the more mobile male bats giving more opportunity for the ectoparasites to access several host individuals. We detected high diversity of Bartonella strains (18 sequences, 7 genotypes, in 9 bat fly species) comparable with tropical assemblages of bat-bat fly association. Most genotypes are novel (15 out of 18 recorded strains have a similarity of 92-99%, with three sequences having 100% similarity to Bartonella spp. sequences deposited in GenBank) with currently unknown pathogenicity; however, 4 of these sequences are similar (up to 92% sequence similarity) to Bartonella spp. with known zoonotic potential. The high prevalence and diversity of Bartonella spp. suggests a long shared evolution of these bacteria with bat flies and bats providing excellent study targets for the eco-epidemiology of host-vector-pathogen cycles.
Vibrio chromosomes share common history.
Kirkup, Benjamin C; Chang, LeeAnn; Chang, Sarah; Gevers, Dirk; Polz, Martin F
2010-05-10
While most gamma proteobacteria have a single circular chromosome, Vibrionales have two circular chromosomes. Horizontal gene transfer is common among Vibrios, and in light of this genetic mobility, it is an open question to what extent the two chromosomes themselves share a common history since their formation. Single copy genes from each chromosome (142 genes from chromosome I and 42 genes from chromosome II) were identified from 19 sequenced Vibrionales genomes and their phylogenetic comparison suggests consistent phylogenies for each chromosome. Additionally, study of the gene organization and phylogeny of the respective origins of replication confirmed the shared history. Thus, while elements within the chromosomes may have experienced significant genetic mobility, the backbones share a common history. This allows conclusions based on multilocus sequence analysis (MLSA) for one chromosome to be applied equally to both chromosomes.
Diverse novel astroviruses identified in wild Himalayan marmots.
Ao, Yuan-Yun; Yu, Jie-Mei; Li, Li-Li; Cao, Jing-Yuan; Deng, Hong-Yan; Xin, Yun-Yun; Liu, Meng-Meng; Lin, Lin; Lu, Shan; Xu, Jian-Guo; Duan, Zhao-Jun
2017-04-01
With advances in viral surveillance and next-generation sequencing, highly diverse novel astroviruses (AstVs) and different animal hosts had been discovered in recent years. However, the existence of AstVs in marmots had yet to be shown. Here, we identified two highly divergent strains of AstVs (tentatively named Qinghai Himalayanmarmot AstVs, HHMAstV1 and HHMAstV2), by viral metagenomic analysis in liver tissues isolated from wild Marmota himalayana in China. Overall, 12 of 99 (12.1 %) M. himalayana faecal samples were positive for the presence of genetically diverse AstVs, while only HHMAstV1 and HHMAstV2 were identified in 300 liver samples. The complete genomic sequences of HHMAstV1 and HHMAstV2 were 6681 and 6610 nt in length, respectively, with the typical genomic organization of AstVs. Analysis of the complete ORF 2 sequence showed that these novel AstVs are most closely related to the rabbit AstV, mamastrovirus 23 (with 31.0 and 48.0 % shared amino acid identity, respectively). Phylogenetic analysis of the amino acid sequences of ORF1a, ORF1b and ORF2 indicated that HHMAstV1 and HHMAstV2 form two distinct clusters among the mamastroviruses, and may share a common ancestor with the rabbit-specific mamastrovirus 23. These results suggest that HHMAstV1 and HHMAstV2 are two novel species of the genus Mamastrovirus in the Astroviridae. The remarkable diversity of these novel AstVs will contribute to a greater understanding of the evolution and ecology of AstVs, although additional studies will be needed to understand the clinical significance of these novel AstVs in marmots, as well as in humans.
MHC class II genes in European wolves: a comparison with dogs.
Seddon, Jennifer M; Ellegren, Hans
2002-10-01
The genome of the grey wolf, one of the most widely distributed land mammal species, has been subjected to both stochastic factors, including biogeographical subdivision and population fragmentation, and strong selection during the domestication of the dog. To explore the effects of drift and selection on the partitioning of MHC variation in the diversification of species, we present nine DQA, 10 DQB, and 17 DRB1 sequences of the second exon for European wolves and compare them with sequences of North American wolves and dogs. The relatively large number of class II alleles present in both European and North American wolves attests to their large historical population sizes, yet there are few alleles shared between these regions at DQB and DRB1. Similarly, the dog has an extensive array of class II MHC alleles, a consequence of a genetically diverse origin, but allelic overlap with wolves only at DQA. Although we might expect a progression from shared alleles to shared allelic lineages during differentiation, the partitioning of diversity between wolves and dogs at DQB and DRB1 differs from that at DQA. Furthermore, an extensive region of nucleotide sequence shared between DRB1 and DQB alleles and a shared motif suggests intergenic recombination may have contributed to MHC diversity in the Canidae.
Que, Ting-zhi; Zhao, Shu-min; Li, Cheng-tao
2010-08-01
Determination strategies for half sibling sharing a same mother were investigated through the detection of autosomal and X-chromosomal STR (X-STR) loci and polymorphisms on hypervariable (HV) region of mitochondrial DNA (mtDNA). Genomic DNA were extracted from blood stain samples of the 3 full siblings and one dubious half sibling sharing the same mother with them. Fifteen autosomal STR loci were genotyped by Sinofiler kit, and 19 X-STR loci were genotyped by Mentype Argus X-8 kit and 16 plex in-house system. Polymorphisms of mtDNA HV-I and HV-II were also detected with sequencing technology. Full sibling relationship between the dubious half sibling and each of the 3 full siblings were excluded based on the results of autosomal STR genotyping and calculation of full sibling index (FSI) and half sibling index (HIS). Results of sequencing for mtDNA HV-I and HV-II showed that all of the 4 samples came from a same maternal line. X-STR genotyping results determined that the dubious half sibling shared a same mother with the 3 full siblings. It is reliable to combine three different genotyping technologies including autosomal STR, X-STR and sequencing of mtDNA HV-I and HV-II for determination of half sibling sharing a same mother.
Structures of two Arabidopsis thaliana major latex proteins represent novel helix-grip folds
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lytle, Betsy L.; Song, Jikui; de la Cruz, Norberto B.
2009-06-02
Here we report the first structures of two major latex proteins (MLPs) which display unique structural differences from the canonical Bet v 1 fold described earlier. MLP28 (SwissProt/TrEMBL ID Q9SSK9), the product of gene At1g70830.1, and the At1g24000.1 gene product (Swiss- Prot/TrEMBL ID P0C0B0), proteins which share 32% sequence identity, were independently selected as foldspace targets by the Center for Eukaryotic Structural Genomics. The structure of a single domain (residues 17-173) of MLP28 was solved by NMR spectroscopy, while the full-length At1g24000.1 structure was determined by X-ray crystallography. MLP28 displays greater than 30% sequence identity to at least eight MLPsmore » from other species. For example, the MLP28 sequence shares 64% identity to peach Pp-MLP119 and 55% identity to cucumber Csf2.20 In contrast, the At1g24000.1 sequence is highly divergent (see Fig. 1), containing a gap of 33 amino acids when compared with all other known MLPs. Even when the gap is excluded, the sequence identity with MLPs from other species is less than 30%. Unlike some of the MLPs from other species, none of the A. thaliana MLPs have been characterized biochemically. We show by NMR chemical shift mapping that At1g24000.1 binds progesterone, demonstrating that despite its sequence dissimilarity, the hydrophobic binding pocket is conserved and, therefore, may play a role in its biological function and that of the MLP family in general.« less
Haplotype assembly in polyploid genomes and identical by descent shared tracts.
Aguiar, Derek; Istrail, Sorin
2013-07-01
Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (i) do not consider individuals sharing haplotypes jointly, which reduces the size and accuracy of assembled haplotypes, and (ii) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Polyploid organisms are increasingly becoming the target of many research groups interested in the genomics of disease, phylogenetics, botany and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction. In this work, we present a number of results, extensions and generalizations of compass graphs and our HapCompass framework. We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. Furthermore, we present graph theory-based algorithms for the problem of haplotype assembly using our previously developed HapCompass framework for (i) novel implementations of haplotype assembly optimizations (minimum error correction), (ii) assembly of a pair of individuals sharing a haplotype tract identical by descent and (iii) assembly of polyploid genomes. We evaluate our methods on 1000 Genomes Project, Pacific Biosciences and simulated sequence data. HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/. Supplementary data are available at Bioinformatics online.
The impact of genetics on future drug discovery in schizophrenia.
Matsumoto, Mitsuyuki; Walton, Noah M; Yamada, Hiroshi; Kondo, Yuji; Marek, Gerard J; Tajinda, Katsunori
2017-07-01
Failures of investigational new drugs (INDs) for schizophrenia have left huge unmet medical needs for patients. Given the recent lackluster results, it is imperative that new drug discovery approaches (and resultant drug candidates) target pathophysiological alterations that are shared in specific, stratified patient populations that are selected based on pre-identified biological signatures. One path to implementing this paradigm is achievable by leveraging recent advances in genetic information and technologies. Genome-wide exome sequencing and meta-analysis of single nucleotide polymorphism (SNP)-based association studies have already revealed rare deleterious variants and SNPs in patient populations. Areas covered: Herein, the authors review the impact that genetics have on the future of schizophrenia drug discovery. The high polygenicity of schizophrenia strongly indicates that this disease is biologically heterogeneous so the identification of unique subgroups (by patient stratification) is becoming increasingly necessary for future investigational new drugs. Expert opinion: The authors propose a pathophysiology-based stratification of genetically-defined subgroups that share deficits in particular biological pathways. Existing tools, including lower-cost genomic sequencing and advanced gene-editing technology render this strategy ever more feasible. Genetically complex psychiatric disorders such as schizophrenia may also benefit from synergistic research with simpler monogenic disorders that share perturbations in similar biological pathways.
Ying, Yu; Meng, Dongdong; Chen, Xiaohua; Li, Fuli
2013-08-15
An anaerobic, extremely thermophilic, and cellulose- and xylan-degrading bacterium F32 was isolated from biocompost. Sequence analysis of the 16S rRNA gene of this strain showed that it was closely related to Caldicellulosiruptor saccharolyticus DSM 8903 (99.0% identity). Physiological and biochemical data also supported that identification of strain F32 as a Caldicellulosiruptor species. The proteins secreted by Caldicellulosiruptor sp. F32 grown on xylan showed a xylanase activity of 7.74U/mg, which was 2.5 times higher than that of C. saccharolyticus DSM 8903. Based on the genomic sequencing data, 2 xylanase genes, JX030400 and JX030401, were identified in Caldicellulosiruptor sp. F32. The xylanase encoded by JX030401 shared 97% identity with Csac_0696 of C. saccharolyticus DSM 8903, while that encoded by JX030400 shared 94% identity with Athe_0089 of C. bescii DSM 6725, which was not found in the genome of strain DSM 8903. Xylanse encoded by JX030400 had 9-fold higher specific activity than JX030401. Our results indicated that although the 2 strains shared high identity, the xylanase system in Caldicellulosiruptor sp. F32 was more efficient than that in C. saccharolyticus DSM 8903. Copyright © 2013 Elsevier Inc. All rights reserved.
Complete genome sequence of a tomato infecting tomato mottle mosaic virus in New York
USDA-ARS?s Scientific Manuscript database
Complete genome sequence of an emerging isolate of tomato mottle mosaic virus (ToMMV) infecting experimental nicotianan benthamiana plants in up-state New York was obtained using small RNA deep sequencing. ToMMV_NY-13 shared 99% sequence identity to ToMMV isolates from Mexico and Florida. Broader d...
Complete genome sequences of Geobacillus sp. WCH70, a thermophilic strain isolated from wood compost
Brumm, Phillip; Land, Miriam L.; Mead, David
2016-04-27
Geobacillus sp. WCH70 was one of several thermophilic organisms isolated from hot composts in the Middleton, WI area. Comparison of 16 S rRNA sequences showed the strain may be a new species, and is most closely related to G. galactosidasius and G. toebii. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2009 (CP001638). The genome of Geobacillus species WCH70 consists of one circular chromosome of 3,893,306 bp with an average G + C content of 43 %, and two circular plasmids of 33,899 and 10,287 bp with anmore » average G + C content of 40 %. Among sequenced organisms, Geobacillus sp. WCH70 shares highest Average Nucleotide Identity (86 %) with G. thermoglucosidasius strains, as well as similar genome organization. Geobacillus sp. WCH70 appears to be a highly adaptable organism, with an exceptionally high 125 annotated transposons in the genome. The organism also possesses four predicted restriction-modification systems not found in other Geobacillus species.« less
Tan, Yann-Chong; Blum, Lisa K; Kongpachith, Sarah; Ju, Chia-Hsin; Cai, Xiaoyong; Lindstrom, Tamsin M; Sokolove, Jeremy; Robinson, William H
2014-03-01
We developed a DNA barcoding method to enable high-throughput sequencing of the cognate heavy- and light-chain pairs of the antibodies expressed by individual B cells. We used this approach to elucidate the plasmablast antibody response to influenza vaccination. We show that >75% of the rationally selected plasmablast antibodies bind and neutralize influenza, and that antibodies from clonal families, defined by sharing both heavy-chain VJ and light-chain VJ sequence usage, do so most effectively. Vaccine-induced heavy-chain VJ regions contained on average >20 nucleotide mutations as compared to their predicted germline gene sequences, and some vaccine-induced antibodies exhibited higher binding affinities for hemagglutinins derived from prior years' seasonal influenza as compared to their affinities for the immunization strains. Our results show that influenza vaccination induces the recall of memory B cells that express antibodies that previously underwent affinity maturation against prior years' seasonal influenza, suggesting that 'original antigenic sin' shapes the antibody response to influenza vaccination. Published by Elsevier Inc.
Complete genome sequences of Geobacillus sp. WCH70, a thermophilic strain isolated from wood compost
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brumm, Phillip; Land, Miriam L.; Mead, David
Geobacillus sp. WCH70 was one of several thermophilic organisms isolated from hot composts in the Middleton, WI area. Comparison of 16 S rRNA sequences showed the strain may be a new species, and is most closely related to G. galactosidasius and G. toebii. The genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute and deposited at the NCBI in December 2009 (CP001638). The genome of Geobacillus species WCH70 consists of one circular chromosome of 3,893,306 bp with an average G + C content of 43 %, and two circular plasmids of 33,899 and 10,287 bp with anmore » average G + C content of 40 %. Among sequenced organisms, Geobacillus sp. WCH70 shares highest Average Nucleotide Identity (86 %) with G. thermoglucosidasius strains, as well as similar genome organization. Geobacillus sp. WCH70 appears to be a highly adaptable organism, with an exceptionally high 125 annotated transposons in the genome. The organism also possesses four predicted restriction-modification systems not found in other Geobacillus species.« less
Bartlett, Sofia R; Grebely, Jason; Eltahla, Auda A; Reeves, Jacqueline D; Howe, Anita Y M; Miller, Veronica; Ceccherini-Silberstein, Francesca; Bull, Rowena A; Douglas, Mark W; Dore, Gregory J; Harrington, Patrick; Lloyd, Andrew R; Jacka, Brendan; Matthews, Gail V; Wang, Gary P; Pawlotsky, Jean-Michel; Feld, Jordan J; Schinkel, Janke; Garcia, Federico; Lennerstrand, Johan; Applegate, Tanya L
2017-07-01
The significance of the clinical impact of direct-acting antiviral (DAA) resistance-associated substitutions (RASs) in hepatitis C virus (HCV) on treatment failure is unclear. No standardized methods or guidelines for detection of DAA RASs in HCV exist. To facilitate further evaluations of the impact of DAA RASs in HCV, we conducted a systematic review of RAS sequencing protocols, compiled a comprehensive public library of sequencing primers, and provided expert guidance on the most appropriate methods to screen and identify RASs. The development of standardized RAS sequencing protocols is complicated due to a high genetic variability and the need for genotype- and subtype-specific protocols for multiple regions. We have identified several limitations of the available methods and have highlighted areas requiring further research and development. The development, validation, and sharing of standardized methods for all genotypes and subtypes should be a priority. ( Hepatology Communications 2017;1:379-390).
[Analysis of 4 clustered high risk acute flaccid paralysis cases in Shanxi Province in 2006].
Yan, Dong-mei; Zhang, Yong; Wang, Dong-yan
2010-04-01
Analysis of epidemiology of 4 clustered high risk acute flaccid paralysis(AFP) cases reported by Shanxi province in 2006 and VP1 gene characteristic for type III poliovirus isolated from the four AFP cases. Virus isolation and identification were conducted according to the 4th edition of WHO polio laboratory manual. The sequence of VP1 region were amplified and sequenced. The phylogenetic trees based on VP1 region were constructed. Three of four high risk AFP cases were suspected as vaccine associated paralysis poliomyelitis (VAPP), the onset date of them were close. VP1 sequencing of the four type III isolates revealed that the identity were 99.7%, 99.9%, 99.4% and 99.9% respectively compared with vaccine reference strain-BJOPV3. According to WHO criteria, the four isolates were identified as type III vaccine-related poliovirus. Phylogenetic analysis based on VP1 coding sequence showed that the four type III poliovirus were not related significantly. The type III poliovirus isolated from 3 suspected VAPP cases shared one nucleotide mutation at 2637 (C-->U), which result in the amino acid mutation from Val into Ala. The improvement of laboratory surveillance for clustered high risk AFP cases should be strengthened so as to detect and prevent poliovirus circulation timely.
Investigating the viral ecology of global bee communities with high-throughput metagenomics.
Galbraith, David A; Fuller, Zachary L; Ray, Allyson M; Brockmann, Axel; Frazier, Maryann; Gikungu, Mary W; Martinez, J Francisco Iturralde; Kapheim, Karen M; Kerby, Jeffrey T; Kocher, Sarah D; Losyev, Oleksiy; Muli, Elliud; Patch, Harland M; Rosa, Cristina; Sakamoto, Joyce M; Stanley, Scott; Vaudo, Anthony D; Grozinger, Christina M
2018-06-11
Bee viral ecology is a fascinating emerging area of research: viruses exert a range of effects on their hosts, exacerbate impacts of other environmental stressors, and, importantly, are readily shared across multiple bee species in a community. However, our understanding of bee viral communities is limited, as it is primarily derived from studies of North American and European Apis mellifera populations. Here, we examined viruses in populations of A. mellifera and 11 other bee species from 9 countries, across 4 continents and Oceania. We developed a novel pipeline to rapidly and inexpensively screen for bee viruses. This pipeline includes purification of encapsulated RNA/DNA viruses, sequence-independent amplification, high throughput sequencing, integrated assembly of contigs, and filtering to identify contigs specifically corresponding to viral sequences. We identified sequences for (+)ssRNA, (-)ssRNA, dsRNA, and ssDNA viruses. Overall, we found 127 contigs corresponding to novel viruses (i.e. previously not observed in bees), with 27 represented by >0.1% of the reads in a given sample, and 7 contained an RdRp or replicase sequence which could be used for robust phylogenetic analysis. This study provides a sequence-independent pipeline for viral metagenomics analysis, and greatly expands our understanding of the diversity of viruses found in bee communities.
Gruenstaeudl, Michael; Gerschler, Nico; Borsch, Thomas
2018-06-21
The sequencing and comparison of plastid genomes are becoming a standard method in plant genomics, and many researchers are using this approach to infer plant phylogenetic relationships. Due to the widespread availability of next-generation sequencing, plastid genome sequences are being generated at breakneck pace. This trend towards massive sequencing of plastid genomes highlights the need for standardized bioinformatic workflows. In particular, documentation and dissemination of the details of genome assembly, annotation, alignment and phylogenetic tree inference are needed, as these processes are highly sensitive to the choice of software and the precise settings used. Here, we present the procedure and results of sequencing, assembling, annotating and quality-checking of three complete plastid genomes of the aquatic plant genus Cabomba as well as subsequent gene alignment and phylogenetic tree inference. We accompany our findings by a detailed description of the bioinformatic workflow employed. Importantly, we share a total of eleven software scripts for each of these bioinformatic processes, enabling other researchers to evaluate and replicate our analyses step by step. The results of our analyses illustrate that the plastid genomes of Cabomba are highly conserved in both structure and gene content.
Obodo, Udochukwu C.; Epum, Esther A.; Platts, Margaret H.; Seloff, Jacob; Dahlson, Nicole A.; Velkovsky, Stoycho M.; Paul, Shira R.
2016-01-01
DNA double-strand breaks (DSBs) pose a threat to genome stability and are repaired through multiple mechanisms. Rarely, telomerase, the enzyme that maintains telomeres, acts upon a DSB in a mutagenic process termed telomere healing. The probability of telomere addition is increased at specific genomic sequences termed sites of repair-associated telomere addition (SiRTAs). By monitoring repair of an induced DSB, we show that SiRTAs on chromosomes V and IX share a bipartite structure in which a core sequence (Core) is directly targeted by telomerase, while a proximal sequence (Stim) enhances the probability of de novo telomere formation. The Stim and Core sequences are sufficient to confer a high frequency of telomere addition to an ectopic site. Cdc13, a single-stranded DNA binding protein that recruits telomerase to endogenous telomeres, is known to stimulate de novo telomere addition when artificially recruited to an induced DSB. Here we show that the ability of the Stim sequence to enhance de novo telomere addition correlates with its ability to bind Cdc13, indicating that natural sites at which telomere addition occurs at high frequency require binding by Cdc13 to a sequence 20 to 100 bp internal from the site at which telomerase acts to initiate de novo telomere addition. PMID:27044869
Habits as action sequences: hierarchical action control and changes in outcome value
Dezfouli, Amir; Lingawi, Nura W.; Balleine, Bernard W.
2014-01-01
Goal-directed action involves making high-level choices that are implemented using previously acquired action sequences to attain desired goals. Such a hierarchical schema is necessary for goal-directed actions to be scalable to real-life situations, but results in decision-making that is less flexible than when action sequences are unfolded and the decision-maker deliberates step-by-step over the outcome of each individual action. In particular, from this perspective, the offline revaluation of any outcomes that fall within action sequence boundaries will be invisible to the high-level planner resulting in decisions that are insensitive to such changes. Here, within the context of a two-stage decision-making task, we demonstrate that this property can explain the emergence of habits. Next, we show how this hierarchical account explains the insensitivity of over-trained actions to changes in outcome value. Finally, we provide new data that show that, under extended extinction conditions, habitual behaviour can revert to goal-directed control, presumably as a consequence of decomposing action sequences into single actions. This hierarchical view suggests that the development of action sequences and the insensitivity of actions to changes in outcome value are essentially two sides of the same coin, explaining why these two aspects of automatic behaviour involve a shared neural structure. PMID:25267824
NASA Astrophysics Data System (ADS)
González-Toril, E.; Amils, R.; Delmas, R. J.; Petit, J.-R.; Komárek, J.; Elster, J.
2008-04-01
Four different communities and one culture of pigmented microbial assemblages were obtained by incubation in mineral medium of samples collected from high elevation snow in the Alps (Mt. Blanc area) and the Andes (Nevado Illimani summit, Bolivia), from Antarctic aerosol (French station Dumont d'Urville) and a maritime Antarctic soil (King George Island, South Shetlands, Uruguay Station Artigas). Molecular analysis of more than 200 16S rRNA gene sequences showed that all cultured cells belong to the Bacteria domain. The phylogenetic comparison with the currently available rDNA database allowed the identification of sequences belonging to Proteobacteria (Alpha-, Beta- and Gamma-proteobacteria), Actinobacteria and Bacteroidetes phyla. The Andes snow culture was the richest in bacterial diversity (eight microorganisms identified) and the maritime Antarctic soil the poorest (only one). Snow samples from Col du midi (Alps) and the Andes shared the highest number of identified microorganisms (Agrobacterium, Limnobacter, Aquiflexus and two uncultured Alphaproteobacteria clones). These two sampling sites also shared four sequences with the Antarctic aerosol sample (Limnobacter, Pseudonocardia and an uncultured Alphaproteobacteria clone). The only microorganism identified in the maritime Antarctica soil (Brevundimonas sp.) was also detected in the Antarctic aerosol. The two snow samples from the Alps only shared one common microorganism. Most of the identified microorganisms have been detected previously in cold environments (Dietzia kujamenisi, Pseudonocardia Antarctica, Hydrogenophaga palleronii and Brebundimonas sp.), marine sediments (Aquiflexus balticus, Pseudomonas pseudoalkaligenes, Pseudomonas sp. and one uncultured Alphaproteobacteria), and soils and rocks (Pseudonocardia sp., Agrobactrium sp., Limnobacter sp. and two uncultured Alphaproteobacetria clones). Air current dispersal is the best model to explain the presence of very specific microorganisms, like those used in this work, in very distant environments. In addition these microorganisms have to be resistant to extreme conditions and able to grow in oligotrophic environments. Considering the habitats in which they have been identified, the presence of pigments must be related with their ability to resist high doses of radiation.
Simmons, Greg; Clarke, Daniel; McKee, Jeff; Young, Paul; Meers, Joanne
2014-01-01
Gibbon ape leukaemia virus (GALV) and koala retrovirus (KoRV) share a remarkably close sequence identity despite the fact that they occur in distantly related mammals on different continents. It has previously been suggested that infection of their respective hosts may have occurred as a result of a species jump from another, as yet unidentified vertebrate host. To investigate possible sources of these retroviruses in the Australian context, DNA samples were obtained from 42 vertebrate species and screened using PCR in order to detect proviral sequences closely related to KoRV and GALV. Four proviral partial sequences totalling 2880 bases which share a strong similarity with KoRV and GALV were detected in DNA from a native Australian rodent, the grassland melomys, Melomys burtoni. We have designated this novel gammaretrovirus Melomys burtoni retrovirus (MbRV). The concatenated nucleotide sequence of MbRV shares 93% identity with the corresponding sequence from GALV-SEATO and 83% identity with KoRV. The geographic ranges of the grassland melomys and of the koala partially overlap. Thus a species jump by MbRV from melomys to koalas is conceivable. However the genus Melomys does not occur in mainland South East Asia and so it appears most likely that another as yet unidentified host was the source of GALV.
Two new miniature inverted-repeat transposable elements in the genome of the clam Donax trunculus.
Šatović, Eva; Plohl, Miroslav
2017-10-01
Repetitive sequences are important components of eukaryotic genomes that drive their evolution. Among them are different types of mobile elements that share the ability to spread throughout the genome and form interspersed repeats. To broaden the generally scarce knowledge on bivalves at the genome level, in the clam Donax trunculus we described two new non-autonomous DNA transposons, miniature inverted-repeat transposable elements (MITEs), named DTC M1 and DTC M2. Like other MITEs, they are characterized by their small size, their A + T richness, and the presence of terminal inverted repeats (TIRs). DTC M1 and DTC M2 are 261 and 286 bp long, respectively, and in addition to TIRs, both of them contain a long imperfect palindrome sequence in their central parts. These elements are present in complete and truncated versions within the genome of the clam D. trunculus. The two new MITEs share only structural similarity, but lack any nucleotide sequence similarity to each other. In a search for related elements in databases, blast search revealed within the Crassostrea gigas genome a larger element sharing sequence similarity only to DTC M1 in its TIR sequences. The lack of sequence similarity with any previously published mobile elements indicates that DTC M1 and DTC M2 elements may be unique to D. trunculus.
Democratization of genetic data: connecting government approval of clinical tests with data sharing
Ross, Theodora S.
2015-01-01
Abstract When a doctor orders a genetic test, patients assume that the test will yield a useful result to guide how their physicians take care of them. That assumption is frequently correct, but not always. Until recently, a genetic test only interrogated the sequence of one or two genes. Now, DNA-sequencing technologies are so fast and cheap that they have enabled clinicians to sequence panels of genes that may or may not be relevant to the patient's condition. The technology has outpaced our ability to interpret the results. Connecting approval of clinical tests to data sharing could help close this gap. PMID:27148568
Using GBrowse 2.0 to visualize and share next-generation sequence data
2013-01-01
GBrowse is a mature web-based genome browser that is suitable for deployment on both public and private web sites. It supports most of genome browser features, including qualitative and quantitative (wiggle) tracks, track uploading, track sharing, interactive track configuration, semantic zooming and limited smooth track panning. As of version 2.0, GBrowse supports next-generation sequencing (NGS) data by providing for the direct display of SAM and BAM sequence alignment files. SAM/BAM tracks provide semantic zooming and support both local and remote data sources. This article provides step-by-step instructions for configuring GBrowse to display NGS data. PMID:23376193
Democratization of genetic data: connecting government approval of clinical tests with data sharing.
Ross, Theodora S
2015-10-01
When a doctor orders a genetic test, patients assume that the test will yield a useful result to guide how their physicians take care of them. That assumption is frequently correct, but not always. Until recently, a genetic test only interrogated the sequence of one or two genes. Now, DNA-sequencing technologies are so fast and cheap that they have enabled clinicians to sequence panels of genes that may or may not be relevant to the patient's condition. The technology has outpaced our ability to interpret the results. Connecting approval of clinical tests to data sharing could help close this gap.
Extraordinary Sequence Divergence at Tsga8, an X-linked Gene Involved in Mouse Spermiogenesis
Good, Jeffrey M.; Vanderpool, Dan; Smith, Kimberly L.; Nachman, Michael W.
2011-01-01
The X chromosome plays an important role in both adaptive evolution and speciation. We used a molecular evolutionary screen of X-linked genes potentially involved in reproductive isolation in mice to identify putative targets of recurrent positive selection. We then sequenced five very rapidly evolving genes within and between several closely related species of mice in the genus Mus. All five genes were involved in male reproduction and four of the genes showed evidence of recurrent positive selection. The most remarkable evolutionary patterns were found at Testis-specific gene a8 (Tsga8), a spermatogenesis-specific gene expressed during postmeiotic chromatin condensation and nuclear transformation. Tsga8 was characterized by extremely high levels of insertion–deletion variation of an alanine-rich repetitive motif in natural populations of Mus domesticus and M. musculus, differing in length from the reference mouse genome by up to 89 amino acids (27% of the total protein length). This population-level variation was coupled with striking divergence in protein sequence and length between closely related mouse species. Although no clear orthologs had previously been described for Tsga8 in other mammalian species, we have identified a highly divergent hypothetical gene on the rat X chromosome that shares clear orthology with the 5′ and 3′ ends of Tsga8. Further inspection of this ortholog verified that it is expressed in rat testis and shares remarkable similarity with mouse Tsga8 across several general features of the protein sequence despite no conservation of nucleotide sequence across over 60% of the rat-coding domain. Overall, Tsga8 appears to be one of the most rapidly evolving genes to have been described in rodents. We discuss the potential evolutionary causes and functional implications of this extraordinary divergence and the possible contribution of Tsga8 and the other four genes we examined to reproductive isolation in mice. PMID:21186189
Elrobh, Mohamed S.; Alanazi, Mohammad S.; Khan, Wajahatullah; Abduljaleel, Zainularifeen; Al-Amri, Abdullah; Bazzi, Mohammad D.
2011-01-01
Heat shock proteins are ubiquitous, induced under a number of environmental and metabolic stresses, with highly conserved DNA sequences among mammalian species. Camelus dromedaries (the Arabian camel) domesticated under semi-desert environments, is well adapted to tolerate and survive against severe drought and high temperatures for extended periods. This is the first report of molecular cloning and characterization of full length cDNA of encoding a putative stress-induced heat shock HSPA6 protein (also called HSP70B′) from Arabian camel. A full-length cDNA (2417 bp) was obtained by rapid amplification of cDNA ends (RACE) and cloned in pET-b expression vector. The sequence analysis of HSPA6 gene showed 1932 bp-long open reading frame encoding 643 amino acids. The complete cDNA sequence of the Arabian camel HSPA6 gene was submitted to NCBI GeneBank (accession number HQ214118.1). The BLAST analysis indicated that C. dromedaries HSPA6 gene nucleotides shared high similarity (77–91%) with heat shock gene nucleotide of other mammals. The deduced 643 amino acid sequences (accession number ADO12067.1) showed that the predicted protein has an estimated molecular weight of 70.5 kDa with a predicted isoelectric point (pI) of 6.0. The comparative analyses of camel HSPA6 protein sequences with other mammalian heat shock proteins (HSPs) showed high identity (80–94%). Predicted camel HSPA6 protein structure using Protein 3D structural analysis high similarities with human and mouse HSPs. Taken together, this study indicates that the cDNA sequences of HSPA6 gene and its amino acid and protein structure from the Arabian camel are highly conserved and have similarities with other mammalian species. PMID:21845074
Ma, Yuyuan; Lv, Maomin; Xu, Shu; Wu, Jianmin; Tian, Kegong; Zhang, Jingang
2010-07-01
Existence of porcine endogenous retrovirus (PERV) hinders pigs to be used in clinical xenotransplantation to alleviate the shortage of human transplants. Chinese miniature pigs are potential organ donors for xenotransplantation in China. However, so far, an adequate level of information on the molecular characteristics of PERV from Chinese miniature pigs has not been available. We described here the cloning and characterization of full-length proviral DNA of PERV from Chinese Wuzhishan miniature pigs inbred (WZSP). Full-length nucleotide sequences of PERV-WZSP and other PERVs were aligned and phylogenetic tree was constructed from deduced amino-acid sequences of env. The results demonstrated that the full-length proviral DNA of PERV-WZSP belongs to gammaretrovirus and shares high similarity with other PERVs. Sequence analysis also suggested that different patterns of LTR existed in the same porcine germ line and partial PERV-C sequence may recombine with PERV-A sequence in LTR. (c) 2008 Elsevier Ltd. All rights reserved.
Puli'uvea, Christopher; Khan, Subuhi; Chang, Wee-Leong; Valmonte, Gardette; Pearson, Michael N; Higgins, Colleen M
2017-02-01
We present the first complete genome of vanilla mosaic virus (VanMV). The VanMV genomic structure is consistent with that of a potyvirus, containing a single open reading frame (ORF) encoding a polyprotein of 3139 amino acids. Motif analyses indicate the polyprotein can be cleaved into the expected ten individual proteins; other recognised potyvirus motifs are also present. As expected, the VanMV genome shows high sequence similarity to the published Dasheen mosaic virus (DsMV) genome sequences; comparisons with DsMV continue to support VanMV as a vanilla infecting strain of DsMV. Phylogenetic analyses indicate that VanMV and DsMV share a common ancestor, with VanMV having the closest relationship with DsMV strains from the South Pacific.
The genome sequence of taurine cattle: a window to ruminant biology and evolution.
Elsik, Christine G; Tellam, Ross L; Worley, Kim C; Gibbs, Richard A; Muzny, Donna M; Weinstock, George M; Adelson, David L; Eichler, Evan E; Elnitski, Laura; Guigó, Roderic; Hamernik, Debora L; Kappes, Steve M; Lewin, Harris A; Lynn, David J; Nicholas, Frank W; Reymond, Alexandre; Rijnkels, Monique; Skow, Loren C; Zdobnov, Evgeny M; Schook, Lawrence; Womack, James; Alioto, Tyler; Antonarakis, Stylianos E; Astashyn, Alex; Chapple, Charles E; Chen, Hsiu-Chuan; Chrast, Jacqueline; Câmara, Francisco; Ermolaeva, Olga; Henrichsen, Charlotte N; Hlavina, Wratko; Kapustin, Yuri; Kiryutin, Boris; Kitts, Paul; Kokocinski, Felix; Landrum, Melissa; Maglott, Donna; Pruitt, Kim; Sapojnikov, Victor; Searle, Stephen M; Solovyev, Victor; Souvorov, Alexandre; Ucla, Catherine; Wyss, Carine; Anzola, Juan M; Gerlach, Daniel; Elhaik, Eran; Graur, Dan; Reese, Justin T; Edgar, Robert C; McEwan, John C; Payne, Gemma M; Raison, Joy M; Junier, Thomas; Kriventseva, Evgenia V; Eyras, Eduardo; Plass, Mireya; Donthu, Ravikiran; Larkin, Denis M; Reecy, James; Yang, Mary Q; Chen, Lin; Cheng, Ze; Chitko-McKown, Carol G; Liu, George E; Matukumalli, Lakshmi K; Song, Jiuzhou; Zhu, Bin; Bradley, Daniel G; Brinkman, Fiona S L; Lau, Lilian P L; Whiteside, Matthew D; Walker, Angela; Wheeler, Thomas T; Casey, Theresa; German, J Bruce; Lemay, Danielle G; Maqbool, Nauman J; Molenaar, Adrian J; Seo, Seongwon; Stothard, Paul; Baldwin, Cynthia L; Baxter, Rebecca; Brinkmeyer-Langford, Candice L; Brown, Wendy C; Childers, Christopher P; Connelley, Timothy; Ellis, Shirley A; Fritz, Krista; Glass, Elizabeth J; Herzig, Carolyn T A; Iivanainen, Antti; Lahmers, Kevin K; Bennett, Anna K; Dickens, C Michael; Gilbert, James G R; Hagen, Darren E; Salih, Hanni; Aerts, Jan; Caetano, Alexandre R; Dalrymple, Brian; Garcia, Jose Fernando; Gill, Clare A; Hiendleder, Stefan G; Memili, Erdogan; Spurlock, Diane; Williams, John L; Alexander, Lee; Brownstein, Michael J; Guan, Leluo; Holt, Robert A; Jones, Steven J M; Marra, Marco A; Moore, Richard; Moore, Stephen S; Roberts, Andy; Taniguchi, Masaaki; Waterman, Richard C; Chacko, Joseph; Chandrabose, Mimi M; Cree, Andy; Dao, Marvin Diep; Dinh, Huyen H; Gabisi, Ramatu Ayiesha; Hines, Sandra; Hume, Jennifer; Jhangiani, Shalini N; Joshi, Vandita; Kovar, Christie L; Lewis, Lora R; Liu, Yih-Shin; Lopez, John; Morgan, Margaret B; Nguyen, Ngoc Bich; Okwuonu, Geoffrey O; Ruiz, San Juana; Santibanez, Jireh; Wright, Rita A; Buhay, Christian; Ding, Yan; Dugan-Rocha, Shannon; Herdandez, Judith; Holder, Michael; Sabo, Aniko; Egan, Amy; Goodell, Jason; Wilczek-Boney, Katarzyna; Fowler, Gerald R; Hitchens, Matthew Edward; Lozado, Ryan J; Moen, Charles; Steffen, David; Warren, James T; Zhang, Jingkun; Chiu, Readman; Schein, Jacqueline E; Durbin, K James; Havlak, Paul; Jiang, Huaiyang; Liu, Yue; Qin, Xiang; Ren, Yanru; Shen, Yufeng; Song, Henry; Bell, Stephanie Nicole; Davis, Clay; Johnson, Angela Jolivet; Lee, Sandra; Nazareth, Lynne V; Patel, Bella Mayurkumar; Pu, Ling-Ling; Vattathil, Selina; Williams, Rex Lee; Curry, Stacey; Hamilton, Cerissa; Sodergren, Erica; Wheeler, David A; Barris, Wes; Bennett, Gary L; Eggen, André; Green, Ronnie D; Harhay, Gregory P; Hobbs, Matthew; Jann, Oliver; Keele, John W; Kent, Matthew P; Lien, Sigbjørn; McKay, Stephanie D; McWilliam, Sean; Ratnakumar, Abhirami; Schnabel, Robert D; Smith, Timothy; Snelling, Warren M; Sonstegard, Tad S; Stone, Roger T; Sugimoto, Yoshikazu; Takasuga, Akiko; Taylor, Jeremy F; Van Tassell, Curtis P; Macneil, Michael D; Abatepaulo, Antonio R R; Abbey, Colette A; Ahola, Virpi; Almeida, Iassudara G; Amadio, Ariel F; Anatriello, Elen; Bahadue, Suria M; Biase, Fernando H; Boldt, Clayton R; Carroll, Jeffery A; Carvalho, Wanessa A; Cervelatti, Eliane P; Chacko, Elsa; Chapin, Jennifer E; Cheng, Ye; Choi, Jungwoo; Colley, Adam J; de Campos, Tatiana A; De Donato, Marcos; Santos, Isabel K F de Miranda; de Oliveira, Carlo J F; Deobald, Heather; Devinoy, Eve; Donohue, Kaitlin E; Dovc, Peter; Eberlein, Annett; Fitzsimmons, Carolyn J; Franzin, Alessandra M; Garcia, Gustavo R; Genini, Sem; Gladney, Cody J; Grant, Jason R; Greaser, Marion L; Green, Jonathan A; Hadsell, Darryl L; Hakimov, Hatam A; Halgren, Rob; Harrow, Jennifer L; Hart, Elizabeth A; Hastings, Nicola; Hernandez, Marta; Hu, Zhi-Liang; Ingham, Aaron; Iso-Touru, Terhi; Jamis, Catherine; Jensen, Kirsty; Kapetis, Dimos; Kerr, Tovah; Khalil, Sari S; Khatib, Hasan; Kolbehdari, Davood; Kumar, Charu G; Kumar, Dinesh; Leach, Richard; Lee, Justin C-M; Li, Changxi; Logan, Krystin M; Malinverni, Roberto; Marques, Elisa; Martin, William F; Martins, Natalia F; Maruyama, Sandra R; Mazza, Raffaele; McLean, Kim L; Medrano, Juan F; Moreno, Barbara T; Moré, Daniela D; Muntean, Carl T; Nandakumar, Hari P; Nogueira, Marcelo F G; Olsaker, Ingrid; Pant, Sameer D; Panzitta, Francesca; Pastor, Rosemeire C P; Poli, Mario A; Poslusny, Nathan; Rachagani, Satyanarayana; Ranganathan, Shoba; Razpet, Andrej; Riggs, Penny K; Rincon, Gonzalo; Rodriguez-Osorio, Nelida; Rodriguez-Zas, Sandra L; Romero, Natasha E; Rosenwald, Anne; Sando, Lillian; Schmutz, Sheila M; Shen, Libing; Sherman, Laura; Southey, Bruce R; Lutzow, Ylva Strandberg; Sweedler, Jonathan V; Tammen, Imke; Telugu, Bhanu Prakash V L; Urbanski, Jennifer M; Utsunomiya, Yuri T; Verschoor, Chris P; Waardenberg, Ashley J; Wang, Zhiquan; Ward, Robert; Weikard, Rosemarie; Welsh, Thomas H; White, Stephen N; Wilming, Laurens G; Wunderlich, Kris R; Yang, Jianqi; Zhao, Feng-Qi
2009-04-24
To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
Farcy, Emilie; Serpentini, Antoine; Fiévet, Bruno; Lebel, Jean-Marc
2007-04-01
Heat-shock proteins are a multigene family of proteins whose expression is induced by a variety of stress factors. This work reports the cloning and sequencing of HSP70 and HSP90 cDNAs in the gastropod Haliotis tuberculata. The deduced amino acid sequences of both HSP70 and HSP90 from H. tuberculata shared a high degree of homology with their homologues in other species, including typical eukaryotic HSP70 and HSP90 signature sequences. We examined their transcription expression pattern in abalone hemocytes exposed to thermal stress. Real-time PCR analysis indicated that both HSP70 and HSP90 mRNA were expressed in control animals but rapidly increased after heat-shock.
2015-12-01
Read depth greater than 8 in at least one sample The Table below shows variant data from Family 1041 categorized by functional effect. Table 1...breast cancer Family 1041 . All Shared Rare Excluding IBD0 Intergenic 3,345,727 1,650,045 35,927 3,990 ncRNA 266,300 130,836 3,104 329 Up
Links, Matthew G; Demeke, Tigst; Gräfenhan, Tom; Hill, Janet E; Hemmingsen, Sean M; Dumonceaux, Tim J
2014-04-01
In order to address the hypothesis that seeds from ecologically and geographically diverse plants harbor characteristic epiphytic microbiota, we characterized the bacterial and fungal microbiota associated with Triticum and Brassica seed surfaces. The total microbial complement was determined by amplification and sequencing of a fragment of chaperonin 60 (cpn60). Specific microorganisms were quantified by qPCR. Bacteria and fungi corresponding to operational taxonomic units (OTU) that were identified in the sequencing study were isolated and their interactions examined. A total of 5477 OTU were observed from seed washes. Neither total epiphytic bacterial load nor community richness/evenness was significantly different between the seed types; 578 OTU were shared among all samples at a variety of abundances. Hierarchical clustering revealed that 203 were significantly different in abundance on Triticum seeds compared with Brassica. Microorganisms isolated from seeds showed 99-100% identity between the cpn60 sequences of the isolates and the OTU sequences from this shared microbiome. Bacterial strains identified as Pantoea agglomerans had antagonistic properties toward one of the fungal isolates (Alternaria sp.), providing a possible explanation for their reciprocal abundances on both Triticum and Brassica seeds. cpn60 enabled the simultaneous profiling of bacterial and fungal microbiota and revealed a core seed-associated microbiota shared between diverse plant genera. © 2014 AAFC. New Phytologist © 2014 New Phytologist Trust.
Links, Matthew G; Demeke, Tigst; Gräfenhan, Tom; Hill, Janet E; Hemmingsen, Sean M; Dumonceaux, Tim J
2014-01-01
In order to address the hypothesis that seeds from ecologically and geographically diverse plants harbor characteristic epiphytic microbiota, we characterized the bacterial and fungal microbiota associated with Triticum and Brassica seed surfaces. The total microbial complement was determined by amplification and sequencing of a fragment of chaperonin 60 (cpn60). Specific microorganisms were quantified by qPCR. Bacteria and fungi corresponding to operational taxonomic units (OTU) that were identified in the sequencing study were isolated and their interactions examined. A total of 5477 OTU were observed from seed washes. Neither total epiphytic bacterial load nor community richness/evenness was significantly different between the seed types; 578 OTU were shared among all samples at a variety of abundances. Hierarchical clustering revealed that 203 were significantly different in abundance on Triticum seeds compared with Brassica. Microorganisms isolated from seeds showed 99–100% identity between the cpn60 sequences of the isolates and the OTU sequences from this shared microbiome. Bacterial strains identified as Pantoea agglomerans had antagonistic properties toward one of the fungal isolates (Alternaria sp.), providing a possible explanation for their reciprocal abundances on both Triticum and Brassica seeds. cpn60 enabled the simultaneous profiling of bacterial and fungal microbiota and revealed a core seed-associated microbiota shared between diverse plant genera. PMID:24444052
Novel actin crosslinker superfamily member identified by a two step degenerate PCR procedure.
Byers, T J; Beggs, A H; McNally, E M; Kunkel, L M
1995-07-24
Actin-crosslinking proteins link F-actin into the bundles and networks that constitute the cytoskeleton. Dystrophin, beta-spectrin, alpha-actinin, ABP-120, ABP-280, and fimbrin share homologous actin-binding domains and comprise an actin crosslinker superfamily. We have identified a novel member of this superfamily (ACF7) using a degenerate primer-mediated PCR strategy that was optimized to resolve less-abundant superfamily sequences. The ACF7 gene is on human chromosome 1 and hybridizes to high molecular weight bands on northern blots. Sequence comparisons argue that ACF7 does not fit into one of the existing families, but represents a new class within the superfamily.
Structure and Function of Lipopolysaccharide Binding Protein
NASA Astrophysics Data System (ADS)
Schumann, Ralf R.; Leong, Steven R.; Flaggs, Gail W.; Gray, Patrick W.; Wright, Samuel D.; Mathison, John C.; Tobias, Peter S.; Ulevitch, Richard J.
1990-09-01
The primary structure of lipopolysaccharide binding protein (LBP), a trace plasma protein that binds to the lipid A moiety of bacterial lipopolysaccharides (LPSs), was deduced by sequencing cloned complementary DNA. LBP shares sequence identity with another LPS binding protein found in granulocytes, bactericidal/permeability-increasing protein, and with cholesterol ester transport protein of the plasma. LBP may control the response to LPS under physiologic conditions by forming high-affinity complexes with LPS that bind to monocytes and macrophages, which then secrete tumor necrosis factor. The identification of this pathway for LPS-induced monocyte stimulation may aid in the development of treatments for diseases in which Gram-negative sepsis or endotoxemia are involved.
Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) Webinar Series
The Sequencing Strategies for Population and Cancer Epidemiology Studies (SeqSPACE) Webinar Series provides an opportunity for our grantees and other interested individuals to share lessons learned and practical information regarding the application of next generation sequencing to cancer epidemiology studies.
Complete genome sequence of a new maize-associated cytorhabdovirus
USDA-ARS?s Scientific Manuscript database
A new 11,877 nt cytorhabdovirus sequence with 6 open reading frames has been identified in a maize sample. It shares 50 and 51% genome-wide nucleotide sequence identity with northern cereal mosaic cytorhabdovirus (NCMV) and barley yellow striate mosaic cytorhabdovirus (BYSMV), respectively....
Mizianty, Marcin J; Kurgan, Lukasz
2009-12-13
Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.
2009-01-01
Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/. PMID:20003388
Beta-globin locus activation regions: conservation of organization, structure, and function.
Li, Q L; Zhou, B; Powers, P; Enver, T; Stamatoyannopoulos, G
1990-01-01
The human beta-globin locus activation region (LAR) comprises four erythroid-specific DNase I hypersensitive sites (I-IV) thought to be largely responsible for activating the beta-globin domain and facilitating high-level erythroid-specific globin gene expression. We identified the goat beta-globin LAR, determined 10.2 kilobases of its sequence, and demonstrated its function in transgenic mice. The human and goat LARs share 6.5 kilobases of homologous sequences that are as highly conserved as the epsilon-globin gene promoters. Furthermore, the overall spatial organization of the two LARs has been conserved. These results suggest that the functionally relevant regions of the LAR are large and that in addition to their primary structure, the spatial relationship of the conserved elements is important for LAR function. Images PMID:2236034
USDA-ARS?s Scientific Manuscript database
The complete genome sequence (6,423 nt) of an emerging Cucumber green mottle mosaic virus (CGMMV) isolate on cucumber in North America was determined through deep sequencing of sRNA and rapid amplification of cDNA ends. It shares 99% nucleotide sequence identity to the Asian genotype, but only 90% t...
Guo, Wei; Li, Ying; Wang, Lizhi; Wang, Jiwen; Xu, Qin; Yan, Tianhai; Xue, Bai
2015-08-01
The Yak (Bos grunniens) is a unique species of ruminant animals that is important to agriculture of the Tibetan plateau, and has a complex intestinal microbial community. The objective of the present study was to characterize the composition and individual variability of microbiota in the rumen of yaks using 16S rRNA gene high-throughput sequencing technique. Rumen samples used in the present study were obtained from grazing adult male yaks (n = 6) in a commercial farm in Ganzi Autonomous Prefecture of Sichuan Province, China. Universal prokaryote primers were used to target the V4-V5 hypervariable region of 16S rRNA gene. A total of 7200 operational taxonomic units (OTUs) were obtained after sequence filtering and chimera removal. Within these OTUs, 0.56% belonged to Archaea (40 OTUs), 7.19% to unassigned species (518 OTUs), and the remaining OTUs (6642) in all samples were of bacterial origin. When examining the community structure of bacteria, we identified 23 phyla within 159 families after taxonomic summarization. Bacteroidetes and Firmicutes were the predominant phyla accounting for 39.68% (SD = 0.05) and 45.90% (SD = 0.06), respectively. Moreover, 3764 OTUs were identified as shared OTUs (i.e. represented in all yaks) and belonged to 35 genera, exhibiting highly variable abundance across individual samples. Phylogenetic placement of these genera across individual samples was examined. In addition, we evaluated the distance among the 6 rumen samples by adding taxon phylogeny using UniFrac, representing 24.1% of average distance. In summary, the current study reveals a shared rumen microbiome and phylogenetic lineage and presents novel information on composition and individual variability of the bacterial community in the rumen of yaks. Copyright © 2015. Published by Elsevier Ltd.
Zhou, Rong; Wang, Qian; Jiang, Fangling; Cao, Xue; Sun, Mintao; Liu, Min; Wu, Zhen
2016-01-01
MicroRNAs (miRNAs) are 19–24 nucleotide (nt) noncoding RNAs that play important roles in abiotic stress responses in plants. High temperatures have been the subject of considerable attention due to their negative effects on plant growth and development. Heat-responsive miRNAs have been identified in some plants. However, there have been no reports on the global identification of miRNAs and their targets in tomato at high temperatures, especially at different elevated temperatures. Here, three small-RNA libraries and three degradome libraries were constructed from the leaves of the heat-tolerant tomato at normal, moderately and acutely elevated temperatures (26/18 °C, 33/33 °C and 40/40 °C, respectively). Following high-throughput sequencing, 662 conserved and 97 novel miRNAs were identified in total with 469 conserved and 91 novel miRNAs shared in the three small-RNA libraries. Of these miRNAs, 96 and 150 miRNAs were responsive to the moderately and acutely elevated temperature, respectively. Following degradome sequencing, 349 sequences were identified as targets of 138 conserved miRNAs, and 13 sequences were identified as targets of eight novel miRNAs. The expression levels of seven miRNAs and six target genes obtained by quantitative real-time PCR (qRT-PCR) were largely consistent with the sequencing results. This study enriches the number of heat-responsive miRNAs and lays a foundation for the elucidation of the miRNA-mediated regulatory mechanism in tomatoes at elevated temperatures. PMID:27653374
2012-01-01
Background MicroRNAs (miRNAs) are small RNAs (21-24 bp) providing an RNA-based system of gene regulation highly conserved in plants and animals. In plants, miRNAs control mRNA degradation or restrain translation, affecting development and responses to stresses. Plant miRNAs show imperfect but extensive complementarity to mRNA targets, making their computational prediction possible, useful when data mining is applied on different species. In this study we used a comparative approach to identify both miRNAs and their targets, in artichoke and safflower. Results Two complete expressed sequence tags (ESTs) datasets from artichoke (3.6·104 entries) and safflower (4.2·104), were analysed with a bioinformatic pipeline and in vitro experiments, identifying 17 potential miRNAs. For each EST, using RNAhybrid program and 953 non redundant miRNA mature sequences, available in mirBase as reference, we searched matching putative targets. 8730 out of 42011 ESTs from safflower and 7145 of 36323 ESTs from artichoke showed at least one predicted miRNA target. BLAST analysis showed that 75% of all ESTs shared at least a common homologous region (E-value < 10-4) and about 50% of these displayed 400 bp or longer aligned sequences as conserved homologous/orthologous (COS) regions. 960 and 890 ESTs of safflower and artichoke organized in COS shared 79 different miRNA targets, considered functionally conserved, and statistically significant when compared with random sequences (signal to noise ratio > 2 and specificity ≥ 0.85). Four highly significant miRNAs selected from in silico data were experimentally validated in globe artichoke leaves. Conclusions Mature miRNAs and targets were predicted within EST sequences of safflower and artichoke. Most of the miRNA targets appeared highly/moderately conserved, highlighting an important and conserved function. In this study we introduce a stringent parameter for the comparative sequence analysis, represented by the identification of the same target in the COS region. After statistical analysis 79 targets, found on the COS regions and belonging to 60 miRNA families, have a signal to noise ratio > 2, with ≥ 0.85 specificity. The putative miRNAs identified belong to 55 dicotyledon plants and to 24 families only in monocotyledon. PMID:22536958
Boutte, Julien; Aliaga, Benoît; Lima, Oscar; Ferreira de Carvalho, Julie; Ainouche, Abdelkader; Macas, Jiri; Rousseau-Gueutin, Mathieu; Coriton, Olivier; Ainouche, Malika; Salmon, Armel
2015-01-01
Gene and whole-genome duplications are widespread in plant nuclear genomes, resulting in sequence heterogeneity. Identification of duplicated genes may be particularly challenging in highly redundant genomes, especially when there are no diploid parents as a reference. Here, we developed a pipeline to detect the different copies in the ribosomal RNA gene family in the hexaploid grass Spartina maritima from next-generation sequencing (Roche-454) reads. The heterogeneity of the different domains of the highly repeated 45S unit was explored by identifying single nucleotide polymorphisms (SNPs) and assembling reads based on shared polymorphisms. SNPs were validated using comparisons with Illumina sequence data sets and by cloning and Sanger (re)sequencing. Using this approach, 29 validated polymorphisms and 11 validated haplotypes were reported (out of 34 and 20, respectively, that were initially predicted by our program). The rDNA domains of S. maritima have similar lengths as those found in other Poaceae, apart from the 5′-ETS, which is approximately two-times longer in S. maritima. Sequence homogeneity was encountered in coding regions and both internal transcribed spacers (ITS), whereas high intragenomic variability was detected in the intergenic spacer (IGS) and the external transcribed spacer (ETS). Molecular cytogenetic analysis by fluorescent in situ hybridization (FISH) revealed the presence of one pair of 45S rDNA signals on the chromosomes of S. maritima instead of three expected pairs for a hexaploid genome, indicating loss of duplicated homeologous loci through the diploidization process. The procedure developed here may be used at any ploidy level and using different sequencing technologies. PMID:26530424
Rasheeda, M K; Rangamaran, Vijaya Raghavan; Srinivasan, Senthilkumar; Ramaiah, Sendhil Kumar; Gunasekaran, Rajaprabhu; Jaypal, Santhanakumar; Gopal, Dharani; Ramalingam, Kirubagaran
2017-08-01
The present study was undertaken to evaluate the microbial composition of farmed cobia pompano and milkfish, reared in sea-cages by culture-independent methods. This study would serve as a basis for assessing the general health of fish, identifying the dominant bacterial species present in the gut for future probiotic work and in early detection of potential pathogens. High-throughput sequencing of V3-V4 hyper variable regions of 16S rDNA on Illumina MiSeq platform facilitated unravelling of composite bacterial population. Analysis of 1.3 million quality-filtered sequences revealed high microbial diversity. Characteristic marine fish gut microbes: Vibrio and Photobacterium spp. showed prevalence in cobia and pompano whereas Pelomonas and Fusobacterium spp. dominated the gut of milkfish. Pompano hindgut with 10,537 operational taxonomy units (OTUs) exhibited the highest alpha-diversity index followed by cobia (10,435) and milkfish (2799). Additionally unique and shared OTUs in each gut type were identified. Gammaproteobacteria dominated in cobia and pompano while Betaproteobacteria showed prevalence in milkfish. We obtained 96 shared OTUs among the three species though the numbers of reads were highly variable. These differences in microbiota of farmed fish reared in same environment were presumably due to differences in the gut morphology, physiological behavior and host specificity. Copyright © 2017 Elsevier B.V. All rights reserved.
A Comprehensive Strategy for Accurate Mutation Detection of the Highly Homologous PMS2.
Li, Jianli; Dai, Hongzheng; Feng, Yanming; Tang, Jia; Chen, Stella; Tian, Xia; Gorman, Elizabeth; Schmitt, Eric S; Hansen, Terah A A; Wang, Jing; Plon, Sharon E; Zhang, Victor Wei; Wong, Lee-Jun C
2015-09-01
Germline mutations in the DNA mismatch repair gene PMS2 underlie the cancer susceptibility syndrome, Lynch syndrome. However, accurate molecular testing of PMS2 is complicated by a large number of highly homologous sequences. To establish a comprehensive approach for mutation detection of PMS2, we have designed a strategy combining targeted capture next-generation sequencing (NGS), multiplex ligation-dependent probe amplification, and long-range PCR followed by NGS to simultaneously detect point mutations and copy number changes of PMS2. Exonic deletions (E2 to E9, E5 to E9, E8, E10, E14, and E1 to E15), duplications (E11 to E12), and a nonsense mutation, p.S22*, were identified. Traditional multiplex ligation-dependent probe amplification and Sanger sequencing approaches cannot differentiate the origin of the exonic deletions in the 3' region when PMS2 and PMS2CL share identical sequences as a result of gene conversion. Our approach allows unambiguous identification of mutations in the active gene with a straightforward long-range-PCR/NGS method. Breakpoint analysis of multiple samples revealed that recurrent exon 14 deletions are mediated by homologous Alu sequences. Our comprehensive approach provides a reliable tool for accurate molecular analysis of genes containing multiple copies of highly homologous sequences and should improve PMS2 molecular analysis for patients with Lynch syndrome. Copyright © 2015 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Van Borm, S; Vangeluwe, D; Steensels, M; Poncin, O; van den Berg, T; Lambrecht, B
2011-12-01
As part of a long-term wild bird monitoring programme, five different low pathogenic (LP) avian influenza viruses (AIVs) were isolated from wild mallards (subtypes H1N1, H4N6, H5N1, H5N3, and H10N7). A LP H5N1 and two co-circulating (same location, same time period) viruses were selected for full genome sequencing. An H1N1 (A/Anas platyrhynchos/Belgium/09-762/2008) and an H5N1 virus (A/Anas platyrhynchos/Belgium/09-762-P1/2008) were isolated on the same day in November 2008, then an H5N3 virus (A/Anas platyrhynchos/09-884/2008) 5 days later in December 2008. All genes of these co-circulating viruses shared common ancestors with recent (2001 to 2007) European wild waterfowl influenza viruses. The H5N1 virus shares genome segments with both the H1N1 (PB1, NA, M) and the H5N3 (PB2, HA) viruses, and all three viruses share the same NS sequence. A double infection with two different PA segments from H5N1 and from H5N3 could be observed for the H1N1 sample. The observed gene constellations resulted from multiple reassortment events between viruses circulating in wild birds in Eurasia. Several internal gene segments from these 2008 viruses and the N3 sequence from the H5N3 show homology with sequences from 2003 H7 outbreaks in Italy (LP) and the Netherlands (highly pathogenic). These data contribute to the growing sequence evidence of the dynamic nature of the avian influenza natural reservoir in Eurasia, and underline the importance of monitoring AIV in wild birds. Genetic information of potential hazard to commercial poultry continues to circulate in this reservoir, including H5 and H7 subtype viruses and genes related to previous AIV outbreaks.
Abécassis, V; Pompon, D; Truan, G
2000-10-15
The design of a family shuffling strategy (CLERY: Combinatorial Libraries Enhanced by Recombination in Yeast) associating PCR-based and in vivo recombination and expression in yeast is described. This strategy was tested using human cytochrome P450 CYP1A1 and CYP1A2 as templates, which share 74% nucleotide sequence identity. Construction of highly shuffled libraries of mosaic structures and reduction of parental gene contamination were two major goals. Library characterization involved multiprobe hybridization on DNA macro-arrays. The statistical analysis of randomly selected clones revealed a high proportion of chimeric genes (86%) and a homogeneous representation of the parental contribution among the sequences (55.8 +/- 2.5% for parental sequence 1A2). A microtiter plate screening system was designed to achieve colorimetric detection of polycyclic hydrocarbon hydroxylation by transformed yeast cells. Full sequences of five randomly picked and five functionally selected clones were analyzed. Results confirmed the shuffling efficiency and allowed calculation of the average length of sequence exchange and mutation rates. The efficient and statistically representative generation of mosaic structures by this type of family shuffling in a yeast expression system constitutes a novel and promising tool for structure-function studies and tuning enzymatic activities of multicomponent eucaryote complexes involving non-soluble enzymes.
Southan, Christopher; Cutler, Paul; Birrell, Helen; Connell, John; Fantom, Kenneth G M; Sims, Matthew; Shaikh, Narjis; Schneider, Klaus
2002-02-01
A proteomic study of rat urine was undertaken using two-dimensional gel electrophoresis, microbore high performance liquid chromatography, mass spectrometry and N-terminal sequencing. Five known urinary proteins were identified but two novel peptide fragments matched a large number of rat expressed sequence tags (ESTs) from a liver library. By combining protein chemical and nucleotide data, two 101-residue open reading frames with 90% amino acid identity were determined, rat urinary protein 1 (RUP-1) and RUP-2. The data established signal peptide removal and provided evidence for N-glycosylation. A third related sequence, rat spleen protein (RSP-1) was confirmed from EST searches. These three proteins have been submitted to SWISS-PROT as P81827, P81828 and Q9QXN2, respectively. A fourth novel homologue was found in porcine and bovine ESTs from embryo libraries. Alignment with known homologues showed conserved cysteine positions characteristic of a secreted subfamily of Ly-6 proteins. In two cases, antineoplastic urinary protein and caltrin, these homologues have unverified functional annotations. The RUP sequences showed high scoring matches to three unrelated rat mRNAs subsequently established to be chimeric. Two of these share extended sectional identity to RUP-1 but the third may represent another novel Ly-6 homologue. These chimeras have caused serious annotation errors in secondary databases.
Fowler, Elizabeth V; Peters, Jennifer M; Gatton, Michelle L; Chen, Nanhua; Cheng, Qin
2002-03-01
In Plasmodium falciparum a highly polymorphic multi-copy gene family, var, encodes the variant surface antigen P. falciparum erythrocyte membrane protein 1 (PfEMP1), which has an important role in cytoadherence and immune evasion. Using previously described universal PCR primers for the first Duffy binding-like domain (DBLalpha) of var we analysed the DBLalpha repertoires of Dd2 (originally from Thailand) and eight isolates from the Solomon Islands (n=4), Philippines (n=2), Papua New Guinea (n=1) and Africa (n=1). We found 15-32 unique DBLalpha sequence types among these isolates and estimated detectable DBLalpha repertoire sizes ranging from 33-38 to 52-57 copies per genome. Our data suggest that var gene repertoires generally consist of 40-50 copies per genome. Eighteen DBLalpha sequences appeared in more than one Asia-Pacific isolate with the number of sequences shared between any two isolates ranging from 0 to 6 (mean=2.0 +/-1.6). At the amino acid level DBLalpha sequence similarity within isolates ranged from 45.2 +/- 7.1 to 50.2 +/- 6.9%, and was not significantly different from the DBLalpha amino acid sequence similarity among isolates (P>0.1). Comparisons with published sequences also revealed little overlap among DBLalpha sequences from different regions. High DBLalpha sequence diversity and minimal overlap among these isolates suggest that the global var gene repertoire is immense, and may potentially be selected for by the host's protective immune response to the var gene products, PfEMP1.
Zhao, D; Slaghekke, F; Middeldorp, J M; Duan, T; Oepkes, D; Lopriore, E
2014-12-01
Twin anemia-polycythemia sequence (TAPS) is a newly described form of chronic twin transfusion. Previous observational studies noted a discordance between birth weight and individual placental share in TAPS. The purpose of this study was to investigate if fetal growth in monochorionic (MC) twins with TAPS is determined by placental share or by the net inter-twin blood transfusion. All consecutive MC twin placentas of live-born twin pairs with and without TAPS examined at our center between June 2002 and February 2014 were included in this study. Hemoglobin (Hb) levels and individual placental share were evaluated at birth and correlated with birth weight share. We excluded MC twin pregnancies with twin-twin transfusion syndrome. A total of 270 MC twin pregnancies (TAPS group, n = 20; control group without TAPS, n = 250) were included in this study. Donors with TAPS had a lower birth weight than recipients in 90% (18/20) of cases, but a larger placental share in 65% (13/20) of cases. In the TAPS group, birth weight share was positively correlated with Hb share at birth (P < 0.01) but not with placental share (P = 0.54). In the control group without TAPS, birth weight share was strongly correlated with placental share (P < 0.01) but not with Hb share (P = 0.14). A relatively larger placental share may enable the survival of the anemic twin in TAPS. In contrast with uncomplicated MC twins, fetal growth in MC twins with TAPS is determined primarily by the net inter-twin blood transfusion instead of placental share. Copyright © 2014 Elsevier Ltd. All rights reserved.
Seepiban, Channarong; Charoenvilaisiri, Saengsoon; Warin, Nuchnard; Bhunchoth, Anjana; Phironrit, Namthip; Phuangrat, Bencharong; Chatchawankanphanich, Orawan; Attathom, Supat; Gajanandana, Oraprapai
2017-05-30
Tomato yellow leaf curl Thailand virus, TYLCTHV, is a begomovirus that causes severe losses of tomato crops in Thailand as well as several countries in Southeast and East Asia. The development of monoclonal antibodies (MAbs) and serological methods for detecting TYLCTHV is essential for epidemiological studies and screening for virus-resistant cultivars. The recombinant coat protein (CP) of TYLCTHV was expressed in Escherichia coli and used to generate MAbs against TYLCTHV through hybridoma technology. The MAbs were characterized and optimized to develop triple antibody sandwich enzyme-linked immunosorbent assays (TAS-ELISAs) for begomovirus detection. The efficiency of TAS-ELISAs for begomovirus detection was evaluated with tomato, pepper, eggplant, okra and cucurbit plants collected from several provinces in Thailand. Molecular identification of begomoviruses in these samples was also performed through PCR and DNA sequence analysis of the CP gene. Two MAbs (M1 and D2) were generated and used to develop TAS-ELISAs for begomovirus detection. The results of begomovirus detection in 147 field samples indicated that MAb M1 reacted with 2 begomovirus species, TYLCTHV and Tobacco leaf curl Yunnan virus (TbLCYnV), whereas MAb D2 reacted with 4 begomovirus species, TYLCTHV, TbLCYnV, Tomato leaf curl New Delhi virus (ToLCNDV) and Squash leaf curl China virus (SLCCNV). Phylogenetic analyses of CP amino acid sequences from these begomoviruses revealed that the CP sequences of begomoviruses recognized by the narrow-spectrum MAb M1 were highly conserved, sharing 93% identity with each other but only 72-81% identity with MAb M1-negative begomoviruses. The CP sequences of begomoviruses recognized by the broad-spectrum MAb D2 demonstrated a wider range of amino acid sequence identity, sharing 78-96% identity with each other and 72-91% identity with those that were not detected by MAb D2. TAS-ELISAs using the narrow-specificity MAb M1 proved highly efficient for the detection of TYLCTHV and TbLCYnV, whereas TAS-ELISAs using the broad-specificity MAb D2 were highly efficient for the detection of TYLCTHV, TbLCYnV, ToLCNDV and SLCCNV. Both newly developed assays allow for sensitive, inexpensive, high-throughput detection of begomoviruses in field plant samples, as well as screening for virus-resistant cultivars.
NABIC: A New Access Portal to Search, Visualize, and Share Agricultural Genomics Data.
Seol, Young-Joo; Lee, Tae-Ho; Park, Dong-Suk; Kim, Chang-Kug
2016-01-01
The National Agricultural Biotechnology Information Center developed an access portal to search, visualize, and share agricultural genomics data with a focus on South Korean information and resources. The portal features an agricultural biotechnology database containing a wide range of omics data from public and proprietary sources. We collected 28.4 TB of data from 162 agricultural organisms, with 10 types of omics data comprising next-generation sequencing sequence read archive, genome, gene, nucleotide, DNA chip, expressed sequence tag, interactome, protein structure, molecular marker, and single-nucleotide polymorphism datasets. Our genomic resources contain information on five animals, seven plants, and one fungus, which is accessed through a genome browser. We also developed a data submission and analysis system as a web service, with easy-to-use functions and cutting-edge algorithms, including those for handling next-generation sequencing data.
FoxP2 in song-learning birds and vocal-learning mammals.
Webb, D M; Zhang, J
2005-01-01
FoxP2 is the first identified gene that is specifically involved in speech and language development in humans. Population genetic studies of FoxP2 revealed a selective sweep in recent human history associated with two amino acid substitutions in exon 7. Avian song learning and human language acquisition share many behavioral and neurological similarities. To determine whether FoxP2 plays a similar role in song-learning birds, we sequenced exon 7 of FoxP2 in multiple song-learning and nonlearning birds. We show extreme conservation of FoxP2 sequences in birds, including unusually low rates of synonymous substitutions. However, no amino acid substitutions are shared between the song-learning birds and humans. Furthermore, sequences from vocal-learning whales, dolphins, and bats do not share the human-unique substitutions. While FoxP2 appears to be under strong functional constraints in mammals and birds, we find no evidence for its role during the evolution of vocal learning in nonhuman animals as in humans.
Mutual coordination strengthens the sense of joint agency in cooperative joint action.
Bolt, Nicole K; Poncelet, Evan M; Schultz, Benjamin G; Loehr, Janeen D
2016-11-01
Philosophers have proposed that when people coordinate their actions with others they may experience a sense of joint agency, or shared control over actions and their effects. However, little empirical work has investigated the sense of joint agency. In the current study, pairs coordinated their actions to produce tone sequences and then rated their sense of joint agency on a scale ranging from shared to independent control. People felt more shared than independent control overall, confirming that people experience joint agency during joint action. Furthermore, people felt stronger joint agency when they (a) produced sequences that required mutual coordination compared to sequences in which only one partner had to coordinate with the other, (b) held the role of follower compared to leader, and (c) were better coordinated with their partner. Thus, the strength of joint agency is influenced by the degree to which people mutually coordinate with each other's actions. Copyright © 2016 Elsevier Inc. All rights reserved.
Evidence for Ancient Origins of Bowman-Birk Inhibitors from Selaginella moellendorffii
James, Amy M.; Jayasena, Achala S.; Zhang, Jingjing; Secco, David; Knott, Gavin J.; Whelan, James
2017-01-01
Bowman-Birk Inhibitors (BBIs) are a well-known family of plant protease inhibitors first described 70 years ago. BBIs are known only in the legume (Fabaceae) and cereal (Poaceae) families, but peptides that mimic their trypsin-inhibitory loops exist in sunflowers (Helianthus annuus) and frogs. The disparate biosynthetic origins and distant phylogenetic distribution implies these loops evolved independently, but their structural similarity suggests a common ancestor. Targeted bioinformatic searches for the BBI inhibitory loop discovered highly divergent BBI-like sequences in the seedless, vascular spikemoss Selaginella moellendorffii. Using de novo transcriptomics, we confirmed expression of five transcripts in S. moellendorffii whose encoded proteins share homology with BBI inhibitory loops. The most highly expressed, BBI3, encodes a protein that inhibits trypsin. We needed to mutate two lysine residues to abolish trypsin inhibition, suggesting BBI3’s mechanism of double-headed inhibition is shared with BBIs from angiosperms. As Selaginella belongs to the lycopod plant lineage, which diverged ∼200 to 230 million years before the common ancestor of angiosperms, its BBI-like proteins imply there was a common ancestor for legume and cereal BBIs. Indeed, we discovered BBI sequences in six angiosperm families outside the Fabaceae and Poaceae. These findings provide the evolutionary missing links between the well-known legume and cereal BBI gene families. PMID:28298518
A novel, privacy-preserving cryptographic approach for sharing sequencing data
Cassa, Christopher A; Miller, Rachel A; Mandl, Kenneth D
2013-01-01
Objective DNA samples are often processed and sequenced in facilities external to the point of collection. These samples are routinely labeled with patient identifiers or pseudonyms, allowing for potential linkage to identity and private clinical information if intercepted during transmission. We present a cryptographic scheme to securely transmit externally generated sequence data which does not require any patient identifiers, public key infrastructure, or the transmission of passwords. Materials and methods This novel encryption scheme cryptographically protects participant sequence data using a shared secret key that is derived from a unique subset of an individual’s genetic sequence. This scheme requires access to a subset of an individual’s genetic sequence to acquire full access to the transmitted sequence data, which helps to prevent sample mismatch. Results We validate that the proposed encryption scheme is robust to sequencing errors, population uniqueness, and sibling disambiguation, and provides sufficient cryptographic key space. Discussion Access to a set of an individual’s genotypes and a mutually agreed cryptographic seed is needed to unlock the full sequence, which provides additional sample authentication and authorization security. We present modest fixed and marginal costs to implement this transmission architecture. Conclusions It is possible for genomics researchers who sequence participant samples externally to protect the transmission of sequence data using unique features of an individual’s genetic sequence. PMID:23125421
Wonczak, Stephan; Thiele, Holger; Nieroda, Lech; Jabbari, Kamel; Borowski, Stefan; Sinha, Vishal; Gunia, Wilfried; Lang, Ulrich; Achter, Viktor; Nürnberg, Peter
2015-01-01
Next generation sequencing (NGS) has been a great success and is now a standard method of research in the life sciences. With this technology, dozens of whole genomes or hundreds of exomes can be sequenced in rather short time, producing huge amounts of data. Complex bioinformatics analyses are required to turn these data into scientific findings. In order to run these analyses fast, automated workflows implemented on high performance computers are state of the art. While providing sufficient compute power and storage to meet the NGS data challenge, high performance computing (HPC) systems require special care when utilized for high throughput processing. This is especially true if the HPC system is shared by different users. Here, stability, robustness and maintainability are as important for automated workflows as speed and throughput. To achieve all of these aims, dedicated solutions have to be developed. In this paper, we present the tricks and twists that we utilized in the implementation of our exome data processing workflow. It may serve as a guideline for other high throughput data analysis projects using a similar infrastructure. The code implementing our solutions is provided in the supporting information files. PMID:25942438
[Cloning and characterization of Caveolin-1 gene in pigeon, Columba livia domestica].
Zhang, Ying; Yu, Jian-Feng; Yang, Li; Wang, Xing-Guo; Gu, Zhi-Liang
2010-10-01
Caveolins, a class of principal proteins forming the structure of caveolae in plasmalemma, were encoded by caveolins gene family. Caveolin-1 gene is a member of caveolins gene family. In the present study, a full-length of 2605 bp caveolin-1 cDNA sequence in Columba livia domestica, which included a 537 bp complete ORF encoding a 178 amino acids long putative peptide, were obtained by using RT-PCR and RACE technique. The Columba livia domestica caveolin-1 CDS shared 80.1% - 93.4% homology with Bos taurus, Canis lupus familiaris, Gallus gallus and Rattus norvegicus. Meanwhile, the putative amino acid sequence of Columba livia domestica caveolin-1 shared 85.4% - 97.2% homology with the above species. The semi-quantity RT-PCR revealed that Caveolin-1 expressions were detectable in all the Columba livia domestica tissues and the expressional level of caveolin-1 gene was high in adipose, medium in various muscles, low in liver. These results demonstrated that Caveolin-1 gene was potentially involved in some metabolic pathways in adipose and muscle.
Secure distributed genome analysis for GWAS and sequence comparison computation.
Zhang, Yihua; Blanton, Marina; Almashaqbeh, Ghada
2015-01-01
The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice.
Secure distributed genome analysis for GWAS and sequence comparison computation
2015-01-01
Background The rapid increase in the availability and volume of genomic data makes significant advances in biomedical research possible, but sharing of genomic data poses challenges due to the highly sensitive nature of such data. To address the challenges, a competition for secure distributed processing of genomic data was organized by the iDASH research center. Methods In this work we propose techniques for securing computation with real-life genomic data for minor allele frequency and chi-squared statistics computation, as well as distance computation between two genomic sequences, as specified by the iDASH competition tasks. We put forward novel optimizations, including a generalization of a version of mergesort, which might be of independent interest. Results We provide implementation results of our techniques based on secret sharing that demonstrate practicality of the suggested protocols and also report on performance improvements due to our optimization techniques. Conclusions This work describes our techniques, findings, and experimental results developed and obtained as part of iDASH 2015 research competition to secure real-life genomic computations and shows feasibility of securely computing with genomic data in practice. PMID:26733307
Insight into the evolution of the Solanaceae from the parental genomes of Petunia hybrida.
Bombarely, Aureliano; Moser, Michel; Amrad, Avichai; Bao, Manzhu; Bapaume, Laure; Barry, Cornelius S; Bliek, Mattijs; Boersma, Maaike R; Borghi, Lorenzo; Bruggmann, Rémy; Bucher, Marcel; D'Agostino, Nunzio; Davies, Kevin; Druege, Uwe; Dudareva, Natalia; Egea-Cortines, Marcos; Delledonne, Massimo; Fernandez-Pozo, Noe; Franken, Philipp; Grandont, Laurie; Heslop-Harrison, J S; Hintzsche, Jennifer; Johns, Mitrick; Koes, Ronald; Lv, Xiaodan; Lyons, Eric; Malla, Diwa; Martinoia, Enrico; Mattson, Neil S; Morel, Patrice; Mueller, Lukas A; Muhlemann, Joëlle; Nouri, Eva; Passeri, Valentina; Pezzotti, Mario; Qi, Qinzhou; Reinhardt, Didier; Rich, Melanie; Richert-Pöggeler, Katja R; Robbins, Tim P; Schatz, Michael C; Schranz, M Eric; Schuurink, Robert C; Schwarzacher, Trude; Spelt, Kees; Tang, Haibao; Urbanus, Susan L; Vandenbussche, Michiel; Vijverberg, Kitty; Villarino, Gonzalo H; Warner, Ryan M; Weiss, Julia; Yue, Zhen; Zethof, Jan; Quattrocchio, Francesca; Sims, Thomas L; Kuhlemeier, Cris
2016-05-27
Petunia hybrida is a popular bedding plant that has a long history as a genetic model system. We report the whole-genome sequencing and assembly of inbred derivatives of its two wild parents, P. axillaris N and P. inflata S6. The assemblies include 91.3% and 90.2% coverage of their diploid genomes (1.4 Gb; 2n = 14) containing 32,928 and 36,697 protein-coding genes, respectively. The genomes reveal that the Petunia lineage has experienced at least two rounds of hexaploidization: the older gamma event, which is shared with most Eudicots, and a more recent Solanaceae event that is shared with tomato and other solanaceous species. Transcription factors involved in the shift from bee to moth pollination reside in particularly dynamic regions of the genome, which may have been key to the remarkable diversity of floral colour patterns and pollination systems. The high-quality genome sequences will enhance the value of Petunia as a model system for research on unique biological phenomena such as small RNAs, symbiosis, self-incompatibility and circadian rhythms.
Tool-assisted rhythmic drumming in palm cockatoos shares key elements of human instrumental music
Heinsohn, Robert; Zdenek, Christina N.; Cunningham, Ross B.; Endler, John A.; Langmore, Naomi E.
2017-01-01
All human societies have music with a rhythmic “beat,” typically produced with percussive instruments such as drums. The set of capacities that allows humans to produce and perceive music appears to be deeply rooted in human biology, but an understanding of its evolutionary origins requires cross-taxa comparisons. We show that drumming by palm cockatoos (Probosciger aterrimus) shares the key rudiments of human instrumental music, including manufacture of a sound tool, performance in a consistent context, regular beat production, repeated components, and individual styles. Over 131 drumming sequences produced by 18 males, the beats occurred at nonrandom, regular intervals, yet individual males differed significantly in the shape parameters describing the distribution of their beat patterns, indicating individual drumming styles. Autocorrelation analyses of the longest drumming sequences further showed that they were highly regular and predictable like human music. These discoveries provide a rare comparative perspective on the evolution of rhythmicity and instrumental music in our own species, and show that a preference for a regular beat can have other origins before being co-opted into group-based music and dance. PMID:28782005
Centralized Planning for Multiple Exploratory Robots
NASA Technical Reports Server (NTRS)
Estlin, Tara; Rabideau, Gregg; Chien, Steve; Barrett, Anthony
2005-01-01
A computer program automatically generates plans for a group of robotic vehicles (rovers) engaged in geological exploration of terrain. The program rapidly generates multiple command sequences that can be executed simultaneously by the rovers. Starting from a set of high-level goals, the program creates a sequence of commands for each rover while respecting hardware constraints and limitations on resources of each rover and of hardware (e.g., a radio communication terminal) shared by all the rovers. First, a separate model of each rover is loaded into a centralized planning subprogram. The centralized planning software uses the models of the rovers plus an iterative repair algorithm to resolve conflicts posed by demands for resources and by constraints associated with the all the rovers and the shared hardware. During repair, heuristics are used to make planning decisions that will result in solutions that will be better and will be found faster than would otherwise be possible. In particular, techniques from prior solutions of the multiple-traveling- salesmen problem are used as heuristics to generate plans in which the paths taken by the rovers to assigned scientific targets are shorter than they would otherwise be.
Cis-acting elements in the promoter region of the human aldolase C gene.
Buono, P; de Conciliis, L; Olivetta, E; Izzo, P; Salvatore, F
1993-08-16
We investigated the cis-acting sequences involved in the expression of the human aldolase C gene by transient transfections into human neuroblastoma cells (SKNBE). We demonstrate that 420 bp of the 5'-flanking DNA direct at high efficiency the transcription of the CAT reporter gene. A deletion between -420 bp and -164 bp causes a 60% decrease of CAT activity. Gel shift and DNase I footprinting analyses revealed four protected elements: A, B, C and D. Competition analyses indicate that Sp1 or factors sharing a similar sequence specificity bind to elements A and B, but not to elements C and D. Sequence analysis shows a half palindromic ERE motif (GGTCA), in elements B and D. Region D binds a transactivating factor which appears also essential to stabilize the initiation complex.
The Genome Sequence of Taurine Cattle: A window to ruminant biology and evolution
Elsik, Christine G.; Tellam, Ross L.; Worley, Kim C.
2010-01-01
To understand the biology and evolution of ruminants, the cattle genome was sequenced to ∼7× coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1,217 are absent or undetected in non-eutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides an enabling resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production. PMID:19390049
Manrique, Pilar; Bolduc, Benjamin; Walk, Seth T.; van der Oost, John; de Vos, Willem M.; Young, Mark J.
2016-01-01
The role of bacteriophages in influencing the structure and function of the healthy human gut microbiome is unknown. With few exceptions, previous studies have found a high level of heterogeneity in bacteriophages from healthy individuals. To better estimate and identify the shared phageome of humans, we analyzed a deep DNA sequence dataset of active bacteriophages and available metagenomic datasets of the gut bacteriophage community from healthy individuals. We found 23 shared bacteriophages in more than one-half of 64 healthy individuals from around the world. These shared bacteriophages were found in a significantly smaller percentage of individuals with gastrointestinal/irritable bowel disease. A network analysis identified 44 bacteriophage groups of which 9 (20%) were shared in more than one-half of all 64 individuals. These results provide strong evidence of a healthy gut phageome (HGP) in humans. The bacteriophage community in the human gut is a mixture of three classes: a set of core bacteriophages shared among more than one-half of all people, a common set of bacteriophages found in 20–50% of individuals, and a set of bacteriophages that are either rarely shared or unique to a person. We propose that the core and common bacteriophage communities are globally distributed and comprise the HGP, which plays an important role in maintaining gut microbiome structure/function and thereby contributes significantly to human health. PMID:27573828
Manrique, Pilar; Bolduc, Benjamin; Walk, Seth T; van der Oost, John; de Vos, Willem M; Young, Mark J
2016-09-13
The role of bacteriophages in influencing the structure and function of the healthy human gut microbiome is unknown. With few exceptions, previous studies have found a high level of heterogeneity in bacteriophages from healthy individuals. To better estimate and identify the shared phageome of humans, we analyzed a deep DNA sequence dataset of active bacteriophages and available metagenomic datasets of the gut bacteriophage community from healthy individuals. We found 23 shared bacteriophages in more than one-half of 64 healthy individuals from around the world. These shared bacteriophages were found in a significantly smaller percentage of individuals with gastrointestinal/irritable bowel disease. A network analysis identified 44 bacteriophage groups of which 9 (20%) were shared in more than one-half of all 64 individuals. These results provide strong evidence of a healthy gut phageome (HGP) in humans. The bacteriophage community in the human gut is a mixture of three classes: a set of core bacteriophages shared among more than one-half of all people, a common set of bacteriophages found in 20-50% of individuals, and a set of bacteriophages that are either rarely shared or unique to a person. We propose that the core and common bacteriophage communities are globally distributed and comprise the HGP, which plays an important role in maintaining gut microbiome structure/function and thereby contributes significantly to human health.
A strategy for detecting the conservation of folding-nucleus residues in protein superfamilies.
Michnick, S W; Shakhnovich, E
1998-01-01
Nucleation-growth theory predicts that fast-folding peptide sequences fold to their native structure via structures in a transition-state ensemble that share a small number of native contacts (the folding nucleus). Experimental and theoretical studies of proteins suggest that residues participating in folding nuclei are conserved among homologs. We attempted to determine if this is true in proteins with highly diverged sequences but identical folds (superfamilies). We describe a strategy based on comparisons of residue conservation in natural superfamily sequences with simulated sequences (generated with a Monte-Carlo sequence design strategy) for the same proteins. The basic assumptions of the strategy were that natural sequences will conserve residues needed for folding and stability plus function, the simulated sequences contain no functional conservation, and nucleus residues make native contacts with each other. Based on these assumptions, we identified seven potential nucleus residues in ubiquitin superfamily members. Non-nucleus conserved residues were also identified; these are proposed to be involved in stabilizing native interactions. We found that all superfamily members conserved the same potential nucleus residue positions, except those for which the structural topology is significantly different. Our results suggest that the conservation of the nucleus of a specific fold can be predicted by comparing designed simulated sequences with natural highly diverged sequences that fold to the same structure. We suggest that such a strategy could be used to help plan protein folding and design experiments, to identify new superfamily members, and to subdivide superfamilies further into classes having a similar folding mechanism.
Pirooznia, Mehdi; Gong, Ping; Guan, Xin; Inouye, Laura S; Yang, Kuan; Perkins, Edward J; Deng, Youping
2007-01-01
Background Eisenia fetida, commonly known as red wiggler or compost worm, belongs to the Lumbricidae family of the Annelida phylum. Little is known about its genome sequence although it has been extensively used as a test organism in terrestrial ecotoxicology. In order to understand its gene expression response to environmental contaminants, we cloned 4032 cDNAs or expressed sequence tags (ESTs) from two E. fetida libraries enriched with genes responsive to ten ordnance related compounds using suppressive subtractive hybridization-PCR. Results A total of 3144 good quality ESTs (GenBank dbEST accession number EH669363–EH672369 and EL515444–EL515580) were obtained from the raw clone sequences after cleaning. Clustering analysis yielded 2231 unique sequences including 448 contigs (from 1361 ESTs) and 1783 singletons. Comparative genomic analysis showed that 743 or 33% of the unique sequences shared high similarity with existing genes in the GenBank nr database. Provisional function annotation assigned 830 Gene Ontology terms to 517 unique sequences based on their homology with the annotated genomes of four model organisms Drosophila melanogaster, Mus musculus, Saccharomyces cerevisiae, and Caenorhabditis elegans. Seven percent of the unique sequences were further mapped to 99 Kyoto Encyclopedia of Genes and Genomes pathways based on their matching Enzyme Commission numbers. All the information is stored and retrievable at a highly performed, web-based and user-friendly relational database called EST model database or ESTMD version 2. Conclusion The ESTMD containing the sequence and annotation information of 4032 E. fetida ESTs is publicly accessible at . PMID:18047730
Complete Genome Sequence and Comparative Analysis of the Fish Pathogen Lactococcus garvieae
Oshima, Kenshiro; Yoshizaki, Mariko; Kawanishi, Michiko; Nakaya, Kohei; Suzuki, Takehito; Miyauchi, Eiji; Ishii, Yasuo; Tanabe, Soichi; Murakami, Masaru; Hattori, Masahira
2011-01-01
Lactococcus garvieae causes fatal haemorrhagic septicaemia in fish such as yellowtail. The comparative analysis of genomes of a virulent strain Lg2 and a non-virulent strain ATCC 49156 of L. garvieae revealed that the two strains shared a high degree of sequence identity, but Lg2 had a 16.5-kb capsule gene cluster that is absent in ATCC 49156. The capsule gene cluster was composed of 15 genes, of which eight genes are highly conserved with those in exopolysaccharide biosynthesis gene cluster often found in Lactococcus lactis strains. Sequence analysis of the capsule gene cluster in the less virulent strain L. garvieae Lg2-S, Lg2-derived strain, showed that two conserved genes were disrupted by a single base pair deletion, respectively. These results strongly suggest that the capsule is crucial for virulence of Lg2. The capsule gene cluster of Lg2 may be a genomic island from several features such as the presence of insertion sequences flanked on both ends, different GC content from the chromosomal average, integration into the locus syntenic to other lactococcal genome sequences, and distribution in human gut microbiomes. The analysis also predicted other potential virulence factors such as haemolysin. The present study provides new insights into understanding of the virulence mechanisms of L. garvieae in fish. PMID:21829716
Gubser, Caroline; Smith, Geoffrey L
2002-04-01
Camelpox virus (CMPV) and variola virus (VAR) are orthopoxviruses (OPVs) that share several biological features and cause high mortality and morbidity in their single host species. The sequence of a virulent CMPV strain was determined; it is 202182 bp long, with inverted terminal repeats (ITRs) of 6045 bp and has 206 predicted open reading frames (ORFs). As for other poxviruses, the genes are tightly packed with little non-coding sequence. Most genes within 25 kb of each terminus are transcribed outwards towards the terminus, whereas genes within the centre of the genome are transcribed from either DNA strand. The central region of the genome contains genes that are highly conserved in other OPVs and 87 of these are conserved in all sequenced chordopoxviruses. In contrast, genes towards either terminus are more variable and encode proteins involved in host range, virulence or immunomodulation. In some cases, these are broken versions of genes found in other OPVs. The relationship of CMPV to other OPVs was analysed by comparisons of DNA and predicted protein sequences, repeats within the ITRs and arrangement of ORFs within the terminal regions. Each comparison gave the same conclusion: CMPV is the closest known virus to variola virus, the cause of smallpox.
Zhu, Dan-Tong; Xia, Wen-Qiang; Rao, Qiong; Liu, Shu-Sheng; Ghanim, Murad; Wang, Xiao-Wei
2016-08-01
The whitefly, Bemisia tabaci, harbors the primary symbiont 'Candidatus Portiera aleyrodidarum' and a variety of secondary symbionts. Among these secondary symbionts, Rickettsia is the only one that can be detected both inside and outside the bacteriomes. Infection with Rickettsia has been reported to influence several aspects of the whitefly biology, such as fitness, sex ratio, virus transmission and resistance to pesticides. However, mechanisms underlying these differences remain unclear, largely due to the lack of genomic information of Rickettsia. In this study, we sequenced the genome of two Rickettsia strains isolated from the Middle East Asia Minor 1 (MEAM1) species of the B. tabaci complex in China and Israel. Both Rickettsia genomes were of high coding density and AT-rich, containing more than 1000 coding sequences, much larger than that of the coexisted primary symbiont, Portiera. Moreover, the two Rickettsia strains isolated from China and Israel shared most of the genes with 100% identity and only nine genes showed sequence differences. The phylogenetic analysis using orthologs shared in the genus, inferred the proximity of Rickettsia in MEAM1 and Rickettsia bellii. Functional analysis revealed that Rickettsia was unable to synthesize amino acids required for complementing the whitefly nutrition. Besides, a type IV secretion system and a number of virulence-related genes were detected in the Rickettsia genome. The presence of virulence-related genes might benefit the symbiotic life of the bacteria, and hint on potential effects of Rickettsia on whiteflies. The genome sequences of Rickettsia provided a basis for further understanding the function of Rickettsia in whiteflies. © 2016 Institute of Zoology, Chinese Academy of Sciences.
Desta, Adey Feleke; Assefa, Fassil; Leta, Seyoum; Stomeo, Francesca; Wamalwa, Mark; Njahira, Moses; Appolinaire, Djikeng
2014-01-01
A culture-independent approach was used to elucidate the microbial diversity and structure in the anaerobic-aerobic reactors integrated with a constructed wetland for the treatment of tannery wastewater in Modjo town, Ethiopia. The system has been running with removal efficiencies ranging from 94%–96% for COD, 91%–100% for SO42- and S2-, 92%–94% for BOD, 56%–82% for total Nitrogen and 2%–90% for NH3-N. 16S rRNA gene clone libraries were constructed and microbial community assemblies were determined by analysis of a total of 801 unique clone sequences from all the sites. Operational Taxonomic Unit (OTU) - based analysis of the sequences revealed highly diverse communities in each of the reactors and the constructed wetland. A total of 32 phylotypes were identified with the dominant members affiliated to Clostridia (33%), Betaproteobacteria (10%), Bacteroidia (10%), Deltaproteobacteria (9%) and Gammaproteobacteria (6%). Sequences affiliated to the class Clostridia were the most abundant across all sites. The 801 sequences were assigned to 255 OTUs, of which 3 OTUs were shared among the clone libraries from all sites. The shared OTUs comprised 80 sequences belonging to Clostridiales Family XIII Incertae Sedis, Bacteroidetes and unclassified bacterial group. Significantly different communities were harbored by the anaerobic, aerobic and rhizosphere sites of the constructed wetland. Numerous representative genera of the dominant bacterial classes obtained from the different sample sites of the integrated system have been implicated in the removal of various carbon- containing pollutants of natural and synthetic origins. To our knowledge, this is the first report of microbial community structure in tannery wastewater treatment plant from Ethiopia. PMID:25541981
Characterizing partial AZFc deletions of the Y chromosome with amplicon-specific sequence markers
Navarro-Costa, Paulo; Pereira, Luísa; Alves, Cíntia; Gusmão, Leonor; Proença, Carmen; Marques-Vidal, Pedro; Rocha, Tiago; Correia, Sónia C; Jorge, Sónia; Neves, António; Soares, Ana P; Nunes, Joaquim; Calhaz-Jorge, Carlos; Amorim, António; Plancha, Carlos E; Gonçalves, João
2007-01-01
Background The AZFc region of the human Y chromosome is a highly recombinogenic locus containing multi-copy male fertility genes located in repeated DNA blocks (amplicons). These AZFc gene families exhibit slight sequence variations between copies which are considered to have functional relevance. Yet, partial AZFc deletions yield phenotypes ranging from normospermia to azoospermia, thwarting definite conclusions on their real impact on fertility. Results The amplicon content of partial AZFc deletion products was characterized with novel amplicon-specific sequence markers. Data indicate that partial AZFc deletions are a male infertility risk [odds ratio: 5.6 (95% CI: 1.6–30.1)] and although high diversity of partial deletion products and sequence conversion profiles were recorded, the AZFc marker profiles detected in fertile men were also observed in infertile men. Additionally, the assessment of rearrangement recurrence by Y-lineage analysis indicated that while partial AZFc deletions occurred in highly diverse samples, haplotype diversity was minimal in fertile men sharing identical marker profiles. Conclusion Although partial AZFc deletion products are highly heterogeneous in terms of amplicon content, this plasticity is not sufficient to account for the observed phenotypical variance. The lack of causative association between the deletion of specific gene copies and infertility suggests that AZFc gene content might be part of a multifactorial network, with Y-lineage evolution emerging as a possible phenotype modulator. PMID:17903263
A Glimpse into the Satellite DNA Library in Characidae Fish (Teleostei, Characiformes)
Utsunomia, Ricardo; Ruiz-Ruano, Francisco J.; Silva, Duílio M. Z. A.; Serrano, Érica A.; Rosa, Ivana F.; Scudeler, Patrícia E. S.; Hashimoto, Diogo T.; Oliveira, Claudio; Camacho, Juan Pedro M.; Foresti, Fausto
2017-01-01
Satellite DNA (satDNA) is an abundant fraction of repetitive DNA in eukaryotic genomes and plays an important role in genome organization and evolution. In general, satDNA sequences follow a concerted evolutionary pattern through the intragenomic homogenization of different repeat units. In addition, the satDNA library hypothesis predicts that related species share a series of satDNA variants descended from a common ancestor species, with differential amplification of different satDNA variants. The finding of a same satDNA family in species belonging to different genera within Characidae fish provided the opportunity to test both concerted evolution and library hypotheses. For this purpose, we analyzed here sequence variation and abundance of this satDNA family in ten species, by a combination of next generation sequencing (NGS), PCR and Sanger sequencing, and fluorescence in situ hybridization (FISH). We found extensive between-species variation for the number and size of pericentromeric FISH signals. At genomic level, the analysis of 1000s of DNA sequences obtained by Illumina sequencing and PCR amplification allowed defining 150 haplotypes which were linked in a common minimum spanning tree, where different patterns of concerted evolution were apparent. This also provided a glimpse into the satDNA library of this group of species. In consistency with the library hypothesis, different variants for this satDNA showed high differences in abundance between species, from highly abundant to simply relictual variants. PMID:28855916
Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations.
Neuwald, Andrew F; Altschul, Stephen F
2016-12-01
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes' theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).
2014-01-01
Background The advent of human genome sequencing project has led to a spurt in the number of protein sequences in the databanks. Success of structure based drug discovery severely hinges on the availability of structures. Despite significant progresses in the area of experimental protein structure determination, the sequence-structure gap is continually widening. Data driven homology based computational methods have proved successful in predicting tertiary structures for sequences sharing medium to high sequence similarities. With dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Here we describe Bhageerath-H, a homology/ ab initio hybrid software/server for predicting protein tertiary structures with advancing drug design attempts as one of the goals. Results Bhageerath-H web-server was validated on 75 CASP10 targets which showed TM-scores ≥0.5 in 91% of the cases and Cα RMSDs ≤5Å from the native in 58% of the targets, which is well above the CASP10 water mark. Comparison with some leading servers demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution. Conclusion Bhageerath-H methodology is web enabled for the scientific community as a freely accessible web server. The methodology is fielded in the on-going CASP11 experiment. PMID:25521245
Lim, Shu Yong; Yap, Kien-Pong; Thong, Kwai Lin
2016-01-01
Listeria monocytogenes is an important foodborne pathogen that causes considerable morbidity in humans with high mortality rates. In this study, we have sequenced the genomes and performed comparative genomics analyses on two strains, LM115 and LM41, isolated from ready-to-eat food in Malaysia. The genome size of LM115 and LM41 was 2,959,041 and 2,963,111 bp, respectively. These two strains shared approximately 90% homologous genes. Comparative genomics and phylogenomic analyses revealed that LM115 and LM41 were more closely related to the reference strains F2365 and EGD-e, respectively. Our virulence profiling indicated a total of 31 virulence genes shared by both analysed strains. These shared genes included those that encode for internalins and L. monocytogenes pathogenicity island 1 (LIPI-1). Both the Malaysian L. monocytogenes strains also harboured several genes associated with stress tolerance to counter the adverse conditions. Seven antibiotic and efflux pump related genes which may confer resistance against lincomycin, erythromycin, fosfomycin, quinolone, tetracycline, and penicillin, and macrolides were identified in the genomes of both strains. Whole genome sequencing and comparative genomics analyses revealed two virulent L. monocytogenes strains isolated from ready-to-eat foods in Malaysia. The identification of strains with pathogenic, persistent, and antibiotic resistant potentials from minimally processed food warrant close attention from both healthcare and food industry.
How strong are passwords used to protect personal health information in clinical trials?
El Emam, Khaled; Moreau, Katherine; Jonker, Elizabeth
2011-02-11
Findings and statements about how securely personal health information is managed in clinical research are mixed. The objective of our study was to evaluate the security of practices used to transfer and share sensitive files in clinical trials. Two studies were performed. First, 15 password-protected files that were transmitted by email during regulated Canadian clinical trials were obtained. Commercial password recovery tools were used on these files to try to crack their passwords. Second, interviews with 20 study coordinators were conducted to understand file-sharing practices in clinical trials for files containing personal health information. We were able to crack the passwords for 93% of the files (14/15). Among these, 13 files contained thousands of records with sensitive health information on trial participants. The passwords tended to be relatively weak, using common names of locations, animals, car brands, and obvious numeric sequences. Patient information is commonly shared by email in the context of query resolution. Files containing personal health information are shared by email and, by posting them on shared drives with common passwords, to facilitate collaboration. If files containing sensitive patient information must be transferred by email, mechanisms to encrypt them and to ensure that password strength is high are necessary. More sophisticated collaboration tools are required to allow file sharing without password sharing. We provide recommendations to implement these practices.
How Strong are Passwords Used to Protect Personal Health Information in Clinical Trials?
Moreau, Katherine; Jonker, Elizabeth
2011-01-01
Background Findings and statements about how securely personal health information is managed in clinical research are mixed. Objective The objective of our study was to evaluate the security of practices used to transfer and share sensitive files in clinical trials. Methods Two studies were performed. First, 15 password-protected files that were transmitted by email during regulated Canadian clinical trials were obtained. Commercial password recovery tools were used on these files to try to crack their passwords. Second, interviews with 20 study coordinators were conducted to understand file-sharing practices in clinical trials for files containing personal health information. Results We were able to crack the passwords for 93% of the files (14/15). Among these, 13 files contained thousands of records with sensitive health information on trial participants. The passwords tended to be relatively weak, using common names of locations, animals, car brands, and obvious numeric sequences. Patient information is commonly shared by email in the context of query resolution. Files containing personal health information are shared by email and, by posting them on shared drives with common passwords, to facilitate collaboration. Conclusion If files containing sensitive patient information must be transferred by email, mechanisms to encrypt them and to ensure that password strength is high are necessary. More sophisticated collaboration tools are required to allow file sharing without password sharing. We provide recommendations to implement these practices. PMID:21317106
Gotzes, F; Balfanz, S; Baumann, A
1994-01-01
Members of the superfamily of G-protein coupled receptors share significant similarities in sequence and transmembrane architecture. We have isolated a Drosophila homologue of the mammalian dopamine receptor family using a low stringency hybridization approach. The deduced amino acid sequence is approximately 70% homologous to the human D1/D5 receptors. When expressed in HEK 293 cells, the Drosophila receptor stimulates cAMP production in response to dopamine application. This effect was mimicked by SKF 38393, a specific D1 receptor agonist, but inhibited by dopaminergic antagonists such as butaclamol and flupentixol. In situ hybridization revealed that the Drosophila dopamine receptor is highly expressed in the somata of the optic lobes. This suggests that the receptor might be involved in the processing of visual information and/or visual learning in invertebrates.
Host Cell Virus Entry Mediated by Australian Bat Lyssavirus Envelope G glycoprotein
2013-10-24
39 Figure 7. Comparison of the amino acid sequences of Saccolaimus and Pteropus ABLV G mature protein... sequence analysis revealed that the PCR products were identical. Sequence comparisons of the ABLV N and other lyssavirus N proteins showed that ABLV...Saccolaimus flaviventris) (129). Nucleoprotein sequence comparisons revealed that the Saccolaimus N protein shared 96% amino acid homology with the Pteropus
Porcine insulin receptor substrate 4 (IRS4) gene: cloning, polymorphism and association study
USDA-ARS?s Scientific Manuscript database
Using PCR and IPCR techniques we obtained a 4498 bp nucleotide sequence FN424076 encompassing the complete coding sequence of the porcine IRS4 gene and its proximal promoter. The 1269-amino acid porcine protein deduced from the nucleotide sequence shares 92% identity with the human IRS4 and possesse...
Garcia-Fernàndez, J; Bayascas-Ramírez, J R; Marfany, G; Muñoz-Mármol, A M; Casali, A; Baguñà, J; Saló, E
1995-05-01
Several DNA sequences similar to the mariner element were isolated and characterized in the platyhelminthe Dugesia (Girardia) tigrina. They were 1,288 bp long, flanked by two 32 bp-inverted repeats, and contained a single 339 amino acid open-reading frame (ORF) encoding the transposase. The number of copies of this element is approximately 8,000 per haploid genome, constituting a member of the middle-repetitive DNA of Dugesia tigrina. Sequence analysis of several elements showed a high percentage of conservation between the different copies. Most of them presented an intact ORF and the standard signals of actively expressed genes, which suggests that some of them are or have recently been functional transposons. The high degree of similarity shared with other mariner elements from some arthropods, together with the fact that this element is undetectable in other planarian species, strongly suggests a case of horizontal transfer between these two distant phyla.
Langevin synchronization in a time-dependent, harmonic basin: An exact solution in 1D
NASA Astrophysics Data System (ADS)
Cadilhe, A.; Voter, Arthur F.
2018-02-01
The trajectories of two particles undergoing Langevin dynamics while sharing a common noise sequence can merge into a single (master) trajectory. Here, we present an exact solution for a particle undergoing Langevin dynamics in a harmonic, time-dependent potential, thus extending the idea of synchronization to nonequilibrium systems. We calculate the synchronization level, i.e., the mismatch between two trajectories sharing a common noise sequence, in the underdamped, critically damped, and overdamped regimes. Finally, we provide asymptotic expansions in various limiting cases and compare to the time independent case.
Detection of a divergent variant of grapevine virus F by next-generation sequencing.
Molenaar, Nicholas; Burger, Johan T; Maree, Hans J
2015-08-01
The complete genome sequence of a South African isolate of grapevine virus F (GVF) is presented. It was first detected by metagenomic next-generation sequencing of field samples and validated through direct Sanger sequencing. The genome sequence of GVF isolate V5 consists of 7539 nucleotides and contains a poly(A) tail. It has a typical vitivirus genome arrangement that comprises five open reading frames (ORFs), which share only 88.96 % nucleotide sequence identity with the existing complete GVF genome sequence (JX105428).
Guo, Xiaoqin; Izume, Satoko; Okada, Ayaka; Ohya, Kenji; Kimura, Takashi; Fukushi, Hideto
2014-09-01
A strain of equine herpesvirus type 1 (EHV-1) was isolated from zebra. This strain, called "zebra-borne EHV-1", was also isolated from an onager and a gazelle in zoological gardens in U.S.A. The full genome sequences of the 3 strains were determined. They shared 99% identities with each other, while they shared 98% and 95% identities with the horse derived EHV-1 and equine herpesvirus type 9, respectively. Sequence data indicated that the EHV-1 isolated from a polar bear in Germany is one of the zebra-borne EHV-1 and not a recombinant virus. These results indicated that zebra-borne EHV-1 is a subtype of EHV-1.
Zulfiqar, Awais; Zhang, Jie; Cui, Xiaofeng; Qian, Yajuan; Zhou, Xueping; Xie, Yan
2012-01-01
A begomovirus disease complex associated with Vernonia cinerea showing yellow vein symptoms was studied. The full-length genomic DNA was comprised of 2739 nucleotides (nt) and contained the typical genome structure of begomoviruses. Comparison analysis showed that it shared the highest (78.9%) nucleotide sequence identity with recently characterized Vernonia yellow vein virus (VeYVV) from India. For associated satellites, betasatellite showed the highest nucleotide sequence identity (52.1%) with Vernonia yellow vein virus betasatellite (VeYVVB) and alphasatellite shared the highest sequence identity (70.7%) with Gossypium mustelinium symptomless alphasatellite (GMusSLA). It is a member of a distinct species with cognate alpha- and betasatellites for which the name Vernonia yellow vein Fujian virus (VeYVFjV) is proposed.
NABIC: A New Access Portal to Search, Visualize, and Share Agricultural Genomics Data
Seol, Young-Joo; Lee, Tae-Ho; Park, Dong-Suk; Kim, Chang-Kug
2016-01-01
The National Agricultural Biotechnology Information Center developed an access portal to search, visualize, and share agricultural genomics data with a focus on South Korean information and resources. The portal features an agricultural biotechnology database containing a wide range of omics data from public and proprietary sources. We collected 28.4 TB of data from 162 agricultural organisms, with 10 types of omics data comprising next-generation sequencing sequence read archive, genome, gene, nucleotide, DNA chip, expressed sequence tag, interactome, protein structure, molecular marker, and single-nucleotide polymorphism datasets. Our genomic resources contain information on five animals, seven plants, and one fungus, which is accessed through a genome browser. We also developed a data submission and analysis system as a web service, with easy-to-use functions and cutting-edge algorithms, including those for handling next-generation sequencing data. PMID:26848255
Phylogeny and Haplotype Analysis of Fungi Within the Fusarium incarnatum-equiseti Species Complex.
Ramdial, H; Latchoo, R K; Hosein, F N; Rampersad, S N
2017-01-01
Fusarium spp. are ranked among the top 10 most economically and scientifically important plant-pathogenic fungi in the world and are associated with plant diseases that include fruit decay of a number of crops. Fusarium isolates infecting bell pepper in Trinidad were identified based on sequence comparisons of the translation elongation factor gene (EF-1a) with sequences of Fusarium incarnatum-equiseti species complex (FIESC) verified in the FUSARIUM-ID database. Eighty-two isolates were identified as belonging to one of four phylogenetic species within the subclades FIESC-1, FIESC-15, FIESC-16, and FIESC-26, with the majority of isolates belonging to FIESC-15. A comparison of the level of DNA polymorphism and phylogenetic inference for sequences of the internal transcribed spacer region (ITS1-5.8S-ITS2) and EF-1a sequences for Trinidad and FUSARIUM-ID type species was carried out. The ITS sequences were less informative, had lower haplotype diversity and restricted haplotype distribution, and resulted in poor resolution and taxa placement in the consensus maximum-likelihood tree. EF-1a sequences enabled strongly supported phylogenetic inference with highly resolved branching patterns of the 30 phylogenetic species within the FIESC and placement of representative Trinidad isolates. Therefore, global phylogeny was inferred from EF-1a sequences representing 11 countries, and separation into distinct Incarnatum and Equiseti clades was again evident. In total, 42 haplotypes were identified: 12 were shared and the remaining were unique haplotypes. The most diverse haplotype was represented by sequences from China, Indonesia, Malaysia, and Trinidad and consisted exclusively of F. incarnatum isolates. Spain had the highest haplotype diversity, perhaps because both F. equiseti and F. incarnatum sequences were represented; followed by the United States, which contributed both F. equiseti and F. incarnatum sequences to the data set; then by countries representing Southeast Asia (China, Indonesia, Malaysia, Thailand, and Philippines) and Trinidad; both of these regions were represented by only F. incarnatum sequences. Trinidad shared two haplotypes with China and one haplotype with the United States for only F. incarnatum isolates. The findings of this study are important for devising disease management strategies and for understanding the phylogenetic relationships among members of the FIESC.
The influence of phonological priming on variability in articulation
NASA Astrophysics Data System (ADS)
Babel, Molly E.; Munson, Benjamin
2004-05-01
Previous research [Sevald and Dell, Cognition 53, 91-127 (1994)] has found that reiterant sequences of CVC words are produced more quickly when the prime word and target word share VC sequences (i.e., sequences like sit sick) than when they are identical (sequences like sick sick). Even slower production rates are found when primes and targets share a CV sequence (sequences like kick sick). These data have been used to support a model of speech production in which lexical items and their constituent phonemes are activated sequentially. The current experiment investigated whether phonological priming also influences variability in the acoustic characteristics of words. Specifically, we examined whether greater variability in the acoustic characteristics of target words was noted in the CV-related prime context than in the identical-prime context, and whether less variability was noted in the VC-related context. Thirty adult subjects with typical speech, language, and hearing ability produced reiterant two-word sequences that varied in their phonological similarity. The duration, first, and second formant frequencies of the target-words' vowels were measured. Preliminary analyses indicate that phonological priming does not have a systematic effect on variability in these acoustic parameters.
Duesberg, Peter H.; Vogt, Peter K.
1979-01-01
The genome of the defective avian tumor virus MH2 was identified as a RNA of 5.7 kilobases by its presence in different MH2-helper virus complexes and its absence from pure helper virus, by its unique fingerprint pattern of RNase T1-resistant (T1) oligonucleotides that differed from those of two helper virus RNAs, and by its structural analogy to the RNA of MC29, another avian acute leukemia virus. Two sets of sequences were distinguished in MH2 RNA: 66% hybridized with DNA complementary to helper-independent avian tumor viruses, termed group-specific, and 34% were specific. The percentage of specific sequences is considered a minimal estimate because the MH2 RNA used was about 30% contaminated by helper virus RNA. No sequences related to the transforming src gene of avian sarcoma viruses were found in MH2. MH2 shared three large T1 oligonucleotides with MC29, two of which could also be isolated from a RNase A- and T1-resistant hybrid formed between MH2 RNA and MC29 specific cDNA. These oligonucleotides belong to a group of six that define the specific segment of MC29 RNA described previously. The group-specific sequences of MH2 and MC29 RNA shared only the two smallest out of about 20 T1 oligonucleotides associated with MH2 RNA. It is concluded that the specific sequences of MH2 and MC29 are related, and it is proposed that they are necessary for, or identical with, the onc genes of these viruses. These sequences would define a related class of transforming genes in avian tumor viruses that differs from the src genes of avian sarcoma viruses. Images PMID:221900
RAD tag sequencing as a source of SNP markers in Cynara cardunculus L
2012-01-01
Background The globe artichoke (Cynara cardunculus L. var. scolymus) genome is relatively poorly explored, especially compared to those of the other major Asteraceae crops sunflower and lettuce. No SNP markers are in the public domain. We have combined the recently developed restriction-site associated DNA (RAD) approach with the Illumina DNA sequencing platform to effect the rapid and mass discovery of SNP markers for C. cardunculus. Results RAD tags were sequenced from the genomic DNA of three C. cardunculus mapping population parents, generating 9.7 million reads, corresponding to ~1 Gbp of sequence. An assembly based on paired ends produced ~6.0 Mbp of genomic sequence, separated into ~19,000 contigs (mean length 312 bp), of which ~21% were fragments of putative coding sequence. The shared sequences allowed for the discovery of ~34,000 SNPs and nearly 800 indels, equivalent to a SNP frequency of 5.6 per 1,000 nt, and an indel frequency of 0.2 per 1,000 nt. A sample of heterozygous SNP loci was mapped by CAPS assays and this exercise provided validation of our mining criteria. The repetitive fraction of the genome had a high representation of retrotransposon sequence, followed by simple repeats, AT-low complexity regions and mobile DNA elements. The genomic k-mers distribution and CpG rate of C. cardunculus, compared with data derived from three whole genome-sequenced dicots species, provided a further evidence of the random representation of the C. cardunculus genome generated by RAD sampling. Conclusion The RAD tag sequencing approach is a cost-effective and rapid method to develop SNP markers in a highly heterozygous species. Our approach permitted to generate a large and robust SNP datasets by the adoption of optimized filtering criteria. PMID:22214349
Polynucleobacter bacteria in the brackish-water species Euplotes harpa (Ciliata Hypotrichia).
Vannini, Claudia; Petroni, Giulio; Verni, Franco; Rosati, Giovanna
2005-01-01
We have found a Polynucleobacter bacterium in the cytoplasm of Euplotes harpa, a species living in a brackish-water habitat, with a cirral pattern not corresponding to that of the freshwater Euplotes species known to harbor this type of bacteria. The symbiont has been found in three strains of the species, obtained by clonal cultures from ciliates collected in different geographic regions. The 16S rRNA gene sequence of this bacterium identifies it as a member of the beta-proteobacterial genus Polynucleobacter. This sequence shares a high similarity value (98.4-98.5%) with P. necessarius, the type species of the genus, and is associated with 16S rRNA gene sequences of environmental clones and bacterial strains included in the Polynucleobacter cluster (>95%). An oligonucleotide probe was designed to corroborate the assignment of the retrieved sequence to the symbiont and to detect similar bacteria rapidly. Antibiotic experiments showed that the elimination of the bacteria stops the reproductive cycle in E. harpa, as has been shown for the freshwater Euplotes species.
Erickson, Robert P
2016-01-01
The advent of next generation sequencing (NGS, which consists of massively parallel sequencing to perform TGS (total genome sequencing) or WES (whole exome sequencing)) has abundantly discovered many causative mutations in patients with pediatric neurological disease. A surprisingly high number of these are de novo mutations which have not been inherited from either parent. For epilepsy, autism spectrum disorders, and neuromotor disorders, including cerebral palsy, initial estimates put the frequency of causative de novo mutations at about 15% and about 10% of these are somatic. There are some shared mutated genes between these three classes of disease. Studies of copy number variation by comparative genomic hybridization (CGH) proceded the NGS approaches but they also detect de novo variation which is especially important for ASDs. There are interesting differences between the mutated genes detected by CGS and NGS. In summary, de novo mutations cause a very significant proportion of pediatric neurological disease. Copyright © 2015 Elsevier B.V. All rights reserved.
Mapping and Sequencing the Human Genome
DOE R&D Accomplishments Database
1988-01-01
Numerous meetings have been held and a debate has developed in the biological community over the merits of mapping and sequencing the human genome. In response a committee to examine the desirability and feasibility of mapping and sequencing the human genome was formed to suggest options for implementing the project. The committee asked many questions. Should the analysis of the human genome be left entirely to the traditionally uncoordinated, but highly successful, support systems that fund the vast majority of biomedical research. Or should a more focused and coordinated additional support system be developed that is limited to encouraging and facilitating the mapping and eventual sequencing of the human genome. If so, how can this be done without distorting the broader goals of biological research that are crucial for any understanding of the data generated in such a human genome project. As the committee became better informed on the many relevant issues, the opinions of its members coalesced, producing a shared consensus of what should be done. This report reflects that consensus.
Fanning, T; Singer, M
1987-01-01
Recent work suggests that one or more members of the highly repeated LINE-1 (L1) DNA family found in all mammals may encode one or more proteins. Here we report the sequence of a portion of an L1 cloned from the domestic cat (Felis catus). These data permit comparison of the L1 sequences in four mammalian orders (Carnivore, Lagomorph, Rodent and Primate) and the comparison supports the suggested coding potential. In two separate, noncontiguous regions in the carboxy terminal half of the proteins predicted from the DNA sequences, there are several strongly conserved segments. In one region, these share homology with known or suspected reverse transcriptases, as described by others in rodents and primates. In the second region, closer to the carboxy terminus, the strongly conserved segments are over 90% homologous among the four orders. One of the latter segments is cysteine rich and resembles the putative metal binding domains of nucleic acid binding proteins, including those of TFIIIA and retroviruses. PMID:3562227
Rift Valley Fever, Sudan, 2007 and 2010
Aradaib, Imadeldin E.; Erickson, Bobbie R.; Elageb, Rehab M.; Khristova, Marina L.; Carroll, Serena A.; Elkhidir, Isam M.; Karsany, Mubarak E.; Karrar, AbdelRahim E.; Elbashir, Mustafa I.
2013-01-01
To elucidate whether Rift Valley fever virus (RVFV) diversity in Sudan resulted from multiple introductions or from acquired changes over time from 1 introduction event, we generated complete genome sequences from RVFV strains detected during the 2007 and 2010 outbreaks. Phylogenetic analyses of small, medium, and large RNA segment sequences indicated several genetic RVFV variants were circulating in Sudan, which all grouped into Kenya-1 or Kenya-2 sublineages from the 2006–2008 eastern Africa epizootic. Bayesian analysis of sequence differences estimated that diversity among the 2007 and 2010 Sudan RVFV variants shared a most recent common ancestor circa 1996. The data suggest multiple introductions of RVFV into Sudan as part of sweeping epizootics from eastern Africa. The sequences indicate recent movement of RVFV and support the need for surveillance to recognize when and where RVFV circulates between epidemics, which can make data from prediction tools easier to interpret and preventive measures easier to direct toward high-risk areas. PMID:23347790
The DNA sequence of the human X chromosome
Ross, Mark T.; Grafham, Darren V.; Coffey, Alison J.; Scherer, Steven; McLay, Kirsten; Muzny, Donna; Platzer, Matthias; Howell, Gareth R.; Burrows, Christine; Bird, Christine P.; Frankish, Adam; Lovell, Frances L.; Howe, Kevin L.; Ashurst, Jennifer L.; Fulton, Robert S.; Sudbrak, Ralf; Wen, Gaiping; Jones, Matthew C.; Hurles, Matthew E.; Andrews, T. Daniel; Scott, Carol E.; Searle, Stephen; Ramser, Juliane; Whittaker, Adam; Deadman, Rebecca; Carter, Nigel P.; Hunt, Sarah E.; Chen, Rui; Cree, Andrew; Gunaratne, Preethi; Havlak, Paul; Hodgson, Anne; Metzker, Michael L.; Richards, Stephen; Scott, Graham; Steffen, David; Sodergren, Erica; Wheeler, David A.; Worley, Kim C.; Ainscough, Rachael; Ambrose, Kerrie D.; Ansari-Lari, M. Ali; Aradhya, Swaroop; Ashwell, Robert I. S.; Babbage, Anne K.; Bagguley, Claire L.; Ballabio, Andrea; Banerjee, Ruby; Barker, Gary E.; Barlow, Karen F.; Barrett, Ian P.; Bates, Karen N.; Beare, David M.; Beasley, Helen; Beasley, Oliver; Beck, Alfred; Bethel, Graeme; Blechschmidt, Karin; Brady, Nicola; Bray-Allen, Sarah; Bridgeman, Anne M.; Brown, Andrew J.; Brown, Mary J.; Bonnin, David; Bruford, Elspeth A.; Buhay, Christian; Burch, Paula; Burford, Deborah; Burgess, Joanne; Burrill, Wayne; Burton, John; Bye, Jackie M.; Carder, Carol; Carrel, Laura; Chako, Joseph; Chapman, Joanne C.; Chavez, Dean; Chen, Ellson; Chen, Guan; Chen, Yuan; Chen, Zhijian; Chinault, Craig; Ciccodicola, Alfredo; Clark, Sue Y.; Clarke, Graham; Clee, Chris M.; Clegg, Sheila; Clerc-Blankenburg, Kerstin; Clifford, Karen; Cobley, Vicky; Cole, Charlotte G.; Conquer, Jen S.; Corby, Nicole; Connor, Richard E.; David, Robert; Davies, Joy; Davis, Clay; Davis, John; Delgado, Oliver; DeShazo, Denise; Dhami, Pawandeep; Ding, Yan; Dinh, Huyen; Dodsworth, Steve; Draper, Heather; Dugan-Rocha, Shannon; Dunham, Andrew; Dunn, Matthew; Durbin, K. James; Dutta, Ireena; Eades, Tamsin; Ellwood, Matthew; Emery-Cohen, Alexandra; Errington, Helen; Evans, Kathryn L.; Faulkner, Louisa; Francis, Fiona; Frankland, John; Fraser, Audrey E.; Galgoczy, Petra; Gilbert, James; Gill, Rachel; Glöckner, Gernot; Gregory, Simon G.; Gribble, Susan; Griffiths, Coline; Grocock, Russell; Gu, Yanghong; Gwilliam, Rhian; Hamilton, Cerissa; Hart, Elizabeth A.; Hawes, Alicia; Heath, Paul D.; Heitmann, Katja; Hennig, Steffen; Hernandez, Judith; Hinzmann, Bernd; Ho, Sarah; Hoffs, Michael; Howden, Phillip J.; Huckle, Elizabeth J.; Hume, Jennifer; Hunt, Paul J.; Hunt, Adrienne R.; Isherwood, Judith; Jacob, Leni; Johnson, David; Jones, Sally; de Jong, Pieter J.; Joseph, Shirin S.; Keenan, Stephen; Kelly, Susan; Kershaw, Joanne K.; Khan, Ziad; Kioschis, Petra; Klages, Sven; Knights, Andrew J.; Kosiura, Anna; Kovar-Smith, Christie; Laird, Gavin K.; Langford, Cordelia; Lawlor, Stephanie; Leversha, Margaret; Lewis, Lora; Liu, Wen; Lloyd, Christine; Lloyd, David M.; Loulseged, Hermela; Loveland, Jane E.; Lovell, Jamieson D.; Lozado, Ryan; Lu, Jing; Lyne, Rachael; Ma, Jie; Maheshwari, Manjula; Matthews, Lucy H.; McDowall, Jennifer; McLaren, Stuart; McMurray, Amanda; Meidl, Patrick; Meitinger, Thomas; Milne, Sarah; Miner, George; Mistry, Shailesh L.; Morgan, Margaret; Morris, Sidney; Müller, Ines; Mullikin, James C.; Nguyen, Ngoc; Nordsiek, Gabriele; Nyakatura, Gerald; O’Dell, Christopher N.; Okwuonu, Geoffery; Palmer, Sophie; Pandian, Richard; Parker, David; Parrish, Julia; Pasternak, Shiran; Patel, Dina; Pearce, Alex V.; Pearson, Danita M.; Pelan, Sarah E.; Perez, Lesette; Porter, Keith M.; Ramsey, Yvonne; Reichwald, Kathrin; Rhodes, Susan; Ridler, Kerry A.; Schlessinger, David; Schueler, Mary G.; Sehra, Harminder K.; Shaw-Smith, Charles; Shen, Hua; Sheridan, Elizabeth M.; Shownkeen, Ratna; Skuce, Carl D.; Smith, Michelle L.; Sotheran, Elizabeth C.; Steingruber, Helen E.; Steward, Charles A.; Storey, Roy; Swann, R. Mark; Swarbreck, David; Tabor, Paul E.; Taudien, Stefan; Taylor, Tineace; Teague, Brian; Thomas, Karen; Thorpe, Andrea; Timms, Kirsten; Tracey, Alan; Trevanion, Steve; Tromans, Anthony C.; d’Urso, Michele; Verduzco, Daniel; Villasana, Donna; Waldron, Lenee; Wall, Melanie; Wang, Qiaoyan; Warren, James; Warry, Georgina L.; Wei, Xuehong; West, Anthony; Whitehead, Siobhan L.; Whiteley, Mathew N.; Wilkinson, Jane E.; Willey, David L.; Williams, Gabrielle; Williams, Leanne; Williamson, Angela; Williamson, Helen; Wilming, Laurens; Woodmansey, Rebecca L.; Wray, Paul W.; Yen, Jennifer; Zhang, Jingkun; Zhou, Jianling; Zoghbi, Huda; Zorilla, Sara; Buck, David; Reinhardt, Richard; Poustka, Annemarie; Rosenthal, André; Lehrach, Hans; Meindl, Alfons; Minx, Patrick J.; Hillier, LaDeana W.; Willard, Huntington F.; Wilson, Richard K.; Waterston, Robert H.; Rice, Catherine M.; Vaudin, Mark; Coulson, Alan; Nelson, David L.; Weinstock, George; Sulston, John E.; Durbin, Richard; Hubbard, Tim; Gibbs, Richard A.; Beck, Stephan; Rogers, Jane; Bentley, David R.
2009-01-01
The human X chromosome has a unique biology that was shaped by its evolution as the sex chromosome shared by males and females. We have determined 99.3% of the euchromatic sequence of the X chromosome. Our analysis illustrates the autosomal origin of the mammalian sex chromosomes, the stepwise process that led to the progressive loss of recombination between X and Y, and the extent of subsequent degradation of the Y chromosome. LINE1 repeat elements cover one-third of the X chromosome, with a distribution that is consistent with their proposed role as way stations in the process of X-chromosome inactivation. We found 1,098 genes in the sequence, of which 99 encode proteins expressed in testis and in various tumour types. A disproportionately high number of mendelian diseases are documented for the X chromosome. Of this number, 168 have been explained by mutations in 113 X-linked genes, which in many cases were characterized with the aid of the DNA sequence. PMID:15772651
A Utility Maximizing and Privacy Preserving Approach for Protecting Kinship in Genomic Databases.
Kale, Gulce; Ayday, Erman; Tastan, Oznur
2017-09-12
Rapid and low cost sequencing of genomes enabled widespread use of genomic data in research studies and personalized customer applications, where genomic data is shared in public databases. Although the identities of the participants are anonymized in these databases, sensitive information about individuals can still be inferred. One such information is kinship. We define two routes kinship privacy can leak and propose a technique to protect kinship privacy against these risks while maximizing the utility of shared data. The method involves systematic identification of minimal portions of genomic data to mask as new participants are added to the database. Choosing the proper positions to hide is cast as an optimization problem in which the number of positions to mask is minimized subject to privacy constraints that ensure the familial relationships are not revealed.We evaluate the proposed technique on real genomic data. Results indicate that concurrent sharing of data pertaining to a parent and an offspring results in high risks of kinship privacy, whereas the sharing data from further relatives together is often safer. We also show arrival order of family members have a high impact on the level of privacy risks and on the utility of sharing data. Available at: https://github.com/tastanlab/Kinship-Privacy. erman@cs.bilkent.edu.tr or oznur.tastan@cs.bilkent.edu.tr. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Enhancing the Breadth and Efficacy of Therapeutic Vaccines for Breast Cancer
2014-10-01
sequence data produced by the Slansky team following their single-cell emulsion RT-PCR technique; however, it can be packaged and shared for use...cell emulsion RT-PCR. Additional modifications were made to our epitope discovery workflow to increase efficacy of transcript and neoantigen candidate...the MiTCR [8] open source software package developed by MiLaboratory. MiTCR is a highly efficient and fast approach to CDR3 extraction, clonotype
CORE-SINEs: eukaryotic short interspersed retroposing elements with common sequence motifs.
Gilbert, N; Labuda, D
1999-03-16
A 65-bp "core" sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3' ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome.
CORE-SINEs: Eukaryotic short interspersed retroposing elements with common sequence motifs
Gilbert, Nicolas; Labuda, Damian
1999-01-01
A 65-bp “core” sequence is dispersed in hundreds of thousands copies in the human genome. This sequence was found to constitute the central segment of a group of short interspersed elements (SINEs), referred to as mammalian-wide interspersed repeats, that proliferated before the radiation of placental mammals. Here, we propose that the core identifies an ancient tRNA-like SINE element, which survived in different lineages such as mammals, reptiles, birds, and fish, as well as mollusks, presumably for >550 million years. This element gave rise to a number of sequence families (CORE-SINEs), including mammalian-wide interspersed repeats, whose distinct 3′ ends are shared with different families of long interspersed elements (LINEs). The evolutionary success of the generic CORE-SINE element can be related to the recruitment of the internal promoter from highly transcribed host RNA as well as to its capacity to adapt to changing retropositional opportunities by sequence exchange with actively amplifying LINEs. It reinforces the notion that the very existence of SINEs depends on the cohabitation with both LINEs and the host genome. PMID:10077603
Googling DNA sequences on the World Wide Web.
Hajibabaei, Mehrdad; Singer, Gregory A C
2009-11-10
New web-based technologies provide an excellent opportunity for sharing and accessing information and using web as a platform for interaction and collaboration. Although several specialized tools are available for analyzing DNA sequence information, conventional web-based tools have not been utilized for bioinformatics applications. We have developed a novel algorithm and implemented it for searching species-specific genomic sequences, DNA barcodes, by using popular web-based methods such as Google. We developed an alignment independent character based algorithm based on dividing a sequence library (DNA barcodes) and query sequence to words. The actual search is conducted by conventional search tools such as freely available Google Desktop Search. We implemented our algorithm in two exemplar packages. We developed pre and post-processing software to provide customized input and output services, respectively. Our analysis of all publicly available DNA barcode sequences shows a high accuracy as well as rapid results. Our method makes use of conventional web-based technologies for specialized genetic data. It provides a robust and efficient solution for sequence search on the web. The integration of our search method for large-scale sequence libraries such as DNA barcodes provides an excellent web-based tool for accessing this information and linking it to other available categories of information on the web.
Powell, J. Elijah; Ratnayeke, Nalin; Moran, Nancy A.
2017-01-01
High throughput rRNA amplicon surveys of bacterial communities provide a rapid snapshot of taxonomic composition. But strains with nearly identical rRNA sequences often differ in gene repertoires and metabolic capabilities. To assess strain-level variation within Snodgrassella alvi, a gut symbiont of corbiculate bees, we performed deep sequencing on amplicons of a single copy coding gene (minD) as well as the 16S rDNA V4 region. We surveyed honey bees (Apis mellifera) sampled globally and 12 bumble bee species (Bombus) sampled from two regions of the USA. The minD analyses reveal that S. alvi contains far more strain diversity than is evident from 16S rDNA analysis. Many taxa inferred on the basis of 16S rDNA are shared between A. mellifera and Bombus species, but taxa inferred on the basis of minD are never shared and often are restricted to particular Bombus species. Clustering based on minD revealed that gut communities often reflect host species and geographic location. Both minD and 16S rDNA analyses indicate that strain diversity is higher in A. mellifera than in Bombus species. The minD locus flanks a 16S gene, enabling development of strain-specific 16S fluorescent probes to illuminate the spatial relationship of strains within the bee gut. PMID:27482856
Privacy preserving protocol for detecting genetic relatives using rare variants.
Hormozdiari, Farhad; Joo, Jong Wha J; Wadia, Akshay; Guan, Feng; Ostrosky, Rafail; Sahai, Amit; Eskin, Eleazar
2014-06-15
High-throughput sequencing technologies have impacted many areas of genetic research. One such area is the identification of relatives from genetic data. The standard approach for the identification of genetic relatives collects the genomic data of all individuals and stores it in a database. Then, each pair of individuals is compared to detect the set of genetic relatives, and the matched individuals are informed. The main drawback of this approach is the requirement of sharing your genetic data with a trusted third party to perform the relatedness test. In this work, we propose a secure protocol to detect the genetic relatives from sequencing data while not exposing any information about their genomes. We assume that individuals have access to their genome sequences but do not want to share their genomes with anyone else. Unlike previous approaches, our approach uses both common and rare variants which provide the ability to detect much more distant relationships securely. We use a simulated data generated from the 1000 genomes data and illustrate that we can easily detect up to fifth degree cousins which was not possible using the existing methods. We also show in the 1000 genomes data with cryptic relationships that our method can detect these individuals. The software is freely available for download at http://genetics.cs.ucla.edu/crypto/. © The Author 2014. Published by Oxford University Press.
Qualitative thematic analysis of consent forms used in cancer genome sequencing.
Allen, Clarissa; Foulkes, William D
2011-07-19
Large-scale whole genome sequencing (WGS) studies promise to revolutionize cancer research by identifying targets for therapy and by discovering molecular biomarkers to aid early diagnosis, to better determine prognosis and to improve treatment response prediction. Such projects raise a number of ethical, legal, and social (ELS) issues that should be considered. In this study, we set out to discover how these issues are being handled across different jurisdictions. We examined informed consent (IC) forms from 30 cancer genome sequencing studies to assess (1) stated purpose of sample collection, (2) scope of consent requested, (3) data sharing protocols (4) privacy protection measures, (5) described risks of participation, (6) subject re-contacting, and (7) protocol for withdrawal. There is a high degree of similarity in how cancer researchers engaged in WGS are protecting participant privacy. We observed a strong trend towards both using samples for additional, unspecified research and sharing data with other investigators. IC forms were varied in terms of how they discussed re-contacting participants, returning results and facilitating participant withdrawal. Contrary to expectation, there were no consistent trends that emerged over the eight year period from which forms were collected. Examining IC forms from WGS studies elucidates how investigators are handling ELS challenges posed by this research. This information is important for ensuring that while the public benefits of research are maximized, the rights of participants are also being appropriately respected.
Jensen, Anders
2012-01-01
The taxonomic status and structure of Streptococcus dysgalactiae have been the object of much confusion. Bacteria belonging to this species are usually referred to as Lancefield group C or group G streptococci in clinical settings in spite of the fact that these terms lack precision and prevent recognition of the exact clinical relevance of these bacteria. The purpose of this study was to develop an improved basis for delineation and identification of the individual species of the pyogenic group of streptococci in the clinical microbiology laboratory, with a special focus on S. dysgalactiae. We critically reexamined the genetic relationships of the species S. dysgalactiae, Streptococcus pyogenes, Streptococcus canis, and Streptococcus equi, which may share Lancefield group antigens, by phylogenetic reconstruction based on multilocus sequence analysis (MLSA) and 16S rRNA gene sequences and by emm typing combined with phenotypic characterization. Analysis of concatenated sequences of seven genes previously used for examination of viridans streptococci distinguished robust and coherent clusters. S. dysgalactiae consists of two separate clusters consistent with the two recognized subspecies dysgalactiae and equisimilis. Both taxa share alleles with S. pyogenes in several housekeeping genes, which invalidates identification based on single-locus sequencing. S. dysgalactiae, S. canis, and S. pyogenes constitute a closely related branch within the genus Streptococcus indicative of recent descent from a common ancestor, while S. equi is highly divergent from other species of the pyogenic group streptococci. The results provide an improved basis for identification of clinically important pyogenic group streptococci and explain the overlapping spectrum of infections caused by the species associated with humans. PMID:22075580
Complete Genome Sequence of the Avian Paramyxovirus Serotype 5 Strain APMV-5/budgerigar/Japan/TI/75.
Hiono, Takahiro; Matsuno, Keita; Tuchiya, Kotaro; Lin, Zhifeng; Okamatsu, Masatoshi; Sakoda, Yoshihiro
2016-09-22
Here, we report the complete genome sequence of the avian paramyxovirus serotype 5 strain APMV-5/budgerigar/Japan/TI/75, which was determined using the Illumina MiSeq platform. The determined sequence shares 97% homology and similar genetic features with the previously known genome sequence of avian paramyxovirus serotype 5 strain APMV-5/budgerigar/Japan/Kunitachi/74. Copyright © 2016 Hiono et al.
Nishiyama, Minako; Yamamoto, Shuichi; Kurosawa, Norio
2013-08-01
Ibusuki hot spring is located on the coastline of Kagoshima Bay, Japan. The hot spring water is characterized by high salinity, high temperature, and neutral pH. The hot spring is covered by the sea during high tide, which leads to severe fluctuations in several environmental variables. A combination of molecular- and culture-based techniques was used to determine the bacterial and archaeal diversity of the hot spring. A total of 48 thermophilic bacterial strains were isolated from two sites (Site 1: 55.6°C; Site 2: 83.1°C) and they were categorized into six groups based on their 16S rRNA gene sequence similarity. Two groups (including 32 isolates) demonstrated low sequence similarity with published species, suggesting that they might represent novel taxa. The 148 clones from the Site 1 bacterial library included 76 operational taxonomy units (OTUs; 97% threshold), while 132 clones from the Site 2 bacterial library included 31 OTUs. Proteobacteria, Bacteroidetes, and Firmicutes were frequently detected in both clone libraries. The clones were related to thermophilic, mesophilic and psychrophilic bacteria. Approximately half of the sequences in bacterial clone libraries shared <92% sequence similarity with their closest sequences in a public database, suggesting that the Ibusuki hot spring may harbor a unique and novel bacterial community. By contrast, 77 clones from the Site 2 archaeal library contained only three OTUs, most of which were affiliated with Thaumarchaeota.
Sadsad, Rosemarie; Martinez, Elena; Jelfs, Peter; Hill-Cawthorne, Grant A.; Gilbert, Gwendolyn L.; Marais, Ben J.; Sintchenko, Vitali
2016-01-01
Background Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways. Methods We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants. Results Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade. Conclusion Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster. PMID:26938641
75 FR 21963 - Regulatory Flexibility Agenda
Federal Register 2010, 2011, 2012, 2013, 2014
2010-04-26
... Materials 3235-AK25 DIVISION OF INVESTMENT MANAGEMENT--Proposed Rule Stage Regulation Sequence Title... 3235-AI17 DIVISION OF INVESTMENT MANAGEMENT--Completed Actions Regulation Sequence Title Identifier... Management Investment Company 3235-AJ11 Shares, Unit Investment Trust Interests, and Municipal Fund...
A reassessment of IgM memory subsets in humans
Bagnara, Davide; Squillario, Margherita; Kipling, David; Mora, Thierry; Walczak, Aleksandra M.; Da Silva, Lucie; Weller, Sandra; Dunn-Walters, Deborah K.; Weill, Jean-Claude; Reynaud, Claude-Agnès
2015-01-01
From paired blood and spleen samples from three adult donors we performed high-throughput V-h sequencing of human B-cell subsets defined by IgD and CD27 expression: IgD+CD27+ (“MZ”), IgD−CD27+(“memory”, including IgM (“IgM-only”), IgG and IgA) and IgD−CD27− cells (“double-negative”, including IgM, IgG and IgA). 91,294 unique sequences clustered in 42,670 clones, revealing major clonal expansions in each of these subsets. Among these clones, we further analyzed those shared sequences from different subsets or tissues for Vh-gene mutation, H-CDR3-length, and Vh/Jh usage, comparing these different characteristics with all sequences from their subset of origin, for which these parameters constitute a distinct signature. The IgM-only repertoire profile differed notably from that of MZ B cells by a higher mutation frequency, and lower Vh4 and higher Jh6 gene usage. Strikingly, IgM sequences from clones shared between the MZ and the memory IgG/IgA compartments showed a mutation and repertoire profile of IgM-only and not of MZ B cells. Similarly, all IgM clonal relationships (between MZ, IgM-only, and double-negative compartments) involved sequences with the characteristics of IgM-only B cells. Finally, clonal relationships between tissues suggested distinct recirculation characteristics between MZ and switched B cells. The “IgM-only” subset (including cells with its repertoire signature but higher IgD or lower CD27 expression levels) thus appear as the only subset showing precursor-product relationships with CD27+ switched memory B cells, indicating that they represent germinal center-derived IgM memory B cells, and that IgM memory and MZ B cells constitute two distinct entities. PMID:26355154
Shamblin, Brian M.; Bolten, Alan B.; Abreu-Grobois, F. Alberto; Bjorndal, Karen A.; Cardona, Luis; Carreras, Carlos; Clusa, Marcel; Monzón-Argüello, Catalina; Nairn, Campbell J.; Nielsen, Janne T.; Nel, Ronel; Soares, Luciano S.; Stewart, Kelly R.; Vilaça, Sibelle T.; Türkozan, Oguz; Yilmaz, Can; Dutton, Peter H.
2014-01-01
Previous genetic studies have demonstrated that natal homing shapes the stock structure of marine turtle nesting populations. However, widespread sharing of common haplotypes based on short segments of the mitochondrial control region often limits resolution of the demographic connectivity of populations. Recent studies employing longer control region sequences to resolve haplotype sharing have focused on regional assessments of genetic structure and phylogeography. Here we synthesize available control region sequences for loggerhead turtles from the Mediterranean Sea, Atlantic, and western Indian Ocean basins. These data represent six of the nine globally significant regional management units (RMUs) for the species and include novel sequence data from Brazil, Cape Verde, South Africa and Oman. Genetic tests of differentiation among 42 rookeries represented by short sequences (380 bp haplotypes from 3,486 samples) and 40 rookeries represented by long sequences (∼800 bp haplotypes from 3,434 samples) supported the distinction of the six RMUs analyzed as well as recognition of at least 18 demographically independent management units (MUs) with respect to female natal homing. A total of 59 haplotypes were resolved. These haplotypes belonged to two highly divergent global lineages, with haplogroup I represented primarily by CC-A1, CC-A4, and CC-A11 variants and haplogroup II represented by CC-A2 and derived variants. Geographic distribution patterns of haplogroup II haplotypes and the nested position of CC-A11.6 from Oman among the Atlantic haplotypes invoke recent colonization of the Indian Ocean from the Atlantic for both global lineages. The haplotypes we confirmed for western Indian Ocean RMUs allow reinterpretation of previous mixed stock analysis and further suggest that contemporary migratory connectivity between the Indian and Atlantic Oceans occurs on a broader scale than previously hypothesized. This study represents a valuable model for conducting comprehensive international cooperative data management and research in marine ecology. PMID:24465810
A Reassessment of IgM Memory Subsets in Humans.
Bagnara, Davide; Squillario, Margherita; Kipling, David; Mora, Thierry; Walczak, Aleksandra M; Da Silva, Lucie; Weller, Sandra; Dunn-Walters, Deborah K; Weill, Jean-Claude; Reynaud, Claude-Agnès
2015-10-15
From paired blood and spleen samples from three adult donors, we performed high-throughput VH sequencing of human B cell subsets defined by IgD and CD27 expression: IgD(+)CD27(+) ("marginal zone [MZ]"), IgD(-)CD27(+) ("memory," including IgM ["IgM-only"], IgG and IgA) and IgD(-)CD27(-) cells ("double-negative," including IgM, IgG, and IgA). A total of 91,294 unique sequences clustered in 42,670 clones, revealing major clonal expansions in each of these subsets. Among these clones, we further analyzed those shared sequences from different subsets or tissues for VH gene mutation, H-CDR3-length, and VH/JH usage, comparing these different characteristics with all sequences from their subset of origin for which these parameters constitute a distinct signature. The IgM-only repertoire profile differed notably from that of MZ B cells by a higher mutation frequency and lower VH4 and higher JH6 gene usage. Strikingly, IgM sequences from clones shared between the MZ and the memory IgG/IgA compartments showed a mutation and repertoire profile of IgM-only and not of MZ B cells. Similarly, all IgM clonal relationships (among MZ, IgM-only, and double-negative compartments) involved sequences with the characteristics of IgM-only B cells. Finally, clonal relationships between tissues suggested distinct recirculation characteristics between MZ and switched B cells. The "IgM-only" subset (including cells with its repertoire signature but higher IgD or lower CD27 expression levels) thus appear as the only subset showing precursor-product relationships with CD27(+) switched memory B cells, indicating that they represent germinal center-derived IgM memory B cells and that IgM memory and MZ B cells constitute two distinct entities. Copyright © 2015 by The American Association of Immunologists, Inc.
Chipster: user-friendly analysis software for microarray and other high-throughput data.
Kallio, M Aleksi; Tuimala, Jarno T; Hupponen, Taavi; Klemelä, Petri; Gentile, Massimiliano; Scheinin, Ilari; Koski, Mikko; Käki, Janne; Korpelainen, Eija I
2011-10-14
The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available.
Rodriguez, Fernando; Kenefick, Aubrey W; Arkhipova, Irina R
2017-04-11
Rotifers of the class Bdelloidea, microscopic freshwater invertebrates, possess a highlydiversified repertoire of transposon families, which, however, occupy less than 4% of genomic DNA in the sequenced representative Adineta vaga . We performed a comprehensive analysis of A. vaga retroelements, and found that bdelloid long terminal repeat (LTR)retrotransposons, in addition to conserved open reading frame (ORF) 1 and ORF2 corresponding to gag and pol genes, code for an unusually high variety of ORF3 sequences. Retrovirus-like LTR families in A. vaga belong to four major lineages, three of which are rotiferspecific and encode a dUTPase domain. However only one lineage contains a canonical env like fusion glycoprotein acquired from paramyxoviruses (non-segmented negative-strand RNA viruses), although smaller ORFs with transmembrane domains may perform similar roles. A different ORF3 type encodes a GDSL esterase/lipase, which was previously identified as ORF1 in several clades of non-LTR retrotransposons, and implicated in membrane targeting. Yet another ORF3 type appears in unrelated LTR-retrotransposon lineages, and displays strong homology to DEDDy-type exonucleases involved in 3'-end processing of RNA and single-stranded DNA. Unexpectedly, each of the enzymatic ORF3s is also associated with different subsets of Penelope -like Athena retroelement families. The unusual association of the same ORF types with retroelements from different classes reflects their modular structure with a high degree of flexibility, and points to gene sharing between different groups of retroelements.
Chipster: user-friendly analysis software for microarray and other high-throughput data
2011-01-01
Background The growth of high-throughput technologies such as microarrays and next generation sequencing has been accompanied by active research in data analysis methodology, producing new analysis methods at a rapid pace. While most of the newly developed methods are freely available, their use requires substantial computational skills. In order to enable non-programming biologists to benefit from the method development in a timely manner, we have created the Chipster software. Results Chipster (http://chipster.csc.fi/) brings a powerful collection of data analysis methods within the reach of bioscientists via its intuitive graphical user interface. Users can analyze and integrate different data types such as gene expression, miRNA and aCGH. The analysis functionality is complemented with rich interactive visualizations, allowing users to select datapoints and create new gene lists based on these selections. Importantly, users can save the performed analysis steps as reusable, automatic workflows, which can also be shared with other users. Being a versatile and easily extendable platform, Chipster can be used for microarray, proteomics and sequencing data. In this article we describe its comprehensive collection of analysis and visualization tools for microarray data using three case studies. Conclusions Chipster is a user-friendly analysis software for high-throughput data. Its intuitive graphical user interface enables biologists to access a powerful collection of data analysis and integration tools, and to visualize data interactively. Users can collaborate by sharing analysis sessions and workflows. Chipster is open source, and the server installation package is freely available. PMID:21999641
He, Ji; Dai, Xinbin; Zhao, Xuechun
2007-02-09
BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at http://bioinfo.noble.org/plan/. The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users.
He, Ji; Dai, Xinbin; Zhao, Xuechun
2007-01-01
Background BLAST searches are widely used for sequence alignment. The search results are commonly adopted for various functional and comparative genomics tasks such as annotating unknown sequences, investigating gene models and comparing two sequence sets. Advances in sequencing technologies pose challenges for high-throughput analysis of large-scale sequence data. A number of programs and hardware solutions exist for efficient BLAST searching, but there is a lack of generic software solutions for mining and personalized management of the results. Systematically reviewing the results and identifying information of interest remains tedious and time-consuming. Results Personal BLAST Navigator (PLAN) is a versatile web platform that helps users to carry out various personalized pre- and post-BLAST tasks, including: (1) query and target sequence database management, (2) automated high-throughput BLAST searching, (3) indexing and searching of results, (4) filtering results online, (5) managing results of personal interest in favorite categories, (6) automated sequence annotation (such as NCBI NR and ontology-based annotation). PLAN integrates, by default, the Decypher hardware-based BLAST solution provided by Active Motif Inc. with a greatly improved efficiency over conventional BLAST software. BLAST results are visualized by spreadsheets and graphs and are full-text searchable. BLAST results and sequence annotations can be exported, in part or in full, in various formats including Microsoft Excel and FASTA. Sequences and BLAST results are organized in projects, the data publication levels of which are controlled by the registered project owners. In addition, all analytical functions are provided to public users without registration. Conclusion PLAN has proved a valuable addition to the community for automated high-throughput BLAST searches, and, more importantly, for knowledge discovery, management and sharing based on sequence alignment results. The PLAN web interface is platform-independent, easily configurable and capable of comprehensive expansion, and user-intuitive. PLAN is freely available to academic users at . The source code for local deployment is provided under free license. Full support on system utilization, installation, configuration and customization are provided to academic users. PMID:17291345
SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing
Bankevich, Anton; Nurk, Sergey; Antipov, Dmitry; Gurevich, Alexey A.; Dvorkin, Mikhail; Kulikov, Alexander S.; Lesin, Valery M.; Nikolenko, Sergey I.; Pham, Son; Prjibelski, Andrey D.; Pyshkin, Alexey V.; Sirotkin, Alexander V.; Vyahhi, Nikolay; Tesler, Glenn; Pevzner, Pavel A.
2012-01-01
Abstract The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software. PMID:22506599
Hou, Wan-ru; Chen, Yu; Wu, Xia; Hu, Jin-chu; Peng, Zheng-song; Yang, Jung; Tang, Zong-xiang; Zhou, Cai-Quan; Li, Yu-ming; Yang, Shi-kui; Du, Yu-jie; Kong, Ling-lu; Ren, Zheng-long; Zhang, Huai-yu; Shuai, Su-rong
2007-01-01
We obtained the complete mitochondrial genome of U.thibetanus mupinensis by DNA sequencing based on the PCR fragments of 18 primers we designed. The results indicate that the mtDNA is 16 868 bp in size, encodes 13 protein genes, 22 tRNA genes, and 2 rRNA genes, with an overall H-strand base composition of 31.2% A, 25.4% C, 15.5% G and 27.9% T. The sequence of the control region (CR) located between tRNA-Pro and tRNA-Phe is 1422 bp in size, consists of 8.43% of the whole genome, GC content is 51.9% and has a 6bp tandem repeat and two 10bp tandem repeats identified by using the Tandem Repeats Finder. U. thibetanus mupinensis mitochondrial genome shares high similarity with those of three other Ursidae: U. americanus (91.46%), U. arctos (89.25%) and U. maritimus (87.66%). PMID:17205108
Methods, Tools and Current Perspectives in Proteogenomics *
Ruggles, Kelly V.; Krug, Karsten; Wang, Xiaojing; Clauser, Karl R.; Wang, Jing; Payne, Samuel H.; Fenyö, David; Zhang, Bing; Mani, D. R.
2017-01-01
With combined technological advancements in high-throughput next-generation sequencing and deep mass spectrometry-based proteomics, proteogenomics, i.e. the integrative analysis of proteomic and genomic data, has emerged as a new research field. Early efforts in the field were focused on improving protein identification using sample-specific genomic and transcriptomic sequencing data. More recently, integrative analysis of quantitative measurements from genomic and proteomic studies have identified novel insights into gene expression regulation, cell signaling, and disease. Many methods and tools have been developed or adapted to enable an array of integrative proteogenomic approaches and in this article, we systematically classify published methods and tools into four major categories, (1) Sequence-centric proteogenomics; (2) Analysis of proteogenomic relationships; (3) Integrative modeling of proteogenomic data; and (4) Data sharing and visualization. We provide a comprehensive review of methods and available tools in each category and highlight their typical applications. PMID:28456751
Vij, Shubha; Kuhl, Heiner; Kuznetsova, Inna S.; Komissarov, Aleksey; Yurchenko, Andrey A.; Van Heusden, Peter; Singh, Siddharth; Thevasagayam, Natascha M.; Prakki, Sai Rama Sridatta; Purushothaman, Kathiresan; Saju, Jolly M.; Jiang, Junhui; Mbandi, Stanley Kimbung; Jonas, Mario; Hin Yan Tong, Amy; Mwangi, Sarah; Lau, Doreen; Ngoh, Si Yan; Liew, Woei Chang; Shen, Xueyan; Hon, Lawrence S.; Drake, James P.; Boitano, Matthew; Hall, Richard; Chin, Chen-Shan; Lachumanan, Ramkumar; Korlach, Jonas; Trifonov, Vladimir; Kabilov, Marsel; Tupikin, Alexey; Green, Darrell; Moxon, Simon; Garvin, Tyler; Sedlazeck, Fritz J.; Vurture, Gregory W.; Gopalapillai, Gopikrishna; Kumar Katneni, Vinaya; Noble, Tansyn H.; Scaria, Vinod; Sivasubbu, Sridhar; Jerry, Dean R.; O'Brien, Stephen J.; Schatz, Michael C.; Dalmay, Tamás; Turner, Stephen W.; Lok, Si; Christoffels, Alan; Orbán, László
2016-01-01
We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species’ native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics. PMID:27082250
2014-01-01
Background Syntrichia caninervis is a desiccation-tolerant moss and the dominant bryophyte of the Biological Soil Crusts (BSCs) found in the Mojave and Gurbantunggut deserts. Next generation high throughput sequencing technologies offer an efficient and economic choice for characterizing non-model organism transcriptomes with little or no prior molecular information available. Results In this study, we employed next generation, high-throughput, Illumina RNA-Seq to analyze the poly-(A) + mRNA from hydrated, dehydrating and desiccated S. caninervis gametophores. Approximately 58.0 million paired-end short reads were obtained and 92,240 unigenes were assembled with an average size of 493 bp, N50 value of 662 bp and a total size of 45.48 Mbp. Sequence similarity searches against five public databases (NR, Swiss-Prot, COSMOSS, KEGG and COG) found 54,125 unigenes (58.7%) with significant similarity to an existing sequence (E-value ≤ 1e-5) and could be annotated. Gene Ontology (GO) annotation assigned 24,183 unigenes to the three GO terms: Biological Process, Cellular Component or Molecular Function. GO comparison between P. patens and S. caninervis demonstrated similar sequence enrichment across all three GO categories. 29,370 deduced polypeptide sequences were assigned Pfam domain information and categorized into 4,212 Pfam domains/families. Using the PlantTFDB, 778 unigenes were predicted to be involved in the regulation of transcription and were classified into 49 transcription factor families. Annotated unigenes were mapped to the KEGG pathways and further annotated using MapMan. Comparative genomics revealed that 44% of protein families are shared in common by S. caninervis, P. patens and Arabidopsis thaliana and that 80% are shared by both moss species. Conclusions This study is one of the first comprehensive transcriptome analyses of the moss S. caninervis. Our data extends our knowledge of bryophyte transcriptomes, provides an insight to plants adapted to the arid regions of central Asia, and continues the development of S. caninervis as a model for understanding the molecular aspects of desiccation-tolerance. PMID:25086984
A shared representation of order between encoding and recognition in visual short-term memory.
Kalm, Kristjan; Norris, Dennis
2017-07-15
Many complex tasks require people to bind individual events into a sequence that can be held in short term memory (STM). For this purpose information about the order of the individual events in the sequence needs to be maintained in an active and accessible form in STM over a period of few seconds. Here we investigated how the temporal order information is shared between the presentation and response phases of an STM task. We trained a classification algorithm on the fMRI activity patterns from the presentation phase of the STM task to predict the order of the items during the subsequent recognition phase. While voxels in a number of brain regions represented positional information during either presentation and recognition phases, only voxels in the lateral prefrontal cortex (PFC) and the anterior temporal lobe (ATL) represented position consistently across task phases. A shared positional code in the ATL might reflect verbal recoding of visual sequences to facilitate the maintenance of order information over several seconds. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
The complete sequence of Cymbidium mosaic virus from Vanilla fragrans in Hainan, China.
He, Zhen; Jiang, Dongmei; Liu, Aiqin; Sang, Liwei; Li, Wenfeng; Li, Shifang
2011-06-01
The complete nucleotide sequence of Cymbidium mosaic virus (CymMV) isolated from vanilla in Hainan province, China was determined for the first time. It comprised 6,224 nucleotides; sequence analysis suggested that the isolate we obtained was a member of the genus Potexvirus, and its sequence shared 86.67-96.61% identities with previously reported sequences. Phylogenetic analysis suggested that CymMV from vanilla fragrans was clustered into subgroup A and the isolates in this subgroup displayed little regional difference.
Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Hubisz, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Zhang, Peili; Liu, Jing; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catharine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenée; Verduzco, Daniel; Clerc-Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.
2005-01-01
We have sequenced the genome of a second Drosophila species, Drosophila pseudoobscura, and compared this to the genome sequence of Drosophila melanogaster, a primary model organism. Throughout evolution the vast majority of Drosophila genes have remained on the same chromosome arm, but within each arm gene order has been extensively reshuffled, leading to a minimum of 921 syntenic blocks shared between the species. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura/melanogaster divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome-wide average, consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than random and nearby sequences between the species—but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a pattern of repeat-mediated chromosomal rearrangement, and high coadaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila. PMID:15632085
Villacreses, Javier; Rojas-Herrera, Marcelo; Sánchez, Carolina; Hewstone, Nicole; Undurraga, Soledad F.; Alzate, Juan F.; Manque, Patricio; Maracaja-Coutinho, Vinicius; Polanco, Victor
2015-01-01
Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1). High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs): ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV), Petuvirus genus. ORF1 encodes a movement protein (MP); ORF2 a Reverse Transcriptase (RT) and a Ribonuclease H (RNase H) domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs), AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq). Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant. PMID:25855242
Willems, Roel M.; Hagoort, Peter
2016-01-01
Many studies have revealed shared music–language processing resources by finding an influence of music harmony manipulations on concurrent language processing. However, the nature of the shared resources has remained ambiguous. They have been argued to be syntax specific and thus due to shared syntactic integration resources. An alternative view regards them as related to general attention and, thus, not specific to syntax. The present experiments evaluated these accounts by investigating the influence of language on music. Participants were asked to provide closure judgements on harmonic sequences in order to assess the appropriateness of sequence endings. At the same time participants read syntactic garden-path sentences. Closure judgements revealed a change in harmonic processing as the result of reading a syntactically challenging word. We found no influence of an arithmetic control manipulation (experiment 1) or semantic garden-path sentences (experiment 2). Our results provide behavioural evidence for a specific influence of linguistic syntax processing on musical harmony judgements. A closer look reveals that the shared resources appear to be needed to hold a harmonic key online in some form of syntactic working memory or unification workspace related to the integration of chords and words. Overall, our results support the syntax specificity of shared music–language processing resources. PMID:26998339
Gubala, Aneta; Davis, Steven; Weir, Richard; Melville, Lorna; Cowled, Chris; Boyle, David
2011-09-01
Tibrogargan virus (TIBV) and Coastal Plains virus (CPV) were isolated from cattle in Australia and TIBV has also been isolated from the biting midge Culicoides brevitarsis. Complete genomic sequencing revealed that the viruses share a novel genome structure within the family Rhabdoviridae, each virus containing two additional putative genes between the matrix protein (M) and glycoprotein (G) genes and one between the G and viral RNA polymerase (L) genes. The predicted novel protein products are highly diverged at the sequence level but demonstrate clear conservation of secondary structure elements, suggesting conservation of biological functions. Phylogenetic analyses showed that TIBV and CPV form an independent group within the 'dimarhabdovirus supergroup'. Although no disease has been observed in association with these viruses, antibodies were detected at high prevalence in cattle and buffalo in northern Australia, indicating the need for disease monitoring and further study of this distinctive group of viruses.
Astafieva, A A; Rogozhin, E A; Odintsova, T I; Khadeeva, N V; Grishin, E V; Egorov, Ts A
2012-08-01
Three novel antimicrobial peptides designated ToAMP1, ToAMP2 and ToAMP3 were purified from Taraxacum officinale flowers. Their amino acid sequences were determined. The peptides are cationic and cysteine-rich and consist of 38, 44 and 42 amino acid residues for ToAMP1, ToAMP2 and ToAMP3, respectively. Importantly, according to cysteine motifs, the peptides are representatives of two novel previously unknown families of plant antimicrobial peptides. ToAMP1 and ToAMP2 share high sequence identity and belong to 6-Cys-containing antimicrobial peptides, while ToAMP3 is a member of a distinct 8-Cys family. The peptides were shown to display high antimicrobial activity both against fungal and bacterial pathogens, and therefore represent new promising molecules for biotechnological and medicinal applications. Crown Copyright © 2012. Published by Elsevier Inc. All rights reserved.
Genomic dissection of conserved transcriptional regulation in intestinal epithelial cells
Camp, J. Gray; Weiser, Matthew; Cocchiaro, Jordan L.; Kingsley, David M.; Furey, Terrence S.; Sheikh, Shehzad Z.; Rawls, John F.
2017-01-01
The intestinal epithelium serves critical physiologic functions that are shared among all vertebrates. However, it is unknown how the transcriptional regulatory mechanisms underlying these functions have changed over the course of vertebrate evolution. We generated genome-wide mRNA and accessible chromatin data from adult intestinal epithelial cells (IECs) in zebrafish, stickleback, mouse, and human species to determine if conserved IEC functions are achieved through common transcriptional regulation. We found evidence for substantial common regulation and conservation of gene expression regionally along the length of the intestine from fish to mammals and identified a core set of genes comprising a vertebrate IEC signature. We also identified transcriptional start sites and other putative regulatory regions that are differentially accessible in IECs in all 4 species. Although these sites rarely showed sequence conservation from fish to mammals, surprisingly, they drove highly conserved IEC expression in a zebrafish reporter assay. Common putative transcription factor binding sites (TFBS) found at these sites in multiple species indicate that sequence conservation alone is insufficient to identify much of the functionally conserved IEC regulatory information. Among the rare, highly sequence-conserved, IEC-specific regulatory regions, we discovered an ancient enhancer upstream from her6/HES1 that is active in a distinct population of Notch-positive cells in the intestinal epithelium. Together, these results show how combining accessible chromatin and mRNA datasets with TFBS prediction and in vivo reporter assays can reveal tissue-specific regulatory information conserved across 420 million years of vertebrate evolution. We define an IEC transcriptional regulatory network that is shared between fish and mammals and establish an experimental platform for studying how evolutionarily distilled regulatory information commonly controls IEC development and physiology. PMID:28850571
Belak, Zachery R; Ovsenek, Nicholas; Eskiw, Christopher H
2018-05-23
Yin-Yang 1 (YY1) is a highly conserved transcription factor possessing RNA-binding activity. A putative YY1 homologue was previously identified in the developmental model organism Strongylocentrotus purpuratus (the purple sea urchin) by genomic sequencing. We identified a high degree of sequence similarity with YY1 homologues of vertebrate origin which shared 100% protein sequence identity over the DNA- and RNA-binding zinc-finger region with high similarity in the N-terminal transcriptional activation domain. SpYY1 demonstrated identical DNA- and RNA-binding characteristics between Xenopus laevis and S. purpuratus indicating that it maintains similar functional and biochemical properties across widely divergent deuterostome species. SpYY1 binds to the consensus YY1 DNA element, and also to U-rich RNA sequences. Although we detected SpYY1 RNA-binding activity in ova lysates and observed cytoplasmic localization, SpYY1 was not associated with maternal mRNA in ova. SpYY1 expressed in Xenopus oocytes was excluded from the nucleus and associated with maternally expressed cytoplasmic mRNA molecules. These data demonstrate the existence of an YY1 homologue in S. purpuratus with similar structural and biochemical features to those of the well-studied vertebrate YY1; however, the data reveal major differences in the biological role of YY1 in the regulation of maternally expressed mRNA in the two species.
A DNA Barcode Library for North American Ephemeroptera: Progress and Prospects
Webb, Jeffrey M.; Jacobus, Luke M.; Funk, David H.; Zhou, Xin; Kondratieff, Boris; Geraci, Christy J.; DeWalt, R. Edward; Baird, Donald J.; Richard, Barton; Phillips, Iain; Hebert, Paul D. N.
2012-01-01
DNA barcoding of aquatic macroinvertebrates holds much promise as a tool for taxonomic research and for providing the reliable identifications needed for water quality assessment programs. A prerequisite for identification using barcodes is a reliable reference library. We gathered 4165 sequences from the barcode region of the mitochondrial cytochrome c oxidase subunit I gene representing 264 nominal and 90 provisional species of mayflies (Insecta: Ephemeroptera) from Canada, Mexico, and the United States. No species shared barcode sequences and all can be identified with barcodes with the possible exception of some Caenis. Minimum interspecific distances ranged from 0.3–24.7% (mean: 12.5%), while the average intraspecific divergence was 1.97%. The latter value was inflated by the presence of very high divergences in some taxa. In fact, nearly 20% of the species included two or three haplotype clusters showing greater than 5.0% sequence divergence and some values are as high as 26.7%. Many of the species with high divergences are polyphyletic and likely represent species complexes. Indeed, many of these polyphyletic species have numerous synonyms and individuals in some barcode clusters show morphological attributes characteristic of the synonymized species. In light of our findings, it is imperative that type or topotype specimens be sequenced to correctly associate barcode clusters with morphological species concepts and to determine the status of currently synonymized species. PMID:22666447
Alignment-free sequence comparison (II): theoretical power of comparison statistics.
Wan, Lin; Reinert, Gesine; Sun, Fengzhu; Waterman, Michael S
2010-11-01
Rapid methods for alignment-free sequence comparison make large-scale comparisons between sequences increasingly feasible. Here we study the power of the statistic D2, which counts the number of matching k-tuples between two sequences, as well as D2*, which uses centralized counts, and D2S, which is a self-standardized version, both from a theoretical viewpoint and numerically, providing an easy to use program. The power is assessed under two alternative hidden Markov models; the first one assumes that the two sequences share a common motif, whereas the second model is a pattern transfer model; the null model is that the two sequences are composed of independent and identically distributed letters and they are independent. Under the first alternative model, the means of the tuple counts in the individual sequences change, whereas under the second alternative model, the marginal means are the same as under the null model. Using the limit distributions of the count statistics under the null and the alternative models, we find that generally, asymptotically D2S has the largest power, followed by D2*, whereas the power of D2 can even be zero in some cases. In contrast, even for sequences of length 140,000 bp, in simulations D2* generally has the largest power. Under the first alternative model of a shared motif, the power of D2*approaches 100% when sufficiently many motifs are shared, and we recommend the use of D2* for such practical applications. Under the second alternative model of pattern transfer,the power for all three count statistics does not increase with sequence length when the sequence is sufficiently long, and hence none of the three statistics under consideration canbe recommended in such a situation. We illustrate the approach on 323 transcription factor binding motifs with length at most 10 from JASPAR CORE (October 12, 2009 version),verifying that D2* is generally more powerful than D2. The program to calculate the power of D2, D2* and D2S can be downloaded from http://meta.cmb.usc.edu/d2. Supplementary Material is available at www.liebertonline.com/cmb.
Record of Decision for the First Active Duty F-35A Operational Base
2013-12-02
trucks or sprinkler systems to keep all areas of vehicle movement damp enough to prevent dust from leaving the construction area. - Temporary wind...synergy between the operational and logistics communities in managing a new, highly complex weapon system . ACC’s existing F-16 squadrons at Hill AFB...Share information with local fire departments on F-35A crash response procedures. Soils and Water • Sequence construction activities to limit the soil
Chau, John H; Rahfeldt, Wolfgang A; Olmstead, Richard G
2018-03-01
Targeted sequence capture can be used to efficiently gather sequence data for large numbers of loci, such as single-copy nuclear loci. Most published studies in plants have used taxon-specific locus sets developed individually for a clade using multiple genomic and transcriptomic resources. General locus sets can also be developed from loci that have been identified as single-copy and have orthologs in large clades of plants. We identify and compare a taxon-specific locus set and three general locus sets (conserved ortholog set [COSII], shared single-copy nuclear [APVO SSC] genes, and pentatricopeptide repeat [PPR] genes) for targeted sequence capture in Buddleja (Scrophulariaceae) and outgroups. We evaluate their performance in terms of assembly success, sequence variability, and resolution and support of inferred phylogenetic trees. The taxon-specific locus set had the most target loci. Assembly success was high for all locus sets in Buddleja samples. For outgroups, general locus sets had greater assembly success. Taxon-specific and PPR loci had the highest average variability. The taxon-specific data set produced the best-supported tree, but all data sets showed improved resolution over previous non-sequence capture data sets. General locus sets can be a useful source of sequence capture targets, especially if multiple genomic resources are not available for a taxon.
Bartonellae are Prevalent and Diverse in Costa Rican Bats and Bat Flies.
Judson, S D; Frank, H K; Hadly, E A
2015-12-01
Species in the bacterial genus, Bartonella, can cause disease in both humans and animals. Previous reports of Bartonella in bats and ectoparasitic bat flies suggest that bats could serve as mammalian hosts and bat flies as arthropod vectors. We compared the prevalence and genetic similarity of bartonellae in individual Costa Rican bats and their bat flies using molecular and sequencing methods targeting the citrate synthase gene (gltA). Bartonellae were more prevalent in bat flies than in bats, and genetic variants were sometimes, but not always, shared between bats and their bat flies. The detected bartonellae genetic variants were diverse, and some were similar to species known to cause disease in humans and other mammals. The high prevalence and sharing of bartonellae in bat flies and bats support a role for bat flies as a potential vector for Bartonella, while the genetic diversity and similarity to known species suggest that bartonellae could spill over into humans and animals sharing the landscape. © 2015 Blackwell Verlag GmbH.
Leonard, Matthew K; Desai, Maansi; Hungate, Dylan; Cai, Ruofan; Singhal, Nilika S; Knowlton, Robert C; Chang, Edward F
2018-05-22
Music and speech are human-specific behaviours that share numerous properties, including the fine motor skills required to produce them. Given these similarities, previous work has suggested that music and speech may at least partially share neural substrates. To date, much of this work has focused on perception, and has not investigated the neural basis of production, particularly in trained musicians. Here, we report two rare cases of musicians undergoing neurosurgical procedures, where it was possible to directly stimulate the left hemisphere cortex during speech and piano/guitar music production tasks. We found that stimulation to left inferior frontal cortex, including pars opercularis and ventral pre-central gyrus, caused slowing and arrest for both speech and music, and note sequence errors for music. Stimulation to posterior superior temporal cortex only caused production errors during speech. These results demonstrate partially dissociable networks underlying speech and music production, with a shared substrate in frontal regions.
Fei-Fei, Diao; Yong-Feng, Zhao; Jian-Li, Wang; Xue-Hua, Wei; Kai, Cui; Chuan-Yi, Liu; Shou-Yu, Guo; Jiang, Shijin; Zhi-Jing, Xie
2017-06-01
Six feline panleukopenia viruses (FPV) were detected in the intestinal samples from the 176 mink collected in China during 2015 to 2016, named MEV-SD1, MEV-SD2, MEV-SD3, MEV-SD4, MEV-SD5 and MEV-SD6. The VP2 genes of the isolates shared 98.9%-100% identity with the reference sequences. The substitution of residue V300A in VP2 protein differentiates the isolates from the reference MEVs, and A300 is a characteristic of FPV. Furthermore, phylogenetic analysis of VP2 genes indicated that the six isolates were clustered into the same branch of all the reference FPVs. The NS1 genes of the isolates shared 98.2%-100% identity with the reference sequences. The NS1 genes of the six isolates and the three reference FPVs formed one unique evolutionary branch. To clarify the pathogenicity of the isolates, animal experiments were performed on healthy mink, using MEV-SD1. As a result, the morbidity of the inoculated animals was 100% and the mortality was as high as 38.9%. It was implied that the FPV infection caused a high morbidity and mortality in mink and the inoculation dose had an effect on pathogenicity of MEV-SD1 in mink. Copyright © 2017 Elsevier B.V. All rights reserved.
van der Vossen, E A; van der Voort, J N; Kanyuka, K; Bendahmane, A; Sandbrink, H; Baulcombe, D C; Bakker, J; Stiekema, W J; Klein-Lankhorst, R M
2000-09-01
The isolation of the nematode-resistance gene Gpa2 in potato is described, and it is demonstrated that highly homologous resistance genes of a single resistance-gene cluster can confer resistance to distinct pathogen species. Molecular analysis of the Gpa2 locus resulted in the identification of an R-gene cluster of four highly homologous genes in a region of approximately 115 kb. At least two of these genes are active: one corresponds to the previously isolated Rx1 gene that confers resistance to potato virus X, while the other corresponds to the Gpa2 gene that confers resistance to the potato cyst nematode Globodera pallida. The proteins encoded by the Gpa2 and the Rx1 genes share an overall homology of over 88% (amino-acid identity) and belong to the leucine-zipper, nucleotide-binding site, leucine-rich repeat (LZ-NBS-LRR)-containing class of plant resistance genes. From the sequence conservation between Gpa2 and Rx1 it is clear that there is a direct evolutionary relationship between the two proteins. Sequence diversity is concentrated in the LRR region and in the C-terminus. The putative effector domains are more conserved suggesting that, at least in this case, nematode and virus resistance cascades could share common components. These findings underline the potential of protein breeding for engineering new resistance specificities against plant pathogens in vitro.
Reconsidering the role of temporal order in spoken word recognition.
Toscano, Joseph C; Anderson, Nathaniel D; McMurray, Bob
2013-10-01
Models of spoken word recognition assume that words are represented as sequences of phonemes. We evaluated this assumption by examining phonemic anadromes, words that share the same phonemes but differ in their order (e.g., sub and bus). Using the visual-world paradigm, we found that listeners show more fixations to anadromes (e.g., sub when bus is the target) than to unrelated words (well) and to words that share the same vowel but not the same set of phonemes (sun). This contrasts with the predictions of existing models and suggests that words are not defined as strict sequences of phonemes.
Song, Yang; Zhang, Yong; Fan, Qin; Cui, Hui; Yan, Dongmei; Zhu, Shuangli; Tang, Haishu; Sun, Qiang; Wang, Dongyan; Xu, Wenbo
2017-02-23
Human enterovirus B106 (EV-B106) is a new member of the enterovirus B species. To date, only three nucleotide sequences of EV-B106 have been published, and only one full-length genome sequence (the Yunnan strain 148/YN/CHN/12) is available in the GenBank database. In this study, we conducted phylogenetic characterisation of four EV-B106 strains isolated in Xinjiang, China. Pairwise comparisons of the nucleotide sequences and the deduced amino acid sequences revealed that the four Xinjiang EV-B106 strains had only 80.5-80.8% nucleotide identity and 95.4-97.3% amino acid identity with the Yunnan EV-B106 strain, indicating high mutagenicity. Similarity plots and bootscanning analyses revealed that frequent intertypic recombination occurred in all four Xinjiang EV-B106 strains in the non-structural region. These four strains may share a donor sequence with the EV-B85 strain, which circulated in Xinjiang in 2011, indicating extensive genetic exchanges between these strains. All Xinjiang EV-B106 strains were temperature-sensitive. An antibody seroprevalence study against EV-B106 in two Xinjiang prefectures also showed low titres of neutralizing antibodies, suggesting limited exposure and transmission in the population. This study contributes the whole genome sequences of EV-B106 to the GenBank database and provides valuable information regarding the molecular epidemiology of EV-B106 in China.
Song, Yang; Zhang, Yong; Fan, Qin; Cui, Hui; Yan, Dongmei; Zhu, Shuangli; Tang, Haishu; Sun, Qiang; Wang, Dongyan; Xu, Wenbo
2017-01-01
Human enterovirus B106 (EV-B106) is a new member of the enterovirus B species. To date, only three nucleotide sequences of EV-B106 have been published, and only one full-length genome sequence (the Yunnan strain 148/YN/CHN/12) is available in the GenBank database. In this study, we conducted phylogenetic characterisation of four EV-B106 strains isolated in Xinjiang, China. Pairwise comparisons of the nucleotide sequences and the deduced amino acid sequences revealed that the four Xinjiang EV-B106 strains had only 80.5–80.8% nucleotide identity and 95.4–97.3% amino acid identity with the Yunnan EV-B106 strain, indicating high mutagenicity. Similarity plots and bootscanning analyses revealed that frequent intertypic recombination occurred in all four Xinjiang EV-B106 strains in the non-structural region. These four strains may share a donor sequence with the EV-B85 strain, which circulated in Xinjiang in 2011, indicating extensive genetic exchanges between these strains. All Xinjiang EV-B106 strains were temperature-sensitive. An antibody seroprevalence study against EV-B106 in two Xinjiang prefectures also showed low titres of neutralizing antibodies, suggesting limited exposure and transmission in the population. This study contributes the whole genome sequences of EV-B106 to the GenBank database and provides valuable information regarding the molecular epidemiology of EV-B106 in China. PMID:28230168
Bürckert, Jean-Philippe; Dubois, Axel R S X; Faison, William J; Farinelle, Sophie; Charpentier, Emilie; Sinner, Regina; Wienecke-Baldacchino, Anke; Muller, Claude P
2017-01-01
The identification and tracking of antigen-specific immunoglobulin (Ig) sequences within total Ig repertoires is central to high-throughput sequencing (HTS) studies of infections or vaccinations. In this context, public Ig sequences shared by different individuals exposed to the same antigen could be valuable markers for tracing back infections, measuring vaccine immunogenicity, and perhaps ultimately allow the reconstruction of the immunological history of an individual. Here, we immunized groups of transgenic rats expressing human Ig against tetanus toxoid (TT), Modified Vaccinia virus Ankara (MVA), measles virus hemagglutinin and fusion proteins expressed on MVA, and the environmental carcinogen benzo[a]pyrene, coupled to TT. We showed that these antigens impose a selective pressure causing the Ig heavy chain (IgH) repertoires of the rats to converge toward the expression of antibodies with highly similar IgH CDR3 amino acid sequences. We present a computational approach, similar to differential gene expression analysis, that selects for clusters of CDR3s with 80% similarity, significantly overrepresented within the different groups of immunized rats. These IgH clusters represent antigen-induced IgH signatures exhibiting stereotypic amino acid patterns including previously described TT- and measles-specific IgH sequences. Our data suggest that with the presented methodology, transgenic Ig rats can be utilized as a model to identify antigen-induced, human IgH signatures to a variety of different antigens.
DNA Barcode Sequence Identification Incorporating Taxonomic Hierarchy and within Taxon Variability
Little, Damon P.
2011-01-01
For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple–sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple–sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment–free sequence identification algorithm–BRONX–that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple–sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user–defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini–barcode queries against a full–length barcode database). BRONX consistently produced better identifications at the genus–level for all query types. PMID:21857897
Brouard, Jean-Simon; Otis, Christian; Lemieux, Claude; Turmel, Monique
2008-01-01
Background To gain insight into the branching order of the five main lineages currently recognized in the green algal class Chlorophyceae and to expand our understanding of chloroplast genome evolution, we have undertaken the sequencing of chloroplast DNA (cpDNA) from representative taxa. The complete cpDNA sequences previously reported for Chlamydomonas (Chlamydomonadales), Scenedesmus (Sphaeropleales), and Stigeoclonium (Chaetophorales) revealed tremendous variability in their architecture, the retention of only few ancestral gene clusters, and derived clusters shared by Chlamydomonas and Scenedesmus. Unexpectedly, our recent phylogenies inferred from these cpDNAs and the partial sequences of three other chlorophycean cpDNAs disclosed two major clades, one uniting the Chlamydomonadales and Sphaeropleales (CS clade) and the other uniting the Oedogoniales, Chaetophorales and Chaetopeltidales (OCC clade). Although molecular signatures provided strong support for this dichotomy and for the branching of the Oedogoniales as the earliest-diverging lineage of the OCC clade, more data are required to validate these phylogenies. We describe here the complete cpDNA sequence of Oedogonium cardiacum (Oedogoniales). Results Like its three chlorophycean homologues, the 196,547-bp Oedogonium chloroplast genome displays a distinctive architecture. This genome is one of the most compact among photosynthetic chlorophytes. It has an atypical quadripartite structure, is intron-rich (17 group I and 4 group II introns), and displays 99 different conserved genes and four long open reading frames (ORFs), three of which are clustered in the spacious inverted repeat of 35,493 bp. Intriguingly, two of these ORFs (int and dpoB) revealed high similarities to genes not usually found in cpDNA. At the gene content and gene order levels, the Oedogonium genome most closely resembles its Stigeoclonium counterpart. Characters shared by these chlorophyceans but missing in members of the CS clade include the retention of psaM, rpl32 and trnL(caa), the loss of petA, the disruption of three ancestral clusters and the presence of five derived gene clusters. Conclusion The Oedogonium chloroplast genome disclosed additional characters that bolster the evidence for a close alliance between the Oedogoniales and Chaetophorales. Our unprecedented finding of int and dpoB in this cpDNA provides a clear example that novel genes were acquired by the chloroplast genome through horizontal transfers, possibly from a mitochondrial genome donor. PMID:18558012
Hughes, M. S.; Hoey, E. M.; Coyle, P. V.
1993-01-01
Ten coxsackievirus B4 (CVB4) strains isolated from clinical and environmental sources in Northern Ireland in 1985-7, were compared at the nucleotide sequence level. Dideoxynucleotide sequencing of a polymerase chain reaction (PCR) amplified fragment, spanning the VP1/P2A genomic region, classified the isolates into two distinct groups or genotypes as defined by Rico-Hesse and colleagues for poliovirus type 1. Isolates within each group shared approximately 99% sequence identity at the nucleotide level whereas < or = 86% sequence identity was shared between groups. One isolate derived from a clinical specimen in 1987 was grouped with six CVB4 isolates recovered from the aquatic environment in 1986-7. The second group comprised CVB4 isolates from clinical specimens in 1985-6. Both groups were different at the nucleotide level from the prototype strain isolated in 1950. It was concluded that the method could be used to sub-type CVB4 isolates and would be of value in epidemiological studies of CVB4. Predicted amino acid sequences revealed non-conservation of the tyrosine residue at the VP1/P2A cleavage site but were of little value in distinguishing CVB4 variants. PMID:8386098
Zhang, Wenqiang; Lin, Xiaojuan; Jiang, Ping; Tao, Zexin; Liu, Xiaolin; Ji, Feng; Wang, Tongzhan; Wang, Suting; Lv, Hui; Xu, Aiqiang; Wang, Haiyan
2016-08-01
Coxsackievirus B3 (CV-B3) has frequently been associated with aseptic meningitis outbreaks in China. To identify sequence motifs related to aseptic meningitis and to construct an infectious clone, the genome sequence of 08TC170, a representative strain isolated from cerebrospinal fluid (CSF) samples from an outbreak in Shandong in 2008, was determined, and the coding regions for P1-P3 and VP1 were aligned. The first 21 and last 20 residues were "TTAAAACAGCCTGTGGGTTGT" and "ATTCTCCGCATTCGGTGCGG", respectively. The whole genome consisted of 7401 nucleotides, sharing 80.8 % identity with the prototype strain Nancy and low sequence similarity with members of clusters A-C. In contrast, 08TC170 showed high sequence similarity to members of cluster D. An especially high level of sequence identity (≥97.7 %) was found within a branch constituted by 08TC170 and four Chinese strains that clustered together in all of the P1-P3 phylogenic trees. In addition, 08TC170 also possessed a close relationship to the Hong Kong strain 26362/08 in VP1. Similarity plot analysis showed that 08TC170 was most similar to the Chinese CV-B3 strain SSM in P1 and the partial P2 coding region but to the CV-B5 or E-6 strain in 2C and following regions. A T277A mutation was found in 08TC170 and other strains isolated in 2008-2010, but not in strains isolated before 2008, which had high sequence similarity and formed the cluster A277. The results suggested that 08TC170 was the product of both intertypic recombination and point mutation, whose effects on viral neurovirulence will be investigated in a further study. The high homology between 08TC170 and other strains revealed their co-circulation in mainland China and Hong Kong and indicates that further surveillance is needed.
Next Generation Sequencing Plus (NGS+) with Y-chromosomal Markers for Forensic Pedigree Searches.
Qian, Xiaoqin; Hou, Jiayi; Wang, Zheng; Ye, Yi; Lang, Min; Gao, Tianzhen; Liu, Jing; Hou, Yiping
2017-09-12
There is high demand for forensic pedigree searches with Y-chromosome short tandem repeat (Y-STR) profiling in large-scale crime investigations. However, when two Y-STR haplotypes have a few mismatched loci, it is difficult to determine if they are from the same male lineage because of the high mutation rate of Y-STRs. Here we design a new strategy to handle cases in which none of pedigree samples shares identical Y-STR haplotype. We combine next generation sequencing (NGS), capillary electrophoresis and pyrosequencing under the term 'NGS+' for typing Y-STRs and Y-chromosomal single nucleotide polymorphisms (Y-SNPs). The high-resolution Y-SNP haplogroup and Y-STR haplotype can be obtained with NGS+. We further developed a new data-driven decision rule, FSindex, for estimating the likelihood for each retrieved pedigree. Our approach enables positive identification of pedigree from mismatched Y-STR haplotypes. It is envisaged that NGS+ will revolutionize forensic pedigree searches, especially when the person of interest was not recorded in forensic DNA database.
Subramanian, Sankar; Lingala, Syamala Gowri; Swaminathan, Siva; Huynen, Leon; Lambert, David
2014-08-01
The complete mitochondrial genome of the Chinstrap penguin (Pygoscelis antarcticus) was sequenced and compared with other penguin mitogenomes. The genome is 15,972 bp in length with the number and order of protein coding genes and RNAs being very similar to that of other known penguin mitogenomes. Comparative nucleotide analysis showed the Chinstrap mitogenome shares 94% homology with the mitogenome of its sister species, Pygoscelis adelie (Adélie penguin). Divergence at nonsynonymous nucleotide positions was found to be up to 23 times less than that observed in synonymous positions of protein coding genes, suggesting high selection constraints. The complete mitogenome data will be useful for genetic and evolutionary studies of penguins.
Ward, T R; Hoang, M L; Prusty, R; Lau, C K; Keil, R L; Fangman, W L; Brewer, B J
2000-07-01
In the ribosomal DNA of Saccharomyces cerevisiae, sequences in the nontranscribed spacer 3' of the 35S ribosomal RNA gene are important to the polar arrest of replication forks at a site called the replication fork barrier (RFB) and also to the cis-acting, mitotic hyperrecombination site called HOT1. We have found that the RFB and HOT1 activity share some but not all of their essential sequences. Many of the mutations that reduce HOT1 recombination also decrease or eliminate fork arrest at one of two closely spaced RFB sites, RFB1 and RFB2. A simple model for the juxtaposition of RFB and HOT1 sequences is that the breakage of strands in replication forks arrested at RFB stimulates recombination. Contrary to this model, we show here that HOT1-stimulated recombination does not require the arrest of forks at the RFB. Therefore, while HOT1 activity is independent of replication fork arrest, HOT1 and RFB require some common sequences, suggesting the existence of a common trans-acting factor(s).
Genome Sequences of Ilzat and Eleri, Two Phages Isolated Using Microbacterium foliorum NRRL B-24224
Ali, Ilzat; Jones, Acacia Eleri; Mohamed, Aleem
2018-01-01
ABSTRACT Bacteriophages Ilzat and Eleri are newly isolated Siphoviridae infecting Microbacterium foliorum NRRL B-24224. The phage genomes are similar in length, G+C content, and architecture and share 62.9% nucleotide sequence identity. PMID:29650566
Exome Sequencing in Suspected Monogenic Dyslipidemias
Stitziel, Nathan O.; Peloso, Gina M.; Abifadel, Marianne; Cefalu, Angelo B.; Fouchier, Sigrid; Motazacker, M. Mahdi; Tada, Hayato; Larach, Daniel B.; Awan, Zuhier; Haller, Jorge F.; Pullinger, Clive R.; Varret, Mathilde; Rabès, Jean-Pierre; Noto, Davide; Tarugi, Patrizia; Kawashiri, Masa-aki; Nohara, Atsushi; Yamagishi, Masakazu; Risman, Marjorie; Deo, Rahul; Ruel, Isabelle; Shendure, Jay; Nickerson, Deborah A.; Wilson, James G.; Rich, Stephen S.; Gupta, Namrata; Farlow, Deborah N.; Neale, Benjamin M.; Daly, Mark J.; Kane, John P.; Freeman, Mason W.; Genest, Jacques; Rader, Daniel J.; Mabuchi, Hiroshi; Kastelein, John J.P.; Hovingh, G. Kees; Averna, Maurizio R.; Gabriel, Stacey; Boileau, Catherine; Kathiresan, Sekar
2015-01-01
Background Exome sequencing is a promising tool for gene mapping in Mendelian disorders. We utilized this technique in an attempt to identify novel genes underlying monogenic dyslipidemias. Methods and Results We performed exome sequencing on 213 selected family members from 41 kindreds with suspected Mendelian inheritance of extreme levels of low-density lipoprotein (LDL) cholesterol (after candidate gene sequencing excluded known genetic causes for high LDL cholesterol families) or high-density lipoprotein (HDL) cholesterol. We used standard analytic approaches to identify candidate variants and also assigned a polygenic score to each individual in order to account for their burden of common genetic variants known to influence lipid levels. In nine families, we identified likely pathogenic variants in known lipid genes (ABCA1, APOB, APOE, LDLR, LIPA, and PCSK9); however, we were unable to identify obvious genetic etiologies in the remaining 32 families despite follow-up analyses. We identified three factors that limited novel gene discovery: (1) imperfect sequencing coverage across the exome hid potentially causal variants; (2) large numbers of shared rare alleles within families obfuscated causal variant identification; and (3) individuals from 15% of families carried a significant burden of common lipid-related alleles, suggesting complex inheritance can masquerade as monogenic disease. Conclusions We identified the genetic basis of disease in nine of 41 families; however, none of these represented novel gene discoveries. Our results highlight the promise and limitations of exome sequencing as a discovery technique in suspected monogenic dyslipidemias. Considering the confounders identified may inform the design of future exome sequencing studies. PMID:25632026
Emshwiller, Eve; Doyle, Jeff J
2002-07-01
In continuing study of the origins of the octoploid tuber crop oca, Oxalis tuberosa Molina, we used phylogenetic analysis of DNA sequences of the chloroplast-active (nuclear encoded) isozyme of glutamine synthetase (ncpGS) from cultivated oca, its allies in the "Oxalis tuberosa alliance," and other Andean Oxalis. Multiple ncpGS sequences found within individuals of both the cultigen and a yet unnamed wild tuber-bearing taxon of Bolivia were separated by molecular cloning, but some cloned sequences appeared to be artifacts of polymerase chain reaction (PCR) recombination and/or Taq error. Nonetheless, three classes of nonrecombinant sequences each joined a different part of the O. tuberosa alliance clade on the ncpGS gene tree. Octoploid oca shares two sequence classes with the Bolivian tuber-bearing taxon (of unknown ploidy level). Fixed heterozygosity of these two sequence classes in all ocas sampled suggests that they represent homeologous loci and that oca is allopolyploid. A third sequence class, found in eight of nine oca plants sampled, might represent a third homeologous locus, suggesting that oca may be autoallopolyploid, and is shared with another wild tuber-bearing species, tetraploid O. picchensis of southern Peru. Thus, ncpGS data identify these two taxa as the best candidates as progenitors of cultivated oca.
A proposal to rename the hyperthermophile Pyrococcus woesei as Pyrococcus furiosus subsp. woesei.
Kanoksilapatham, Wirojne; González, Juan M; Maeder, Dennis L; DiRuggiero, Jocelyne; Robb, Frank T
2004-10-01
Pyrococcus species are hyperthermophilic members of the order Thermococcales, with optimal growth temperatures approaching 100 degrees C. All species grow heterotrophically and produce H2 or, in the presence of elemental sulfur (S(o)), H2S. Pyrococcus woesei and P. furiosus were isolated from marine sediments at the same Vulcano Island beach site and share many morphological and physiological characteristics. We report here that the rDNA operons of these strains have identical sequences, including their intergenic spacer regions and part of the 23S rRNA. Both species grow rapidly and produce H2 in the presence of 0.1% maltose and 10-100 microM sodium tungstate in S(o)-free medium. However, P. woesei shows more extensive autolysis than P. furiosus in the stationary phase. Pyrococcus furiosus and P. woesei share three closely related families of insertion sequences (ISs). A Southern blot performed with IS probes showed extensive colinearity between the genomes of P. woesei and P. furiosus. Cloning and sequencing of ISs that were in different contexts in P. woesei and P. furiosus revealed that the napA gene in P. woesei is disrupted by a type III IS element, whereas in P. furiosus, this gene is intact. A type I IS element, closely linked to the napA gene, was observed in the same context in both P. furiosus and P. woesei genomes. Our results suggest that the IS elements are implicated in genomic rearrangements and reshuffling in these closely related strains. We propose to rename P. woesei a subspecies of P. furiosus based on their identical rDNA operon sequences, many common IS elements that are shared genomic markers, and the observation that all P. woesei nucleotide sequences deposited in GenBank to date are > 99% identical to P. furiosus sequences.
Identification and analysis of pig chimeric mRNAs using RNA sequencing data
2012-01-01
Background Gene fusion is ubiquitous over the course of evolution. It is expected to increase the diversity and complexity of transcriptomes and proteomes through chimeric sequence segments or altered regulation. However, chimeric mRNAs in pigs remain unclear. Here we identified some chimeric mRNAs in pigs and analyzed the expression of them across individuals and breeds using RNA-sequencing data. Results The present study identified 669 putative chimeric mRNAs in pigs, of which 251 chimeric candidates were detected in a set of RNA-sequencing data. The 618 candidates had clear trans-splicing sites, 537 of which obeyed the canonical GU-AG splice rule. Only two putative pig chimera variants whose fusion junction was overlapped with that of a known human chimeric mRNA were found. A set of unique chimeric events were considered middle variances in the expression across individuals and breeds, and revealed non-significant variance between sexes. Furthermore, the genomic region of the 5′ partner gene shares a similar DNA sequence with that of the 3′ partner gene for 458 putative chimeric mRNAs. The 81 of those shared DNA sequences significantly matched the known DNA-binding motifs in the JASPAR CORE database. Four DNA motifs shared in parental genomic regions had significant similarity with known human CTCF binding sites. Conclusions The present study provided detailed information on some pig chimeric mRNAs. We proposed a model that trans-acting factors, such as CTCF, induced the spatial organisation of parental genes to the same transcriptional factory so that parental genes were coordinatively transcribed to give birth to chimeric mRNAs. PMID:22925561
Molecular Cloning of Secreted Luciferases from Marine Planktonic Copepods.
Takenaka, Yasuhiro; Ikeo, Kazuho; Shigeri, Yasushi
2016-01-01
Secreted luciferases isolated from copepod crustaceans are frequently used for nondisruptive reporter-gene assays, such as the continuous, automated and/or high-throughput monitoring of gene expression in living cells. All known copepod luciferases share highly conserved amino acid residues in two similar, repeated domains in the sequence. The similarity in the domains are ideal nature for designing PCR primers to amplify cDNA fragments of unidentified copepod luciferases from bioluminescent copepod crustaceans. Here, we introduce how to establish a cDNA encoding novel copepod luciferases from a copepod specimen by PCR with degenerated primers.
Ethical and legal implications of whole genome and whole exome sequencing in African populations.
Wright, Galen E B; Koornhof, Pieter G J; Adeyemo, Adebowale A; Tiffin, Nicki
2013-05-28
Rapid advances in high throughput genomic technologies and next generation sequencing are making medical genomic research more readily accessible and affordable, including the sequencing of patient and control whole genomes and exomes in order to elucidate genetic factors underlying disease. Over the next five years, the Human Heredity and Health in Africa (H3Africa) Initiative, funded by the Wellcome Trust (United Kingdom) and the National Institutes of Health (United States of America), will contribute greatly towards sequencing of numerous African samples for biomedical research. Funding agencies and journals often require submission of genomic data from research participants to databases that allow open or controlled data access for all investigators. Access to such genotype-phenotype and pedigree data, however, needs careful control in order to prevent identification of individuals or families. This is particularly the case in Africa, where many researchers and their patients are inexperienced in the ethical issues accompanying whole genome and exome research; and where an historical unidirectional flow of samples and data out of Africa has created a sense of exploitation and distrust. In the current study, we analysed the implications of the anticipated surge of next generation sequencing data in Africa and the subsequent data sharing concepts on the protection of privacy of research subjects. We performed a retrospective analysis of the informed consent process for the continent and the rest-of-the-world and examined relevant legislation, both current and proposed. We investigated the following issues: (i) informed consent, including guidelines for performing culturally-sensitive next generation sequencing research in Africa and availability of suitable informed consent documents; (ii) data security and subject privacy whilst practicing data sharing; (iii) conveying the implications of such concepts to research participants in resource limited settings. We conclude that, in order to meet the unique requirements of performing next generation sequencing-related research in African populations, novel approaches to the informed consent process are required. This will help to avoid infringement of privacy of individual subjects as well as to ensure that informed consent adheres to acceptable data protection levels with regard to use and transfer of such information.
Ethical and legal implications of whole genome and whole exome sequencing in African populations
2013-01-01
Background Rapid advances in high throughput genomic technologies and next generation sequencing are making medical genomic research more readily accessible and affordable, including the sequencing of patient and control whole genomes and exomes in order to elucidate genetic factors underlying disease. Over the next five years, the Human Heredity and Health in Africa (H3Africa) Initiative, funded by the Wellcome Trust (United Kingdom) and the National Institutes of Health (United States of America), will contribute greatly towards sequencing of numerous African samples for biomedical research. Discussion Funding agencies and journals often require submission of genomic data from research participants to databases that allow open or controlled data access for all investigators. Access to such genotype-phenotype and pedigree data, however, needs careful control in order to prevent identification of individuals or families. This is particularly the case in Africa, where many researchers and their patients are inexperienced in the ethical issues accompanying whole genome and exome research; and where an historical unidirectional flow of samples and data out of Africa has created a sense of exploitation and distrust. In the current study, we analysed the implications of the anticipated surge of next generation sequencing data in Africa and the subsequent data sharing concepts on the protection of privacy of research subjects. We performed a retrospective analysis of the informed consent process for the continent and the rest-of-the-world and examined relevant legislation, both current and proposed. We investigated the following issues: (i) informed consent, including guidelines for performing culturally-sensitive next generation sequencing research in Africa and availability of suitable informed consent documents; (ii) data security and subject privacy whilst practicing data sharing; (iii) conveying the implications of such concepts to research participants in resource limited settings. Summary We conclude that, in order to meet the unique requirements of performing next generation sequencing-related research in African populations, novel approaches to the informed consent process are required. This will help to avoid infringement of privacy of individual subjects as well as to ensure that informed consent adheres to acceptable data protection levels with regard to use and transfer of such information. PMID:23714101
Previously unknown and highly divergent ssDNA viruses populate the oceans.
Labonté, Jessica M; Suttle, Curtis A
2013-11-01
Single-stranded DNA (ssDNA) viruses are economically important pathogens of plants and animals, and are widespread in oceans; yet, the diversity and evolutionary relationships among marine ssDNA viruses remain largely unknown. Here we present the results from a metagenomic study of composite samples from temperate (Saanich Inlet, 11 samples; Strait of Georgia, 85 samples) and subtropical (46 samples, Gulf of Mexico) seawater. Most sequences (84%) had no evident similarity to sequenced viruses. In total, 608 putative complete genomes of ssDNA viruses were assembled, almost doubling the number of ssDNA viral genomes in databases. These comprised 129 genetically distinct groups, each represented by at least one complete genome that had no recognizable similarity to each other or to other virus sequences. Given that the seven recognized families of ssDNA viruses have considerable sequence homology within them, this suggests that many of these genetic groups may represent new viral families. Moreover, nearly 70% of the sequences were similar to one of these genomes, indicating that most of the sequences could be assigned to a genetically distinct group. Most sequences fell within 11 well-defined gene groups, each sharing a common gene. Some of these encoded putative replication and coat proteins that had similarity to sequences from viruses infecting eukaryotes, suggesting that these were likely from viruses infecting eukaryotic phytoplankton and zooplankton.
Santos, Leonardo N; Silva, Eduardo S; Santos, André S; De Sá, Pablo H; Ramos, Rommel T; Silva, Artur; Cooper, Philip J; Barreto, Maurício L; Loureiro, Sebastião; Pinheiro, Carina S; Alcantara-Neves, Neuza M; Pacheco, Luis G C
2016-07-01
Infection with helminthic parasites, including the soil-transmitted helminth Trichuris trichiura (human whipworm), has been shown to modulate host immune responses and, consequently, to have an impact on the development and manifestation of chronic human inflammatory diseases. De novo derivation of helminth proteomes from sequencing of transcriptomes will provide valuable data to aid identification of parasite proteins that could be evaluated as potential immunotherapeutic molecules in near future. Herein, we characterized the transcriptome of the adult stage of the human whipworm T. trichiura, using next-generation sequencing technology and a de novo assembly strategy. Nearly 17.6 million high-quality clean reads were assembled into 6414 contiguous sequences, with an N50 of 1606bp. In total, 5673 protein-encoding sequences were confidentially identified in the T. trichiura adult worm transcriptome; of these, 1013 sequences represent potential newly discovered proteins for the species, most of which presenting orthologs already annotated in the related species T. suis. A number of transcripts representing probable novel non-coding transcripts for the species T. trichiura were also identified. Among the most abundant transcripts, we found sequences that code for proteins involved in lipid transport, such as vitellogenins, and several chitin-binding proteins. Through a cross-species expression analysis of gene orthologs shared by T. trichiura and the closely related parasites T. suis and T. muris it was possible to find twenty-six protein-encoding genes that are consistently highly expressed in the adult stages of the three helminth species. Additionally, twenty transcripts could be identified that code for proteins previously detected by mass spectrometry analysis of protein fractions of the whipworm somatic extract that present immunomodulatory activities. Five of these transcripts were amongst the most highly expressed protein-encoding sequences in the T. trichiura adult worm. Besides, orthologs of proteins demonstrated to have potent immunomodulatory properties in related parasitic helminths were also predicted from the T. trichiura de novo assembled transcriptome. Copyright © 2016. Published by Elsevier B.V.
Exploration of the relationship between topology and designability of conformations
NASA Astrophysics Data System (ADS)
Leelananda, Sumudu P.; Towfic, Fadi; Jernigan, Robert L.; Kloczkowski, Andrzej
2011-06-01
Protein structures are evolutionarily more conserved than sequences, and sequences with very low sequence identity frequently share the same fold. This leads to the concept of protein designability. Some folds are more designable and lots of sequences can assume that fold. Elucidating the relationship between protein sequence and the three-dimensional (3D) structure that the sequence folds into is an important problem in computational structural biology. Lattice models have been utilized in numerous studies to model protein folds and predict the designability of certain folds. In this study, all possible compact conformations within a set of two-dimensional and 3D lattice spaces are explored. Complementary interaction graphs are then generated for each conformation and are described using a set of graph features. The full HP sequence space for each lattice model is generated and contact energies are calculated by threading each sequence onto all the possible conformations. Unique conformation giving minimum energy is identified for each sequence and the number of sequences folding to each conformation (designability) is obtained. Machine learning algorithms are used to predict the designability of each conformation. We find that the highly designable structures can be distinguished from other non-designable conformations based on certain graphical geometric features of the interactions. This finding confirms the fact that the topology of a conformation is an important determinant of the extent of its designability and suggests that the interactions themselves are important for determining the designability.
Hunt, C; Morimoto, R I
1985-01-01
We have determined the nucleotide sequence of the human hsp70 gene and 5' flanking region. The hsp70 gene is transcribed as an uninterrupted primary transcript of 2440 nucleotides composed of a 5' noncoding leader sequence of 212 nucleotides, a 3' noncoding region of 242 nucleotides, and a continuous open reading frame of 1986 nucleotides that encodes a protein with predicted molecular mass of 69,800 daltons. Upstream of the 5' terminus are the canonical TATAAA box, the sequence ATTGG that corresponds in the inverted orientation to the CCAAT motif, and the dyad sequence CTGGAAT/ATTCCCG that shares homology in 12 of 14 positions with the consensus transcription regulatory sequence common to Drosophila heat shock genes. Comparison of the predicted amino acid sequences of human hsp70 with the published sequences of Drosophila hsp70 and Escherichia coli dnaK reveals that human hsp70 is 73% identical to Drosophila hsp70 and 47% identical to E. coli dnaK. Surprisingly, the nucleotide sequences of the human and Drosophila genes are 72% identical and human and E. coli genes are 50% identical, which is more highly conserved than necessary given the degeneracy of the genetic code. The lack of accumulated silent nucleotide substitutions leads us to propose that there may be additional information in the nucleotide sequence of the hsp70 gene or the corresponding mRNA that precludes the maximum divergence allowed in the silent codon positions. PMID:3931075
Michael, Todd P; Bryant, Douglas; Gutierrez, Ryan; Borisjuk, Nikolai; Chu, Philomena; Zhang, Hanzhong; Xia, Jing; Zhou, Junfei; Peng, Hai; El Baidouri, Moaine; Ten Hallers, Boudewijn; Hastie, Alex R; Liang, Tiffany; Acosta, Kenneth; Gilbert, Sarah; McEntee, Connor; Jackson, Scott A; Mockler, Todd C; Zhang, Weixiong; Lam, Eric
2017-02-01
Spirodela polyrhiza is a fast-growing aquatic monocot with highly reduced morphology, genome size and number of protein-coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158-Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome-wide physical maps combined with high-coverage short-read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of the rDNA repeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, small RNA sequencing revealed 29 Spirodela-specific microRNA, with only two being shared with Elaeis guineensis (oil palm) and Musa balbisiana (banana). Combining DNA methylation data and small RNA sequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:Intact LTR ratio of 8.2. Interestingly, we found that Spirodela has the lowest global DNA methylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non-essential protein coding genes, rDNA and LTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large-scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.
Rapid and accurate pyrosequencing of angiosperm plastid genomes
Moore, Michael J; Dhingra, Amit; Soltis, Pamela S; Shaw, Regina; Farmerie, William G; Folta, Kevin M; Soltis, Douglas E
2006-01-01
Background Plastid genome sequence information is vital to several disciplines in plant biology, including phylogenetics and molecular biology. The past five years have witnessed a dramatic increase in the number of completely sequenced plastid genomes, fuelled largely by advances in conventional Sanger sequencing technology. Here we report a further significant reduction in time and cost for plastid genome sequencing through the successful use of a newly available pyrosequencing platform, the Genome Sequencer 20 (GS 20) System (454 Life Sciences Corporation), to rapidly and accurately sequence the whole plastid genomes of the basal eudicot angiosperms Nandina domestica (Berberidaceae) and Platanus occidentalis (Platanaceae). Results More than 99.75% of each plastid genome was simultaneously obtained during two GS 20 sequence runs, to an average depth of coverage of 24.6× in Nandina and 17.3× in Platanus. The Nandina and Platanus plastid genomes shared essentially identical gene complements and possessed the typical angiosperm plastid structure and gene arrangement. To assess the accuracy of the GS 20 sequence, over 45 kilobases of sequence were generated for each genome using conventional sequencing. Overall error rates of 0.043% and 0.031% were observed in GS 20 sequence for Nandina and Platanus, respectively. More than 97% of all observed errors were associated with homopolymer runs, with ~60% of all errors associated with homopolymer runs of 5 or more nucleotides and ~50% of all errors associated with regions of extensive homopolymer runs. No substitution errors were present in either genome. Error rates were generally higher in the single-copy and noncoding regions of both plastid genomes relative to the inverted repeat and coding regions. Conclusion Highly accurate and essentially complete sequence information was obtained for the Nandina and Platanus plastid genomes using the GS 20 System. More importantly, the high accuracy observed in the GS 20 plastid genome sequence was generated for a significant reduction in time and cost over traditional shotgun-based genome sequencing techniques, although with approximately half the coverage of previously reported GS 20 de novo genome sequence. The GS 20 should be broadly applicable to angiosperm plastid genome sequencing, and therefore promises to expand the scale of plant genetic and phylogenetic research dramatically. PMID:16934154
Avershina, Ekaterina; Angell, Inga Leena; Simpson, Melanie; Storrø, Ola; Øien, Torbjørn; Johnsen, Roar; Rudi, Knut
2018-05-01
The maternal microbiota plays an important role in infant gut colonization. In this work we have investigated which bacterial species are shared across the breast milk, vaginal and stool microbiotas of 109 women shortly before and after giving birth using 16S rRNA gene sequencing and a novel reduced metagenomic sequencing (RMS) approach in a subgroup of 16 women. All the species predicted by the 16S rRNA gene sequencing were also detected by RMS analysis and there was good correspondence between their relative abundances estimated by both approaches. Both approaches also demonstrate a low level of maternal microbiota sharing across the population and RMS analysis identified only two species common to most women and in all sample types ( Bifidobacterium longum and Enterococcus faecalis ). Breast milk was the only sample type that had significantly higher intra- than inter- individual similarity towards both vaginal and stool samples. We also searched our RMS dataset against an in silico generated reference database derived from bacterial isolates in the Human Microbiome Project. The use of this reference-based search enabled further separation of Bifidobacterium longum into Bifidobacterium longum ssp. longum and Bifidobacterium longum ssp. infantis . We also detected the Lactobacillus rhamnosus GG strain, which was used as a probiotic supplement by some women, demonstrating the potential of RMS approach for deeper taxonomic delineation and estimation.
Angell, Inga Leena; Storrø, Ola; Øien, Torbjørn; Johnsen, Roar; Rudi, Knut
2018-01-01
The maternal microbiota plays an important role in infant gut colonization. In this work we have investigated which bacterial species are shared across the breast milk, vaginal and stool microbiotas of 109 women shortly before and after giving birth using 16S rRNA gene sequencing and a novel reduced metagenomic sequencing (RMS) approach in a subgroup of 16 women. All the species predicted by the 16S rRNA gene sequencing were also detected by RMS analysis and there was good correspondence between their relative abundances estimated by both approaches. Both approaches also demonstrate a low level of maternal microbiota sharing across the population and RMS analysis identified only two species common to most women and in all sample types (Bifidobacterium longum and Enterococcus faecalis). Breast milk was the only sample type that had significantly higher intra- than inter- individual similarity towards both vaginal and stool samples. We also searched our RMS dataset against an in silico generated reference database derived from bacterial isolates in the Human Microbiome Project. The use of this reference-based search enabled further separation of Bifidobacterium longum into Bifidobacterium longum ssp. longum and Bifidobacterium longum ssp. infantis. We also detected the Lactobacillus rhamnosus GG strain, which was used as a probiotic supplement by some women, demonstrating the potential of RMS approach for deeper taxonomic delineation and estimation. PMID:29724017
Kaplan, J B; Merkel, W K; Nichols, B P
1985-06-05
The amide group of glutamine is a source of nitrogen in the biosynthesis of a variety of compounds. These reactions are catalyzed by a group of enzymes known as glutamine amidotransferases; two of these, the glutamine amidotransferase subunits of p-aminobenzoate synthase and anthranilate synthase have been studied in detail and have been shown to be structurally and functionally related. In some micro-organisms, p-aminobenzoate synthase and anthranilate synthase share a common glutamine amidotransferase subunit. We report here the primary DNA and deduced amino acid sequences of the p-aminobenzoate synthase glutamine amidotransferase subunits from Salmonella typhimurium, Klebsiella aerogenes and Serratia marcescens. A comparison of these glutamine amidotransferase sequences to the sequences of ten others, including some that function specifically in either the p-aminobenzoate synthase or anthranilate synthase complexes and some that are shared by both synthase complexes, has revealed several interesting features of the structure and organization of these genes, and has allowed us to speculate as to the evolutionary history of this family of enzymes. We propose a model for the evolution of the p-aminobenzoate synthase and anthranilate synthase glutamine amidotransferase subunits in which the duplication and subsequent divergence of the genetic information encoding a shared glutamine amidotransferase subunit led to the evolution of two new pathway-specific enzymes.
Saghatelyan, Ani; Poghosyan, Lianna
2015-01-01
The 2,379,636-bp draft genome sequence of Thermus scotoductus strain K1, isolated from geothermal spring outlet located in the Karvachar region in Nagorno Karabakh is presented. Strain K1 shares about 80% genome sequence similarity with T. scotoductus strain SA-01, recovered from a deep gold mine in South Africa. PMID:26564055
Carro, Lorena; Spröer, Cathrin; Alonso, Pilar; Trujillo, Martha E
2012-03-01
It was recently reported that Micromonospora inhabits the intracellular tissues of nitrogen fixing nodules of the wild legume Lupinus angustifolius. To determine if Micromonospora populations are also present in nitrogen fixing nodules of cultivated legumes such as Pisum sativum, we carried out the isolation of this actinobacterium from P. sativum plants collected in two man-managed fields in the region of Castilla and León (Spain). In this work, we describe the isolation of 93 Micromonospora strains recovered from nitrogen fixing nodules and the rhizosphere of P. sativum. The genomic diversity of the strains was analyzed by amplified ribosomal DNA restriction analysis (ARDRA). Forty-six isolates and 34 reference strains were further analyzed using a multilocus sequence analysis scheme developed to address the phylogeny of the genus Micromonospora and to evaluate the species distribution in the two studied habitats. The MLSA results were evaluated by DNA-DNA hybridization to determine their usefulness for the delineation of Micromonospora at the species level. In most cases, DDH values below 70% were obtained with strains that shared a sequence similarity of 98.5% or less. Thus, MLSA studies clearly supported the established taxonomy of the genus Micromonospora and indicated that genomic species could be delineated as groups of strains that share > 98.5% sequence similarity based on the 5 genes selected. The species diversity of the strains isolated from both the rhizosphere and nodules was very high and in many cases the new strains could not be related to any of the currently described species. Copyright © 2011 Elsevier GmbH. All rights reserved.
Maternal phylogeny of a newly-found yak population in china.
Mipam, Tserang Donko; Wen, Yongli; Fu, Changxiu; Li, Shanrong; Zhao, Hongwen; Ai, Yi; Li, Lu; Zhang, Lei; Zou, Deqiang
2012-01-01
The Jinchuan yak is a new yak population identified in Sichuan, China. This population has a special anatomical characteristic: an additional pair of ribs compared with other yak breeds. The genetic structure of this population is unknown. In the present study, we investigated the maternal phylogeny of this special yak population using the mitochondrial DNA variation. A total of 23 Jinchuan yaks were sequenced for a 823-bp fragment of D-loop control region and three individuals were sequenced for the whole mtDNA genome with a length of 16,371-bp. To compare with the data from other yaks, we extracted sequence data from Genebank, including D-loop of 398 yaks (from 12 breeds) and 55 wild yaks, and whole mitochondrial genomes of 53 yaks (from 12 breeds) and 21 wild yaks. A total of 127 haplotypes were defined, based on the D-loop data. Thirteen haplotypes were defined from 23 mtDNA D-loop sequences of Jinchuan yaks, six of which were shared only by Jinchuan, and one was shared by Jinchuan and wild yaks. The Jinquan yaks were found to carry clades A and B from lineage I and clade C of lineage II, respectively. It was also suggested that the Jinchuan population has no distinct different phylogenetic relationship in maternal inheritance with other breeds of yak. The highly haplotype diversity of the Pali breed, Jinchuan population, Maiwa breed and Jiulong breed suggested that the yak was first domesticated from wild yaks in the middle Himalayan region and the northern Hengduan Mountains. The special anatomic characteristic that we found in the Jinchuan population needs further studies based on nuclear data.
Ethical issues in consumer genome sequencing: Use of consumers' samples and data
Niemiec, Emilia; Howard, Heidi Carmen
2016-01-01
High throughput approaches such as whole genome sequencing (WGS) and whole exome sequencing (WES) create an unprecedented amount of data providing powerful resources for clinical care and research. Recently, WGS and WES services have been made available by commercial direct-to-consumer (DTC) companies. The DTC offer of genetic testing (GT) has already brought attention to potentially problematic issues such as the adequacy of consumers' informed consent and transparency of companies' research activities. In this study, we analysed the websites of four DTC GT companies offering WGS and/or WES with regard to their policies governing storage and future use of consumers' data and samples. The results are discussed in relation to recommendations and guiding principles such as the “Statement of the European Society of Human Genetics on DTC GT for health-related purposes” (2010) and the “Framework for responsible sharing of genomic and health-related data” (Global Alliance for Genomics and Health, 2014). The analysis reveals that some companies may store and use consumers' samples or sequencing data for unspecified research and share the data with third parties. Moreover, the companies do not provide sufficient or clear information to consumers about this, which can undermine the validity of the consent process. Furthermore, while all companies state that they provide privacy safeguards for data and mention the limitations of these, information about the possibility of re-identification is lacking. Finally, although the companies that may conduct research do include information regarding proprietary claims and commercialisation of the results, it is not clear whether consumers are aware of the consequences of these policies. These results indicate that DTC GT companies still need to improve the transparency regarding handling of consumers' samples and data, including having an explicit and clear consent process for research activities. PMID:27047756
Sri, Tanu; Mayee, Pratiksha; Singh, Anandita
2015-09-01
Whole genome sequence analyses allow unravelling such evolutionary consequences of meso-triplication event in Brassicaceae (∼14-20 million years ago (MYA)) as differential gene fractionation and diversification in homeologous sub-genomes. This study presents a simple gene-centric approach involving microsynteny and natural genetic variation analysis for understanding SUPPRESSOR of OVEREXPRESSION of CONSTANS 1 (SOC1) homeolog evolution in Brassica. Analysis of microsynteny in Brassica rapa homeologous regions containing SOC1 revealed differential gene fractionation correlating to reported fractionation status of sub-genomes of origin, viz. least fractionated (LF), moderately fractionated 1 (MF1) and most fractionated (MF2), respectively. Screening 18 cultivars of 6 Brassica species led to the identification of 8 genomic and 27 transcript variants of SOC1, including splice-forms. Co-occurrence of both interrupted and intronless SOC1 genes was detected in few Brassica species. In silico analysis characterised Brassica SOC1 as MADS intervening, K-box, C-terminal (MIKC(C)) transcription factor, with highly conserved MADS and I domains relative to K-box and C-terminal domain. Phylogenetic analyses and multiple sequence alignments depicting shared pattern of silent/non-silent mutations assigned Brassica SOC1 homologs into groups based on shared diploid base genome. In addition, a sub-genome structure in uncharacterised Brassica genomes was inferred. Expression analysis of putative MF2 and LF (Brassica diploid base genome A (AA)) sub-genome-specific SOC1 homeologs of Brassica juncea revealed near identical expression pattern. However, MF2-specific homeolog exhibited significantly higher expression implying regulatory diversification. In conclusion, evidence for polyploidy-induced sequence and regulatory evolution in Brassica SOC1 is being presented wherein differential homeolog expression is implied in functional diversification.
Hutsul, J A; Worobec, E
1997-08-01
Serratia marcescens is a nosocomial pathogen with a high incidence of beta-lactam resistance. Reduced amounts of outer-membrane porins have been correlated with increased resistance to beta-lactams but only one porin, OmpC, has been characterized at the molecular level. In this study we present the molecular characterization of a second porin, OmpF, and an analysis of the expression of S. marcescens porins in response to various environmental changes. Two porins were isolated from the outer membrane using urea-SDS-PAGE and the relative amounts were shown to be influenced by the osmolarity of the medium and the presence of salicylate. From a S. marcescens genomic DNA library an 8 kb EcoRI fragment was isolated that hybridized with an oligonucleotide encoding the published N-terminal amino acid sequence of the S. marcescens 41 kDa porin. A 41 kDa protein was detected in the outer membrane of Escherichia coli NM522 carrying the cloned S. marcescens DNA. The cloned gene was sequenced and shown to code for a protein that shared 60-70% identity with other known OmpF and OmpC sequences. The upstream DNA sequence of the S. marcescens gene was similar to the corresponding E. coli ompF sequence; however, a regulatory element important in repression of E. coli ompF at high osmolarity was absent. The cloned S. marcescens OmpF in E. coli increased in expression in conditions of high osmolarity. The potential involvement of micF in the observed osmoregulation of S. marcescens porins is discussed.
The Roles of TGF-Beta and TGF-Beta Signaling Receptors in Breast Carcinogenesis.
1997-07-01
phosphorylation of these molecules in a normal mammary epithelial cell line. Subsequently, we have focused on the functional role of Smad3 and Smad4 as...serine residues in the C-terminal portion of Smadl and Smad2, though the corresponding highly conserved sites in Smad3 and Smad5 most likely serve the...far, it appears that Smad2 and Smad3 , which share 92% sequence identity, are likely mediators for the TGF-B signal, whereas Smadl and Smad5, which
Inflammation in Prostate Carcinogenesis: Role of the Tumor Suppressor Par-4
2012-09-01
2006; 2: 138–139. 112. Nezis IP, Simonsen A, Sagona AP, Finley K, Gaumer S, Contamine D et al. Ref(2)P, the Drosophila melanogaster homologue of...Tommerup N, Hansen C, Vissing H, Shi Y. Mapping of the human PAWR (par-4) gene to chromosome 12q21. Genomics 1998; 53:241-3. 17. Joshi J... The two aPKC isoforms are highly related, sharing an overall amino acid identity of 72%.1 The conservation in their sequences is most striking in the
Janes, D E; Chapus, C; Gondo, Y; Clayton, D F; Sinha, S; Blatti, C A; Organ, C L; Fujita, M K; Balakrishnan, C N; Edwards, S V
2011-01-01
Many noncoding regions of genomes appear to be essential to genome function. Conservation of large numbers of noncoding sequences has been reported repeatedly among mammals but not thus far among birds and reptiles. By searching genomes of chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and green anole (Anolis carolinensis), we quantified the conservation among birds and reptiles and across amniotes of long, conserved noncoding sequences (LCNS), which we define as sequences ≥500 bp in length and exhibiting ≥95% similarity between species. We found 4,294 LCNS shared between chicken and zebra finch and 574 LCNS shared by the two birds and Anolis. The percent of genomes comprised by LCNS in the two birds (0.0024%) is notably higher than the percent in mammals (<0.0003% to <0.001%), differences that we show may be explained in part by differences in genome-wide substitution rates. We reconstruct a large number of LCNS for the amniote ancestor (ca. 8,630) and hypothesize differential loss and substantial turnover of these sites in descendent lineages. By contrast, we estimated a small role for recruitment of LCNS via acquisition of novel functions over time. Across amniotes, LCNS are significantly enriched with transcription factor binding sites for many developmental genes, and 2.9% of LCNS shared between the two birds show evidence of expression in brain expressed sequence tag databases. These results show that the rate of retention of LCNS from the amniote ancestor differs between mammals and Reptilia (including birds) and that this may reflect differing roles and constraints in gene regulation.
Janes, D.E.; Chapus, C.; Gondo, Y.; Clayton, D.F.; Sinha, S.; Blatti, C.A.; Organ, C.L.; Fujita, M.K.; Balakrishnan, C.N.; Edwards, S.V.
2010-01-01
Many noncoding regions of genomes appear to be essential to genome function. Conservation of large numbers of noncoding sequences has been reported repeatedly among mammals but not thus far among birds and reptiles. By searching genomes of chicken (Gallus gallus), zebra finch (Taeniopygia guttata), and green anole (Anolis carolinensis), we quantified the conservation among birds and reptiles and across amniotes of long, conserved noncoding sequences (LCNS), which we define as sequences ≥500 bp in length and exhibiting ≥95% similarity between species. We found 4,294 LCNS shared between chicken and zebra finch and 574 LCNS shared by the two birds and Anolis. The percent of genomes comprised by LCNS in the two birds (0.0024%) is notably higher than the percent in mammals (<0.0003% to <0.001%), differences that we show may be explained in part by differences in genome-wide substitution rates. We reconstruct a large number of LCNS for the amniote ancestor (ca. 8,630) and hypothesize differential loss and substantial turnover of these sites in descendent lineages. By contrast, we estimated a small role for recruitment of LCNS via acquisition of novel functions over time. Across amniotes, LCNS are significantly enriched with transcription factor binding sites for many developmental genes, and 2.9% of LCNS shared between the two birds show evidence of expression in brain expressed sequence tag databases. These results show that the rate of retention of LCNS from the amniote ancestor differs between mammals and Reptilia (including birds) and that this may reflect differing roles and constraints in gene regulation. PMID:21183607
Moreira, K G; Prates, M V; Andrade, F A C; Silva, L P; Beirão, P S L; Kushmerick, C; Naves, L A; Bloch, C
2010-08-01
Neurotoxicity is a major symptom of envenomation caused by Brazilian coral snake Micrurus frontalis. Due to the small amount of material that can be collected, no neurotoxin has been fully sequenced from this venom. In this work we report six new three-finger like toxins isolated from the venom of the coral snake M. frontalis which we named Frontoxin (FTx) I-VI. Toxins were purified using multiple steps of RP-HPLC. Molecular masses were determined by MALDI-TOF and ESI ion-trap mass spectrometry. The complete amino acid sequence of FTx II, III, IV and V were determined by sequencing of overlapping proteolytic fragments by Edman degradation and by de novo sequencing. The amino acid sequences of FTx I, II, III and VI predict 4 conserved disulphide bonds and structural similarity to previously reported short-chain alpha-neurotoxins. FTx IV and V each contained 10 conserved cysteines and share high similarity with long-chain alpha-neurotoxins. At the frog neuromuscular junction FTx II, III and IV reduced miniature endplate potential amplitudes in a time-and concentration-dependent manner suggesting Frontoxins block nicotinic acetylcholine receptors. Copyright 2010 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
González-Toril, E.; Amils, R.; Delmas, R. J.; Petit, J.-R.; Komárek, J.; Elster, J.
2009-01-01
Four different communities and one culture of autotrophic microbial assemblages were obtained by incubation of samples collected from high elevation snow in the Alps (Mt. Blanc area) and the Andes (Nevado Illimani summit, Bolivia), from Antarctic aerosol (French station Dumont d'Urville) and a maritime Antarctic soil (King George Island, South Shetlands, Uruguay Station Artigas), in a minimal mineral (oligotrophic) media. Molecular analysis of more than 200 16S rRNA gene sequences showed that all cultured cells belong to the Bacteria domain. Phylogenetic comparison with the currently available rDNA database allowed sequences belonging to Proteobacteria Alpha-, Beta- and Gamma-proteobacteria), Actinobacteria and Bacteroidetes phyla to be identified. The Andes snow culture was the richest in bacterial diversity (eight microorganisms identified) and the marine Antarctic soil the poorest (only one). Snow samples from Col du Midi (Alps) and the Andes shared the highest number of identified microorganisms (Agrobacterium, Limnobacter, Aquiflexus and two uncultured Alphaproteobacteria clones). These two sampling sites also shared four sequences with the Antarctic aerosol sample (Limnobacter, Pseudonocardia and an uncultured Alphaproteobacteriaclone). The only microorganism identified in the Antarctica soil (Brevundimonas sp.) was also detected in the Antarctic aerosol. Most of the identified microorganisms had been detected previously in cold environments, marine sediments soils and rocks. Air current dispersal is the best model to explain the presence of very specific microorganisms, like those identified in this work, in environments very distant and very different from each other.
Qualitative thematic analysis of consent forms used in cancer genome sequencing
2011-01-01
Background Large-scale whole genome sequencing (WGS) studies promise to revolutionize cancer research by identifying targets for therapy and by discovering molecular biomarkers to aid early diagnosis, to better determine prognosis and to improve treatment response prediction. Such projects raise a number of ethical, legal, and social (ELS) issues that should be considered. In this study, we set out to discover how these issues are being handled across different jurisdictions. Methods We examined informed consent (IC) forms from 30 cancer genome sequencing studies to assess (1) stated purpose of sample collection, (2) scope of consent requested, (3) data sharing protocols (4) privacy protection measures, (5) described risks of participation, (6) subject re-contacting, and (7) protocol for withdrawal. Results There is a high degree of similarity in how cancer researchers engaged in WGS are protecting participant privacy. We observed a strong trend towards both using samples for additional, unspecified research and sharing data with other investigators. IC forms were varied in terms of how they discussed re-contacting participants, returning results and facilitating participant withdrawal. Contrary to expectation, there were no consistent trends that emerged over the eight year period from which forms were collected. Conclusion Examining IC forms from WGS studies elucidates how investigators are handling ELS challenges posed by this research. This information is important for ensuring that while the public benefits of research are maximized, the rights of participants are also being appropriately respected. PMID:21771309
Genomic sequence for the aflatoxigenic filamentous fungus Aspergillus nomius
USDA-ARS?s Scientific Manuscript database
The genome of the A. nomius type strain was sequenced using a personal genome machine. Annotation of the genes was undertaken, followed by gene ontology and an investigation into the number of secondary metabolite clusters. Comparative studies with other Aspergillus species involved shared/unique ge...
Gene sequences present in Citrullus sp. having been lost during domestication of watermelon
USDA-ARS?s Scientific Manuscript database
A wide genetic diversity exists among Citrullus species, while watermelon cultivars (Citrullus lanatus var. lanatus) share a narrow genetic base as a result of many years of domestication and selection for desirable fruit qualities. The recent international watermelon genome sequencing project reve...
Hasiów-Jaroszewska, Beata; Komorowska, Beata
2013-10-01
Diagnostic methods distinguished different Pepino mosaic virus (PepMV) genotypes but the methods do not detect sequence variation in particular gene segments. The necrotic and non-necrotic isolates (pathotypes) of PepMV share a 99% sequence similarity. These isolates differ from each other at one nucleotide site in the triple gene block 3. In this study, a combination of real-time reverse transcription polymerase chain reaction and high resolution melting curve analysis of triple gene block 3 was developed for simultaneous detection and differentiation of PepMV pathotypes. The triple gene block 3 region carrying a transition A → G was amplified using two primer pairs from twelve virus isolates, and was subjected to high resolution melting curve analysis. The results showed two distinct melting curve profiles related to each pathotype. The results also indicated that the high resolution melting method could readily differentiate between necrotic and non-necrotic PepMV pathotypes. Copyright © 2013 Elsevier B.V. All rights reserved.
A high level interface to SCOP and ASTRAL implemented in python.
Casbon, James A; Crooks, Gavin E; Saqi, Mansoor A S
2006-01-10
Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources. We have designed a set of python modules to provide an abstraction of the SCOP and ASTRAL databases. The modules are designed to work as part of the Biopython distribution. Python users can now manipulate and use the SCOP hierarchy from within python programs, and use ASTRAL to return sequences of domains in SCOP, as well as clustered representations of SCOP from ASTRAL. The modules make the analysis and generation of datasets for use in structural genomics easier and more principled.
Desbiez, C; Lecoq, H
2004-08-01
Watermelon mosaic virus (WMV, Potyvirus) is a potyvirus with a worldwide distribution, mostly in temperate and mediterranean regions. According to the partial sequences that were available, WMV appeared to share high sequence similarity with Soybean mosaic virus (SMV), and it was almost considered as a strain of SMV in spite of its different and much broader host range. Like SMV, it was also related to legume-infecting potyviruses belonging to the " Bean common mosaic virus (BCMV) subgroup". In this paper we obtained the full-length sequence of WMV, and we confirmed that this virus is very closely related to SMV in most of its genome; however, there is evidence for an interspecific recombination in the P1 protein, as the P1 of WMV was 135 amino-acids longer than that of SMV, and the N-terminal half of the P1 showed no relation to SMV but was 85% identical to BCMV. This suggests that WMV has emerged through an ancestral recombination event, and supports the distinction of WMV and SMV as separate taxonomic units.
NASA Astrophysics Data System (ADS)
Arteca, Gustavo A.; Tapia, O.
Using computer-simulated molecular dynamics, we study the effect of sequence mutation on the unfolding mechanism of a native fold. The system considered is the native fold of hen egg-white lysozyme, exposed to centrifugal unfolding in vacuo. This unfolding bias elicits configurational transitions that imitate the behaviour of anhydrous proteins diffusing after electrospraying from neutral-pH solutions. By changing the sequences threaded onto the native fold of lysozyme, we probe the role of disulfide bridges and the effect of a global mutation. We find that the initial denaturing steps share common characteristics for the tested sequences. Recurrent features are: (i) the presence of dumbbell conformers with significant residual secondary structure, (ii) the ubiquitous formation of hairpins and two-stranded β-sheets regardless of disulfide bridges, and (iii) an unfolding pattern where the reduction in folding complexity is highly correlated with the decrease in chain compactness. These findings appear to be intrinsic to the shape of the native fold, suggesting that similar unfolding pathways may be accessible to many protein sequences.
Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware.
Zhu, Xiangyuan; Li, Kenli; Salah, Ahmad; Shi, Lin; Li, Keqin
2015-01-01
Multiple sequence alignment (MSA) constitutes an extremely powerful tool for many biological applications including phylogenetic tree estimation, secondary structure prediction, and critical residue identification. However, aligning large biological sequences with popular tools such as MAFFT requires long runtimes on sequential architectures. Due to the ever increasing sizes of sequence databases, there is increasing demand to accelerate this task. In this paper, we demonstrate how graphic processing units (GPUs), powered by the compute unified device architecture (CUDA), can be used as an efficient computational platform to accelerate the MAFFT algorithm. To fully exploit the GPU's capabilities for accelerating MAFFT, we have optimized the sequence data organization to eliminate the bandwidth bottleneck of memory access, designed a memory allocation and reuse strategy to make full use of limited memory of GPUs, proposed a new modified-run-length encoding (MRLE) scheme to reduce memory consumption, and used high-performance shared memory to speed up I/O operations. Our implementation tested in three NVIDIA GPUs achieves speedup up to 11.28 on a Tesla K20m GPU compared to the sequential MAFFT 7.015.
Xia, Xichao; Liu, Rongzhi; Li, Yi; Xue, Shipeng; Liu, Qingchun; Jiang, Xiao; Zhang, Wenjuan; Ding, Ke
2014-09-01
Hyaluronidase is a common component of scorpion venom and has been considered as "spreading factor" that promotes a fast penetration of the venom in the anaphylactic reaction. In the current study, a novel full-length of hyaluronidase BmHYI and three noncoding isoforms of BmHYII, BmHYIII and BmHYIV were cloned by using a combined strategy based on peptide sequencing and Rapid Amplification of cDNA Ends (RACE). BmHYI has 410 amino acid residues containing the catalytic, positional and five potential N-glycosylation sites. The deduced protein sequence of BmHYI shares significant identity with venom hyaluronidases from bees and snakes. The phylogenetic analysis showed early divergence and independent evolution of BmHYI from other hyaluronidases. An extraordinarily high level of sequence similarity was detected among four sequences. But, BmHYII, BmHYIII and BmHYIV were short of stop-codon in the open reading frame and poly(A) signal in the 3' end. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Tibbetts, Clark; Lichanska, Agnieszka M.; Borsuk, Lisa A.; Weslowski, Brian; Morris, Leah M.; Lorence, Matthew C.; Schafer, Klaus O.; Campos, Joseph; Sene, Mohamadou; Myers, Christopher A.; Faix, Dennis; Blair, Patrick J.; Brown, Jason; Metzgar, David
2010-04-01
High-density resequencing microarrays support simultaneous detection and identification of multiple viral and bacterial pathogens. Because detection and identification using RPM is based upon multiple specimen-specific target pathogen gene sequences generated in the individual test, the test results enable both a differential diagnostic analysis and epidemiological tracking of detected pathogen strains and variants from one specimen to the next. The RPM assay enables detection and identification of pathogen sequences that share as little as 80% sequence similarity to prototype target gene sequences represented as detector tiles on the array. This capability enables the RPM to detect and identify previously unknown strains and variants of a detected pathogen, as in sentinel cases associated with an infectious disease outbreak. We illustrate this capability using assay results from testing influenza A virus vaccines configured with strains that were first defined years after the design of the RPM microarray. Results are also presented from RPM-Flu testing of three specimens independently confirmed to the positive for the 2009 Novel H1N1 outbreak strain of influenza virus.
A unique chromatin complex occupies young α-satellite arrays of human centromeres
Henikoff, Jorja G.; Thakur, Jitendra; Kasinathan, Sivakanthan; Henikoff, Steven
2015-01-01
The intractability of homogeneous α-satellite arrays has impeded understanding of human centromeres. Artificial centromeres are produced from higher-order repeats (HORs) present at centromere edges, although the exact sequences and chromatin conformations of centromere cores remain unknown. We use high-resolution chromatin immunoprecipitation (ChIP) of centromere components followed by clustering of sequence data as an unbiased approach to identify functional centromere sequences. We find that specific dimeric α-satellite units shared by multiple individuals dominate functional human centromeres. We identify two recently homogenized α-satellite dimers that are occupied by precisely positioned CENP-A (cenH3) nucleosomes with two ~100–base pair (bp) DNA wraps in tandem separated by a CENP-B/CENP-C–containing linker, whereas pericentromeric HORs show diffuse positioning. Precise positioning is largely maintained, whereas abundance decreases exponentially with divergence, which suggests that young α-satellite dimers with paired ~100-bp particles mediate evolution of functional human centromeres. Our unbiased strategy for identifying functional centromeric sequences should be generally applicable to tandem repeat arrays that dominate the centromeres of most eukaryotes. PMID:25927077
Robinson, Nick A; Hall, Nathan E; Ross, Elizabeth M; Cooke, Ira R; Shiel, Brett P; Robinson, Andrew J; Strugnell, Jan M
2016-01-01
The mitochondrial genome of greenlip abalone, Haliotis laevigata, is reported. MiSeq and HiSeq sequencing of one individual was assembled to yield a single 16,545 bp contig. The sequence shares 92% identity to the H. rubra mitochondrial genome (a closely related species that hybridize with H. laevigata in the wild). The sequence will be useful for determining the maternal contribution to hybrid populations, for investigating population structure and stock-enhancement effectiveness.
ERIC Educational Resources Information Center
Richards, Janet C.
2010-01-01
Studies indicate thoughtfully planned chants integrated with shared book reading help young children remember concepts and vocabulary they hear in literature, capture children's imagination, develop their rhyming acuity, and background knowledge, and increase their sense of story structure, understanding of story sequence, phonological awareness,…
Symbolic dynamics techniques for complex systems: Application to share price dynamics
NASA Astrophysics Data System (ADS)
Xu, Dan; Beck, Christian
2017-05-01
The symbolic dynamics technique is well known for low-dimensional dynamical systems and chaotic maps, and lies at the roots of the thermodynamic formalism of dynamical systems. Here we show that this technique can also be successfully applied to time series generated by complex systems of much higher dimensionality. Our main example is the investigation of share price returns in a coarse-grained way. A nontrivial spectrum of Rényi entropies is found. We study how the spectrum depends on the time scale of returns, the sector of stocks considered, as well as the number of symbols used for the symbolic description. Overall our analysis confirms that in the symbol space transition probabilities of observed share price returns depend on the entire history of previous symbols, thus emphasizing the need for a modelling based on non-Markovian stochastic processes. Our method allows for quantitative comparisons of entirely different complex systems, for example the statistics of symbol sequences generated by share price returns using 4 symbols can be compared with that of genomic sequences.
Genetic diversity of merozoite surface antigens in Babesia bovis detected from Sri Lankan cattle.
Sivakumar, Thillaiampalam; Okubo, Kazuhiro; Igarashi, Ikuo; de Silva, Weligodage Kumarawansa; Kothalawala, Hemal; Silva, Seekkuge Susil Priyantha; Vimalakumar, Singarayar Caniciyas; Meewewa, Asela Sanjeewa; Yokoyama, Naoaki
2013-10-01
Babesia bovis, the causative agent of severe bovine babesiosis, is endemic in Sri Lanka. The live attenuated vaccine (K-strain), which was introduced in the early 1990s, has been used to immunize cattle populations in endemic areas of the country. The present study was undertaken to determine the genetic diversity of merozoite surface antigens (MSAs) in B. bovis isolates from Sri Lankan cattle, and to compare the gene sequences obtained from such isolates against those of the K-strain. Forty-four bovine blood samples isolated from different geographical regions of Sri Lanka and judged to be B. bovis-positive by PCR screening were used to amplify MSAs (MSA-1, MSA-2c, MSA-2a1, MSA-2a2, and MSA-2b), AMA-1, and 12D3 genes from parasite DNA. Although the AMA-1 and 12D3 gene sequences were highly conserved among the Sri Lankan isolates, the MSA gene sequences from the same isolates were highly diverse. Sri Lankan MSA-1, MSA-2c, MSA-2a1, MSA-2a2, and MSA-2b sequences clustered within 5, 2, 4, 1, and 9 different clades in the gene phylograms, respectively, while the minimum similarity values among the deduced amino acid sequences of these genes were 36.8%, 68.7%, 80.3%, 100%, and 68.3%, respectively. In the phylograms, none of the Sri Lankan sequences fell within clades containing the respective K-strain sequences. Additionally, the similarity values for MSA-1 and MSA-2c were 40-61.8% and 90.9-93.2% between the Sri Lankan isolates and the K-strain, respectively, while the K-strain MSA-2a/b sequence shared 64.5-69.8%, 69.3%, and 70.5-80.3% similarities with the Sri Lankan MSA-2a1, MSA-2a2, and MSA-2b sequences, respectively. The present study has shown that genetic diversity among MSAs of Sri Lankan B. bovis isolates is very high, and that the sequences of field isolates diverged genetically from the K-strain. Copyright © 2013 Elsevier B.V. All rights reserved.
Nature and distribution of feline sarcoma virus nucleotide sequences.
Frankel, A E; Gilbert, J H; Porzig, K J; Scolnick, E M; Aaronson, S A
1979-01-01
The genomes of three independent isolates of feline sarcoma virus (FeSV) were compared by molecular hybridization techniques. Using complementary DNAs prepared from two strains, SM- and ST-FeSV, common complementary DNA'S were selected by sequential hybridization to FeSV and feline leukemia virus RNAs. These DNAs were shown to be highly related among the three independent sarcoma virus isolates. FeSV-specific complementary DNAs were prepared by selection for hybridization by the homologous FeSV RNA and against hybridization by fline leukemia virus RNA. Sarcoma virus-specific sequences of SM-FeSV were shown to differ from those of either ST- or GA-FeSV strains, whereas ST-FeSV-specific DNA shared extensive sequence homology with GA-FeSV. By molecular hybridization, each set of FeSV-specific sequences was demonstrated to be present in normal cat cellular DNA in approximately one copy per haploid genome and was conserved throughout Felidae. In contrast, FeSV-common sequences were present in multiple DNA copies and were found only in Mediterranean cats. The present results are consistent with the concept that each FeSV strain has arisen by a mechanism involving recombination between feline leukemia virus and cat cellular DNA sequences, the latter represented within the cat genome in a manner analogous to that of a cellular gene. PMID:225544
Yu, Yao; Hu, Hao; Bohlender, Ryan J; Hu, Fulan; Chen, Jiun-Sheng; Holt, Carson; Fowler, Jerry; Guthery, Stephen L; Scheet, Paul; Hildebrandt, Michelle A T; Yandell, Mark; Huff, Chad D
2018-04-06
High-throughput sequencing data are increasingly being made available to the research community for secondary analyses, providing new opportunities for large-scale association studies. However, heterogeneity in target capture and sequencing technologies often introduce strong technological stratification biases that overwhelm subtle signals of association in studies of complex traits. Here, we introduce the Cross-Platform Association Toolkit, XPAT, which provides a suite of tools designed to support and conduct large-scale association studies with heterogeneous sequencing datasets. XPAT includes tools to support cross-platform aware variant calling, quality control filtering, gene-based association testing and rare variant effect size estimation. To evaluate the performance of XPAT, we conducted case-control association studies for three diseases, including 783 breast cancer cases, 272 ovarian cancer cases, 205 Crohn disease cases and 3507 shared controls (including 1722 females) using sequencing data from multiple sources. XPAT greatly reduced Type I error inflation in the case-control analyses, while replicating many previously identified disease-gene associations. We also show that association tests conducted with XPAT using cross-platform data have comparable performance to tests using matched platform data. XPAT enables new association studies that combine existing sequencing datasets to identify genetic loci associated with common diseases and other complex traits.
Piombo, Edoardo; Sela, Noa; Wisniewski, Michael; Hoffmann, Maria; Gullino, Maria L.; Allard, Marc W.; Levin, Elena; Spadaro, Davide; Droby, Samir
2018-01-01
The yeast Metschnikowia fructicola was reported as an efficient biological control agent of postharvest diseases of fruits and vegetables, and it is the bases of the commercial formulated product “Shemer.” Several mechanisms of action by which M. fructicola inhibits postharvest pathogens were suggested including iron-binding compounds, induction of defense signaling genes, production of fungal cell wall degrading enzymes and relatively high amounts of superoxide anions. We assembled the whole genome sequence of two strains of M. fructicola using PacBio and Illumina shotgun sequencing technologies. Using the PacBio, a high-quality draft genome consisting of 93 contigs, with an estimated genome size of approximately 26 Mb, was obtained. Comparative analysis of M. fructicola proteins with the other three available closely related genomes revealed a shared core of homologous proteins coded by 5,776 genes. Comparing the genomes of the two M. fructicola strains using a SNP calling approach resulted in the identification of 564,302 homologous SNPs with 2,004 predicted high impact mutations. The size of the genome is exceptionally high when compared with those of available closely related organisms, and the high rate of homology among M. fructicola genes points toward a recent whole-genome duplication event as the cause of this large genome. Based on the assembled genome, sequences were annotated with a gene description and gene ontology (GO term) and clustered in functional groups. Analysis of CAZymes family genes revealed 1,145 putative genes, and transcriptomic analysis of CAZyme expression levels in M. fructicola during its interaction with either grapefruit peel tissue or Penicillium digitatum revealed a high level of CAZyme gene expression when the yeast was placed in wounded fruit tissue. PMID:29666611
Mirroring and beyond: coupled dynamics as a generalized framework for modelling social interactions
Hasson, Uri; Frith, Chris D.
2016-01-01
When people observe one another, behavioural alignment can be detected at many levels, from the physical to the mental. Likewise, when people process the same highly complex stimulus sequences, such as films and stories, alignment is detected in the elicited brain activity. In early sensory areas, shared neural patterns are coupled to the low-level properties of the stimulus (shape, motion, volume, etc.), while in high-order brain areas, shared neural patterns are coupled to high-levels aspects of the stimulus, such as meaning. Successful social interactions require such alignments (both behavioural and neural), as communication cannot occur without shared understanding. However, we need to go beyond simple, symmetric (mirror) alignment once we start interacting. Interactions are dynamic processes, which involve continuous mutual adaptation, development of complementary behaviour and division of labour such as leader–follower roles. Here, we argue that interacting individuals are dynamically coupled rather than simply aligned. This broader framework for understanding interactions can encompass both processes by which behaviour and brain activity mirror each other (neural alignment), and situations in which behaviour and brain activity in one participant are coupled (but not mirrored) to the dynamics in the other participant. To apply these more sophisticated accounts of social interactions to the study of the underlying neural processes we need to develop new experimental paradigms and novel methods of data analysis PMID:27069044
DOE Office of Scientific and Technical Information (OSTI.GOV)
Osman, Wan Adnawani Meor; van Berkum, Peter; León-Barrios, Milagros
Ensifer meliloti Mlalz-1 (INSDC = ATZD00000000) is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing nodule of Medicago laciniata (L.) Miller from a soil sample collected near the town of Guatiza on the island of Lanzarote, the Canary Islands, Spain. This strain nodulates and forms an effective symbiosis with the highly specific host M. laciniata. This rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) sequencing project. Here in this paper, the features of E. meliloti Mlalz-1 are described, together with high-qualitymore » permanent draft genome sequence information and annotation. The 6,664,116 bp high-quality draft genome is arranged in 99 scaffolds of 100 contigs, containing 6314 protein-coding genes and 74 RNA-only encoding genes. Strain Mlalz-1 is closely related to Ensifer meliloti IAM 12611 T, Ensifer medicae A 321T and Ensifer numidicus ORS 1407 T, based on 16S rRNA gene sequences. gANI values of ≥98.1% support the classification of strain Mlalz-1 as E. meliloti . Nodulation of M. laciniata requires a specific nodC allele, and the nodC gene of strain Mlalz-1 shares ≥98% sequence identity with nodC of M. laciniata-nodulating Ensifer strains, but ≤93% with nodC of Ensifer strains that nodulate other Medicago species. Strain Mlalz-1 is unique among sequenced E. meliloti strains in possessing genes encoding components of a T2SS and in having two versions of the adaptive acid tolerance response lpiA-acvB operon. In E. medicae strain WSM419, lpiA is essential for enhancing survival in lethal acid conditions. The second copy of the lpiA-acvB operon of strain Mlalz-1 has highest sequence identity (> 96%) with that of E. medicae strains, which suggests genetic recombination between strain Mlalz-1 and E. medicae and the horizontal gene transfer of lpiA-acvB.« less
Osman, Wan Adnawani Meor; van Berkum, Peter; León-Barrios, Milagros; ...
2017-09-25
Ensifer meliloti Mlalz-1 (INSDC = ATZD00000000) is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing nodule of Medicago laciniata (L.) Miller from a soil sample collected near the town of Guatiza on the island of Lanzarote, the Canary Islands, Spain. This strain nodulates and forms an effective symbiosis with the highly specific host M. laciniata. This rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) sequencing project. Here in this paper, the features of E. meliloti Mlalz-1 are described, together with high-qualitymore » permanent draft genome sequence information and annotation. The 6,664,116 bp high-quality draft genome is arranged in 99 scaffolds of 100 contigs, containing 6314 protein-coding genes and 74 RNA-only encoding genes. Strain Mlalz-1 is closely related to Ensifer meliloti IAM 12611 T, Ensifer medicae A 321T and Ensifer numidicus ORS 1407 T, based on 16S rRNA gene sequences. gANI values of ≥98.1% support the classification of strain Mlalz-1 as E. meliloti . Nodulation of M. laciniata requires a specific nodC allele, and the nodC gene of strain Mlalz-1 shares ≥98% sequence identity with nodC of M. laciniata-nodulating Ensifer strains, but ≤93% with nodC of Ensifer strains that nodulate other Medicago species. Strain Mlalz-1 is unique among sequenced E. meliloti strains in possessing genes encoding components of a T2SS and in having two versions of the adaptive acid tolerance response lpiA-acvB operon. In E. medicae strain WSM419, lpiA is essential for enhancing survival in lethal acid conditions. The second copy of the lpiA-acvB operon of strain Mlalz-1 has highest sequence identity (> 96%) with that of E. medicae strains, which suggests genetic recombination between strain Mlalz-1 and E. medicae and the horizontal gene transfer of lpiA-acvB.« less
Pearson, Bruce M.; Louwen, Rogier; van Baarlen, Peter; van Vliet, Arnoud H.M.
2015-01-01
CRISPR (clustered regularly interspaced palindromic repeats)-Cas (CRISPR-associated) systems are sequence-specific adaptive defenses against phages and plasmids which are widespread in prokaryotes. Here we have studied whether phylogenetic relatedness or sharing of environmental niches affects the distribution and dissemination of Type II CRISPR-Cas systems, first in 132 bacterial genomes from 15 phylogenetic classes, ranging from Proteobacteria to Actinobacteria. There was clustering of distinct Type II CRISPR-Cas systems in phylogenetically distinct genera with varying G+C%, which share environmental niches. The distribution of CRISPR-Cas within a genus was studied using a large collection of genome sequences of the closely related Campylobacter species Campylobacter jejuni (N = 3,746) and Campylobacter coli (N = 486). The Cas gene cas9 and CRISPR-repeat are almost universally present in C. jejuni genomes (98.0% positive) but relatively rare in C. coli genomes (9.6% positive). Campylobacter jejuni and agricultural C. coli isolates share the C. jejuni CRISPR-Cas system, which is closely related to, but distinct from the C. coli CRISPR-Cas system found in C. coli isolates from nonagricultural sources. Analysis of the genomic position of CRISPR-Cas insertion suggests that the C. jejuni-type CRISPR-Cas has been transferred to agricultural C. coli. Conversely, the absence of the C. coli-type CRISPR-Cas in agricultural C. coli isolates may be due to these isolates not sharing the same environmental niche, and may be affected by farm hygiene and biosecurity practices in the agricultural sector. Finally, many CRISPR spacer alleles were linked with specific multilocus sequence types, suggesting that these can assist molecular epidemiology applications for C. jejuni and C. coli. PMID:26338188
Test Sequence Priming in Recognition Memory
ERIC Educational Resources Information Center
Johns, Elizabeth E.; Mewhort, D. J. K.
2009-01-01
The authors examined priming within the test sequence in 3 recognition memory experiments. A probe primed its successor whenever both probes shared a feature with the same studied item ("interjacent priming"), indicating that the study item like the probe is central to the decision. Interjacent priming occurred even when the 2 probes did…
Stimulus-Dependent Flexibility in Non-Human Auditory Pitch Processing
ERIC Educational Resources Information Center
Bregman, Micah R.; Patel, Aniruddh D.; Gentner, Timothy Q.
2012-01-01
Songbirds and humans share many parallels in vocal learning and auditory sequence processing. However, the two groups differ notably in their abilities to recognize acoustic sequences shifted in absolute pitch (pitch height). Whereas humans maintain accurate recognition of words or melodies over large pitch height changes, songbirds are…
El-Bebany, Ahmed F; Rampitsch, Christof; Daayf, Fouad
2010-01-01
Verticillium dahliae is a soilborne fungus that causes a vascular wilt disease of plants and losses in a broad range of economically important crops worldwide. In this study, we compared the proteomes of highly (Vd1396-9) and weakly (Vs06-14) aggressive isolates of V. dahliae to identify protein factors that may contribute to pathogenicity. Twenty-five protein spots were consistently observed as differential in the proteome profiles of the two isolates. The protein sequences in the spots were identified by LC-ESI-MS/MS and MASCOT database searches. Some of the identified sequences shared homology with fungal proteins that have roles in stress response, colonization, melanin biosynthesis, microsclerotia formation, antibiotic resistance, and fungal penetration. These are important functions for infection of the host and survival of the pathogen in soil. One protein found only in the highly aggressive isolate was identified as isochorismatase hydrolase, a potential plant-defense suppressor. This enzyme may inhibit the production of salicylic acid, which is important for plant defense response signaling. Other sequences corresponding to potential pathogenicity factors were identified in the highly aggressive isolate. This work indicates that, in combination with functional genomics, proteomics-based analyses can provide additional insights into pathogenesis and potential management strategies for this disease.
Gabrieli, Paolo; Gomulski, Ludvik M.; Bonomi, Angelica; Siciliano, Paolo; Scolari, Francesca; Franz, Gerald; Jessup, Andrew; Malacrida, Anna R.; Gasperi, Giuliano
2011-01-01
Background Diptera have an extraordinary variety of sex determination mechanisms, and Drosophila melanogaster is the paradigm for this group. However, the Drosophila sex determination pathway is only partially conserved and the family Tephritidae affords an interesting example. The tephritid Y chromosome is postulated to be necessary to determine male development. Characterization of Y sequences, apart from elucidating the nature of the male determining factor, is also important to understand the evolutionary history of sex chromosomes within the Tephritidae. We studied the Y sequences from the olive fly, Bactrocera oleae. Its Y chromosome is minute and highly heterochromatic, and displays high heteromorphism with the X chromosome. Methodology/Principal Findings A combined Representational Difference Analysis (RDA) and fluorescence in-situ hybridization (FISH) approach was used to investigate the Y chromosome to derive information on its sequence content. The Y chromosome is strewn with repetitive DNA sequences, the majority of which are also interdispersed in the pericentromeric regions of the autosomes. The Y chromosome appears to have accumulated small and large repetitive interchromosomal duplications. The large interchromosomal duplications harbour an importin-4-like gene fragment. Apart from these importin-4-like sequences, the other Y repetitive sequences are not shared with the X chromosome, suggesting molecular differentiation of these two chromosomes. Moreover, as the identified Y sequences were not detected on the Y chromosomes of closely related tephritids, we can infer divergence in the repetitive nature of their sequence contents. Conclusions/Significance The identification of Y-linked sequences may tell us much about the repetitive nature, the origin and the evolution of Y chromosomes. We hypothesize how these repetitive sequences accumulated and were maintained on the Y chromosome during its evolutionary history. Our data reinforce the idea that the sex chromosomes of the Tephritidae may have distinct evolutionary origins with respect to those of the Drosophilidae and other Dipteran families. PMID:21408187
Zhalnina, Kateryna V.; Dias, Raquel; Leonard, Michael T.; Dorr de Quadros, Patricia; Camargo, Flavio A. O.; Drew, Jennifer C.; Farmerie, William G.; Daroub, Samira H.; Triplett, Eric W.
2014-01-01
The activity of ammonia-oxidizing archaea (AOA) leads to the loss of nitrogen from soil, pollution of water sources and elevated emissions of greenhouse gas. To date, eight AOA genomes are available in the public databases, seven are from the group I.1a of the Thaumarchaeota and only one is from the group I.1b, isolated from hot springs. Many soils are dominated by AOA from the group I.1b, but the genomes of soil representatives of this group have not been sequenced and functionally characterized. The lack of knowledge of metabolic pathways of soil AOA presents a critical gap in understanding their role in biogeochemical cycles. Here, we describe the first complete genome of soil archaeon Candidatus Nitrososphaera evergladensis, which has been reconstructed from metagenomic sequencing of a highly enriched culture obtained from an agricultural soil. The AOA enrichment was sequenced with the high throughput next generation sequencing platforms from Pacific Biosciences and Ion Torrent. The de novo assembly of sequences resulted in one 2.95 Mb contig. Annotation of the reconstructed genome revealed many similarities of the basic metabolism with the rest of sequenced AOA. Ca. N. evergladensis belongs to the group I.1b and shares only 40% of whole-genome homology with the closest sequenced relative Ca. N. gargensis. Detailed analysis of the genome revealed coding sequences that were completely absent from the group I.1a. These unique sequences code for proteins involved in control of DNA integrity, transporters, two-component systems and versatile CRISPR defense system. Notably, genomes from the group I.1b have more gene duplications compared to the genomes from the group I.1a. We suggest that the presence of these unique genes and gene duplications may be associated with the environmental versatility of this group. PMID:24999826
Saghatelyan, Ani; Poghosyan, Lianna; Panosyan, Hovik; Birkeland, Nils-Kåre
2015-11-12
The 2,379,636-bp draft genome sequence of Thermus scotoductus strain K1, isolated from geothermal spring outlet located in the Karvachar region in Nagorno Karabakh is presented. Strain K1 shares about 80% genome sequence similarity with T. scotoductus strain SA-01, recovered from a deep gold mine in South Africa. Copyright © 2015 Saghatelyan et al.
Aoyagi, K; Beyou, A; Moon, K; Fang, L; Ulrich, T
1993-01-01
The enzyme 3-hydroxy-3-methylglutaryl coenzyme A reductase (HMGR, EC 1.1.1.34) is a key enzyme in the isoprenoid biosynthetic pathway. We have isolated partial cDNAs from wheat (Triticum aestivum) using the polymerase chain reaction. Comparison of deduced amino acid sequences of these cDNAs shows that they represent a small family of genes that share a high degree of sequence homology among themselves as well as among genes from other organisms including tomato, Arabidopsis, hamster, human, Drosophila, and yeast. Southern blot analysis reveals the presence of at least four genes. Our results concerning the tissue-specific expression as well as developmental regulation of these HMGR cDNAs highlight the important role of this enzyme in the growth and development of wheat. PMID:8108513
Lindeberg, M; Collmer, A
1992-01-01
Many extracellular proteins produced by Erwinia chrysanthemi require the out gene products for transport across the outer membrane. In a previous report (S. Y. He, M. Lindeberg, A. K. Chatterjee, and A. Collmer, Proc. Natl. Acad. Sci. USA 88:1079-1083, 1991) cosmid pCPP2006, sufficient for secretion of Erwinia chrysanthemi extracellular proteins by Escherichia coli, was partially sequenced, revealing four out genes sharing high homology with pulH through pulK from Klebsiella oxytoca. The nucleotide sequence of eight additional out genes reveals homology with pulC through pulG, pulL, pulM, pulO, and other genes involved in secretion by various gram-negative bacteria. Although signal sequences and hydrophobic regions are generally conserved between Pul and Out proteins, four out genes contain unique inserts, a pulN homolog is not present, and outO appears to be transcribed separately from outC through outM. The sequenced region was subcloned, and an additional 7.6-kb region upstream was identified as being required for secretion in E. coli. out gene homologs were found on Erwinia carotovora cosmid clone pAKC651 but were not detected in E. coli. The outC-through-outM operon is weakly induced by polygalacturonic acid and strongly expressed in the early stationary phase. The out and pul genes are highly similar in sequence, hydropathic properties, and overall arrangement but differ in both transcriptional organization and the nature of their induction. Images PMID:1429461
Khan, Abdul Latif; Khan, Muhammad Aaqil; Shahzad, Raheem; Lubna; Kang, Sang Mo; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung
2018-01-01
Pinaceae, the largest family of conifers, has a diversified organization of chloroplast (cp) genomes with two typical highly reduced inverted repeats (IRs). In the current study, we determined the complete sequence of the cp genome of an economically and ecologically important conifer tree, the loblolly pine (Pinus taeda L.), using Illumina paired-end sequencing and compared the sequence with those of other pine species. The results revealed a genome size of 121,531 base pairs (bp) containing a pair of 830-bp IR regions, distinguished by a small single copy (42,258 bp) and large single copy (77,614 bp) region. The chloroplast genome of P. taeda encodes 120 genes, comprising 81 protein-coding genes, four ribosomal RNA genes, and 35 tRNA genes, with 151 randomly distributed microsatellites. Approximately 6 palindromic, 34 forward, and 22 tandem repeats were found in the P. taeda cp genome. Whole cp genome comparison with those of other Pinus species exhibited an overall high degree of sequence similarity, with some divergence in intergenic spacers. Higher and lower numbers of indels and single-nucleotide polymorphism substitutions were observed relative to P. contorta and P. monophylla, respectively. Phylogenomic analyses based on the complete genome sequence revealed that 60 shared genes generated trees with the same topologies, and P. taeda was closely related to P. contorta in the subgenus Pinus. Thus, the complete P. taeda genome provided valuable resources for population and evolutionary studies of gymnosperms and can be used to identify related species. PMID:29596414
Berstein, R M; Schluter, S F; Shen, S; Marchalonis, J J
1996-04-16
All immunoglobulins and T-cell receptors throughout phylogeny share regions of highly conserved amino acid sequence. To identify possible primitive immunoglobulins and immunoglobulin-like molecules, we utilized 3' RACE (rapid amplification of cDNA ends) and a highly conserved constant region consensus amino acid sequence to isolate a new immunoglobulin class from the sandbar shark Carcharhinus plumbeus. The immunoglobulin, termed IgW, in its secreted form consists of 782 amino acids and is expressed in both the thymus and the spleen. The molecule overall most closely resembles mu chains of the skate and human and a new putative antigen binding molecule isolated from the nurse shark (NAR). The full-length IgW chain has a variable region resembling human and shark heavy-chain (VH) sequences and a novel joining segment containing the WGXGT motif characteristic of H chains. However, unlike any other H-chain-type molecule, it contains six constant (C) domains. The first C domain contains the cysteine residue characteristic of C mu1 that would allow dimerization with a light (L) chain. The fourth and sixth domains also contain comparable cysteines that would enable dimerization with other H chains or homodimerization. Comparison of the sequences of IgW V and C domains shows homology greater than that found in comparisons among VH and C mu or VL, or CL thereby suggesting that IgW may retain features of the primordial immunoglobulin in evolution.
Kiriake, Aya; Shiomi, Kazuo
2011-11-01
Lionfish, members of the genera Pterois, Parapterois and Dendrochirus, are well known to be venomous, having venomous glandular tissues in dorsal, pelvic and anal spines. The lionfish toxins have been shown to cross-react with the stonefish toxins by neutralization tests using the commercial stonefish antivenom, although their chemical properties including structures have been little characterized. In this study, an antiserum against neoverrucotoxin, the stonefish Synanceia verrucosa toxin, was first raised in a guinea pig and used in immunoblotting and inhibition immunoblotting to confirm that two species of Pterois lionfish (P. antennata and P. volitans) contain a 75kDa protein (corresponding to the toxin subunit) cross-reacting with neoverrucotoxin. Then, the amino acid sequences of the P. antennata and P. volitans toxins were successfully determined by cDNA cloning using primers designed from the highly conserved sequences of the stonefish toxins. Notably, either α-subunits (699 amino acid residues) or β-subunits (698 amino acid residues) of the P. antennata and P. volitans toxins share as high as 99% sequence identity with each other. Furthermore, both α- and β-subunits of the lionfish toxins exhibit high sequence identity (70-80% identity) with each other and also with the β-subunits of the stonefish toxins. As reported for the stonefish toxins, the lionfish toxins also contain a B30.2/SPRY domain (comprising nearly 200 amino acid residues) in the C-terminal region of each subunit. Copyright © 2011 Elsevier Ltd. All rights reserved.
Ohshima, Chihiro; Takahashi, Hajime; Iwakawa, Ai; Kuda, Takashi; Kimura, Bon
2017-07-17
Listeria monocytogenes, which is responsible for causing food poisoning known as listeriosis, infects humans and animals. Widely distributed in the environment, this bacterium is known to contaminate food products after being transmitted to factories via raw materials. To minimize the contamination of products by food pathogens, it is critical to identify and eliminate factory entry routes and pathways for the causative bacteria. High resolution melting analysis (HRMA) is a method that takes advantage of differences in DNA sequences and PCR product lengths that are reflected by the disassociation temperature. Through our research, we have developed a multiple locus variable-number tandem repeat analysis (MLVA) using HRMA as a simple and rapid method to differentiate L. monocytogenes isolates. While evaluating our developed method, the ability of MLVA-HRMA, MLVA using capillary electrophoresis, and multilocus sequence typing (MLST) was compared for their ability to discriminate between strains. The MLVA-HRMA method displayed greater discriminatory ability than MLST and MLVA using capillary electrophoresis, suggesting that the variation in the number of repeat units, along with mutations within the DNA sequence, was accurately reflected by the melting curve of HRMA. Rather than relying on DNA sequence analysis or high-resolution electrophoresis, the MLVA-HRMA method employs the same process as PCR until the analysis step, suggesting a combination of speed and simplicity. The result of MLVA-HRMA method is able to be shared between different laboratories. There are high expectations that this method will be adopted for regular inspections at food processing facilities in the near future. Copyright © 2017. Published by Elsevier B.V.
Genetic diversity in Trypanosoma theileri from Sri Lankan cattle and water buffaloes.
Yokoyama, Naoaki; Sivakumar, Thillaiampalam; Fukushi, Shintaro; Tattiyapong, Muncharee; Tuvshintulga, Bumduuren; Kothalawala, Hemal; Silva, Seekkuge Susil Priyantha; Igarashi, Ikuo; Inoue, Noboru
2015-01-30
Trypanosoma theileri is a hemoprotozoan parasite that infects various ruminant species. We investigated the epidemiology of this parasite among cattle and water buffalo populations bred in Sri Lanka, using a diagnostic PCR assay based on the cathepsin L-like protein (CATL) gene. Blood DNA samples sourced from cattle (n=316) and water buffaloes (n=320) bred in different geographical areas of Sri Lanka were PCR screened for T. theileri. Parasite DNA was detected in cattle and water buffaloes alike in all the sampling locations. The overall T. theileri-positive rate was higher in water buffaloes (15.9%) than in cattle (7.6%). Subsequently, PCR amplicons were sequenced and the partial CATL sequences were phylogenetically analyzed. The identity values for the CATL gene were 89.6-99.7% among the cattle-derived sequences, compared with values of 90.7-100% for the buffalo-derived sequences. However, the cattle-derived sequences shared 88.2-100% identity values with those from buffaloes. In the phylogenetic tree, the Sri Lankan CATL gene sequences fell into two major clades (TthI and TthII), both of which contain CATL sequences from several other countries. Although most of the CATL sequences from Sri Lankan cattle and buffaloes clustered independently, two buffalo-derived sequences were observed to be closely related to those of the Sri Lankan cattle. Furthermore, a Sri Lankan buffalo sequence clustered with CATL gene sequences from Brazilian buffalo and Thai cattle. In addition to reporting the first PCR-based survey of T. theileri among Sri Lankan-bred cattle and water buffaloes, the present study found that some of the CATL gene fragments sourced from water buffaloes shared similarity with those determined from cattle in this country. Copyright © 2014 Elsevier B.V. All rights reserved.
Awan, Ali R; Manfredo, Amanda; Pleiss, Jeffrey A
2013-07-30
Alternative splicing is a potent regulator of gene expression that vastly increases proteomic diversity in multicellular eukaryotes and is associated with organismal complexity. Although alternative splicing is widespread in vertebrates, little is known about the evolutionary origins of this process, in part because of the absence of phylogenetically conserved events that cross major eukaryotic clades. Here we describe a lariat-sequencing approach, which offers high sensitivity for detecting splicing events, and its application to the unicellular fungus, Schizosaccharomyces pombe, an organism that shares many of the hallmarks of alternative splicing in mammalian systems but for which no previous examples of exon-skipping had been demonstrated. Over 200 previously unannotated splicing events were identified, including examples of regulated alternative splicing. Remarkably, an evolutionary analysis of four of the exons identified here as subject to skipping in S. pombe reveals high sequence conservation and perfect length conservation with their homologs in scores of plants, animals, and fungi. Moreover, alternative splicing of two of these exons have been documented in multiple vertebrate organisms, making these the first demonstrations of identical alternative-splicing patterns in species that are separated by over 1 billion y of evolution.
Fungal Genes in Context: Genome Architecture Reflects Regulatory Complexity and Function
Noble, Luke M.; Andrianopoulos, Alex
2013-01-01
Gene context determines gene expression, with local chromosomal environment most influential. Comparative genomic analysis is often limited in scope to conserved or divergent gene and protein families, and fungi are well suited to this approach with low functional redundancy and relatively streamlined genomes. We show here that one aspect of gene context, the amount of potential upstream regulatory sequence maintained through evolution, is highly predictive of both molecular function and biological process in diverse fungi. Orthologs with large upstream intergenic regions (UIRs) are strongly enriched in information processing functions, such as signal transduction and sequence-specific DNA binding, and, in the genus Aspergillus, include the majority of experimentally studied, high-level developmental and metabolic transcriptional regulators. Many uncharacterized genes are also present in this class and, by implication, may be of similar importance. Large intergenic regions also share two novel sequence characteristics, currently of unknown significance: they are enriched for plus-strand polypyrimidine tracts and an information-rich, putative regulatory motif that was present in the last common ancestor of the Pezizomycotina. Systematic consideration of gene UIR in comparative genomics, particularly for poorly characterized species, could help reveal organisms’ regulatory priorities. PMID:23699226
Serrano, Amaya; Williams, Trevor; Simón, Oihane; López-Ferber, Miguel; Caballero, Primitivo
2013-01-01
A natural Spodoptera exigua multiple nucleopolyhedrovirus (SeMNPV) isolate from Florida shares a strikingly similar genotypic composition to that of a natural Spodoptera frugiperda MNPV (SfMNPV) isolate from Nicaragua. Both isolates comprise a high proportion of large-deletion genotypes that lack genes that are essential for viral replication or transmission. To determine the likely origins of such genotypically similar population structures, we performed genomic and functional analyses of these genotypes. The homology of nucleotides in the deleted regions was as high as 79%, similar to those of other colinear genomic regions, although some SfMNPV genes were not present in SeMNPV. In addition, no potential consensus sequences were shared between the deletion flanking sequences. These results indicate an evolutionary mechanism that independently generates and sustains deletion mutants within each virus population. Functional analyses using different proportions of complete and deletion genotypes were performed with the two viruses in mixtures of occlusion bodies (OBs) or co-occluded virions. Ratios greater than 3:1 of complete/deletion genotypes resulted in reduced pathogenicity (expressed as median lethal dose), but there were no significant changes in the speed of kill. In contrast, OB yields increased only in the 1:1 mixture. The three phenotypic traits analyzed provide a broader picture of the functional significance of the most extensively deleted SeMNPV genotype and contribute toward the elucidation of the role of such mutants in baculovirus populations. PMID:23204420
Novel pedigree analysis implicates DNA repair and chromatin remodeling in multiple myeloma risk
Curtin, Karen; Rajamanickam, Venkatesh; Jayabalan, David; Atanackovic, Djordje; Rajkumar, S. Vincent; Kumar, Shaji; Slager, Susan; Galia, Perrine; Demangel, Delphine; Salama, Mohamed; Joseph, Vijai; Lipkin, Steven M.; Dumontet, Charles; Vachon, Celine M.
2018-01-01
The high-risk pedigree (HRP) design is an established strategy to discover rare, highly-penetrant, Mendelian-like causal variants. Its success, however, in complex traits has been modest, largely due to challenges of genetic heterogeneity and complex inheritance models. We describe a HRP strategy that addresses intra-familial heterogeneity, and identifies inherited segments important for mapping regulatory risk. We apply this new Shared Genomic Segment (SGS) method in 11 extended, Utah, multiple myeloma (MM) HRPs, and subsequent exome sequencing in SGS regions of interest in 1063 MM / MGUS (monoclonal gammopathy of undetermined significance–a precursor to MM) cases and 964 controls from a jointly-called collaborative resource, including cases from the initial 11 HRPs. One genome-wide significant 1.8 Mb shared segment was found at 6q16. Exome sequencing in this region revealed predicted deleterious variants in USP45 (p.Gln691* and p.Gln621Glu), a gene known to influence DNA repair through endonuclease regulation. Additionally, a 1.2 Mb segment at 1p36.11 is inherited in two Utah HRPs, with coding variants identified in ARID1A (p.Ser90Gly and p.Met890Val), a key gene in the SWI/SNF chromatin remodeling complex. Our results provide compelling statistical and genetic evidence for segregating risk variants for MM. In addition, we demonstrate a novel strategy to use large HRPs for risk-variant discovery more generally in complex traits. PMID:29389935
Novel pedigree analysis implicates DNA repair and chromatin remodeling in multiple myeloma risk.
Waller, Rosalie G; Darlington, Todd M; Wei, Xiaomu; Madsen, Michael J; Thomas, Alun; Curtin, Karen; Coon, Hilary; Rajamanickam, Venkatesh; Musinsky, Justin; Jayabalan, David; Atanackovic, Djordje; Rajkumar, S Vincent; Kumar, Shaji; Slager, Susan; Middha, Mridu; Galia, Perrine; Demangel, Delphine; Salama, Mohamed; Joseph, Vijai; McKay, James; Offit, Kenneth; Klein, Robert J; Lipkin, Steven M; Dumontet, Charles; Vachon, Celine M; Camp, Nicola J
2018-02-01
The high-risk pedigree (HRP) design is an established strategy to discover rare, highly-penetrant, Mendelian-like causal variants. Its success, however, in complex traits has been modest, largely due to challenges of genetic heterogeneity and complex inheritance models. We describe a HRP strategy that addresses intra-familial heterogeneity, and identifies inherited segments important for mapping regulatory risk. We apply this new Shared Genomic Segment (SGS) method in 11 extended, Utah, multiple myeloma (MM) HRPs, and subsequent exome sequencing in SGS regions of interest in 1063 MM / MGUS (monoclonal gammopathy of undetermined significance-a precursor to MM) cases and 964 controls from a jointly-called collaborative resource, including cases from the initial 11 HRPs. One genome-wide significant 1.8 Mb shared segment was found at 6q16. Exome sequencing in this region revealed predicted deleterious variants in USP45 (p.Gln691* and p.Gln621Glu), a gene known to influence DNA repair through endonuclease regulation. Additionally, a 1.2 Mb segment at 1p36.11 is inherited in two Utah HRPs, with coding variants identified in ARID1A (p.Ser90Gly and p.Met890Val), a key gene in the SWI/SNF chromatin remodeling complex. Our results provide compelling statistical and genetic evidence for segregating risk variants for MM. In addition, we demonstrate a novel strategy to use large HRPs for risk-variant discovery more generally in complex traits.
Smith, Jeramiah J; Kuraku, Shigehiro; Holt, Carson; Sauka-Spengler, Tatjana; Jiang, Ning; Campbell, Michael S; Yandell, Mark D; Manousaki, Tereza; Meyer, Axel; Bloom, Ona E; Morgan, Jennifer R; Buxbaum, Joseph D; Sachidanandam, Ravi; Sims, Carrie; Garruss, Alexander S; Cook, Malcolm; Krumlauf, Robb; Wiedemann, Leanne M; Sower, Stacia A; Decatur, Wayne A; Hall, Jeffrey A; Amemiya, Chris T; Saha, Nil R; Buckley, Katherine M; Rast, Jonathan P; Das, Sabyasachi; Hirano, Masayuki; McCurley, Nathanael; Guo, Peng; Rohner, Nicolas; Tabin, Clifford J; Piccinelli, Paul; Elgar, Greg; Ruffier, Magali; Aken, Bronwen L; Searle, Stephen MJ; Muffato, Matthieu; Pignatelli, Miguel; Herrero, Javier; Jones, Matthew; Brown, C Titus; Chung-Davidson, Yu-Wen; Nanlohy, Kaben G; Libants, Scot V; Yeh, Chu-Yin; McCauley, David W; Langeland, James A; Pancer, Zeev; Fritzsch, Bernd; de Jong, Pieter J; Zhu, Baoli; Fulton, Lucinda L; Theising, Brenda; Flicek, Paul; Bronner, Marianne E; Warren, Wesley C; Clifton, Sandra W; Wilson, Richard K; Li, Weiming
2013-01-01
Lampreys are representatives of an ancient vertebrate lineage that diverged from our own ~500 million years ago. By virtue of this deeply shared ancestry, the sea lamprey (P. marinus) genome is uniquely poised to provide insight into the ancestry of vertebrate genomes and the underlying principles of vertebrate biology. Here, we present the first lamprey whole-genome sequence and assembly. We note challenges faced owing to its high content of repetitive elements and GC bases, as well as the absence of broad-scale sequence information from closely related species. Analyses of the assembly indicate that two whole-genome duplications likely occurred before the divergence of ancestral lamprey and gnathostome lineages. Moreover, the results help define key evolutionary events within vertebrate lineages, including the origin of myelin-associated proteins and the development of appendages. The lamprey genome provides an important resource for reconstructing vertebrate origins and the evolutionary events that have shaped the genomes of extant organisms. PMID:23435085
Molecular systematics of higher primates: genealogical relations and classification.
Miyamoto, M M; Koop, B F; Slightom, J L; Goodman, M; Tennant, M R
1988-01-01
We obtained 5' and 3' flanking sequences (5.4 kilobase pairs) from the psi eta-globin gene region of the rhesus macaque (Macaca mulatta) and combined them with available nucleotide data. The completed sequence, representing 10.8 kilobase pairs of contiguous noncoding DNA, was compared to the same orthologous regions available for human (Homo sapiens, as represented by five different alleles), common chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), and orangutan (Pongo pygmaeus). The nucleotide sequence for Macaca mulatta provided the outgroup perspective needed to evaluate better the relationships of humans and great apes. Pairwise comparisons and parsimony analysis of these orthologues clearly demonstrated (i) that humans and great apes share a high degree of genetic similarity and (ii) that humans, chimpanzees, and gorillas form a natural monophyletic group. These conclusions strongly favor a genealogical classification for higher primates consisting of a single family (Hominidae) with two subfamilies (Homininae for Homo, Pan, and Gorilla and Ponginae for Pongo). PMID:3174657
A proposed model for the flowering signaling pathway of sugarcane under photoperiodic control.
Coelho, C P; Costa Netto, A P; Colasanti, J; Chalfun-Júnior, A
2013-04-25
Molecular analysis of floral induction in Arabidopsis has identified several flowering time genes related to 4 response networks defined by the autonomous, gibberellin, photoperiod, and vernalization pathways. Although grass flowering processes include ancestral functions shared by both mono- and dicots, they have developed their own mechanisms to transmit floral induction signals. Despite its high production capacity and its important role in biofuel production, almost no information is available about the flowering process in sugarcane. We searched the Sugarcane Expressed Sequence Tags database to look for elements of the flowering signaling pathway under photoperiodic control. Sequences showing significant similarity to flowering time genes of other species were clustered, annotated, and analyzed for conserved domains. Multiple alignments comparing the sequences found in the sugarcane database and those from other species were performed and their phylogenetic relationship assessed using the MEGA 4.0 software. Electronic Northerns were run with Cluster and TreeView programs, allowing us to identify putative members of the photoperiod-controlled flowering pathway of sugarcane.
Comparison of the Distal Gut Microbiota from People and Animals in Africa
Ellis, Richard J.; Bruce, Kenneth D.; Jenkins, Claire; Stothard, J. Russell; Ajarova, Lilly; Mugisha, Lawrence; Viney, Mark E.
2013-01-01
The gut microbiota plays a key role in the maintenance of healthy gut function as well as many other aspects of health. High-throughput sequence analyses have revealed the composition of the gut microbiota, showing that there is a core signature to the human gut microbiota, as well as variation in its composition between people. The gut microbiota of animals is also being investigated. We are interested in the relationship between bacterial taxa of the human gut microbiota and those in the gut microbiota of domestic and semi-wild animals. While it is clear that some human gut bacterial pathogens come from animals (showing that human – animal transmission occurs), the extent to which the usually non-pathogenic commensal taxa are shared between humans and animals has not been explored. To investigate this we compared the distal gut microbiota of humans, cattle and semi-captive chimpanzees in communities that are geographically sympatric in Uganda. The gut microbiotas of these three host species could be distinguished by the different proportions of bacterial taxa present. We defined multiple operational taxonomic units (OTUs) by sequence similarity and found evidence that some OTUs were common between human, cattle and chimpanzees, with the largest number of shared OTUs occurring between chimpanzees and humans, as might be expected with their close physiological similarity. These results show the potential for the sharing of usually commensal bacterial taxa between humans and other animals. This suggests that further investigation of this phenomenon is needed to fully understand how it drives the composition of human and animal gut microbiotas. PMID:23355898
[Clonal association of flat epithelial atypia and tubular breast cancer].
Aulmann, S; Elsawaf, Z; Penzel, R; Schirmacher, P; Sinn, H P
2008-11-01
Flat epithelial atypia (FEA) of the breast has recently gained attention as a possible precursor lesion of highly differentiated breast cancer. Especially tubular carcinomas, with which FEA shares cytological features, often occur in close proximity to each other. To examine a possible clonal relationship, we analysed mutations of the highly variable region of the mitochondrial genome in a series of tubular carcinomas, associated FEA and normal glands. Multiple sequence alignment showed identical mtDNA mutations in approximately 50% of paired FEA and tumour samples, indicative of a clonal relationship. Our data indicate a possible precursor role of FEA in the development of tubular breast cancer.
Gusev, A.; Shah, M. J.; Kenny, E. E.; Ramachandran, A.; Lowe, J. K.; Salit, J.; Lee, C. C.; Levandowsky, E. C.; Weaver, T. N.; Doan, Q. C.; Peckham, H. E.; McLaughlin, S. F.; Lyons, M. R.; Sheth, V. N.; Stoffel, M.; De La Vega, F. M.; Friedman, J. M.; Breslow, J. L.
2012-01-01
Whole-genome sequencing in an isolated population with few founders directly ascertains variants from the population bottleneck that may be rare elsewhere. In such populations, shared haplotypes allow imputation of variants in unsequenced samples without resorting to complex statistical methods as in studies of outbred cohorts. We focus on an isolated population cohort from the Pacific Island of Kosrae, Micronesia, where we previously collected SNP array and rich phenotype data for the majority of the population. We report identification of long regions with haplotypes co-inherited between pairs of individuals and methodology to leverage such shared genetic content for imputation. Our estimates show that sequencing as few as 40 personal genomes allows for inference in up to 60% of the 3000-person cohort at the average locus. We ascertained a pilot data set of whole-genome sequences from seven Kosraean individuals, with average 5× coverage. This assay identified 5,735,306 unique sites of which 1,212,831 were previously unknown. Additionally, these variants are unusually enriched for alleles that are rare in other populations when compared to geographic neighbors (published Korean genome SJK). We used the presence of shared haplotypes between the seven Kosraen individuals to estimate expected imputation accuracy of known and novel homozygous variants at 99.6% and 97.3%, respectively. This study presents whole-genome analysis of a homogenous isolate population with emphasis on optimal rare variant inference. PMID:22135348
DeMaere, Matthew Z; Williams, Timothy J; Allen, Michelle A; Brown, Mark V; Gibson, John A E; Rich, John; Lauro, Federico M; Dyall-Smith, Michael; Davenport, Karen W; Woyke, Tanja; Kyrpides, Nikos C; Tringe, Susannah G; Cavicchioli, Ricardo
2013-10-15
Deep Lake in Antarctica is a globally isolated, hypersaline system that remains liquid at temperatures down to -20 °C. By analyzing metagenome data and genomes of four isolates we assessed genome variation and patterns of gene exchange to learn how the lake community evolved. The lake is completely and uniformly dominated by haloarchaea, comprising a hierarchically structured, low-complexity community that differs greatly to temperate and tropical hypersaline environments. The four Deep Lake isolates represent distinct genera (∼85% 16S rRNA gene similarity and ∼73% genome average nucleotide identity) with genomic characteristics indicative of niche adaptation, and collectively account for ∼72% of the cellular community. Network analysis revealed a remarkable level of intergenera gene exchange, including the sharing of long contiguous regions (up to 35 kb) of high identity (∼100%). Although the genomes of closely related Halobacterium, Haloquadratum, and Haloarcula (>90% average nucleotide identity) shared regions of high identity between species or strains, the four Deep Lake isolates were the only distantly related haloarchaea to share long high-identity regions. Moreover, the Deep Lake high-identity regions did not match to any other hypersaline environment metagenome data. The most abundant species, tADL, appears to play a central role in the exchange of insertion sequences, but not the exchange of high-identity regions. The genomic characteristics of the four haloarchaea are consistent with a lake ecosystem that sustains a high level of intergenera gene exchange while selecting for ecotypes that maintain sympatric speciation. The peculiarities of this polar system restrict which species can grow and provide a tempo and mode for accentuating gene exchange.
Ginkgo and Welwitschia Mitogenomes Reveal Extreme Contrasts in Gymnosperm Mitochondrial Evolution.
Guo, Wenhu; Grewe, Felix; Fan, Weishu; Young, Gregory J; Knoop, Volker; Palmer, Jeffrey D; Mower, Jeffrey P
2016-06-01
Mitochondrial genomes (mitogenomes) of flowering plants are well known for their extreme diversity in size, structure, gene content, and rates of sequence evolution and recombination. In contrast, little is known about mitogenomic diversity and evolution within gymnosperms. Only a single complete genome sequence is available, from the cycad Cycas taitungensis, while limited information is available for the one draft sequence, from Norway spruce (Picea abies). To examine mitogenomic evolution in gymnosperms, we generated complete genome sequences for the ginkgo tree (Ginkgo biloba) and a gnetophyte (Welwitschia mirabilis). There is great disparity in size, sequence conservation, levels of shared DNA, and functional content among gymnosperm mitogenomes. The Cycas and Ginkgo mitogenomes are relatively small, have low substitution rates, and possess numerous genes, introns, and edit sites; we infer that these properties were present in the ancestral seed plant. By contrast, the Welwitschia mitogenome has an expanded size coupled with accelerated substitution rates and extensive loss of these functional features. The Picea genome has expanded further, to more than 4 Mb. With regard to structural evolution, the Cycas and Ginkgo mitogenomes share a remarkable amount of intergenic DNA, which may be related to the limited recombinational activity detected at repeats in Ginkgo Conversely, the Welwitschia mitogenome shares almost no intergenic DNA with any other seed plant. By conducting the first measurements of rates of DNA turnover in seed plant mitogenomes, we discovered that turnover rates vary by orders of magnitude among species. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Harrison, Nigel A; Davis, Robert E; Oropeza, Carlos; Helmick, Ericka E; Narváez, María; Eden-Green, Simon; Dollet, Michel; Dickinson, Matthew
2014-06-01
In this study, the taxonomic position and group classification of the phytoplasma associated with a lethal yellowing-type disease (LYD) of coconut (Cocos nucifera L.) in Mozambique were addressed. Pairwise similarity values based on alignment of nearly full-length 16S rRNA gene sequences (1530 bp) revealed that the Mozambique coconut phytoplasma (LYDM) shared 100% identity with a comparable sequence derived from a phytoplasma strain (LDN) responsible for Awka wilt disease of coconut in Nigeria, and shared 99.0-99.6% identity with 16S rRNA gene sequences from strains associated with Cape St Paul wilt (CSPW) disease of coconut in Ghana and Côte d'Ivoire. Similarity scores further determined that the 16S rRNA gene of the LYDM phytoplasma shared <97.5% sequence identity with all previously described members of 'Candidatus Phytoplasma'. The presence of unique regions in the 16S rRNA gene sequence distinguished the LYDM phytoplasma from all currently described members of 'Candidatus Phytoplasma', justifying its recognition as the reference strain of a novel taxon, 'Candidatus Phytoplasma palmicola'. Virtual RFLP profiles of the F2n/R2 portion (1251 bp) of the 16S rRNA gene and pattern similarity coefficients delineated coconut LYDM phytoplasma strains from Mozambique as novel members of established group 16SrXXII, subgroup A (16SrXXII-A). Similarity coefficients of 0.97 were obtained for comparisons between subgroup 16SrXXII-A strains and CSPW phytoplasmas from Ghana and Côte d'Ivoire. On this basis, the CSPW phytoplasma strains were designated members of a novel subgroup, 16SrXXII-B.
Gupta, Radhey S
2012-11-01
The origin of photosynthesis and how this capability has spread to other bacterial phyla remain important unresolved questions. I describe here a number of conserved signature indels (CSIs) in key proteins involved in bacteriochlorophyll (Bchl) biosynthesis that provide important insights in these regards. The proteins BchL and BchX, which are essential for Bchl biosynthesis, are derived by gene duplication in a common ancestor of all phototrophs. More ancient gene duplication gave rise to the BchX-BchL proteins and the NifH protein of the nitrogenase complex. The sequence alignment of NifH-BchX-BchL proteins contain two CSIs that are uniquely shared by all NifH and BchX homologs, but not by any BchL homologs. These CSIs and phylogenetic analysis of NifH-BchX-BchL protein sequences strongly suggest that the BchX homologs are ancestral to BchL and that the Bchl-based anoxygenic photosynthesis originated prior to the chlorophyll (Chl)-based photosynthesis in cyanobacteria. Another CSI in the BchX-BchL sequence alignment that is uniquely shared by all BchX homologs and the BchL sequences from Heliobacteriaceae, but absent in all other BchL homologs, suggests that the BchL homologs from Heliobacteriaceae are primitive in comparison to all other photosynthetic lineages. Several other identified CSIs in the BchN homologs are commonly shared by all proteobacterial homologs and a clade consisting of the marine unicellular Cyanobacteria (Clade C). These CSIs in conjunction with the results of phylogenetic analyses and pair-wise sequence similarity on the BchL, BchN, and BchB proteins, where the homologs from Clade C Cyanobacteria and Proteobacteria exhibited close relationship, provide strong evidence that these two groups have incurred lateral gene transfers. Additionally, phylogenetic analyses and several CSIs in the BchL-N-B proteins that are uniquely shared by all Chlorobi and Chloroflexi homologs provide evidence that the genes for these proteins have also been laterally transferred between these groups. Other results and observations reported here indicate that the genes for the BchL-N-B proteins in Proteobacteria are derived from the Clade C Cyanobacteria, whereas those in Chlorobi were acquired from Chloroflexus or related bacteria by means of LGTs. Some implications of these observations regarding the origin and spread of photosynthesis are discussed.
Dufresne, Andrew T; Gromeier, Matthias
2004-09-14
Coxsackievirus A21 (CAV21) is classified within the species Human enterovirus C (HEV-C) of the Enterovirus genus of picornaviruses. HEV-C share striking homology with the polioviruses (PV), their closest kin among the enteroviruses. Despite a high level of sequence identity, CAV21 and PV cause distinct clinical disease typically attributed to their differential use of host receptors. PV cause poliomyelitis, whereas CAV21 shares a receptor and a propensity to cause upper respiratory tract infections with the major group rhinoviruses. As a model for CAV21 infection, we have developed transgenic mice that express human intercellular adhesion molecule 1, the cell-surface receptor for CAV21. Surprisingly, CAV21 administered to these mice via the intramuscular route causes a paralytic condition consistent with poliomyelitis. The virus appears to invade the CNS by retrograde axonal transport, as has been demonstrated to occur in analogous PV infections. We detected human intercellular adhesion molecule 1 expression on both transgenic mouse and human spinal cord anterior horn motor neurons, indicating that members of HEV-C may share PV's potential to elicit poliomyelitis in humans.
USDA-ARS?s Scientific Manuscript database
The cattle tick, Rhipicephalus (Boophilus) microplus, is a pest which causes multiple health complications in cattle. The G-protein coupled receptor (GPCR) super-family presents an interesting target for developing novel tick control methods. However, GPCRs share limited sequence similarity among or...
USDA-ARS?s Scientific Manuscript database
Complete genomic sequences of nine isolates of sweet potato symptomless virus 1 (SPSMV-1), a virus of genus Mastrevirus in the family Geminiviridae, was determined to be 2,559-2,602 nucleotides from sweet potato accessions from different countries. These isolates shared genomic sequence identities o...
Using an online genome resource to identify myostatin variation in U.S. sheep
USDA-ARS?s Scientific Manuscript database
We created a public, searchable DNA sequence resource for sheep that contained approximately 14x whole genome sequence of 96 rams. The animals represent 10 popular U.S. breeds and share minimal pedigree relationships, making the resource suitable for viewing gene variants in the user-friendly Integ...
Tettelin, Hervé; Masignani, Vega; Cieslewicz, Michael J.; Donati, Claudio; Medini, Duccio; Ward, Naomi L.; Angiuoli, Samuel V.; Crabtree, Jonathan; Jones, Amanda L.; Durkin, A. Scott; DeBoy, Robert T.; Davidsen, Tanja M.; Mora, Marirosa; Scarselli, Maria; Margarit y Ros, Immaculada; Peterson, Jeremy D.; Hauser, Christopher R.; Sundaram, Jaideep P.; Nelson, William C.; Madupu, Ramana; Brinkac, Lauren M.; Dodson, Robert J.; Rosovitz, Mary J.; Sullivan, Steven A.; Daugherty, Sean C.; Haft, Daniel H.; Selengut, Jeremy; Gwinn, Michelle L.; Zhou, Liwei; Zafar, Nikhat; Khouri, Hoda; Radune, Diana; Dimitrov, George; Watkins, Kisha; O'Connor, Kevin J. B.; Smith, Shannon; Utterback, Teresa R.; White, Owen; Rubens, Craig E.; Grandi, Guido; Madoff, Lawrence C.; Kasper, Dennis L.; Telford, John L.; Wessels, Michael R.; Rappuoli, Rino; Fraser, Claire M.
2005-01-01
The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for ≈80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes. PMID:16172379
Salinas, Alejandro; Vega, Marcela; Lienqueo, María Elena; Garcia, Alejandro; Carmona, Rene; Salazar, Oriana
2011-12-10
Total cDNA isolated from cellulolytic fungi cultured in cellulose was examined for the presence of sequences encoding for endoglucanases. Novel sequences encoding for glycoside hydrolases (GHs) were identified in Fusarium oxysporum, Ganoderma applanatum and Trametes versicolor. The cDNA encoding for partial sequences of GH family 61 cellulases from F. oxysporum and G. applanatum shares 58 and 68% identity with endoglucanases from Glomerella graminicola and Laccaria bicolor, respectively. A new GH family 5 endoglucanase from T. versicolor was also identified. The cDNA encoding for the mature protein was completely sequenced. This enzyme shares 96% identity with Trametes hirsuta endoglucanase and 22% with Trichoderma reesei endoglucanase II (EGII). The enzyme, named TvEG, has N-terminal family 1 carbohydrate binding module (CBM1). The full length cDNA was cloned into the pPICZαB vector and expressed as an active, extracellular enzyme in the methylotrophic yeast Pichia pastoris. Preliminary studies suggest that T. versicolor could be useful for lignocellulose degradation. Copyright © 2011 Elsevier Inc. All rights reserved.
Deng, Peng; Tan, Xiaoqing; Wu, Ying; Bai, Qunhua; Jia, Yan; Xiao, Hong
2015-03-01
The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica , which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function.
DENG, PENG; TAN, XIAOQING; WU, YING; BAI, QUNHUA; JIA, YAN; XIAO, HONG
2015-01-01
The ChrT gene encodes a chromate reductase enzyme which catalyzes the reduction of Cr(VI). The chromate reductase is also known as flavin mononucleotide (FMN) reductase (FMN_red). The aim of the present study was to clone the full-length ChrT DNA from Serratia sp. CQMUS2 and analyze the deduced amino acid sequence and three-dimensional structure. The putative ChrT gene fragment of Serratia sp. CQMUS2 was isolated by polymerase chain reaction (PCR), according to the known FMN_red gene sequence from Serratia sp. AS13. The flanking sequences of the ChrT gene were obtained by high efficiency TAIL-PCR, while the full-length gene of ChrT was cloned in Escherichia coli for subsequent sequencing. The nucleotide sequence of ChrT was submitted onto GenBank under the accession number, KF211434. Sequence analysis of the gene and amino acids was conducted using the Basic Local Alignment Search Tool, and open reading frame (ORF) analysis was performed using ORF Finder software. The ChrT gene was found to be an ORF of 567 bp that encodes a 188-amino acid enzyme with a calculated molecular weight of 20.4 kDa. In addition, the ChrT protein was hypothesized to be an NADPH-dependent FMN_red and a member of the flavodoxin-2 superfamily. The amino acid sequence of ChrT showed high sequence similarity to the FMN reductase genes of Klebsiella pneumonia and Raoultella ornithinolytica, which belong to the flavodoxin-2 superfamily. Furthermore, ChrT was shown to have a 85.6% similarity to the three-dimensional structure of Escherichia coli ChrR, sharing four common enzyme active sites for chromate reduction. Therefore, ChrT gene cloning and protein structure determination demonstrated the ability of the gene for chromate reduction. The results of the present study provide a basis for further studies on ChrT gene expression and protein function. PMID:25667630
Wang, Yongjie; Kleespies, Regina G.; Huger, Alois M.; Jehle, Johannes A.
2007-01-01
The Gryllus bimaculatus nudivirus (GbNV) infects nymphs and adults of the cricket Gryllus bimaculatus (Orthoptera: Gryllidae). GbNV and other nudiviruses such as Heliothis zea nudivirus 1 (HzNV-1) and Oryctes rhinoceros nudivirus (OrNV) were previously called “nonoccluded baculoviruses” as they share some similar structural, genomic, and replication aspects with members of the family Baculoviridae. Their relationships to each other and to baculoviruses are elucidated by the sequence of the complete genome of GbNV, which is 96,944 bp, has an AT content of 72%, and potentially contains 98 predicted protein-coding open reading frames (ORFs). Forty-one ORFs of GbNV share sequence similarities with ORFs found in OrNV, HzNV-1, baculoviruses, and bacteria. Most notably, 15 GbNV ORFs are homologous to the baculovirus core genes, which are associated with transcription (lef-8, lef-9, lef-4, vlf-1, and lef-5), replication (dnapol), structural proteins (p74, pif-1, pif-2, pif-3, vp91, and odv-e56), and proteins of unknown function (38K, ac81, and 19kda). Homologues to these baculovirus core genes have been predicted in HzNV-1 as well. Six GbNV ORFs are homologous to nonconserved baculovirus genes dnaligase, helicase 2, rr1, rr2, iap-3, and desmoplakin. However, the remaining 57 ORFs revealed no homology or poor similarities to the current gene databases. No homologous repeat (hr) sequences but fourteen short direct repeat (dr) regions were detected in the GbNV genome. Gene content and sequence similarity suggest that the nudiviruses GbNV, HzNV-1, and OrNV form a monophyletic group of nonoccluded double-stranded DNA viruses, which separated from the baculovirus lineage before this radiated into dipteran-, hymenopteran-, and lepidopteran-specific clades of occluded nucleopolyhedroviruses and granuloviruses. The accumulated information on the GbNV genome suggests that nudiviruses form a highly diverse and phylogenetically ancient sister group of the baculoviruses, which have evolved in a variety of highly divergent host orders. PMID:17360757
Jans, Christoph; de Wouters, Tomas; Bonfoh, Bassirou; Lacroix, Christophe; Kaindi, Dasel Wambua Mulwa; Anderegg, Janine; Böck, Désirée; Vitali, Sabrina; Schmid, Thomas; Isenring, Julia; Kurt, Fabienne; Kogi-Makau, Wambui; Meile, Leo
2016-06-21
The Streptococcus bovis/Streptococcus equinus complex (SBSEC) comprises seven (sub)species classified as human and animal commensals, emerging opportunistic pathogens and food fermentative organisms. Changing taxonomy, shared habitats, natural competence and evidence for horizontal gene transfer pose difficulties for determining their phylogeny, epidemiology and virulence mechanisms. Thus, novel phylogenetic and functional classifications are required. An SBSEC overarching multi locus sequence type (MLST) scheme targeting 10 housekeeping genes was developed, validated and combined with host-related properties of adhesion to extracellular matrix proteins (ECM), activation of the immune responses via NF-KB and survival in simulated gastric juice (SGJ). Commensal and pathogenic SBSEC strains (n = 74) of human, animal and food origin from Europe, Asia, America and Africa were used in the MLST scheme yielding 66 sequence types and 10 clonal complexes differentiated into distinct habitat-associated and mixed lineages. Adhesion to ECMs collagen I and mucin type II was a common characteristic (23 % of strains) followed by adhesion to fibronectin and fibrinogen (19.7 %). High adhesion abilities were found for East African dairy and human blood isolate branches whereas commensal fecal SBSEC displayed low adhesion. NF-KB activation was observed for a limited number of dairy and blood isolates suggesting the potential of some pathogenic strains for reduced immune activation. Strains from dairy MLST clades displayed the highest relative survival to SGJ independently of dairy adaptation markers lacS/lacZ. Combining phylogenetic and functional analyses via SBSEC MLST enabled the clear delineation of strain clades to unravel the complexity of this bacterial group. High adhesion values shared between certain dairy and blood strains as well as the behavior of NF-KB activation are concerning for specific lineages. They highlighted the health risk among shared lineages and establish the basis to elucidate (zoonotic-) transmission, host specificity, virulence mechanisms and enhanced risk assessment as pathobionts in an overarching One Health approach.
Genetic characterization of a novel astrovirus in Pekin ducks.
Liao, Qinfeng; Liu, Ning; Wang, Xiaoyan; Wang, Fumin; Zhang, Dabing
2015-06-01
Three divergent groups of duck astroviruses (DAstVs), namely DAstV-1, DAstV-2 (formerly duck hepatitis virus type 3) and DAstV-3 (isolate CPH), and other avastroviruses are known to infect domestic ducks. To provide more data regarding the molecular epidemiology of astroviruses in domestic ducks, we examined the prevalence of astroviruses in 136 domestic duck samples collected from four different provinces of China. Nineteen goose samples were also included. Using an astrovirus-specific reverse transcription-PCR assay, two groups of astroviruses were detected from our samples. A group of astroviruses detected from Pekin ducks, Shaoxing ducks and Landes geese were highly similar to the newly discovered DAstV-3. More interestingly, a novel group of avastroviruses, which we named DAstV-4, was detected in Pekin ducks. Following full-length sequencing and sequence analysis, the variation between DAstV-4 and other avastroviruses in terms of lengths of genome and internal component was highlighted. Sequence identity and phylogenetic analyses based on the amino acid sequences of the three open reading frames (ORFs) clearly demonstrated that DAstV-4 was highly divergent from all other avastroviruses. Further analyses showed that DAstV-4 shared low levels of genome identities (50-58%) and high levels of mean amino acid genetic distances in the ORF2 sequences (0.520-0.801) with other avastroviruses, suggesting DAstV-4 may represent an additional avastrovirus species although the taxonomic relationship of DAstV-4 to DAstV-3 remains to be resolved. The present works contribute to the understanding of epidemiology, ecology and taxonomy of astroviruses in ducks. Copyright © 2015 Elsevier B.V. All rights reserved.
Mohorianu, Irina; Stocks, Matthew Benedict; Wood, John; Dalmay, Tamas; Moulton, Vincent
2013-07-01
Small RNAs (sRNAs) are 20-25 nt non-coding RNAs that act as guides for the highly sequence-specific regulatory mechanism known as RNA silencing. Due to the recent increase in sequencing depth, a highly complex and diverse population of sRNAs in both plants and animals has been revealed. However, the exponential increase in sequencing data has also made the identification of individual sRNA transcripts corresponding to biological units (sRNA loci) more challenging when based exclusively on the genomic location of the constituent sRNAs, hindering existing approaches to identify sRNA loci. To infer the location of significant biological units, we propose an approach for sRNA loci detection called CoLIde (Co-expression based sRNA Loci Identification) that combines genomic location with the analysis of other information such as variation in expression levels (expression pattern) and size class distribution. For CoLIde, we define a locus as a union of regions sharing the same pattern and located in close proximity on the genome. Biological relevance, detected through the analysis of size class distribution, is also calculated for each locus. CoLIde can be applied on ordered (e.g., time-dependent) or un-ordered (e.g., organ, mutant) series of samples both with or without biological/technical replicates. The method reliably identifies known types of loci and shows improved performance on sequencing data from both plants (e.g., A. thaliana, S. lycopersicum) and animals (e.g., D. melanogaster) when compared with existing locus detection techniques. CoLIde is available for use within the UEA Small RNA Workbench which can be downloaded from: http://srna-workbench.cmp.uea.ac.uk.
Chen, Xiaochi; Ansai, Toshihiro; Awano, Shuji; Iida, Toshiya; Barik, Sailen; Takehara, Tadamichi
1999-01-01
A novel acid phosphatase containing phosphotyrosyl phosphatase (PTPase) activity, designated PiACP, from Prevotella intermedia ATCC 25611, an anaerobe implicated in progressive periodontal disease, has been purified and characterized. PiACP, a monomer with an apparent molecular mass of 30 kDa, did not require divalent metal cations for activity and was sensitive to orthovanadate but highly resistant to okadaic acid. The enzyme exhibited substantial activity against tyrosine phosphate-containing peptides derived from the epidermal growth factor receptor. On the basis of N-terminal and internal amino acid sequences of purified PiACP, the gene coding for PiACP was isolated and sequenced. The PiACP gene consisted of 792 bp and coded for a basic protein with an Mr of 29,164. The deduced amino acid sequence exhibited striking similarity (25 to 64%) to those of members of class A bacterial acid phosphatases, including PhoC of Morganella morganii, and involved a conserved phosphatase sequence motif that is shared among several lipid phosphatases and the mammalian glucose-6-phosphatases. The highly conservative motif HCXAGXXR in the active domain of PTPase was not found in PiACP. Mutagenesis of recombinant PiACP showed that His-170 and His-209 were essential for activity. Thus, the class A bacterial acid phosphatases including PiACP may function as atypical PTPases, the biological functions of which remain to be determined. PMID:10559178
Pelsy, F.; Merdinoglu, D.
2002-09-01
A chromosome-walking strategy was used to sequence and characterize retrotransposons in the grapevine genome. The reconstitution of a family of retroelements, named Tvv1, was achieved by six successive steps. These elements share a single, highly conserved open reading frame 4,153 nucleotides-long, putatively encoding the gag, pro, int, rt and rh proteins. Comparison of the Tvv1 open reading frame coding potential with those of drosophila copia and tobacco Tnt1, revealed that Tvv1 is closely related to Ty 1 copia-like retrotransposons. A highly variable untranslated leader region, upstream of the open reading frame, allowed us to differentiate Tvv1 variants, which represent a family of at least 28 copies, in varying sizes. This internal region is flanked by two long terminal repeats in direct orientation, sized between 149 and 157 bp. Among elements theoretically sized from 4,970 to 5,550 bp, we describe the full-length sequence of a reference element Tvv1-1, 5,343 nucleotides-long. The full-length sequence of Tvv1-1 compared to pea PDR1 shows a 53.3% identity. In addition, both elements contain long terminal repeats of nearly the same size in which the U5 region could be entirely absent. Therefore, we assume that Tvv1 and PDR1 could constitute a particular class of short LTRs retroelements.
Hu, H M; Chuang, C K; Lee, M J; Tseng, T C; Tang, T K
2000-11-01
We previously reported two novel testis-specific serine/threonine kinases, Aie1 (mouse) and AIE2 (human), that share high amino acid identities with the kinase domains of fly aurora and yeast Ipl1. Here, we report the entire intron-exon organization of the Aie1 gene and analyze the expression patterns of Aie1 mRNA during testis development. The mouse Aie1 gene spans approximately 14 kb and contains seven exons. The sequences of the exon-intron boundaries of the Aie1 gene conform to the consensus sequences (GT/AG) of the splicing donor and acceptor sites of most eukaryotic genes. Comparative genomic sequencing revealed that the gene structure is highly conserved between mouse Aie1 and human AIE2. However, much less homology was found in the sequence outside the kinase-coding domains. The Aie1 locus was mapped to mouse chromosome 7A2-A3 by fluorescent in situ hybridization. Northern blot analysis indicates that Aie1 mRNA likely is expressed at a low level on day 14 and reaches its plateau on day 21 in the developing postnatal testis. RNA in situ hybridization indicated that the expression of the Aie1 transcript was restricted to meiotically active germ cells, with the highest levels detected in spermatocytes at the late pachytene stage. These findings suggest that Aie1 plays a role in spermatogenesis.
Astakhova, L N; Zatsepina, O G; Przhiboro, A A; Evgen'ev, M B; Garbuz, D G
2013-06-01
The heat shock proteins belonging to the Hsp90 family (Hsp83 in Diptera) play a crucial role in the protection of cells due to their chaperoning functions. We sequenced hsp90 genes from three species of the family Stratiomyidae (Diptera) living in thermally different habitats and characterized by extraordinarily high thermotolerance. The sequence variation and structure of the hsp90 family genes were compared with previously described features of hsp70 copies isolated from the same species. Two functional hsp83 genes were found in the species studied, that are arranged in tandem orientation at least in one of them. This organization was not previously described. Stratiomyidae hsp83 genes share a high level of identity with hsp83 of Drosophila, and the deduced protein possesses five conserved amino acid sequence motifs characteristic of the Hsp90 family as well as the C-terminus MEEVD sequence characteristic of the cytosolic isoform. A comparison of the hsp83 promoters of two Stratiomyidae species from thermally contrasting habitats demonstrated that while both species contain canonical heat shock elements in the same position, only one of the species contains functional GAF-binding elements. Our data indicate that in the same species, hsp83 family genes show a higher evolution rate than the hsp70 family. © 2013 Royal Entomological Society.
Emaravirus: A Novel Genus of Multipartite, Negative Strand RNA Plant Viruses
Mielke-Ehret, Nicole; Mühlbach, Hans-Peter
2012-01-01
Ringspot symptoms in European mountain ash (Sorbus aucuparia L.), fig mosaic, rose rosette, raspberry leaf blotch, pigeonpea sterility mosaic (Cajanus cajan) and High Plains disease of maize and wheat were found to be associated with viruses that share several characteristics. They all have single-stranded multipartite RNA genomes of negative orientation. In some cases, double membrane-bound virus-like particles of 80 to 200 nm in diameter were found in infected tissue. Furthermore, at least five of these viruses were shown to be vectored by eriophyid mites. Sequences of European mountain ash ringspot-associated virus (EMARaV), Fig mosaic virus (FMV), rose rosette virus (RRV), raspberry leaf blotch virus (RLBV), pigeonpea sterility mosaic virus and High Plains virus strongly support their potential phylogenetic relationship. Therefore, after characterization of EMARaV, the novel genus Emaravirus was established, and FMV was the second virus species assigned to this genus. The recently sequenced RRV and RLBV are supposed to be additional members of this new group of plant RNA viruses. PMID:23170170
Ma, Qiao; Qu, Yuanyuan; Shen, Wenli; Zhang, Zhaojing; Wang, Jingwei; Liu, Ziyan; Li, Duanxing; Li, Huijie; Zhou, Jiti
2015-03-01
In this study, Illumina high-throughput sequencing was used to reveal the community structures of nine coking wastewater treatment plants (CWWTPs) in China for the first time. The sludge systems exhibited a similar community composition at each taxonomic level. Compared to previous studies, some of the core genera in municipal wastewater treatment plants such as Zoogloea, Prosthecobacter and Gp6 were detected as minor species. Thiobacillus (20.83%), Comamonas (6.58%), Thauera (4.02%), Azoarcus (7.78%) and Rhodoplanes (1.42%) were the dominant genera shared by at least six CWWTPs. The percentages of autotrophic ammonia-oxidizing bacteria and nitrite-oxidizing bacteria were unexpectedly low, which were verified by both real-time PCR and fluorescence in situ hybridization analyses. Hierarchical clustering and canonical correspondence analysis indicated that operation mode, flow rate and temperature might be the key factors in community formation. This study provides new insights into our understanding of microbial community compositions and structures of CWWTPs. Copyright © 2014 Elsevier Ltd. All rights reserved.
The rpoE operon regulates heat stress response in Burkholderia pseudomallei.
Vanaporn, Muthita; Vattanaviboon, Paiboon; Thongboonkerd, Visith; Korbsrisate, Sunee
2008-07-01
Burkholderia pseudomallei is a gram-negative bacterium and the causative agent of melioidosis, one of the important lethal diseases in tropical regions. In this article, we demonstrate the crucial role of the B. pseudomallei rpoE locus in the response to heat stress. The rpoE operon knockout mutant exhibited growth retardation and reduced survival when exposed to a high temperature. Expression analysis using rpoH promoter-lacZ fusion revealed that heat stress induction of rpoH, which encodes heat shock sigma factor (sigma(H)), was abolished in the B. pseudomallei rpoE mutant. Analysis of the rpoH promoter region revealed sequences sharing high homology to the consensus sequence of sigma(E)-dependent promoters. Moreover, the putative heat-induced sigma(H)-regulated heat shock proteins (i.e. GroEL and HtpG) were also absent in the rpoE operon mutant. Altogether, our data suggest that the rpoE operon regulates B. pseudomallei heat stress response through the function of rpoH.
Vasudevan, Kumar; Vera Cruz, Casiana M.; Gruissem, Wilhelm; Bhullar, Navreet K.
2016-01-01
Rice blast is caused by Magnaporthe oryzae, which is the most destructive fungal pathogen affecting rice growing regions worldwide. The rice blast resistance gene Pib confers broad-spectrum resistance against Southeast Asian M. oryzae races. We investigated the allelic diversity of Pib in rice germplasm originating from 12 major rice growing countries. Twenty-five new Pib alleles were identified that have unique single nucleotide polymorphisms (SNPs), insertions and/or deletions, in addition to the polymorphic nucleotides that are shared between the different alleles. These partially or completely shared polymorphic nucleotides indicate frequent sequence exchange events between the Pib alleles. In some of the new Pib alleles, nucleotide diversity is high in the LRR domain, whereas, in others it is distributed among the NB-ARC and LRR domains. Most of the polymorphic amino acids in LRR and NB-ARC2 domains are predicted as solvent-exposed. Several of the alleles and the unique SNPs are country specific, suggesting a diversifying selection of alleles in various geographical locations in response to the locally prevalent M. oryzae population. Together, the new Pib alleles are an important genetic resource for rice blast resistance breeding programs and provide new information on rice-M. oryzae interactions at the molecular level. PMID:27446145
Structures of Bacterial Biosynthetic Arginine Decarboxylases
DOE Office of Scientific and Technical Information (OSTI.GOV)
F Forouhar; S Lew; J Seetharaman
2011-12-31
Biosynthetic arginine decarboxylase (ADC; also known as SpeA) plays an important role in the biosynthesis of polyamines from arginine in bacteria and plants. SpeA is a pyridoxal-5'-phosphate (PLP)-dependent enzyme and shares weak sequence homology with several other PLP-dependent decarboxylases. Here, the crystal structure of PLP-bound SpeA from Campylobacter jejuni is reported at 3.0 {angstrom} resolution and that of Escherichia coli SpeA in complex with a sulfate ion is reported at 3.1 {angstrom} resolution. The structure of the SpeA monomer contains two large domains, an N-terminal TIM-barrel domain followed by a {beta}-sandwich domain, as well as two smaller helical domains. Themore » TIM-barrel and {beta}-sandwich domains share structural homology with several other PLP-dependent decarboxylases, even though the sequence conservation among these enzymes is less than 25%. A similar tetramer is observed for both C. jejuni and E. coli SpeA, composed of two dimers of tightly associated monomers. The active site of SpeA is located at the interface of this dimer and is formed by residues from the TIM-barrel domain of one monomer and a highly conserved loop in the {beta}-sandwich domain of the other monomer. The PLP cofactor is recognized by hydrogen-bonding, {pi}-stacking and van der Waals interactions.« less
Casals, Ferran; Cáceres, Mario; Manfrin, Maura Helena; González, Josefa; Ruiz, Alfredo
2005-04-01
Galileo is a foldback transposable element that has been implicated in the generation of two polymorphic chromosomal inversions in Drosophila buzzatii. Analysis of the inversion breakpoints led to the discovery of two additional elements, called Kepler and Newton, sharing sequence and structural similarities with Galileo. Here, we describe in detail the molecular structure of these three elements, on the basis of the 13 copies found at the inversion breakpoints plus 10 additional copies isolated during this work. Similarly to the foldback elements described in other organisms, these elements have long inverted terminal repeats, which in the case of Galileo possess a complex structure and display a high degree of internal variability between copies. A phylogenetic tree built with their shared sequences shows that the three elements are closely related and diverged approximately 10 million years ago. We have also analyzed the abundance and chromosomal distribution of these elements in D. buzzatii and other species of the repleta group by Southern analysis and in situ hybridization. Overall, the results suggest that these foldback elements are present in all the buzzatti complex species and may have played an important role in shaping their genomes. In addition, we show that recombination rate is the main factor determining the chromosomal distribution of these elements.
Solofoharivelo, Marie-Chrystine; Souza-Richards, Rose; Stephan, Dirk; Murray, Shane; Burger, Johan T.
2017-01-01
Phytoplasmas are cell wall-less plant pathogenic bacteria responsible for major crop losses throughout the world. In grapevine they cause grapevine yellows, a detrimental disease associated with a variety of symptoms. The high economic impact of this disease has sparked considerable interest among researchers to understand molecular mechanisms related to pathogenesis. Increasing evidence exist that a class of small non-coding endogenous RNAs, known as microRNAs (miRNAs), play an important role in post-transcriptional gene regulation during plant development and responses to biotic and abiotic stresses. Thus, we aimed to dissect complex high-throughput small RNA sequencing data for the genome-wide identification of known and novel differentially expressed miRNAs, using read libraries constructed from healthy and phytoplasma-infected Chardonnay leaf material. Furthermore, we utilised computational resources to predict putative miRNA targets to explore the involvement of possible pathogen response pathways. We identified multiple known miRNA sequence variants (isomiRs), likely generated through post-transcriptional modifications. Sequences of 13 known, canonical miRNAs were shown to be differentially expressed. A total of 175 novel miRNA precursor sequences, each derived from a unique genomic location, were predicted, of which 23 were differentially expressed. A homology search revealed that some of these novel miRNAs shared high sequence similarity with conserved miRNAs from other plant species, as well as known grapevine miRNAs. The relative expression of randomly selected known and novel miRNAs was determined with real-time RT-qPCR analysis, thereby validating the trend of expression seen in the normalised small RNA sequencing read count data. Among the putative miRNA targets, we identified genes involved in plant morphology, hormone signalling, nutrient homeostasis, as well as plant stress. Our results may assist in understanding the role that miRNA pathways play during plant pathogenesis, and may be crucial in understanding disease symptom development in aster yellows phytoplasma-infected grapevines. PMID:28813447
A Comparison of the First Two Sequenced Chloroplast Genomes in Asteraceae: Lettuce and Sunflower
DOE Office of Scientific and Technical Information (OSTI.GOV)
Timme, Ruth E.; Kuehl, Jennifer V.; Boore, Jeffrey L.
2006-01-20
Asteraceae is the second largest family of plants, with over 20,000 species. For the past few decades, numerous phylogenetic studies have contributed to our understanding of the evolutionary relationships within this family, including comparisons of the fast evolving chloroplast gene, ndhF, rbcL, as well as non-coding DNA from the trnL intron plus the trnLtrnF intergenic spacer, matK, and, with lesser resolution, psbA-trnH. This culminated in a study by Panero and Funk in 2002 that used over 13,000 bp per taxon for the largest taxonomic revision of Asteraceae in over a hundred years. Still, some uncertainties remain, and it would bemore » very useful to have more information on the relative rates of sequence evolution among various genes and on genome structure as a potential set of phylogenetic characters to help guide future phylogenetic structures. By way of contributing to this, we report the first two complete chloroplast genome sequences from members of the Asteraceae, those of Helianthus annuus and Lactuca sativa. These plants belong to two distantly related subfamilies, Asteroideae and Cichorioideae, respectively. In addition to these, there is only one other published chloroplast genome sequence for any plant within the larger group called Eusterids II, that of Panax ginseng (Araliaceae, 156,318 bps, AY582139). Early chloroplast genome mapping studies demonstrated that H. annuus and L. sativa share a 22 kb inversion relative to members of the subfamily Barnadesioideae. By comparison to outgroups, this inversion was shown to be derived, indicating that the Asteroideae and Cichorioideae are more closely related than either is to the Barnadesioideae. Later sequencing study found that taxa that share this 22 kb inversion also contain within this region a second, smaller, 3.3 kb inversion. These sequences also enable an analysis of patterns of shared repeats in the genomes at fine level and of RNA editing by comparison to available EST sequences. In addition, since both of these genomes are crop plants, their complete genome sequence will facilitate development of chloroplast genetic engineering technology, as in recent studies from Daniell's lab. Knowing the exact sequence from spacer regions is crucial for introducing transgenes into the chloroplast genome.« less
Gruel, Jérémy; LeBorgne, Michel; LeMeur, Nolwenn; Théret, Nathalie
2011-09-12
Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.
2011-01-01
Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks. PMID:21910886
Sharmin, Refat; Islam, Abul B M M K
2016-01-01
MERS-CoV is a newly emerged human coronavirus reported closely related with HKU4 and HKU5 Bat coronaviruses. Bat and MERS corona-viruses are structurally related. Therefore, it is of interest to estimate the degree of conserved antigenic sites among them. It is of importance to elucidate the shared antigenic-sites and extent of conservation between them to understand the evolutionary dynamics of MERS-CoV. Multiple sequence alignment of the spike (S), membrane (M), enveloped (E) and nucleocapsid (N) proteins was employed to identify the sequence conservation among MERS and Bat (HKU4, HKU5) coronaviruses. We used various in silico tools to predict the conserved antigenic sites. We found that MERS-CoV shared 30 % of its S protein antigenic sites with HKU4 and 70 % with HKU5 bat-CoV. Whereas 100 % of its E, M and N protein's antigenic sites are found to be conserved with those in HKU4 and HKU5. This sharing suggests that in case of pathogenicity MERS-CoV is more closely related to HKU5 bat-CoV than HKU4 bat-CoV. The conserved epitopes indicates their evolutionary relationship and ancestry of pathogenicity.
Cuartas, Paola E.; Barrera, Gloria P.; Belaich, Mariano N.; Barreto, Emiliano; Ghiringhelli, Pablo D.; Villamizar, Laura F.
2015-01-01
Spodoptera frugiperda (Lepidoptera: Noctuidae) is a major pest in maize crops in Colombia, and affects several regions in America. A granulovirus isolated from S. frugiperda (SfGV VG008) has potential as an enhancer of insecticidal activity of previously described nucleopolyhedrovirus from the same insect species (SfMNPV). The SfGV VG008 genome was sequenced and analyzed showing circular double stranded DNA of 140,913 bp encoding 146 putative ORFs that include 37 Baculoviridae core genes, 88 shared with betabaculoviruses, two shared only with betabaculoviruses from Noctuide insects, two shared with alphabaculoviruses, three copies of own genes (paralogs) and the other 14 corresponding to unique genes without representation in the other baculovirus species. Particularly, the genome encodes for important virulence factors such as 4 chitinases and 2 enhancins. The sequence analysis revealed the existence of eight homologous regions (hrs) and also suggests processes of gene acquisition by horizontal transfer including the SfGV VG008 ORFs 046/047 (paralogs), 059, 089 and 099. The bioinformatics evidence indicates that the genome donors of mentioned genes could be alpha- and/or betabaculovirus species. The previous reported ability of SfGV VG008 to naturally co-infect the same host with other virus show a possible mechanism to capture genes and thus improve its fitness. PMID:25609309
Cuartas, Paola E; Barrera, Gloria P; Belaich, Mariano N; Barreto, Emiliano; Ghiringhelli, Pablo D; Villamizar, Laura F
2015-01-20
Spodoptera frugiperda (Lepidoptera: Noctuidae) is a major pest in maize crops in Colombia, and affects several regions in America. A granulovirus isolated from S. frugiperda (SfGV VG008) has potential as an enhancer of insecticidal activity of previously described nucleopolyhedrovirus from the same insect species (SfMNPV). The SfGV VG008 genome was sequenced and analyzed showing circular double stranded DNA of 140,913 bp encoding 146 putative ORFs that include 37 Baculoviridae core genes, 88 shared with betabaculoviruses, two shared only with betabaculoviruses from Noctuide insects, two shared with alphabaculoviruses, three copies of own genes (paralogs) and the other 14 corresponding to unique genes without representation in the other baculovirus species. Particularly, the genome encodes for important virulence factors such as 4 chitinases and 2 enhancins. The sequence analysis revealed the existence of eight homologous regions (hrs) and also suggests processes of gene acquisition by horizontal transfer including the SfGV VG008 ORFs 046/047 (paralogs), 059, 089 and 099. The bioinformatics evidence indicates that the genome donors of mentioned genes could be alpha- and/or betabaculovirus species. The previous reported ability of SfGV VG008 to naturally co-infect the same host with other virus show a possible mechanism to capture genes and thus improve its fitness.
Wang, Yanqun; Li, Yamin; Lu, Roujian; Zhao, Yanjie; Xie, Zhengde; Shen, Jun; Tan, Wenjie
2016-03-10
Human adenoviruses (HAdVs) are prevalent in hospitalized children with severe acute respiratory infection (SARI). Here, we report a unique recombinant HAdV strain (CBJ113) isolated from a HAdV-positive child with SARI. The whole-genome sequence was determined using Sanger sequencing and high-throughput sequencing. A phylogenetic analysis of the complete genome indicated that the CBJ113 strain shares a common origin with HAdV-C2, HAdV-C6, HAdV-C1, HAdV-C5, and HAdV-C57 and formed a novel subclade on the same branch as other HAdV-C subtypes. BootScan and single nucleotide polymorphism analyses showed that the CBJ113 genome has an intra-subtype recombinant structure and comprises gene regions mainly originating from two circulating viral strains: HAdV-1 and HAdV-2. The parental penton base, pVI, and DBP genes of the recombinant strain clustered with the HAdV-1 prototype strain, and the E1B, hexon, fiber, and 100 K genes of the recombinant clustered within the HAdV-2 subtype, meanwhile the E4orf1 and DNA polymerase genes of the recombinant shared the greatest similarity with those of HAdV-5 and HAdV-6, respectively. All of these findings provide insight into our understanding of the dynamics of the complexity of the HAdV-C epidemic. More extensive studies should address the pathogenicity and clinical characteristics of the novel recombinant.
Wang, Yanqun; Li, Yamin; Lu, Roujian; Zhao, Yanjie; Xie, Zhengde; Shen, Jun; Tan, Wenjie
2016-01-01
Human adenoviruses (HAdVs) are prevalent in hospitalized children with severe acute respiratory infection (SARI). Here, we report a unique recombinant HAdV strain (CBJ113) isolated from a HAdV-positive child with SARI. The whole-genome sequence was determined using Sanger sequencing and high-throughput sequencing. A phylogenetic analysis of the complete genome indicated that the CBJ113 strain shares a common origin with HAdV-C2, HAdV-C6, HAdV-C1, HAdV-C5, and HAdV-C57 and formed a novel subclade on the same branch as other HAdV-C subtypes. BootScan and single nucleotide polymorphism analyses showed that the CBJ113 genome has an intra-subtype recombinant structure and comprises gene regions mainly originating from two circulating viral strains: HAdV-1 and HAdV-2. The parental penton base, pVI, and DBP genes of the recombinant strain clustered with the HAdV-1 prototype strain, and the E1B, hexon, fiber, and 100 K genes of the recombinant clustered within the HAdV-2 subtype, meanwhile the E4orf1 and DNA polymerase genes of the recombinant shared the greatest similarity with those of HAdV-5 and HAdV-6, respectively. All of these findings provide insight into our understanding of the dynamics of the complexity of the HAdV-C epidemic. More extensive studies should address the pathogenicity and clinical characteristics of the novel recombinant. PMID:26960434
Molecular characterization of an ependymin precursor from goldfish brain.
Königstorfer, A; Sterrer, S; Eckerskorn, C; Lottspeich, F; Schmidt, R; Hoffmann, W
1989-01-01
Ependymins are thought to be implicated in fundamental processes involved in plasticity of the goldfish CNS. Gas-phase sequencing of purified ependymins beta and gamma revealed that they share the same N-terminal sequence. Each sequence displays microheterogeneities at several positions. Based on the protein sequences obtained, we constructed synthetic oligonucleotides and used them as hybridization probes for screening cDNA libraries of goldfish brain. In this article we describe the full-length sequence of a mRNA encoding a precursor of ependymins. A cleavable signal sequence characteristic of secretory proteins is located at the N-terminal end, followed directly by the ependymin sequence. Also, two potential N-glycosylation sites were detected. A computer search revealed that ependymins form a novel family of unique proteins.
Bushakra, Jill M; Lewers, Kim S; Staton, Margaret E; Zhebentyayeva, Tetyana; Saski, Christopher A
2015-10-26
Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed sequence tags (ESTs) are a source of SSRs that can be used to develop markers to facilitate plant breeding and for more basic research across genera and higher plant orders. Leaf and meristem tissue from 'Heritage' red raspberry (Rubus idaeus) and 'Bristol' black raspberry (R. occidentalis) were utilized for RNA extraction. After conversion to cDNA and library construction, ESTs were sequenced, quality verified, assembled and scanned for SSRs. Primers flanking the SSRs were designed and a subset tested for amplification, polymorphism and transferability across species. ESTs containing SSRs were functionally annotated using the GenBank non-redundant (nr) database and further classified using the gene ontology database. To accelerate development of EST-SSRs in the genus Rubus (Rosaceae), 1149 and 2358 cDNA sequences were generated from red raspberry and black raspberry, respectively. The cDNA sequences were screened using rigorous filtering criteria which resulted in the identification of 121 and 257 SSR loci for red and black raspberry, respectively. Primers were designed from the surrounding sequences resulting in 131 and 288 primer pairs, respectively, as some sequences contained more than one SSR locus. Sequence analysis revealed that the SSR-containing genes span a diversity of functions and share more sequence identity with strawberry genes than with other Rosaceous species. This resource of Rubus-specific, gene-derived markers will facilitate the construction of linkage maps composed of transferable markers for studying and manipulating important traits in this economically important genus.
Li, Jingtao; Sun, Xinhua; Yu, Gang; Jia, Chengguo; Liu, Jinliang; Pan, Hongyu
2014-01-01
Little information is available on gene expression profiling of halophyte A. canescens. To elucidate the molecular mechanism for stress tolerance in A. canescens, a full-length complementary DNA library was generated from A. canescens exposed to 400 mM NaCl, and provided 343 high-quality ESTs. In an evaluation of 343 valid EST sequences in the cDNA library, 197 unigenes were assembled, among which 190 unigenes (83.1% ESTs) were identified according to their significant similarities with proteins of known functions. All the 343 EST sequences have been deposited in the dbEST GenBank under accession numbers JZ535802 to JZ536144. According to Arabidopsis MIPS functional category and GO classifications, we identified 193 unigenes of the 311 annotations EST, representing 72 non-redundant unigenes sharing similarities with genes related to the defense response. The sets of ESTs obtained provide a rich genetic resource and 17 up-regulated genes related to salt stress resistance were identified by qRT-PCR. Six of these genes may contribute crucially to earlier and later stage salt stress resistance. Additionally, among the 343 unigenes sequences, 22 simple sequence repeats (SSRs) were also identified contributing to the study of A. canescens resources. PMID:24960361
Raymond, Frédéric; Boisvert, Sébastien; Roy, Gaétan; Ritt, Jean-François; Légaré, Danielle; Isnard, Amandine; Stanke, Mario; Olivier, Martin; Tremblay, Michel J.; Papadopoulou, Barbara; Ouellette, Marc; Corbeil, Jacques
2012-01-01
The Leishmania tarentolae Parrot-TarII strain genome sequence was resolved to an average 16-fold mean coverage by next-generation DNA sequencing technologies. This is the first non-pathogenic to humans kinetoplastid protozoan genome to be described thus providing an opportunity for comparison with the completed genomes of pathogenic Leishmania species. A high synteny was observed between all sequenced Leishmania species. A limited number of chromosomal regions diverged between L. tarentolae and L. infantum, while remaining syntenic to L. major. Globally, >90% of the L. tarentolae gene content was shared with the other Leishmania species. We identified 95 predicted coding sequences unique to L. tarentolae and 250 genes that were absent from L. tarentolae. Interestingly, many of the latter genes were expressed in the intracellular amastigote stage of pathogenic species. In addition, genes coding for products involved in antioxidant defence or participating in vesicular-mediated protein transport were underrepresented in L. tarentolae. In contrast to other Leishmania genomes, two gene families were expanded in L. tarentolae, namely the zinc metallo-peptidase surface glycoprotein GP63 and the promastigote surface antigen PSA31C. Overall, L. tarentolae's gene content appears better adapted to the promastigote insect stage rather than the amastigote mammalian stage. PMID:21998295
Evolution and Diversity in Human Herpes Simplex Virus Genomes
Gatherer, Derek; Ochoa, Alejandro; Greenbaum, Benjamin; Dolan, Aidan; Bowden, Rory J.; Enquist, Lynn W.; Legendre, Matthieu; Davison, Andrew J.
2014-01-01
Herpes simplex virus 1 (HSV-1) causes a chronic, lifelong infection in >60% of adults. Multiple recent vaccine trials have failed, with viral diversity likely contributing to these failures. To understand HSV-1 diversity better, we comprehensively compared 20 newly sequenced viral genomes from China, Japan, Kenya, and South Korea with six previously sequenced genomes from the United States, Europe, and Japan. In this diverse collection of passaged strains, we found that one-fifth of the newly sequenced members share a gene deletion and one-third exhibit homopolymeric frameshift mutations (HFMs). Individual strains exhibit genotypic and potential phenotypic variation via HFMs, deletions, short sequence repeats, and single-nucleotide polymorphisms, although the protein sequence identity between strains exceeds 90% on average. In the first genome-scale analysis of positive selection in HSV-1, we found signs of selection in specific proteins and residues, including the fusion protein glycoprotein H. We also confirmed previous results suggesting that recombination has occurred with high frequency throughout the HSV-1 genome. Despite this, the HSV-1 strains analyzed clustered by geographic origin during whole-genome distance analysis. These data shed light on likely routes of HSV-1 adaptation to changing environments and will aid in the selection of vaccine antigens that are invariant worldwide. PMID:24227835
Lalonde, Emilie; Albrecht, Steffen; Ha, Kevin C H; Jacob, Karine; Bolduc, Nathalie; Polychronakos, Constantin; Dechelotte, Pierre; Majewski, Jacek; Jabado, Nada
2010-08-01
Protein coding genes constitute approximately 1% of the human genome but harbor 85% of the mutations with large effects on disease-related traits. Therefore, efficient strategies for selectively sequencing complete coding regions (i.e., "whole exome") have the potential to contribute our understanding of human diseases. We used a method for whole-exome sequencing coupling Agilent whole-exome capture to the Illumina DNA-sequencing platform, and investigated two unrelated fetuses from nonconsanguineous families with Fowler Syndrome (FS), a stereotyped phenotype lethal disease. We report novel germline mutations in feline leukemia virus subgroup C cellular-receptor-family member 2, FLVCR2, which has recently been shown to cause FS. Using this technology, we identified three types of genetic abnormalities: point-mutations, insertions-deletions, and intronic splice-site changes (first pathogenic report using this technology), in the fetuses who both were compound heterozygotes for the disease. Although revealing a high level of allelic heterogeneity and mutational spectrum in FS, this study further illustrates the successful application of whole-exome sequencing to uncover genetic defects in rare Mendelian disorders. Of importance, we show that we can identify genes underlying rare, monogenic and recessive diseases using a limited number of patients (n=2), in the absence of shared genetic heritage and in the presence of allelic heterogeneity.
Abebe-Akele, Feseha; Tisa, Louis S; Cooper, Vaughn S; Hatcher, Philip J; Abebe, Eyualem; Thomas, W Kelley
2015-07-18
Entomopathogenic associations between nematodes in the genera Steinernema and Heterorhabdus with their cognate bacteria from the bacterial genera Xenorhabdus and Photorhabdus, respectively, are extensively studied for their potential as biological control agents against invasive insect species. These two highly coevolved associations were results of convergent evolution. Given the natural abundance of bacteria, nematodes and insects, it is surprising that only these two associations with no intermediate forms are widely studied in the entomopathogenic context. Discovering analogous systems involving novel bacterial and nematode species would shed light on the evolutionary processes involved in the transition from free living organisms to obligatory partners in entomopathogenicity. We report the complete genome sequence of a new member of the enterobacterial genus Serratia that forms a putative entomopathogenic complex with Caenorhabditis briggsae. Analysis of the 5.04 MB chromosomal genome predicts 4599 protein coding genes, seven sets of ribosomal RNA genes, 84 tRNA genes and a 64.8 KB plasmid encoding 74 genes. Comparative genomic analysis with three of the previously sequenced Serratia species, S. marcescens DB11 and S. proteamaculans 568, and Serratia sp. AS12, revealed that these four representatives of the genus share a core set of ~3100 genes and extensive structural conservation. The newly identified species shares a more recent common ancestor with S. marcescens with 99% sequence identity in rDNA sequence and orthology across 85.6% of predicted genes. Of the 39 genes/operons implicated in the virulence, symbiosis, recolonization, immune evasion and bioconversion, 21 (53.8%) were present in Serratia while 33 (84.6%) and 35 (89%) were present in Xenorhabdus and Photorhabdus EPN bacteria respectively. The majority of unique sequences in Serratia sp. SCBI (South African Caenorhabditis briggsae Isolate) are found in ~29 genomic islands of 5 to 65 genes and are enriched in putative functions that are biologically relevant to an entomopathogenic lifestyle, including non-ribosomal peptide synthetases, bacteriocins, fimbrial biogenesis, ushering proteins, toxins, secondary metabolite secretion and multiple drug resistance/efflux systems. By revealing the early stages of adaptation to this lifestyle, the Serratia sp. SCBI genome underscores the fact that in EPN formation the composite end result - killing, bioconversion, cadaver protection and recolonization- can be achieved by dissimilar mechanisms. This genome sequence will enable further study of the evolution of entomopathogenic nematode-bacteria complexes.
Brinch-Pedersen, Henrik
2013-01-01
The phytase activity in food and feedstuffs is an important nutritional parameter. Members of the Triticeae tribe accumulate purple acid phosphatase phytases (PAPhy) during grain filling. This accumulation elevates mature grain phytase activities (MGPA) up to levels between ~650 FTU/kg for barley and 6000 FTU/kg for rye. This is notably more than other cereals. For instance, rice, maize, and oat have MGPAs below 100 FTU/kg. The cloning and characterization of the PAPhy gene complement from wheat, barley, rye, einkorn, and Aegilops tauschii is reported here. The Triticeae PAPhy genes generally consist of a set of paralogues, PAPhy_a and PAPhy_b, and have been mapped to Triticeae chromosomes 5 and 3, respectively. The promoters share a conserved core but the PAPhy_a promoter have acquired a novel cis-acting regulatory element for expression during grain filling while the PAPhy_b promoter has maintained the archaic function and drives expression during germination. Brachypodium is the only sequenced Poaceae sharing the PAPhy duplication. As for the Triticeae, the duplication is reflected in a high MGPA of ~4200 FTU/kg in Brachypodium. The sequence conservation of the paralogous loci on Brachypodium chromosomes 1 and 2 does not extend beyond the PAPhy gene. The results indicate that a single-gene segmental duplication may have enabled the evolution of high MGPA by creating functional redundancy of the parent PAPhy gene. This implies that similar MGPA levels may be out of reach in breeding programs for some Poaceae, e.g. maize and rice, whereas Triticeae breeders should focus on PAPhy_a. PMID:23918958
Saski, Christopher; Lee, Seung-Bum; Fjellheim, Siri; Guda, Chittibabu; Jansen, Robert K.; Luo, Hong; Tomkins, Jeffrey; Rognli, Odd Arne; Clarke, Jihong Liu
2009-01-01
Comparisons of complete chloroplast genome sequences of Hordeum vulgare, Sorghum bicolor and Agrostis stolonifera to six published grass chloroplast genomes reveal that gene content and order are similar but two microstructural changes have occurred. First, the expansion of the IR at the SSC/IRa boundary that duplicates a portion of the 5′ end of ndhH is restricted to the three genera of the subfamily Pooideae (Agrostis, Hordeum and Triticum). Second, a 6 bp deletion in ndhK is shared by Agrostis, Hordeum, Oryza and Triticum, and this event supports the sister relationship between the subfamilies Erhartoideae and Pooideae. Repeat analysis identified 19–37 direct and inverted repeats 30 bp or longer with a sequence identity of at least 90%. Seventeen of the 26 shared repeats are found in all the grass chloroplast genomes examined and are located in the same genes or intergenic spacer (IGS) regions. Examination of simple sequence repeats (SSRs) identified 16–21 potential polymorphic SSRs. Five IGS regions have 100% sequence identity among Zea mays, Saccharum officinarum and Sorghum bicolor, whereas no spacer regions were identical among Oryza sativa, Triticum aestivum, H. vulgare and A. stolonifera despite their close phylogenetic relationship. Alignment of EST sequences and DNA coding sequences identified six C–U conversions in both Sorghum bicolor and H. vulgare but only one in A. stolonifera. Phylogenetic trees based on DNA sequences of 61 protein-coding genes of 38 taxa using both maximum parsimony and likelihood methods provide moderate support for a sister relationship between the subfamilies Erhartoideae and Pooideae. PMID:17534593
Discovery of a novel iflavirus sequence in the eastern paralysis tick Ixodes holocyclus.
O'Brien, Caitlin A; Hall-Mendelin, Sonja; Hobson-Peters, Jody; Deliyannis, Georgia; Allen, Andy; Lew-Tabor, Ala; Rodriguez-Valle, Manuel; Barker, Dayana; Barker, Stephen C; Hall, Roy A
2018-05-11
Ixodes holocyclus, the eastern paralysis tick, is a significant parasite in Australia in terms of animal and human health. However, very little is known about its virome. In this study, next-generation sequencing of I. holocyclus salivary glands yielded a full-length genome sequence which phylogenetically groups with viruses classified in the Iflaviridae family and shares 45% amino acid similarity with its closest relative Bole hyalomma asiaticum virus 1. The sequence of this virus, provisionally named Ixodes holocyclus iflavirus (IhIV) has been identified in tick populations from northern New South Wales and Queensland, Australia and represents the first virus sequence reported from I. holocyclus.
Meta4: a web application for sharing and annotating metagenomic gene predictions using web services.
Richardson, Emily J; Escalettes, Franck; Fotheringham, Ian; Wallace, Robert J; Watson, Mick
2013-01-01
Whole-genome shotgun metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website, code is available on Github, a cloud image is available, and an example implementation can be seen at.
Genetic population structure in the yellow mongoose, Cynictis penicillata.
Van Vuuren, B J; Robinson, T J
1997-12-01
Phylogeographic structure was determined for the yellow mongoose, Cynictis penicillata, using mtDNA RFLPs and control region sequences. The RFLP analysis revealed 13 haplotypes which showed weak geographical patterning consistent with a recent range expansion from a refugial population(s). An analysis of molecular variance (AMOVA) revealed no correspondence between mtDNA phylogeography and subspecies delimitation, nor between matrilines and areas characterized by a high incidence of the viverrid-type rabies, of which the yellow mongoose is the principal vector. The lack of structure was also shown by control region sequences although four of the maternal lineages shared a near-perfect 81 bp repeat. We speculate that regional hot spots of the viverrid rabies biotype reflect population density differences in the yellow mongoose that are not underscored by genetic partitioning, at least at the level of resolution provided by our analyses.
The Microsoft Biology Foundation Applications for High-Throughput Sequencing
Mercer, S.
2010-01-01
w9-2 The need for reusable libraries of bioinformatics functions has been recognized for many years and a number of language-specific toolkits have been constructed. Such toolkits have served as valuable nucleation points for the community, promoting the sharing of code and establishing standards. The majority of DNA sequencing machines and many other standard pieces of lab equipment are controlled by PCs using Windows, and a Microsoft genomics toolkit would enable initial processing and quality control to happen closer to the instrumentation and provide opportunities for added-value services within core facilities. The Microsoft Biology Foundation (MBF) is an open source software library, freely available for both commercial and academic use, available as an early-stage betafrom mbf.codeplex.com. This presentation will describe the structure and goals of MBF and demonstrate some of its uses.
Complete genome sequence of Paris mosaic necrosis virus, a distinct member of the genus Potyvirus
USDA-ARS?s Scientific Manuscript database
The complete genomic sequence of a novel potyvirus was determined from Paris polyphylla var. yunnanensis. Its genomic RNA consists of 9,660 nucleotides (nt) excluding the 3’-terminal poly (A) tail, containing a single open reading frame (ORF) encoding a large polyprotein. The virus shares 52.1-69.7%...
Enhancing the Breadth and Efficacy of Therapeutic Vaccines for Breast Cancer
2015-10-01
and get the top shared TCR sequences of CD8 T cells from the tumor, TDLN, and peripheral blood. These sequences will be used to make avatars and these... avatars will be screened against HLA- A2+ BC cell lines, Oregon’s eluted peptides, and Denver’s Baculovirus library. 9 Outline of the project
USDA-ARS?s Scientific Manuscript database
Plant class IV chitinases are composed of a carboxy-terminal chitinase domain that is attached, through a linker sequence, to a small amino-terminal domain that can be thought of as a structured peptide. While both the peptide-like domain and the chitinase domain share sequence homology throughout m...
USDA-ARS?s Scientific Manuscript database
Our recent study has shown that bovine rhinovirus type 2 (BRV2), a new member of the Aphthovirus genus, shares many motifs and sequence similarities with foot-and-mouth disease virus (FMDV). Despite low sequence conservation (36percent amino acid identity) and N- and C-terminus folding differences,...
Genetic history of an archaic hominin group from Denisova Cave in Siberia
Reich, David; Green, Richard E.; Kircher, Martin; Krause, Johannes; Patterson, Nick; Durand, Eric Y.; Viola, Bence; Briggs, Adrian W.; Stenzel, Udo; Johnson, Philip L. F.; Maricic, Tomislav; Good, Jeffrey M.; Marques-Bonet, Tomas; Alkan, Can; Fu, Qiaomei; Mallick, Swapan; Li, Heng; Meyer, Matthias; Eichler, Evan E.; Stoneking, Mark; Richards, Michael; Talamo, Sahra; Shunkov, Michael V.; Derevianko, Anatoli P.; Hublin, Jean-Jacques; Kelso, Janet; Slatkin, Montgomery; Pääbo, Svante
2015-01-01
Using DNA extracted from a finger bone found in Denisova Cave in southern Siberia, we have sequenced the genome of an archaic hominin to about 1.9-fold coverage. This individual is from a group that shares a common origin with Neanderthals. This population was not involved in the putative gene flow from Neanderthals into Eurasians; however, the data suggest that it contributed 4–6% of its genetic material to the genomes of present-day Melanesians. We designate this hominin population ‘Denisovans’ and suggest that it may have been widespread in Asia during the Late Pleistocene epoch. A tooth found in Denisova Cave carries a mitochondrial genome highly similar to that of the finger bone. This tooth shares no derived morphological features with Neanderthals or modern humans, further indicating that Denisovans have an evolutionary history distinct from Neanderthals and modern humans. PMID:21179161
Bills, Gerald F; Yue, Qun; Chen, Li; Li, Yan; An, Zhiqiang; Frisvad, Jens C
2016-03-01
The invalidly published name Aspergillus sydowii var. mulundensis was proposed for a strain of Aspergillus that produced new echinocandin metabolites designated as the mulundocadins. Reinvestigation of this strain (Y-30462=DSMZ 5745) using phylogenetic, morphological, and metabolic data indicated that it is a distinct and novel species of Aspergillus sect. Nidulantes. The taxonomic novelty, Aspergillus mulundensis, is introduced for this historically important echinocandin-producing strain. The closely related A. nidulans FGSC A4 has one of the most extensively characterized secondary metabolomes of any filamentous fungus. Comparison of the full-genome sequences of DSMZ 5745 and FGSC A4 indicated that the two strains share 33 secondary metabolite biosynthetic gene clusters. These shared gene clusters represent ~45% of the total secondary metabolome of each strain, thus indicating a high level intraspecific divergence in terms of secondary metabolism.
Human, Mouse, and Rat Genome Large-Scale Rearrangements: Stability Versus Speciation
Zhao, Shaying; Shetty, Jyoti; Hou, Lihua; Delcher, Arthur; Zhu, Baoli; Osoegawa, Kazutoyo; de Jong, Pieter; Nierman, William C.; Strausberg, Robert L.; Fraser, Claire M.
2004-01-01
Using paired-end sequences from bacterial artificial chromosomes, we have constructed high-resolution synteny and rearrangement breakpoint maps among human, mouse, and rat genomes. Among the >300 syntenic blocks identified are segments of over 40 Mb without any detected interspecies rearrangements, as well as regions with frequently broken synteny and extensive rearrangements. As closely related species, mouse and rat share the majority of the breakpoints and often have the same types of rearrangements when compared with the human genome. However, the breakpoints not shared between them indicate that mouse rearrangements are more often interchromosomal, whereas intrachromosomal rearrangements are more prominent in rat. Centromeres may have played a significant role in reorganizing a number of chromosomes in all three species. The comparison of the three species indicates that genome rearrangements follow a path that accommodates a delicate balance between maintaining a basic structure underlying all mammalian species and permitting variations that are necessary for speciation. PMID:15364903
Implementation of Parallel Dynamic Simulation on Shared-Memory vs. Distributed-Memory Environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jin, Shuangshuang; Chen, Yousu; Wu, Di
2015-12-09
Power system dynamic simulation computes the system response to a sequence of large disturbance, such as sudden changes in generation or load, or a network short circuit followed by protective branch switching operation. It consists of a large set of differential and algebraic equations, which is computational intensive and challenging to solve using single-processor based dynamic simulation solution. High-performance computing (HPC) based parallel computing is a very promising technology to speed up the computation and facilitate the simulation process. This paper presents two different parallel implementations of power grid dynamic simulation using Open Multi-processing (OpenMP) on shared-memory platform, and Messagemore » Passing Interface (MPI) on distributed-memory clusters, respectively. The difference of the parallel simulation algorithms and architectures of the two HPC technologies are illustrated, and their performances for running parallel dynamic simulation are compared and demonstrated.« less
Inferring Recent Demography from Isolation by Distance of Long Shared Sequence Blocks
Ringbauer, Harald; Coop, Graham
2017-01-01
Recently it has become feasible to detect long blocks of nearly identical sequence shared between pairs of genomes. These identity-by-descent (IBD) blocks are direct traces of recent coalescence events and, as such, contain ample signal to infer recent demography. Here, we examine sharing of such blocks in two-dimensional populations with local migration. Using a diffusion approximation to trace genetic ancestry, we derive analytical formulas for patterns of isolation by distance of IBD blocks, which can also incorporate recent population density changes. We introduce an inference scheme that uses a composite-likelihood approach to fit these formulas. We then extensively evaluate our theory and inference method on a range of scenarios using simulated data. We first validate the diffusion approximation by showing that the theoretical results closely match the simulated block-sharing patterns. We then demonstrate that our inference scheme can accurately and robustly infer dispersal rate and effective density, as well as bounds on recent dynamics of population density. To demonstrate an application, we use our estimation scheme to explore the fit of a diffusion model to Eastern European samples in the Population Reference Sample data set. We show that ancestry diffusing with a rate of σ≈50−−100 km/gen during the last centuries, combined with accelerating population growth, can explain the observed exponential decay of block sharing with increasing pairwise sample distance. PMID:28108588
Patient-shared TCRβ-CDR3 clonotypes correlate with favorable prognosis in chronic hepatitis B.
Jiang, Qiong; Zhao, Tingting; Zheng, Wenhong; Zhou, Jijun; Wang, Haoliang; Dong, Hui; Chen, Yongwen; Tang, Xiaoqin; Liu, Cong; Ye, Lilin; Mao, Qing; Wang, Chunlin; Han, Jian; Shang, Xiaoyun; Wu, Yuzhang
2018-06-01
The presence of shared T cell clonotypes was found in several different diseases, but its relationship with the progression of disease remains unclear. By sequencing the complementary determining region 3 of T cell receptor (TCR) β chains from the purified antigen-experienced CD8 + T cells, we characterized the T cell repertoire in a prospective cohort study among 75 patients with chronic hepatitis B in China, as well as a healthy control and a validation cohort. We found that most T cell clones from patients harbored the 'patient-specific' TCR sequences. However, 'patient-shared' TCR clonotypes were also widely found, which correlated with the favorable turnover of disease. Interestingly, the frequency of the 'patient-shared' clonotypes can serve as a biomarker for favorable prognosis. Based on the clonotypes in those patients with favorable outcomes, we created a database including several clusters of protective anti-HBV CD8 + T cell clonotypes that might be a reasonable target for therapeutic vaccine development or adoptive cell transfer therapy. These findings were validated in an additional independent cohort of patients. These results suggest that the 'patient-shared' TCR clonotypes may serve as a valuable prognostic tool in the treatment of chronic hepatitis B and possibly other chronic viral diseases. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Structural Plasticity and Rapid Evolution in a Viral RNA Revealed by In Vivo Genetic Selection▿ †
Guo, Rong; Lin, Wai; Zhang, Jiuchun; Simon, Anne E.; Kushner, David B.
2009-01-01
Satellite RNAs usually lack substantial homology with their helper viruses. The 356-nucleotide satC of Turnip crinkle virus (TCV) is unusual in that its 3′-half shares high sequence similarity with the TCV 3′ end. Computer modeling, structure probing, and/or compensatory mutagenesis identified four hairpins and three pseudoknots in this TCV region that participate in replication and/or translation. Two hairpins and two pseudoknots have been confirmed as important for satC replication. One portion of the related 3′ end of satC that remains poorly characterized corresponds to juxtaposed TCV hairpins H4a and H4b and pseudoknot ψ3, which are required for the TCV-specific requirement of translation (V. A. Stupina et al., RNA 14:2379-2393, 2008). Replacement of satC H4a with randomized sequence and scoring for fitness in plants by in vivo genetic selection (SELEX) resulted in winning sequences that contain an H4a-like stem-loop, which can have additional upstream sequence composing a portion of the stem. SELEX of the combined H4a and H4b region in satC generated three distinct groups of winning sequences. One group models into two stem-loops similar to H4a and H4b of TCV. However, the selected sequences in the other two groups model into single hairpins. Evolution of these single-hairpin SELEX winners in plants resulted in satC that can accumulate to wild-type (wt) levels in protoplasts but remain less fit in planta when competed against wt satC. These data indicate that two highly distinct RNA conformations in the H4a and H4b region can mediate satC fitness in protoplasts. PMID:19004956
Sequence divergence of the red and green visual pigments in great apes and humans.
Deeb, S S; Jorgensen, A L; Battisti, L; Iwasaki, L; Motulsky, A G
1994-01-01
We have determined the coding sequences of red and green visual pigment genes of the chimpanzee, gorilla, and orangutan. The deduced amino acid sequences of these pigments are highly homologous to the equivalent human pigments. None of the amino acid differences occurred at sites that were previously shown to influence pigment absorption characteristics. Therefore, we predict the spectra of red and green pigments of the apes to have wavelengths of maximum absorption that differ by < 2 nm from the equivalent human pigments and that color vision in these nonhuman primates will be very similar, if not identical, to that in humans. A total of 14 within-species polymorphisms (6 involving silent substitutions) were observed in the coding sequences of the red and green pigment genes of the great apes. Remarkably, the polymorphisms at 6 of these sites had been observed in human populations, suggesting that they predated the evolution of higher primates. Alleles at polymorphic sites were often shared between the red and green pigment genes. The average synonymous rate of divergence of red from green sequences was approximately 1/10th that estimated for other proteins of higher primates, indicating the involvement of gene conversion in generating these polymorphisms. The high degree of homology and juxtaposition of these two genes on the X chromosome has promoted unequal recombination and/or gene conversion that led to sequence homogenization. However, natural selection operated to maintain the degree of separation in peak absorbance between the red and green pigments that resulted in optimal chromatic discrimination. This represents a unique case of molecular coevolution between two homologous genes that functionally interact at the behavioral level. PMID:8041777
Cloning and sequencing of the cDNA species for mammalian dimeric dihydrodiol dehydrogenases.
Arimitsu, E; Aoki, S; Ishikura, S; Nakanishi, K; Matsuura, K; Hara, A
1999-01-01
Cynomolgus and Japanese monkey kidneys, dog and pig livers and rabbit lens contain dimeric dihydrodiol dehydrogenase (EC 1.3.1.20) associated with high carbonyl reductase activity. Here we have isolated cDNA species for the dimeric enzymes by reverse transcriptase-PCR from human intestine in addition to the above five animal tissues. The amino acid sequences deduced from the monkey, pig and dog cDNA species perfectly matched the partial sequences of peptides digested from the respective enzymes of these animal tissues, and active recombinant proteins were expressed in a bacterial system from the monkey and human cDNA species. Northern blot analysis revealed the existence of a single 1.3 kb mRNA species for the enzyme in these animal tissues. The human enzyme shared 94%, 85%, 84% and 82% amino acid identity with the enzymes of the two monkey strains (their sequences were identical), the dog, the pig and the rabbit respectively. The sequences of the primate enzymes consisted of 335 amino acid residues and lacked one amino acid compared with the other animal enzymes. In contrast with previous reports that other types of dihydrodiol dehydrogenase, carbonyl reductases and enzymes with either activity belong to the aldo-keto reductase family or the short-chain dehydrogenase/reductase family, dimeric dihydrodiol dehydrogenase showed no sequence similarity with the members of the two protein families. The dimeric enzyme aligned with low degrees of identity (14-25%) with several prokaryotic proteins, in which 47 residues are strictly or highly conserved. Thus dimeric dihydrodiol dehydrogenase has a primary structure distinct from the previously known mammalian enzymes and is suggested to constitute a novel protein family with the prokaryotic proteins. PMID:10477285
Gomulski, Ludvik M; Dimopoulos, George; Xi, Zhiyong; Soares, Marcelo B; Bonaldo, Maria F; Malacrida, Anna R; Gasperi, Giuliano
2008-01-01
Background The medfly, Ceratitis capitata, is a highly invasive agricultural pest that has become a model insect for the development of biological control programs. Despite research into the behavior and classical and population genetics of this organism, the quantity of sequence data available is limited. We have utilized an expressed sequence tag (EST) approach to obtain detailed information on transcriptome signatures that relate to a variety of physiological systems in the medfly; this information emphasizes on reproduction, sex determination, and chemosensory perception, since the study was based on normalized cDNA libraries from embryos and adult heads. Results A total of 21,253 high-quality ESTs were obtained from the embryo and head libraries. Clustering analyses performed separately for each library resulted in 5201 embryo and 6684 head transcripts. Considering an estimated 19% overlap in the transcriptomes of the two libraries, they represent about 9614 unique transcripts involved in a wide range of biological processes and molecular functions. Of particular interest are the sequences that share homology with Drosophila genes involved in sex determination, olfaction, and reproductive behavior. The medfly transformer2 (tra2) homolog was identified among the embryonic sequences, and its genomic organization and expression were characterized. Conclusion The sequences obtained in this study represent the first major dataset of expressed genes in a tephritid species of agricultural importance. This resource provides essential information to support the investigation of numerous questions regarding the biology of the medfly and other related species and also constitutes an invaluable tool for the annotation of complete genome sequences. Our study has revealed intriguing findings regarding the transcript regulation of tra2 and other sex determination genes, as well as insights into the comparative genomics of genes implicated in chemosensory reception and reproduction. PMID:18500975
Genome-wide analysis of putative peroxiredoxin in unicellular and filamentous cyanobacteria.
Cui, Hongli; Wang, Yipeng; Wang, Yinchu; Qin, Song
2012-11-16
Cyanobacteria are photoautotrophic prokaryotes with wide variations in genome sizes and ecological habitats. Peroxiredoxin (PRX) is an important protein that plays essential roles in protecting own cells against reactive oxygen species (ROS). PRXs have been identified from mammals, fungi and higher plants. However, knowledge on cyanobacterial PRXs still remains obscure. With the availability of 37 sequenced cyanobacterial genomes, we performed a comprehensive comparative analysis of PRXs and explored their diversity, distribution, domain structure and evolution. Overall 244 putative prx genes were identified, which were abundant in filamentous diazotrophic cyanobacteria, Acaryochloris marina MBIC 11017, and unicellular cyanobacteria inhabiting freshwater and hot-springs, while poor in all Prochlorococcus and marine Synechococcus strains. Among these putative genes, 25 open reading frames (ORFs) encoding hypothetical proteins were identified as prx gene family members and the others were already annotated as prx genes. All 244 putative PRXs were classified into five major subfamilies (1-Cys, 2-Cys, BCP, PRX5_like, and PRX-like) according to their domain structures. The catalytic motifs of the cyanobacterial PRXs were similar to those of eukaryotic PRXs and highly conserved in all but the PRX-like subfamily. Classical motif (CXXC) of thioredoxin was detected in protein sequences from the PRX-like subfamily. Phylogenetic tree constructed of catalytic domains coincided well with the domain structures of PRXs and the phylogenies based on 16s rRNA. The distribution of genes encoding PRXs in different unicellular and filamentous cyanobacteria especially those sub-families like PRX-like or 1-Cys PRX correlate with the genome size, eco-physiology, and physiological properties of the organisms. Cyanobacterial and eukaryotic PRXs share similar conserved motifs, indicating that cyanobacteria adopt similar catalytic mechanisms as eukaryotes. All cyanobacterial PRX proteins share highly similar structures, implying that these genes may originate from a common ancestor. In this study, a general framework of the sequence-structure-function connections of the PRXs was revealed, which may facilitate functional investigations of PRXs in various organisms.
Genome-wide analysis of putative peroxiredoxin in unicellular and filamentous cyanobacteria
2012-01-01
Background Cyanobacteria are photoautotrophic prokaryotes with wide variations in genome sizes and ecological habitats. Peroxiredoxin (PRX) is an important protein that plays essential roles in protecting own cells against reactive oxygen species (ROS). PRXs have been identified from mammals, fungi and higher plants. However, knowledge on cyanobacterial PRXs still remains obscure. With the availability of 37 sequenced cyanobacterial genomes, we performed a comprehensive comparative analysis of PRXs and explored their diversity, distribution, domain structure and evolution. Results Overall 244 putative prx genes were identified, which were abundant in filamentous diazotrophic cyanobacteria, Acaryochloris marina MBIC 11017, and unicellular cyanobacteria inhabiting freshwater and hot-springs, while poor in all Prochlorococcus and marine Synechococcus strains. Among these putative genes, 25 open reading frames (ORFs) encoding hypothetical proteins were identified as prx gene family members and the others were already annotated as prx genes. All 244 putative PRXs were classified into five major subfamilies (1-Cys, 2-Cys, BCP, PRX5_like, and PRX-like) according to their domain structures. The catalytic motifs of the cyanobacterial PRXs were similar to those of eukaryotic PRXs and highly conserved in all but the PRX-like subfamily. Classical motif (CXXC) of thioredoxin was detected in protein sequences from the PRX-like subfamily. Phylogenetic tree constructed of catalytic domains coincided well with the domain structures of PRXs and the phylogenies based on 16s rRNA. Conclusions The distribution of genes encoding PRXs in different unicellular and filamentous cyanobacteria especially those sub-families like PRX-like or 1-Cys PRX correlate with the genome size, eco-physiology, and physiological properties of the organisms. Cyanobacterial and eukaryotic PRXs share similar conserved motifs, indicating that cyanobacteria adopt similar catalytic mechanisms as eukaryotes. All cyanobacterial PRX proteins share highly similar structures, implying that these genes may originate from a common ancestor. In this study, a general framework of the sequence-structure-function connections of the PRXs was revealed, which may facilitate functional investigations of PRXs in various organisms. PMID:23157370
NASA Astrophysics Data System (ADS)
Michnovicz, Michael R.
1997-06-01
A real-time executive has been implemented to control a high altitude pointing and tracking experiment. The track and mode controller (TMC) implements a table driven design, in which the track mode logic for a tracking mission is defined within a state transition diagram (STD). THe STD is implemented as a state transition table in the TMC software. Status Events trigger the state transitions in the STD. Each state, as it is entered, causes a number of processes to be activated within the system. As these processes propagate through the system, the status of key processes are monitored by the TMC, allowing further transitions within the STD. This architecture is implemented in real-time, using the vxWorks operating system. VxWorks message queues allow communication of status events from the Event Monitor task to the STD task. Process commands are propagated to the rest of the system processors by means of the SCRAMNet shared memory network. The system mode logic contained in the STD will autonomously sequence in acquisition, tracking and pointing system through an entire engagement sequence, starting with target detection and ending with aimpoint maintenance. Simulation results and lab test results will be presented to verify the mode controller. In addition to implementing the system mode logic with the STD, the TMC can process prerecorded time sequences of commands required during startup operations. It can also process single commands from the system operator. In this paper, the author presents (1) an overview, in which he describes the TMC architecture, the relationship of an end-to-end simulation to the flight software and the laboratory testing environment, (2) implementation details, including information on the vxWorks message queues and the SCRAMNet shared memory network, (3) simulation results and lab test results which verify the mode controller, and (4) plans for the future, specifically as to how this executive will expedite transition to a fully functional system.
Shahein, Yasser Ezzat; El Sayed El-Hakim, Amr; Abouelella, Amira Mohamed Kamal; Hamed, Ragaa Reda; Allam, Shaimaa Abdul-Moez; Farid, Nevin Mahmoud
2008-03-25
A full-length cDNA of a glutathione S-transferase (GST) was cloned from a cDNA library of the local Egyptian cattle tick Boophilus annulatus. The 672 bp cloned fragment was sequenced and showed an open reading frame encoding a protein of 223 amino acids. Comparison of the deduced amino acid sequence with GSTs from other species revealed that the sequence is closely related to the mammalian mu-class GST. The cloned gene was expressed in E. coli under T7 promotor of pET-30b vector, and purified under native conditions. The purified enzyme appeared as a single band on 12% SDS-PAGE and has a molecular weight of 30.8 kDa including the histidine tag of the vector. The purified enzyme was assayed upon the chromogenic substrate 1-chloro-2,4-dinitrobenzene (CDNB) and the recombinant enzyme showed high level of activity even in the presence of the beta-galactosidase region on its 5' end and showed maximum activity at pH 7.5. The Km values for CDNB and GSH were 0.57 and 0.79 mM, respectively. The over expressed rBaGST showed high activity toward CDNB (121 units/mg protein) and less toward DCNB (29.3 units/mg protein). rBaGST exhibited peroxidatic activity on cumene hydroperoxide sharing this property with GSTs belonging to the GST alpha class. I50 values for cibacron blue and bromosulfophthalein were 0.22 and 8.45 microM, respectively, sharing this property with the mammalian GSTmu class. Immunoblotting revealed the presence of the GST molecule in B. annulatus protein extracts; whole tick, larvae, gut, salivary gland and ovary. Homologues to the GSTmu were also detected in other tick species as Hyalomma dromedarii and Rhipicephalus sp. while in Ornithodoros moubata, GSTmu homologue could not be detected.
He, J; Liu, L P; Hou, S; Gong, L; Wu, J B; Hu, W F; Wang, J J
2016-05-01
To understand genomic characteristics of 2 strains of influenza A(H9N2)virus isolated from human infection cases in Anhui province in 2015. Two human infection with H9N2 virus were confirmed by national influenza surveillance laboratory network in Anhui through viral isolation in April and September, 2015, respectively. The full genomic sequences of the two viral isolates were analyzed in this study by using molecular bioinformatics software Mega 6.0. Human infection with H9N2 virus was first reported in Anhui province. The analysis of genomic sequence showed that the HA and NA genes of the two H9N2 isolates belonged to A/Chicken/Shanghai/F/98(H9N2)-like lineage, and shared high identity with H9N2 virus circulating in poultry in 2013. The PB2 and MP genes belonged to the A/quail/Hong Kong/G1/97-like lineage, and shared high homology with H7N9, H10N8 or H6N2 viruses. The amino acid sequence alignment results showed that several mutations for human infection tropism presented in the two virus strains, including Q226L, H183N and E190T in HA; S31N in M2; 63-65 deletion in NA. In addition, the H9N2 influenza virus strains possessed the PSRSSR\\GL motif in HA. Meanwhile several human-like signatures, including PA-100A, PA-356R and PA-409N were also found in the two virus strains. The H9N2 viruses isolated from human infection cases in Anhui province belonged to a reassortant virus originated from different lineage H9N2 avian influenza virus. The virus has possessed several human susceptibility locus.
Cronin, Matthew A; Rincon, Gonzalo; Meredith, Robert W; MacNeil, Michael D; Islas-Trejo, Alma; Cánovas, Angela; Medrano, Juan F
2014-01-01
We assessed the relationships of polar bears (Ursus maritimus), brown bears (U. arctos), and black bears (U. americanus) with high throughput genomic sequencing data with an average coverage of 25× for each species. A total of 1.4 billion 100-bp paired-end reads were assembled using the polar bear and annotated giant panda (Ailuropoda melanoleuca) genome sequences as references. We identified 13.8 million single nucleotide polymorphisms (SNP) in the 3 species aligned to the polar bear genome. These data indicate that polar bears and brown bears share more SNP with each other than either does with black bears. Concatenation and coalescence-based analysis of consensus sequences of approximately 1 million base pairs of ultraconserved elements in the nuclear genome resulted in a phylogeny with black bears as the sister group to brown and polar bears, and all brown bears are in a separate clade from polar bears. Genotypes for 162 SNP loci of 336 bears from Alaska and Montana showed that the species are genetically differentiated and there is geographic population structure of brown and black bears but not polar bears.
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses
Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T
2014-01-01
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. PMID:24462600
Lapunzina, Pablo; López, Rocío Ortiz; Rodríguez-Laguna, Lara; García-Miguel, Purificación; Martínez, Augusto Rojas; Martínez-Glez, Víctor
2014-01-01
The increased speed and decreasing cost of sequencing, along with an understanding of the clinical relevance of emerging information for patient management, has led to an explosion of potential applications in healthcare. Currently, SNP arrays and Next-Generation Sequencing (NGS) technologies are relatively new techniques used to scan genomes for gains and losses, losses of heterozygosity (LOH), SNPs, and indel variants as well as to perform complete sequencing of a panel of candidate genes, the entire exome (whole exome sequencing) or even the whole genome. As a result, these new high-throughput technologies have facilitated progress in the understanding and diagnosis of genetic syndromes and cancers, two disorders traditionally considered to be separate diseases but that can share causal genetic alterations in a group of developmental disorders associated with congenital malformations and cancer risk. The purpose of this work is to review these syndromes as an example of a group of disorders that has been included in a panel of genes for NGS analysis. We also highlight the relationship between development and cancer and underline the connections between these syndromes. PMID:24764758
Detection of porcine circovirus type 2 in pigs imported from Indonesia.
Manokaran, Gayathri; Lin, Yueh-Nuo; Soh, Moi-Lien; Lim, Elizabeth Ai-Sim; Lim, Chee-Wee; Tan, Boon-Huan
2008-11-25
We have detected the presence of porcine circovirus (PCV) type 2 in Indonesian pigs imported to Singapore for food consumption. A total of three viral isolates were identified, and to genetically characterise them further, their full genomes were sequenced. Each genome showed a typical organization of PCV type 2, with the three isolates sharing similar genome lengths of 1767 nucleotide (nt) at high nt identities of 99.8-100%, further indicating that the viral isolates were quite homogeneous. Sequence analysis further revealed that the ORF2 genes contain the nt sequence CCCCGC (from nt position 262 to 267) that was previously reported to be associated with PCV type 2, group 1C. The phylogenetic tree was constructed for the ORF2 genes, and the PCV type 2 isolates distributed into two distinctive groups. The Indonesian PCV type 2 clustered tightly with one China isolate, accession number AY035820, as a sub-cluster in group 1C. The sequence and phylogenetic analyses both confirmed that the three Indonesian PCV type 2 isolates belong to group 1C, and that the genetic changes for the three Indonesian isolates were very stable, possibly due to the low-scale evolution.
TCRmodel: high resolution modeling of T cell receptors from sequence.
Gowthaman, Ragul; Pierce, Brian G
2018-05-22
T cell receptors (TCRs), along with antibodies, are responsible for specific antigen recognition in the adaptive immune response, and millions of unique TCRs are estimated to be present in each individual. Understanding the structural basis of TCR targeting has implications in vaccine design, autoimmunity, as well as T cell therapies for cancer. Given advances in deep sequencing leading to immune repertoire-level TCR sequence data, fast and accurate modeling methods are needed to elucidate shared and unique 3D structural features of these molecules which lead to their antigen targeting and cross-reactivity. We developed a new algorithm in the program Rosetta to model TCRs from sequence, and implemented this functionality in a web server, TCRmodel. This web server provides an easy to use interface, and models are generated quickly that users can investigate in the browser and download. Benchmarking of this method using a set of nonredundant recently released TCR crystal structures shows that models are accurate and compare favorably to models from another available modeling method. This server enables the community to obtain insights into TCRs of interest, and can be combined with methods to model and design TCR recognition of antigens. The TCRmodel server is available at: http://tcrmodel.ibbr.umd.edu/.
Polypeptide p41 of a Norwalk-Like Virus Is a Nucleic Acid-Independent Nucleoside Triphosphatase
Pfister, Thomas; Wimmer, Eckard
2001-01-01
Southampton virus (SHV) is a member of the Norwalk-like viruses (NLVs), one of four genera of the family Caliciviridae. The genome of SHV contains three open reading frames (ORFs). ORF 1 encodes a polyprotein that is autocatalytically processed into six proteins, one of which is p41. p41 shares sequence motifs with protein 2C of picornaviruses and superfamily 3 helicases. We have expressed p41 of SHV in bacteria. Purified p41 exhibited nucleoside triphosphate (NTP)-binding and NTP hydrolysis activities. The NTPase activity was not stimulated by single-stranded nucleic acids. SHV p41 had no detectable helicase activity. Protein sequence comparison between the consensus sequences of NLV p41 and enterovirus protein 2C revealed regions of high similarity. According to secondary structure prediction, the conserved regions were located within a putative central domain of alpha helices and beta strands. This study reveals for the first time an NTPase activity associated with a calicivirus-encoded protein. Based on enzymatic properties and sequence information, a functional relationship between NLV p41 and enterovirus 2C is discussed in regard to the role of 2C-like proteins in virus replication. PMID:11160659
A Case-by-Case Evolutionary Analysis of Four Imprinted Retrogenes
McCole, Ruth B; Loughran, Noeleen B; Chahal, Mandeep; Fernandes, Luis P; Roberts, Roland G; Fraternali, Franca; O'Connell, Mary J; Oakey, Rebecca J
2011-01-01
Retroposition is a widespread phenomenon resulting in the generation of new genes that are initially related to a parent gene via very high coding sequence similarity. We examine the evolutionary fate of four retrogenes generated by such an event; mouse Inpp5f_v2, Mcts2, Nap1l5, and U2af1-rs1. These genes are all subject to the epigenetic phenomenon of parental imprinting. We first provide new data on the age of these retrogene insertions. Using codon-based models of sequence evolution, we show these retrogenes have diverse evolutionary trajectories, including divergence from the parent coding sequence under positive selection pressure, purifying selection pressure maintaining parent-retrogene similarity, and neutral evolution. Examination of the expression pattern of retrogenes shows an atypical, broad pattern across multiple tissues. Protein 3D structure modeling reveals that a positively selected residue in U2af1-rs1, not shared by its parent, may influence protein conformation. Our case-by-case analysis of the evolution of four imprinted retrogenes reveals that this interesting class of imprinted genes, while similar in regulation and sequence characteristics, follow very varied evolutionary paths. PMID:21166792
Tóbiás, István; Palkovics, László
2003-04-01
Zucchini yellow mosaic virus (ZYMV) has emerged as an important pathogen of cucurbits within the last few years in Hungary. The Hungarian isolates show a high biological variability, have specific nucleotide and amino acid sequences in the N-terminal region of coat protein and form a distinct branch in the phylogenetic tree. The virus is spread very efficiently in the field by several aphid species in a non-persistent manner. It can be transmitted by seed in holl-less seeded oil pumpkin (Cucurbita pepo (L) var Styriaca), although at a very low rate. Three isolates from seed transmission assay experiments were chosen and their nucleotide sequences of coat proteins have been compared with the available CP sequences of ZYMV. According to the sequence analysis, the Hungarian isolates belong to the Central European branch in the phylogenetic tree and, together with the ZYMV isolates from Austria and Slovenia, share specific amino acids at positions 16, 17, 27 and 37 which are characteristic only to these isolates. The phylogenetic tree suggests the common origin of distantly distributed isolates which can be attributed to widespread seed transmission.
Adorno, E V; Moura-Neto, J P; Lyra, I; Zanette, A; Santos, L F O; Seixas, M O; Reis, M G; Goncalves, M S
2008-02-01
The fetal hemoglobin (HbF) levels and betaS-globin gene haplotypes of 125 sickle cell anemia patients from Brazil were investigated. We sequenced the Ggamma- and Agamma-globin gene promoters and the DNase I-2 hypersensitive sites in the locus control regions (HS2-LCR) of patients with HbF level disparities as compared to their betaS haplotypes. Sixty-four (51.2%) patients had CAR/Ben genotype; 36 (28.8%) Ben/Ben; 18 (14.4%) CAR/CAR; 2 (1.6%) CAR/Atypical; 2 (1.6%) Ben/Cam; 1 (0.8%) CAR/Cam; 1 (0.8%) CAR/Arab-Indian, and 1 (0.8%) Sen/Atypical. The HS2-LCR sequence analyses demonstrated a c.-10.677G>A change in patients with the Ben haplotype and high HbF levels. The Gg gene promoter sequence analyses showed a c.-157T>C substitution shared by all patients, and a c.-222_-225del related to the Cam haplotype. These results identify new polymorphisms in the HS2-LCR and Gg-globin gene promoter. Further studies are required to determine the correlation between HbF synthesis and the clinical profile of sickle cell anemia patients.
Bystrykh, L V; Vonck, J; van Bruggen, E F; van Beeumen, J; Samyn, B; Govorukhina, N I; Arfman, N; Duine, J A; Dijkhuizen, L
1993-01-01
The quaternary protein structure of two methanol:N,N'-dimethyl-4-nitrosoaniline (NDMA) oxidoreductases purified from Amycolatopsis methanolica and Mycobacterium gastri MB19 was analyzed by electron microscopy and image processing. The enzymes are decameric proteins (displaying fivefold symmetry) with estimated molecular masses of 490 to 500 kDa based on their subunit molecular masses of 49 to 50 kDa. Both methanol:NDMA oxidoreductases possess a tightly but noncovalently bound NADP(H) cofactor at an NADPH-to-subunit molar ratio of 0.7. These cofactors are redox active toward alcohol and aldehyde substrates. Both enzymes contain significant amounts of Zn2+ and Mg2+ ions. The primary amino acid sequences of the A. methanolica and M. gastri MB19 methanol:NDMA oxidoreductases share a high degree of identity, as indicated by N-terminal sequence analysis (63% identity among the first 27 N-terminal amino acids), internal peptide sequence analysis, and overall amino acid composition. The amino acid sequence analysis also revealed significant similarity to a decameric methanol dehydrogenase of Bacillus methanolicus C1. Images PMID:8449887
Kosushkin, S A; Borodulina, O R; Solov'eva, E N; Grechko, V V
2008-01-01
We have isolated and characterised sequences of a SINE family specific for squamate reptiles from a genome of lacertid lizard that we called Squam1. Copies are 360-390 bp in length and share a significant similarity with tRNA gene sequence on its 5'-end. This family was also detected by us in DNA of representatives of varanids, iguanids (anolis), gekkonids, and snakes. No signs of it were found in DNA of mammals, birds, amphibians, and crocodiles. Detailed analysis of primary structure of the retroposons obtained by us from genomic libraries or GenBank sequences was carried out. Most taxa possess 2-3 subfamilies of the SINE in their genomes with specific diagnostic features in their primary structure. Individual variability of copies in different families is about 85% and is just slightly lower on the genera level. Comparison of consensus sequences on family level reveals a high degree of structural similarity with a number of specific apomorphic features which makes it a useful marker of phylogeny for this group of reptiles. Snakes do not show specific affinity to varanids when compared to other lizards, as it was suggested earlier.
Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses.
Liu, Bo; Madduri, Ravi K; Sotomayor, Borja; Chard, Kyle; Lacinski, Lukasz; Dave, Utpal J; Li, Jianqiang; Liu, Chunchen; Foster, Ian T
2014-06-01
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach. Copyright © 2014 Elsevier Inc. All rights reserved.
A network approach to analyzing highly recombinant malaria parasite genes.
Larremore, Daniel B; Clauset, Aaron; Buckee, Caroline O
2013-01-01
The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences.
A Network Approach to Analyzing Highly Recombinant Malaria Parasite Genes
Larremore, Daniel B.; Clauset, Aaron; Buckee, Caroline O.
2013-01-01
The var genes of the human malaria parasite Plasmodium falciparum present a challenge to population geneticists due to their extreme diversity, which is generated by high rates of recombination. These genes encode a primary antigen protein called PfEMP1, which is expressed on the surface of infected red blood cells and elicits protective immune responses. Var gene sequences are characterized by pronounced mosaicism, precluding the use of traditional phylogenetic tools that require bifurcating tree-like evolutionary relationships. We present a new method that identifies highly variable regions (HVRs), and then maps each HVR to a complex network in which each sequence is a node and two nodes are linked if they share an exact match of significant length. Here, networks of var genes that recombine freely are expected to have a uniformly random structure, but constraints on recombination will produce network communities that we identify using a stochastic block model. We validate this method on synthetic data, showing that it correctly recovers populations of constrained recombination, before applying it to the Duffy Binding Like-α (DBLα) domain of var genes. We find nine HVRs whose network communities map in distinctive ways to known DBLα classifications and clinical phenotypes. We show that the recombinational constraints of some HVRs are correlated, while others are independent. These findings suggest that this micromodular structuring facilitates independent evolutionary trajectories of neighboring mosaic regions, allowing the parasite to retain protein function while generating enormous sequence diversity. Our approach therefore offers a rigorous method for analyzing evolutionary constraints in var genes, and is also flexible enough to be easily applied more generally to any highly recombinant sequences. PMID:24130474
Bidirectional Retroviral Integration Site PCR Methodology and Quantitative Data Analysis Workflow.
Suryawanshi, Gajendra W; Xu, Song; Xie, Yiming; Chou, Tom; Kim, Namshin; Chen, Irvin S Y; Kim, Sanggu
2017-06-14
Integration Site (IS) assays are a critical component of the study of retroviral integration sites and their biological significance. In recent retroviral gene therapy studies, IS assays, in combination with next-generation sequencing, have been used as a cell-tracking tool to characterize clonal stem cell populations sharing the same IS. For the accurate comparison of repopulating stem cell clones within and across different samples, the detection sensitivity, data reproducibility, and high-throughput capacity of the assay are among the most important assay qualities. This work provides a detailed protocol and data analysis workflow for bidirectional IS analysis. The bidirectional assay can simultaneously sequence both upstream and downstream vector-host junctions. Compared to conventional unidirectional IS sequencing approaches, the bidirectional approach significantly improves IS detection rates and the characterization of integration events at both ends of the target DNA. The data analysis pipeline described here accurately identifies and enumerates identical IS sequences through multiple steps of comparison that map IS sequences onto the reference genome and determine sequencing errors. Using an optimized assay procedure, we have recently published the detailed repopulation patterns of thousands of Hematopoietic Stem Cell (HSC) clones following transplant in rhesus macaques, demonstrating for the first time the precise time point of HSC repopulation and the functional heterogeneity of HSCs in the primate system. The following protocol describes the step-by-step experimental procedure and data analysis workflow that accurately identifies and quantifies identical IS sequences.
Cavanagh, Jorunn Pauline; Hjerde, Erik; Holden, Matthew T G; Kahlke, Tim; Klingenberg, Claus; Flægstad, Trond; Parkhill, Julian; Bentley, Stephen D; Sollid, Johanna U Ericson
2014-11-01
Staphylococcus haemolyticus is an emerging cause of nosocomial infections, primarily affecting immunocompromised patients. A comparative genomic analysis was performed on clinical S. haemolyticus isolates to investigate their genetic relationship and explore the coding sequences with respect to antimicrobial resistance determinants and putative hospital adaptation. Whole-genome sequencing was performed on 134 isolates of S. haemolyticus from geographically diverse origins (Belgium, 2; Germany, 10; Japan, 13; Norway, 54; Spain, 2; Switzerland, 43; UK, 9; USA, 1). Each genome was individually assembled. Protein coding sequences (CDSs) were predicted and homologous genes were categorized into three types: Type I, core genes, homologues present in all strains; Type II, unique core genes, homologues shared by only a subgroup of strains; and Type III, unique genes, strain-specific CDSs. The phylogenetic relationship between the isolates was built from variable sites in the form of single nucleotide polymorphisms (SNPs) in the core genome and used to construct a maximum likelihood phylogeny. SNPs in the genome core regions divided the isolates into one major group of 126 isolates and one minor group of isolates with highly diverse genomes. The major group was further subdivided into seven clades (A-G), of which four (A-D) encompassed isolates only from Europe. Antimicrobial multiresistance was observed in 77.7% of the collection. High levels of homologous recombination were detected in genes involved in adherence, staphylococcal host adaptation and bacterial cell communication. The presence of several successful and highly resistant clones underlines the adaptive potential of this opportunistic pathogen. © The Author 2014. Published by Oxford University Press on behalf of the British Society for Antimicrobial Chemotherapy.
Cavanagh, Jorunn Pauline; Hjerde, Erik; Holden, Matthew T. G.; Kahlke, Tim; Klingenberg, Claus; Flægstad, Trond; Parkhill, Julian; Bentley, Stephen D.; Sollid, Johanna U. Ericson
2014-01-01
Objectives Staphylococcus haemolyticus is an emerging cause of nosocomial infections, primarily affecting immunocompromised patients. A comparative genomic analysis was performed on clinical S. haemolyticus isolates to investigate their genetic relationship and explore the coding sequences with respect to antimicrobial resistance determinants and putative hospital adaptation. Methods Whole-genome sequencing was performed on 134 isolates of S. haemolyticus from geographically diverse origins (Belgium, 2; Germany, 10; Japan, 13; Norway, 54; Spain, 2; Switzerland, 43; UK, 9; USA, 1). Each genome was individually assembled. Protein coding sequences (CDSs) were predicted and homologous genes were categorized into three types: Type I, core genes, homologues present in all strains; Type II, unique core genes, homologues shared by only a subgroup of strains; and Type III, unique genes, strain-specific CDSs. The phylogenetic relationship between the isolates was built from variable sites in the form of single nucleotide polymorphisms (SNPs) in the core genome and used to construct a maximum likelihood phylogeny. Results SNPs in the genome core regions divided the isolates into one major group of 126 isolates and one minor group of isolates with highly diverse genomes. The major group was further subdivided into seven clades (A–G), of which four (A–D) encompassed isolates only from Europe. Antimicrobial multiresistance was observed in 77.7% of the collection. High levels of homologous recombination were detected in genes involved in adherence, staphylococcal host adaptation and bacterial cell communication. Conclusions The presence of several successful and highly resistant clones underlines the adaptive potential of this opportunistic pathogen. PMID:25038069
Gucciardo, Sébastian; Wisniewski, Jean-Pierre; Brewin, Nicholas J; Bornemann, Stephen
2007-01-01
The cDNAs encoding three germin-like proteins (PsGER1, PsGER2a, and PsGER2b) were isolated from Pisum sativum. The coding sequence of PsGER1 transiently expressed in tobacco leaves gave a protein with superoxide dismutase activity but no detectable oxalate oxidase activity according to in-gel activity stains. The transient expression of wheat germin gf-2.8 oxalate oxidase showed oxalate oxidase but no superoxide dismutase activity under the same conditions. The superoxide dismutase activity of PsGER1 was resistant to high temperature, denaturation by detergent, and high concentrations of hydrogen peroxide. In salt-stressed pea roots, a heat-resistant superoxide dismutase activity was observed with an electrophoretic mobility similar to that of the PsGER1 protein, but this activity was below the detection limit in non-stressed or H(2)O(2)-stressed pea roots. Oxalate oxidase activity was not detected in either pea roots or nodules. Following in situ hybridization in developing pea nodules, PsGER1 transcript was detected in expanding cells just proximal to the meristematic zone and also in the epidermis, but to a lesser extent. PsGER1 is the first known germin-like protein with superoxide dismutase activity to be associated with nodules. It shared protein sequence identity with the N-terminal sequence of a putative plant receptor for rhicadhesin, a bacterial attachment protein. However, its primary location in nodules suggests functional roles other than as a rhicadhesin receptor required for the first stage of bacterial attachment to root hairs.
De novo assembly and phasing of a Korean human genome.
Seo, Jeong-Sun; Rhie, Arang; Kim, Junsoo; Lee, Sangjin; Sohn, Min-Hwan; Kim, Chang-Uk; Hastie, Alex; Cao, Han; Yun, Ji-Young; Kim, Jihye; Kuk, Junho; Park, Gun Hwa; Kim, Juhyeok; Ryu, Hanna; Kim, Jongbum; Roh, Mira; Baek, Jeonghun; Hunkapiller, Michael W; Korlach, Jonas; Shin, Jong-Yeon; Kim, Changhoon
2016-10-13
Advances in genome assembly and phasing provide an opportunity to investigate the diploid architecture of the human genome and reveal the full range of structural variation across population groups. Here we report the de novo assembly and haplotype phasing of the Korean individual AK1 (ref. 1) using single-molecule real-time sequencing, next-generation mapping, microfluidics-based linked reads, and bacterial artificial chromosome (BAC) sequencing approaches. Single-molecule sequencing coupled with next-generation mapping generated a highly contiguous assembly, with a contig N50 size of 17.9 Mb and a scaffold N50 size of 44.8 Mb, resolving 8 chromosomal arms into single scaffolds. The de novo assembly, along with local assemblies and spanning long reads, closes 105 and extends into 72 out of 190 euchromatic gaps in the reference genome, adding 1.03 Mb of previously intractable sequence. High concordance between the assembly and paired-end sequences from 62,758 BAC clones provides strong support for the robustness of the assembly. We identify 18,210 structural variants by direct comparison of the assembly with the human reference, identifying thousands of breakpoints that, to our knowledge, have not been reported before. Many of the insertions are reflected in the transcriptome and are shared across the Asian population. We performed haplotype phasing of the assembly with short reads, long reads and linked reads from whole-genome sequencing and with short reads from 31,719 BAC clones, thereby achieving phased blocks with an N50 size of 11.6 Mb. Haplotigs assembled from single-molecule real-time reads assigned to haplotypes on phased blocks covered 89% of genes. The haplotigs accurately characterized the hypervariable major histocompatability complex region as well as demonstrating allele configuration in clinically relevant genes such as CYP2D6. This work presents the most contiguous diploid human genome assembly so far, with extensive investigation of unreported and Asian-specific structural variants, and high-quality haplotyping of clinically relevant alleles for precision medicine.
Berruezo, Florencia; de Souza, Flávio S. J.; Picca, Pablo I.; Nemirovsky, Sergio I.; Martínez Tosar, Leandro; Rivero, Mercedes; Mentaberry, Alejandro N.
2017-01-01
MicroRNAs (miRNAs) are short, single stranded RNA molecules that regulate the stability and translation of messenger RNAs in diverse eukaryotic groups. Several miRNA genes are of ancient origin and have been maintained in the genomes of animal and plant taxa for hundreds of millions of years, playing key roles in development and physiology. In the last decade, genome and small RNA (sRNA) sequencing of several plant species have helped unveil the evolutionary history of land plants. Among these, the fern group (monilophytes) occupies a key phylogenetic position, as it represents the closest extant cousin taxon of seed plants, i.e. gymno- and angiosperms. However, in spite of their evolutionary, economic and ecological importance, no fern genome has been sequenced yet and few genomic resources are available for this group. Here, we sequenced the small RNA fraction of an epiphytic South American fern, Pleopeltis minima (Polypodiaceae), and compared it to plant miRNA databases, allowing for the identification of miRNA families that are shared by all land plants, shared by all vascular plants (tracheophytes) or shared by euphyllophytes (ferns and seed plants) only. Using the recently described transcriptome of another fern, Lygodium japonicum, we also estimated the degree of conservation of fern miRNA targets in relation to other plant groups. Our results pinpoint the origin of several miRNA families in the land plant evolutionary tree with more precision and are a resource for future genomic and functional studies of fern miRNAs. PMID:28494025
Lum, J. Koji; McIntyre, James K.; Greger, Douglas L.; Huffman, Kirk W.; Vilar, Miguel G.
2006-01-01
Recent analyses of global pig populations revealed strict correlations between mtDNA phylogenies and geographic locations. An exception was the monophyletic “Pacific clade” (PC) of pigs not previously linked to any specific location. We examined mtDNA sequences of two varieties of Vanuatu sacred pigs, the male pseudohermaphroditic Narave from the island of Malo (n = 9) and the hairless Kapia from the island of Tanna (n = 9), as well as control pigs (n = 21) from the islands of Malo, Tanna, and Epi and compared them with GenBank sequences to determine (i) the distribution of PC and introduced domestic lineages within Vanuatu, (ii) relationship between the Narave and Kapia, and (iii) origin of the PC. All of the Narave share two PC mtDNA sequences, one of which matches the sequence of a Narave collected in 1927, consistent with an unbroken maternal descent of these intersex pigs from the original pigs brought to Vanuatu 3,200 years ago. One-third of the Kapia share a single PC lineage also found in the Narave. The remaining Kapia lineages are associated with recently introduced, globally distributed domestic breeds. The predominant Narave lineage is also shared with two wild boars from Vietnam. These data suggest that PC pigs were recently domesticated within Southeast Asia and dispersed during the human colonization of Remote Oceania associated with the Lapita cultural complex. More extensive sampling of Southeast Asian wild boar diversity may refine the location of Pacific pig domestication and potentially the proximate homeland of the Lapita cultural complex. PMID:17088556
Mendes, Lucas William; Taketani, Rodrigo Gouvêa; Navarrete, Acácio Aparecido; Tsai, Siu Mui
2012-06-01
This study focused on the structure and composition of archaeal communities in sediments of tropical mangroves in order to obtain sufficient insight into two Brazilian sites from different locations (one pristine and another located in an urban area) and at different depth levels from the surface. Terminal restriction fragment length polymorphism (T-RFLP) of PCR-amplified 16S rRNA gene fragments was used to scan the archaeal community structure, and 16S rRNA gene clone libraries were used to determine the community composition. Redundancy analysis of T-RFLP patterns revealed differences in archaeal community structure according to location, depth and soil attributes. Parameters such as pH, organic matter, potassium and magnesium presented significant correlation with general community structure. Furthermore, phylogenetic analysis revealed a community composition distributed differently according to depth where, in shallow samples, 74.3% of sequences were affiliated with Euryarchaeota and 25.7% were shared between Crenarchaeota and Thaumarchaeota, while for the deeper samples, 24.3% of the sequences were affiliated with Euryarchaeota and 75.7% with Crenarchaeota and Thaumarchaeota. Archaeal diversity measurements based on 16S rRNA gene clone libraries decreased with increasing depth and there was a greater difference between depths (<18% of sequences shared) than sites (>25% of sequences shared). Taken together, our findings indicate that mangrove ecosystems support a diverse archaeal community; it might possibly be involved in nutrient cycles and are affected by sediment properties, depth and distinct locations. Copyright © 2012 Institut Pasteur. Published by Elsevier Masson SAS. All rights reserved.
Berruezo, Florencia; de Souza, Flávio S J; Picca, Pablo I; Nemirovsky, Sergio I; Martínez Tosar, Leandro; Rivero, Mercedes; Mentaberry, Alejandro N; Zelada, Alicia M
2017-01-01
MicroRNAs (miRNAs) are short, single stranded RNA molecules that regulate the stability and translation of messenger RNAs in diverse eukaryotic groups. Several miRNA genes are of ancient origin and have been maintained in the genomes of animal and plant taxa for hundreds of millions of years, playing key roles in development and physiology. In the last decade, genome and small RNA (sRNA) sequencing of several plant species have helped unveil the evolutionary history of land plants. Among these, the fern group (monilophytes) occupies a key phylogenetic position, as it represents the closest extant cousin taxon of seed plants, i.e. gymno- and angiosperms. However, in spite of their evolutionary, economic and ecological importance, no fern genome has been sequenced yet and few genomic resources are available for this group. Here, we sequenced the small RNA fraction of an epiphytic South American fern, Pleopeltis minima (Polypodiaceae), and compared it to plant miRNA databases, allowing for the identification of miRNA families that are shared by all land plants, shared by all vascular plants (tracheophytes) or shared by euphyllophytes (ferns and seed plants) only. Using the recently described transcriptome of another fern, Lygodium japonicum, we also estimated the degree of conservation of fern miRNA targets in relation to other plant groups. Our results pinpoint the origin of several miRNA families in the land plant evolutionary tree with more precision and are a resource for future genomic and functional studies of fern miRNAs.
Mercenaro, Luca; Nieddu, Giovanni; Porceddu, Andrea; Pezzotti, Mario; Camiolo, Salvatore
2017-01-01
The genetic diversity among grapevine (Vitis vinifera L.) cultivars that underlies differences in agronomic performance and wine quality reflects the accumulation of single nucleotide polymorphisms (SNPs) and small indels as well as larger genomic variations. A combination of high throughput sequencing and mapping against the grapevine reference genome allows the creation of comprehensive sequence variation maps. We used next generation sequencing and bioinformatics to generate an inventory of SNPs and small indels in four widely cultivated Sardinian grape cultivars (Bovale sardo, Cannonau, Carignano and Vermentino). More than 3,200,000 SNPs were identified with high statistical confidence. Some of the SNPs caused the appearance of premature stop codons and thus identified putative pseudogenes. The analysis of SNP distribution along chromosomes led to the identification of large genomic regions with uninterrupted series of homozygous SNPs. We used a digital comparative genomic hybridization approach to identify 6526 genomic regions with significant differences in copy number among the four cultivars compared to the reference sequence, including 81 regions shared between all four cultivars and 4953 specific to single cultivars (representing 1.2 and 75.9% of total copy number variation, respectively). Reads mapping at a distance that was not compatible with the insert size were used to identify a dataset of putative large deletions with cultivar Cannonau revealing the highest number. The analysis of genes mapping to these regions provided a list of candidates that may explain some of the phenotypic differences among the Bovale sardo, Cannonau, Carignano and Vermentino cultivars. PMID:28775732
Lindahl, Susanne; Söderlund, Robert; Frosth, Sara; Pringle, John; Båverud, Viveca; Aspán, Anna
2011-11-21
Strangles is a serious respiratory disease in horses caused by Streptococcus equi subspecies equi (S. equi). Transmission of the disease occurs by direct contact with an infected horse or contaminated equipment. Genetically, S. equi strains are highly homogenous and differentiation of strains has proven difficult. However, the S. equi M-protein SeM contains a variable N-terminal region and has been proposed as a target gene to distinguish between different strains of S. equi and determine the source of an outbreak. In this study, strains of S. equi (n=60) from 32 strangles outbreaks in Sweden during 1998-2003 and 2008-2009 were genetically characterized by sequencing the SeM protein gene (seM), and by pulsed-field gel electrophoresis (PFGE). Swedish strains belonged to 10 different seM types, of which five have not previously been described. Most were identical or highly similar to allele types from strangles outbreaks in the UK. Outbreaks in 2008/2009 sharing the same seM type were associated by geographic location and/or type of usage of the horses (racing stables). Sequencing of the seM gene generally agreed with pulsed-field gel electrophoresis profiles. Our data suggest that seM sequencing as a epidemiological tool is supported by the agreement between seM and PFGE and that sequencing of the SeM protein gene is more sensitive than PFGE in discriminating strains of S. equi. Copyright © 2011 Elsevier B.V. All rights reserved.
Kristin Vanderbilt; John H. Porter; Sheng-Shan Lu; Nic Bertrand; David Blankman; Xuebing Guo; Honglin He; Don Henshaw; Karpjoo Jeong; Eun-Shik Kim; Chau-Chin Lin; Margaret O' Brien; Takeshi Osawa; Éamonn Ó Tuama; Wen Su; Haibo Yang
2017-01-01
Shared ecological data have the potential to revolutionize ecological research just as shared genetic sequence data have done for biological research. However, for ecological data to be useful, it must first be discoverable. A broad-scale research topic may require that a researcher be able to locate suitable data from a variety of global, regional and national data...
2015-06-03
demonstrating its immunogenicity in humans. PdSP15 sequence and structure show no homol- ogy to mammalian proteins, further demonstrating its potential...sequence or structure homology to known human proteins The protective salivary antigen PdSP15 shares sequence homology only to the small odorant binding...salivary proteins PpSP15 and PsSP15, respectively (Fig. 4B). To exclude any structural similarities to human pro teins, the crystal structure of PdPS15
Osman, Wan Adnawani Meor; van Berkum, Peter; León-Barrios, Milagros; Velázquez, Encarna; Elia, Patrick; Tian, Rui; Ardley, Julie; Gollagher, Margaret; Seshadri, Rekha; Reddy, T B K; Ivanova, Natalia; Woyke, Tanja; Pati, Amrita; Markowitz, Victor; Baeshen, Mohamed N; Baeshen, Naseebh Nabeeh; Kyrpides, Nikos; Reeve, Wayne
2017-01-01
10.1601/nm.1335 Mlalz-1 (INSDC = ATZD00000000) is an aerobic, motile, Gram-negative, non-spore-forming rod that was isolated from an effective nitrogen-fixing nodule of Medicago laciniata (L.) Miller from a soil sample collected near the town of Guatiza on the island of Lanzarote, the Canary Islands, Spain. This strain nodulates and forms an effective symbiosis with the highly specific host M. laciniata . This rhizobial genome was sequenced as part of the DOE Joint Genome Institute 2010 Genomic Encyclopedia for Bacteria and Archaea-Root Nodule Bacteria (GEBA-RNB) sequencing project. Here the features of 10.1601/nm.1335 Mlalz-1 are described, together with high-quality permanent draft genome sequence information and annotation. The 6,664,116 bp high-quality draft genome is arranged in 99 scaffolds of 100 contigs, containing 6314 protein-coding genes and 74 RNA-only encoding genes. Strain Mlalz-1 is closely related to 10.1601/nm.1335 10.1601/strainfinder?urlappend=%3Fid%3DIAM+12611 T , 10.1601/nm.1334 A 321 T and 10.1601/nm.17831 10.1601/strainfinder?urlappend=%3Fid%3DORS+1407 T , based on 16S rRNA gene sequences. gANI values of ≥98.1% support the classification of strain Mlalz-1 as 10.1601/nm.1335. Nodulation of M. laciniata requires a specific nodC allele, and the nodC gene of strain Mlalz-1 shares ≥98% sequence identity with nodC of M. laciniata -nodulating 10.1601/nm.1328 strains, but ≤93% with nodC of 10.1601/nm.1328 strains that nodulate other Medicago species. Strain Mlalz-1 is unique among sequenced 10.1601/nm.1335 strains in possessing genes encoding components of a T2SS and in having two versions of the adaptive acid tolerance response lpiA-acvB operon. In 10.1601/nm.1334 strain 10.1601/strainfinder?urlappend=%3Fid%3DWSM+419, lpiA is essential for enhancing survival in lethal acid conditions. The second copy of the lpiA-acvB operon of strain Mlalz-1 has highest sequence identity (> 96%) with that of 10.1601/nm.1334 strains, which suggests genetic recombination between strain Mlalz-1 and 10.1601/nm.1334 and the horizontal gene transfer of lpiA-acvB .
Rat prostatic steroid binding protein: DNA sequence and transcript maps of the two C3 genes.
Hurst, H C; Parker, M G
1983-01-01
In the rat there are two non-allelic genes C3(1) and C3(2) for the C3 polypeptide of prostatic steroid binding protein. We have cloned and sequenced both genes and show that only C3(1) is responsible for the production of authentic C3. Although there is a marked difference in their transcriptional activity, the two genes share extensive DNA sequence homology there being only one base difference from nucleotide - 235 to within the first intron. Transcript mapping has shown that there are two distinct C3 transcripts which share a unique 3' terminus but have 5' termini 38 bases apart each preceded by a 'TATA' box homology. Interestingly, an identical repetitive element is present just upstream of both genes. Both families of transcripts, which are produced in a ratio of 18:1, are coordinately regulated by testosterone. Images Fig. 3. Fig. 4. Fig. 5. PMID:6685625
Pearson, Bruce M; Louwen, Rogier; van Baarlen, Peter; van Vliet, Arnoud H M
2015-09-02
CRISPR (clustered regularly interspaced palindromic repeats)-Cas (CRISPR-associated) systems are sequence-specific adaptive defenses against phages and plasmids which are widespread in prokaryotes. Here we have studied whether phylogenetic relatedness or sharing of environmental niches affects the distribution and dissemination of Type II CRISPR-Cas systems, first in 132 bacterial genomes from 15 phylogenetic classes, ranging from Proteobacteria to Actinobacteria. There was clustering of distinct Type II CRISPR-Cas systems in phylogenetically distinct genera with varying G+C%, which share environmental niches. The distribution of CRISPR-Cas within a genus was studied using a large collection of genome sequences of the closely related Campylobacter species Campylobacter jejuni (N = 3,746) and Campylobacter coli (N = 486). The Cas gene cas9 and CRISPR-repeat are almost universally present in C. jejuni genomes (98.0% positive) but relatively rare in C. coli genomes (9.6% positive). Campylobacter jejuni and agricultural C. coli isolates share the C. jejuni CRISPR-Cas system, which is closely related to, but distinct from the C. coli CRISPR-Cas system found in C. coli isolates from nonagricultural sources. Analysis of the genomic position of CRISPR-Cas insertion suggests that the C. jejuni-type CRISPR-Cas has been transferred to agricultural C. coli. Conversely, the absence of the C. coli-type CRISPR-Cas in agricultural C. coli isolates may be due to these isolates not sharing the same environmental niche, and may be affected by farm hygiene and biosecurity practices in the agricultural sector. Finally, many CRISPR spacer alleles were linked with specific multilocus sequence types, suggesting that these can assist molecular epidemiology applications for C. jejuni and C. coli. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Kaphingst, Kimberly A; Ivanovich, Jennifer; Lyons, Sarah; Biesecker, Barbara; Dresser, Rebecca; Elrick, Ashley; Matsen, Cindy; Goodman, Melody
2018-01-29
The growing importance of genome sequencing means that patients will increasingly face decisions regarding what results they would like to learn. The present study examined psychological and clinical factors that might affect these preferences. 1,080 women diagnosed with breast cancer at age 40 or younger completed an online survey. We assessed their interest in learning various types of genome sequencing results: risk of preventable disease or unpreventable disease, cancer treatment response, uncertain meaning, risk to relatives' health, and ancestry/physical traits. Multivariable logistic regression was used to examine whether being "very" interested in each result type was associated with clinical factors: BRCA1/2 mutation status, prior genetic testing, family history of breast cancer, and psychological factors: cancer recurrence worry, genetic risk worry, future orientation, health information orientation, and genome sequencing knowledge. The proportion of respondents who were very interested in learning each type of result ranged from 16% to 77%. In all multivariable models, those who were very interested in learning a result type had significantly higher knowledge about sequencing benefits, greater genetic risks worry, and stronger health information orientation compared to those with less interest (p-values < .05). Our findings indicate that high interest in return of various types of genome sequencing results was more closely related to psychological factors. Shared decision-making approaches that increase knowledge about genome sequencing and incorporate patient preferences for health information and learning about genetic risks may help support patients' informed choices about learning different types of sequencing results. © Society of Behavioral Medicine 2018.
Genomic and genetic analyses of diversity and plant interactions of Pseudomonas fluorescens
Silby, Mark W; Cerdeño-Tárraga, Ana M; Vernikos, Georgios S; Giddens, Stephen R; Jackson, Robert W; Preston, Gail M; Zhang, Xue-Xian; Moon, Christina D; Gehrig, Stefanie M; Godfrey, Scott AC; Knight, Christopher G; Malone, Jacob G; Robinson, Zena; Spiers, Andrew J; Harris, Simon; Challis, Gregory L; Yaxley, Alice M; Harris, David; Seeger, Kathy; Murphy, Lee; Rutter, Simon; Squares, Rob; Quail, Michael A; Saunders, Elizabeth; Mavromatis, Konstantinos; Brettin, Thomas S; Bentley, Stephen D; Hothersall, Joanne; Stephens, Elton; Thomas, Christopher M; Parkhill, Julian; Levy, Stuart B; Rainey, Paul B; Thomson, Nicholas R
2009-01-01
Background Pseudomonas fluorescens are common soil bacteria that can improve plant health through nutrient cycling, pathogen antagonism and induction of plant defenses. The genome sequences of strains SBW25 and Pf0-1 were determined and compared to each other and with P. fluorescens Pf-5. A functional genomic in vivo expression technology (IVET) screen provided insight into genes used by P. fluorescens in its natural environment and an improved understanding of the ecological significance of diversity within this species. Results Comparisons of three P. fluorescens genomes (SBW25, Pf0-1, Pf-5) revealed considerable divergence: 61% of genes are shared, the majority located near the replication origin. Phylogenetic and average amino acid identity analyses showed a low overall relationship. A functional screen of SBW25 defined 125 plant-induced genes including a range of functions specific to the plant environment. Orthologues of 83 of these exist in Pf0-1 and Pf-5, with 73 shared by both strains. The P. fluorescens genomes carry numerous complex repetitive DNA sequences, some resembling Miniature Inverted-repeat Transposable Elements (MITEs). In SBW25, repeat density and distribution revealed 'repeat deserts' lacking repeats, covering approximately 40% of the genome. Conclusions P. fluorescens genomes are highly diverse. Strain-specific regions around the replication terminus suggest genome compartmentalization. The genomic heterogeneity among the three strains is reminiscent of a species complex rather than a single species. That 42% of plant-inducible genes were not shared by all strains reinforces this conclusion and shows that ecological success requires specialized and core functions. The diversity also indicates the significant size of genetic information within the Pseudomonas pan genome. PMID:19432983
Opdahl, Lee James; Gonda, Michael G.
2018-01-01
The ability of ruminants to utilize cellulosic biomass is a result of the metabolic activities of symbiotic microbial communities that reside in the rumen. To gain further insight into this complex microbial ecosystem, a selection-based batch culturing approach was used to identify candidate cellulose-utilizing bacterial consortia. Prior to culturing with cellulose, rumen contents sampled from three beef cows maintained on a forage diet shared 252 Operational Taxonomic Units (OTUs), accounting for 41.6–50.0% of bacterial 16S rRNA gene sequences in their respective samples. Despite this high level of overlap, only one OTU was enriched in cellulose-supplemented cultures from all rumen samples. Otherwise, each set of replicate cellulose supplemented cultures originating from a sampled rumen environment was found to have a distinct bacterial composition. Two of the seven most enriched OTUs were closely matched to well-established rumen cellulose utilizers (Ruminococcus flavefaciens and Fibrobacter succinogenes), while the others did not show high nucleotide sequence identity to currently defined bacterial species. The latter were affiliated to Prevotella (1 OTU), Ruminococcaceae (3 OTUs), and the candidate phylum Saccharibacteria (1 OTU), respectively. While further investigations will be necessary to elucidate the metabolic function(s) of each enriched OTU, these results together further support cellulose utilization as a ruminal metabolic trait shared across vast phylogenetic distances, and that the rumen is an environment conducive to the selection of a broad range of microbial adaptations for the digestion of plant structural polysaccharides. PMID:29495256
Opdahl, Lee James; Gonda, Michael G; St-Pierre, Benoit
2018-02-24
The ability of ruminants to utilize cellulosic biomass is a result of the metabolic activities of symbiotic microbial communities that reside in the rumen. To gain further insight into this complex microbial ecosystem, a selection-based batch culturing approach was used to identify candidate cellulose-utilizing bacterial consortia. Prior to culturing with cellulose, rumen contents sampled from three beef cows maintained on a forage diet shared 252 Operational Taxonomic Units (OTUs), accounting for 41.6-50.0% of bacterial 16S rRNA gene sequences in their respective samples. Despite this high level of overlap, only one OTU was enriched in cellulose-supplemented cultures from all rumen samples. Otherwise, each set of replicate cellulose supplemented cultures originating from a sampled rumen environment was found to have a distinct bacterial composition. Two of the seven most enriched OTUs were closely matched to well-established rumen cellulose utilizers ( Ruminococcus flavefaciens and Fibrobacter succinogenes ), while the others did not show high nucleotide sequence identity to currently defined bacterial species. The latter were affiliated to Prevotella (1 OTU), Ruminococcaceae (3 OTUs), and the candidate phylum Saccharibacteria (1 OTU), respectively. While further investigations will be necessary to elucidate the metabolic function(s) of each enriched OTU, these results together further support cellulose utilization as a ruminal metabolic trait shared across vast phylogenetic distances, and that the rumen is an environment conducive to the selection of a broad range of microbial adaptations for the digestion of plant structural polysaccharides.
USDA-ARS?s Scientific Manuscript database
Background: Next-generation sequencing (NGS) of bacterial isolates has emerged as valuable tool for tracking of an outbreak source. Between 2009 and 2011, clinical isolates of Salmonella Typhimurium sharing the JPXX01.0014 (XbaI) PFGE type were isolated across the U.S. The initial isolates were asso...
Kitajima, Masaaki; Iker, Brandon C; Magill-Collins, Anne; Gaither, Marlene; Stoehr, James D; Gerba, Charles P
2017-06-01
Toilet solid waste samples collected from five outbreaks among rafters in the Grand Canyon were subjected to sequencing analysis of norovirus partial capsid gene. The results revealed that a GI.3 strain was associated with one outbreak, whereas the other outbreaks were caused by GII.5 whose sequences shared >98.9% homology.
Tuan, Pham Anh; Kim, Jae Kwang; Lee, Sanghyun; Chae, Soo Cheon; Park, Sang Un
2012-12-05
Riboflavin (vitamin B2) is the universal precursor of the coenzymes flavin mononucleotide and flavin adenine dinucleotide--cofactors that are essential for the activity of a wide variety of metabolic enzymes in animals, plants, and microbes. Using the RACE PCR approach, cDNAs encoding lumazine synthase (McLS) and riboflavin synthase (McRS), which catalyze the last two steps in the riboflavin biosynthetic pathway, were cloned from bitter melon (Momordica charantia), a popular vegetable crop in Asia. Amino acid sequence alignments indicated that McLS and McRS share high sequence identity with other orthologous genes and carry an N-terminal extension, which is reported to be a plastid-targeting sequence. Organ expression analysis using quantitative real-time RT PCR showed that McLS and McRS were constitutively expressed in M. charantia, with the strongest expression levels observed during the last stage of fruit ripening (stage 6). This correlated with the highest level of riboflavin content, which was detected during ripening stage 6 by HPLC analysis. McLS and McRS were highly expressed in the young leaves and flowers, whereas roots exhibited the highest accumulation of riboflavin. The cloning and characterization of McLS and McRS from M. charantia may aid the metabolic engineering of vitamin B2 in crops.
Hsieh, S L; Liu, R W; Wu, C H; Cheng, W T; Kuo, Ching-Ming
2003-12-01
A cDNA sequence of stearoyl-CoA desaturase (SCD) was determined from zebrafish (Danio rerio) and compared to the corresponding genes in several teleosts. Zebrafish SCD cDNA has a size of 1,061 bp, encodes a polypeptide of 325 amino acids, and shares 88, 85, 84, and 83% similarities with tilapia (Oreochromis mossambicus), grass carp (Ctenopharyngodon idella), common carp (Cyprinus carpio), and milkfish (Chanos chanos), respectively. This 1,061 bp sequence specifies a protein that, in common with other fatty acid desaturases, contains three histidine boxes, believed to be involved in catalysis. These observations suggested that SCD genes are highly conserved. In addition, an oligonucleotide probe complementary to zebrafish SCD mRNA was hybridized to mRNA of approximately 396 bases with Northern blot analysis. The Northern blot and RT-PCR analyses showed that the SCD mRNA was expressed predominantly in the liver, intestine, gill, and muscle, while a lower level was found in the brain. Furthermore, we utilized whole-mount in situ hybridization and real-time quantitative RT-PCR to identify expression of the zebrafish SCD gene at five different stages of development. This revealed that very high levels of transcripts were found in zebrafish at all stages during embryogenesis and early development. Copyright 2003 Wiley-Liss, Inc.
Strauss, E G; Levinson, R; Rice, C M; Dalrymple, J; Strauss, J H
1988-05-01
We have sequenced the nsP3 and nsP4 region of two alphaviruses, Ross River virus and O'Nyong-nyong virus, in order to examine these viruses for the presence or absence of an opal termination codon present between nsP3 and nsP4 in many alphaviruses. We found that Ross River virus possesses an in-phase opal termination codon between nsP3 and nsP4, whereas in O'Nyong-nyong virus this termination codon is replaced by an arginine codon. Previous studies have shown that two other alphaviruses, Sindbis virus and Middelburg virus, possess an opal termination codon separating nsP3 and nsP4 [E.G. Strauss, C.M. Rice, and J.H. Strauss (1983), Proc. Natl. Acad. Sci. USA 80, 5271-5275], whereas Semliki Forest virus possesses an arginine codon in lieu of the opal codon [K. Takkinen (1986), Nucleic Acids Res. 14, 5667-5682]. Thus, of the five alphaviruses examined to date, three possess the opal codon and two do not. Production of nsP4 requires readthrough of the opal codon in those alphaviruses that possess this termination codon and the function of the termination codon may be to regulate the amount of nsP4 produced. It is an open question then as to whether alphaviruses with no termination codon use other mechanisms to regulate the activity of this gene. The nsP4s of these five alphaviruses are highly conserved, sharing 71-76% amino acid sequence similarity, and all five contain the Gly-Asp-Asp motif found in many RNA virus replicases. The nsP3s are somewhat less conserved, sharing 52-73% amino acid sequence similarity throughout most of the protein, but each possesses a nonconserved C-terminal domain of 134 to 246 amino acids of unknown function.
Reference-free comparative genomics of 174 chloroplasts.
Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R; Yu, Jun; Cannon, Charles H
2012-01-01
Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ~18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions.
Albayrak, Levent; Khanipov, Kamil; Pimenova, Maria; Golovko, George; Rojas, Mark; Pavlidis, Ioannis; Chumakov, Sergei; Aguilar, Gerardo; Chávez, Arturo; Widger, William R; Fofanov, Yuriy
2016-12-12
Low-abundance mutations in mitochondrial populations (mutations with minor allele frequency ≤ 1%), are associated with cancer, aging, and neurodegenerative disorders. While recent progress in high-throughput sequencing technology has significantly improved the heteroplasmy identification process, the ability of this technology to detect low-abundance mutations can be affected by the presence of similar sequences originating from nuclear DNA (nDNA). To determine to what extent nDNA can cause false positive low-abundance heteroplasmy calls, we have identified mitochondrial locations of all subsequences that are common or similar (one mismatch allowed) between nDNA and mitochondrial DNA (mtDNA). Performed analysis revealed up to a 25-fold variation in the lengths of longest common and longest similar (one mismatch allowed) subsequences across the mitochondrial genome. The size of the longest subsequences shared between nDNA and mtDNA in several regions of the mitochondrial genome were found to be as low as 11 bases, which not only allows using these regions to design new, very specific PCR primers, but also supports the hypothesis of the non-random introduction of mtDNA into the human nuclear DNA. Analysis of the mitochondrial locations of the subsequences shared between nDNA and mtDNA suggested that even very short (36 bases) single-end sequencing reads can be used to identify low-abundance variation in 20.4% of the mitochondrial genome. For longer (76 and 150 bases) reads, the proportion of the mitochondrial genome where nDNA presence will not interfere found to be 44.5 and 67.9%, when low-abundance mutations at 100% of locations can be identified using 417 bases long single reads. This observation suggests that the analysis of low-abundance variations in mitochondria population can be extended to a variety of large data collections such as NCBI Sequence Read Archive, European Nucleotide Archive, The Cancer Genome Atlas, and International Cancer Genome Consortium.
Reference-Free Comparative Genomics of 174 Chloroplasts
Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R.; Yu, Jun; Cannon, Charles H.
2012-01-01
Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ∼18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions. PMID:23185288
Williams, Angela H; Sharma, Mamta; Thatcher, Louise F; Azam, Sarwar; Hane, James K; Sperschneider, Jana; Kidd, Brendan N; Anderson, Jonathan P; Ghosh, Raju; Garg, Gagan; Lichtenzveig, Judith; Kistler, H Corby; Shea, Terrance; Young, Sarah; Buck, Sally-Anne G; Kamphuis, Lars G; Saxena, Rachit; Pande, Suresh; Ma, Li-Jun; Varshney, Rajeev K; Singh, Karam B
2016-03-05
Soil-borne fungi of the Fusarium oxysporum species complex cause devastating wilt disease on many crops including legumes that supply human dietary protein needs across many parts of the globe. We present and compare draft genome assemblies for three legume-infecting formae speciales (ff. spp.): F. oxysporum f. sp. ciceris (Foc-38-1) and f. sp. pisi (Fop-37622), significant pathogens of chickpea and pea respectively, the world's second and third most important grain legumes, and lastly f. sp. medicaginis (Fom-5190a) for which we developed a model legume pathosystem utilising Medicago truncatula. Focusing on the identification of pathogenicity gene content, we leveraged the reference genomes of Fusarium pathogens F. oxysporum f. sp. lycopersici (tomato-infecting) and F. solani (pea-infecting) and their well-characterised core and dispensable chromosomes to predict genomic organisation in the newly sequenced legume-infecting isolates. Dispensable chromosomes are not essential for growth and in Fusarium species are known to be enriched in host-specificity and pathogenicity-associated genes. Comparative genomics of the publicly available Fusarium species revealed differential patterns of sequence conservation across F. oxysporum formae speciales, with legume-pathogenic formae speciales not exhibiting greater sequence conservation between them relative to non-legume-infecting formae speciales, possibly indicating the lack of a common ancestral source for legume pathogenicity. Combining predicted dispensable gene content with in planta expression in the model legume-infecting isolate, we identified small conserved regions and candidate effectors, four of which shared greatest similarity to proteins from another legume-infecting ff. spp. We demonstrate that distinction of core and potential dispensable genomic regions of novel F. oxysporum genomes is an effective tool to facilitate effector discovery and the identification of gene content possibly linked to host specificity. While the legume-infecting isolates didn't share large genomic regions of pathogenicity-related content, smaller regions and candidate effector proteins were highly conserved, suggesting that they may play specific roles in inducing disease on legume hosts.
Aho-Corasick String Matching on Shared and Distributed Memory Parallel Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumeo, Antonino; Villa, Oreste; Chavarría-Miranda, Daniel
String matching is at the core of many critical applications, including network intrusion detection systems, search engines, virus scanners, spam filters, DNA and protein sequencing, and data mining. For all of these applications string matching requires a combination of (sometimes all) the following characteristics: high and/or predictable performance, support for large data sets and flexibility of integration and customization. Many software based implementations targeting conventional cache-based microprocessors fail to achieve high and predictable performance requirements, while Field-Programmable Gate Array (FPGA) implementations and dedicated hardware solutions fail to support large data sets (dictionary sizes) and are difficult to integrate and customize.more » The advent of multicore, multithreaded, and GPU-based systems is opening the possibility for software based solutions to reach very high performance at a sustained rate. This paper compares several software-based implementations of the Aho-Corasick string searching algorithm for high performance systems. We discuss the implementation of the algorithm on several types of shared-memory high-performance architectures (Niagara 2, large x86 SMPs and Cray XMT), distributed memory with homogeneous processing elements (InfiniBand cluster of x86 multicores) and heterogeneous processing elements (InfiniBand cluster of x86 multicores with NVIDIA Tesla C10 GPUs). We describe in detail how each solution achieves the objectives of supporting large dictionaries, sustaining high performance, and enabling customization and flexibility using various data sets.« less
Kotoula, Vassiliki; Lyberopoulou, Aggeliki; Papadopoulou, Kyriaki; Charalambous, Elpida; Alexopoulou, Zoi; Gakou, Chryssa; Lakis, Sotiris; Tsolaki, Eleftheria; Lilakos, Konstantinos; Fountzilas, George
2015-01-01
Background—Aim Massively parallel sequencing (MPS) holds promise for expanding cancer translational research and diagnostics. As yet, it has been applied on paraffin DNA (FFPE) with commercially available highly multiplexed gene panels (100s of DNA targets), while custom panels of low multiplexing are used for re-sequencing. Here, we evaluated the performance of two highly multiplexed custom panels on FFPE DNA. Methods Two custom multiplex amplification panels (B, 373 amplicons; T, 286 amplicons) were coupled with semiconductor sequencing on DNA samples from FFPE breast tumors and matched peripheral blood samples (n samples: 316; n libraries: 332). The two panels shared 37% DNA targets (common or shifted amplicons). Panel performance was evaluated in paired sample groups and quartets of libraries, where possible. Results Amplicon read ratios yielded similar patterns per gene with the same panel in FFPE and blood samples; however, performance of common amplicons differed between panels (p<0.001). FFPE genotypes were compared for 1267 coding and non-coding variant replicates, 999 out of which (78.8%) were concordant in different paired sample combinations. Variant frequency was highly reproducible (Spearman’s rho 0.959). Repeatedly discordant variants were of high coverage / low frequency (p<0.001). Genotype concordance was (a) high, for intra-run duplicates with the same panel (mean±SD: 97.2±4.7, 95%CI: 94.8–99.7, p<0.001); (b) modest, when the same DNA was analyzed with different panels (mean±SD: 81.1±20.3, 95%CI: 66.1–95.1, p = 0.004); and (c) low, when different DNA samples from the same tumor were compared with the same panel (mean±SD: 59.9±24.0; 95%CI: 43.3–76.5; p = 0.282). Low coverage / low frequency variants were validated with Sanger sequencing even in samples with unfavourable DNA quality. Conclusions Custom MPS may yield novel information on genomic alterations, provided that data evaluation is adjusted to tumor tissue FFPE DNA. To this scope, eligibility of all amplicons along with variant coverage and frequency need to be assessed. PMID:26039550
Genetic Relatedness among Hepatitis A Virus Strains Associated with Food-Borne Outbreaks
Vaughan, Gilberto; Xia, Guoliang; Forbi, Joseph C.; Purdy, Michael A.; Rossi, Lívia Maria Gonçalves; Spradling, Philip R.; Khudyakov, Yury E.
2013-01-01
The genetic characterization of hepatitis A virus (HAV) strains is commonly accomplished by sequencing subgenomic regions, such as the VP1/P2B junction. HAV genome is not extensively variable, thus presenting opportunity for sharing sequences of subgenomic regions among genetically unrelated isolates. The degree of misrepresentation of phylogenetic relationships by subgenomic regions is especially important for tracking transmissions. Here, we analyzed whole-genome (WG) sequences of 101 HAV strains identified from 4 major multi-state, food-borne outbreaks of hepatitis A in the Unites States and from 14 non-outbreak-related HAV strains that shared identical VP1/P2B sequences with the outbreak strains. Although HAV strains with an identical VP1/P2B sequence were specific to each outbreak, WG were different, with genetic diversity reaching 0.31% (mean 0.09%). Evaluation of different subgenomic regions did not identify any other section of the HAV genome that could accurately represent phylogenetic relationships observed using WG sequences. The identification of 2–3 dominant HAV strains in 3 out of 4 outbreaks indicates contamination of the implicated food items with a heterogeneous HAV population. However, analysis of intra-host HAV variants from eight patients involved in one outbreak showed that only a single sequence variant established infection in each patient. Four non-outbreak strains were found closely related to strains from 2 outbreaks, whereas ten were genetically different from the outbreak strains. Thus, accurate tracking of HAV strains can be accomplished using HAV WG sequences, while short subgenomic regions are useful for identification of transmissions only among cases with known epidemiological association. PMID:24223112
Swaminathan, Rajeswari; Huang, Yungui; Miller, Katherine; Pastore, Matthew; Hashimoto, Sayaka; Jacobson, Theodora; Mouhlas, Danielle; Lin, Simon
2018-01-01
The adoption rate of genome sequencing for clinical diagnostics has been steadily increasing leading to the possibility of improvement in diagnostic yields. Although laboratories generate a summary clinical report, sharing raw genomic data with healthcare providers is equally important, both for secondary research studies as well as for a deeper analysis of the data itself, as seen by the efforts from organizations such as American College of Medical Genetics and Genomics and Global Alliance for Genomics and Health. Here, we aim to describe the existing protocol of genomic data sharing between a certified clinical laboratory and a healthcare provider and highlight some of the lessons learned. This study tracked and subsequently evaluated the data transfer workflow for 19 patients, all of whom consented to be part of this research study and visited the genetics clinic at a tertiary pediatric hospital between April 2016 to December 2016. Two of the most noticeable elements observed through this study are the manual validation steps and the discrepancies in patient identifiers used by a clinical lab vs. healthcare provider. Both of these add complexity to the transfer process as well as make it more susceptible to errors. The results from this study highlight some of the critical changes that need to be made in order to improve genomic data sharing workflows between healthcare providers and clinical sequencing laboratories. PMID:29515625
Puerma, Eva; Orengo, Dorcas J; Salguero, David; Papaceit, Montserrat; Segarra, Carmen; Aguadé, Montserrat
2014-09-01
Inversions are an integral part of structural variation within species, and they play a leading role in genome reorganization across species. Work at both the cytological and genome sequence levels has revealed heterogeneity in the distribution of inversion breakpoints, with some regions being recurrently used. Breakpoint reuse at the molecular level has mostly been assessed for fixed inversions through genome sequence comparison, and therefore rather broadly. Here, we have identified and sequenced the breakpoints of two polymorphic inversions-E1 and E2 that share a breakpoint-in the extant Est and E1 + 2 chromosomal arrangements of Drosophila subobscura. The breakpoints are two medium-sized repeated motifs that mediated the inversions by two different mechanisms: E1 via staggered breaks and subsequent repair and E2 via repeat-mediated ectopic recombination. The fine delimitation of the shared breakpoint revealed its strict reuse at the molecular level regardless of which was the intermediate arrangement. The occurrence of other rearrangements in the most proximal and distal extended breakpoint regions reveals the broad reuse of these regions. This differential degree of fragility might be related to their sharing the presence outside the inverted region of snoRNA-encoding genes. © The Author 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Limited sharing of tick-borne hemoparasites between sympatric wild and domestic ungulates.
Ghai, Ria R; Mutinda, Mathew; Ezenwa, Vanessa O
2016-08-15
Tick-borne hemoparasites (TBHs) are a group of pathogens of concern in animal management because they are associated with a diversity of hosts, including both wild and domestic species. However, little is known about how frequently TBHs are shared across the wildlife-livestock interface in natural settings. Here, we compared the TBHs of wild Grant's gazelle (Nanger granti) and domestic sheep (Ovis aries) in a region of Kenya where these species extensively overlap. Blood samples collected from each species were screened for piroplasm and rickettsial TBHs by PCR-based amplification of 18S/16S ribosomal DNA, respectively. Overall, 99% of gazelle and 66% of sheep were positive for Babesia/Theileria, and 32% of gazelle and 47% sheep were positive for Anaplasma/Ehrlichia. Sequencing a subset of positive samples revealed infections of Theileria and Anaplasma. Sequences sorted into seven phylogenetically distinct genotypes-two Theileria, and five Anaplasma. With the exception of a putatively novel Anaplasma lineage from Grant's gazelle, these genotypes appeared to be divergent forms of previously described species, including T. ovis, A. ovis, A. bovis, and A. platys. Only one genotype, which clustered within the A. platys clade, contained sequences from both gazelle and sheep. This suggests that despite niche, habitat, and phylogenetic overlap, the majority of circulating tick-borne diseases may not be shared between these two focal species. Copyright © 2016 Elsevier B.V. All rights reserved.
Comparative and Evolutionary Analyses of Meloidogyne spp. Based on Mitochondrial Genome Sequences
García, Laura Evangelina; Sánchez-Puerta, M. Virginia
2015-01-01
Molecular taxonomy and evolution of nematodes have been recently the focus of several studies. Mitochondrial sequences were proposed as an alternative for precise identification of Meloidogyne species, to study intraspecific variability and to follow maternal lineages. We characterized the mitochondrial genomes (mtDNAs) of the root knot nematodes M. floridensis, M. hapla and M. incognita. These were AT rich (81–83%) and highly compact, encoding 12 proteins, 2 rRNAs, and 22 tRNAs. Comparisons with published mtDNAs of M. chitwoodi, M. incognita (another strain) and M. graminicola revealed that they share protein and rRNA gene order but differ in the order of tRNAs. The mtDNAs of M. floridensis and M. incognita were strikingly similar (97–100% identity for all coding regions). In contrast, M. floridensis, M. chitwoodi, M. hapla and M. graminicola showed 65–84% nucleotide identity for coding regions. Variable mitochondrial sequences are potentially useful for evolutionary and taxonomic studies. We developed a molecular taxonomic marker by sequencing a highly-variable ~2 kb mitochondrial region, nad5-cox1, from 36 populations of root-knot nematodes to elucidate relationships within the genus Meloidogyne. Isolates of five species formed monophyletic groups and showed little intraspecific variability. We also present a thorough analysis of the mitochondrial region cox2-rrnS. Phylogenies based on either mitochondrial region had good discrimination power but could not discriminate between M. arenaria, M. incognita and M. floridensis. PMID:25799071
DOE Office of Scientific and Technical Information (OSTI.GOV)
Espínola, Fernando; Dionisi, Hebe M.; Borglin, Sharon
In this work, we analyzed the community structure and metabolic potential of sediment microbial communities in high-latitude coastal environments subjected to low to moderate levels of chronic pollution. Subtidal sediments from four low-energy inlets located in polar and subpolar regions from both Hemispheres were analyzed using large-scale 16S rRNA gene and metagenomic sequencing. Communities showed high diversity (Shannon’s index 6.8 to 10.2), with distinct phylogenetic structures (<40% shared taxa at the Phylum level among regions) but similar metabolic potential in terms of sequences assigned to KOs. Environmental factors (mainly salinity, temperature, and in less extent organic pollution) were drivers ofmore » both phylogenetic and functional traits. Bacterial taxa correlating with hydrocarbon pollution included families of anaerobic or facultative anaerobic lifestyle, such as Desulfuromonadaceae, Geobacteraceae, and Rhodocyclaceae. In accordance, biomarker genes for anaerobic hydrocarbon degradation (bamA, ebdA, bcrA, and bssA) were prevalent, only outnumbered by alkB, and their sequences were taxonomically binned to the same bacterial groups. BssA-assigned metagenomic sequences showed an extremely wide diversity distributed all along the phylogeny known for this gene, including bssA sensu stricto, nmsA, assA, and other clusters from poorly or not yet described variants. This work increases our understanding of microbial community patterns in cold coastal sediments, and highlights the relevance of anaerobic hydrocarbon degradation processes in subtidal environments.« less
Two different groups of signal sequence in M-superfamily conotoxins.
Wang, Qi; Jiang, Hui; Han, Yu-Hong; Yuan, Duo-Duo; Chi, Cheng-Wu
2008-04-01
M-superfamily conotoxins can be divided into four branches (M-1, M-2, M-3 and M-4) according to the number of amino acid residues in the third Cys loop. In general, it is widely accepted that the conotoxin signal peptides of each superfamily are strictly conserved. Recently, we cloned six cDNAs of novel M-superfamily conotoxins from Conus leopardus, Conus marmoreus and Conus quercinus, belonging to either M-1 or M-3 branch. These conotoxins, judging from the putative peptide sequences deducted from cDNAs, are rich in acidic residues and share highly conserved signal and pro-peptide region. However, they are quite different from the reported conotoxins of M-2 and M-4 branches even in their signal peptides, which in general are considered highly conserved for each superfamily of conotoxins. The signal sequences of M-1 and M-3 conotoxins composed of 24 residues start with MLKMGVVL-, while those of M-2 and M-4 conotoxins composed of 25 residues start with MMSKLGVL-. It is another example that different types of signal peptides can exist within a superfamily besides the I-conotoxin superfamily. In addition to the different disulfide connectivity of M-1 conotoxins from that of M-4 or M-2 conotoxins, the sequence alignment, preferential Cys codon usage and phylogenetic tree analysis suggest that M-1 and M-3 conotoxins have much closer relationship, being different from the conotoxins of other two branches (M-4 and M-2) of M-superfamily.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Biaoyang; Nasir, J.; Kalchman, M.A.
1995-02-10
We have previously cloned and characterized the murine homologue of the Huntington disease (HD) gene and shown that it maps to mouse chromosome 5 within a region of conserved synteny with human chromosome 4p16.3. Here we present a detailed comparison of the sequence of the putative promoter and the organization of the 5{prime} genomic region of the murine (Hdh) and human HD genes encompassing the first five exons. We show that in this region these two genes share identical exon boundaries, but have different-size introns. Two dinucleotide (CT) and one trinucleotide intronic polymorphism in Hdh and an intronic CA polymorphismmore » in the HD gene were identified. Comparison of 940-bp sequence 5{prime} to the putative translation start site reveals a highly conserved region (78.8% nucleotide identity) between Hdh and the HD gene from nucleotide -56 to -206 (of Hdh). Neither Hdh nor the HD gene have typical TATA or CCAAT elements, but both show one putative AP2 binding site and numerous potential Sp1 binding sites. The high sequence identity between Hdh and the HD gene for approximately 200 bp 5{prime} to the putative translation start site indicates that these sequences may play a role in regulating expression of the Huntington disease gene. 30 refs., 4 figs., 2 tabs.« less
Recapitulating phylogenies using k-mers: from trees to networks.
Bernard, Guillaume; Ragan, Mark A; Chan, Cheong Xin
2016-01-01
Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on families of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. A tree (or sometimes a network) is then inferred. Rigorous multiple sequence alignment is computationally demanding, and evolutionary processes that shape the genomes of many microbes (bacteria, archaea and some morphologically simple eukaryotes) can add further complications. In particular, recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k -mers (subsequences at fixed length k ). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel's idea of ontogeny, we argue that genome phylogenies can be inferred using k -mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.
Orengo, D J; Puerma, E; Papaceit, M; Segarra, C; Aguadé, M
2015-06-01
Genome sequence comparison across the Drosophila genus revealed that some fixed inversion breakpoints had been multiply reused at this long timescale. Cytological studies of Drosophila inversion polymorphism had previously shown that, also at this shorter timescale, some breakpoints had been multiply reused. The paucity of molecularly characterized polymorphic inversion breakpoints has so far precluded contrasting whether cytologically shared breakpoints of these relatively young inversions are actually reused at the molecular level. The E chromosome of Drosophila subobscura stands out because it presents several inversion complexes. This is the case of the E1+2+9+3 arrangement that originated from the ancestral Est arrangement through the sequential accumulation of four inversions (E1, E2, E9 and E3) sharing some breakpoints. We recently identified the breakpoints of inversions E1 and E2, which allowed establishing reuse at the molecular level of the cytologically shared breakpoint of these inversions. Here, we identified and sequenced the breakpoints of inversions E9 and E3, because they share breakpoints at sections 58D and 64C with those of inversions E1 and E2. This has allowed establishing that E9 and E3 originated through the staggered-break mechanism. Most importantly, sequence comparison has revealed the multiple reuse at the molecular level of the proximal breakpoint (section 58D), which would have been used at least by inversions E2, E9 and E3. In contrast, the distal breakpoint (section 64C) might have been only reused once by inversions E1 and E2, because the distal E3 breakpoint is displaced >70 kb from the other breakpoint limits.
Orengo, D J; Puerma, E; Papaceit, M; Segarra, C; Aguadé, M
2015-01-01
Genome sequence comparison across the Drosophila genus revealed that some fixed inversion breakpoints had been multiply reused at this long timescale. Cytological studies of Drosophila inversion polymorphism had previously shown that, also at this shorter timescale, some breakpoints had been multiply reused. The paucity of molecularly characterized polymorphic inversion breakpoints has so far precluded contrasting whether cytologically shared breakpoints of these relatively young inversions are actually reused at the molecular level. The E chromosome of Drosophila subobscura stands out because it presents several inversion complexes. This is the case of the E1+2+9+3 arrangement that originated from the ancestral Est arrangement through the sequential accumulation of four inversions (E1, E2, E9 and E3) sharing some breakpoints. We recently identified the breakpoints of inversions E1 and E2, which allowed establishing reuse at the molecular level of the cytologically shared breakpoint of these inversions. Here, we identified and sequenced the breakpoints of inversions E9 and E3, because they share breakpoints at sections 58D and 64C with those of inversions E1 and E2. This has allowed establishing that E9 and E3 originated through the staggered-break mechanism. Most importantly, sequence comparison has revealed the multiple reuse at the molecular level of the proximal breakpoint (section 58D), which would have been used at least by inversions E2, E9 and E3. In contrast, the distal breakpoint (section 64C) might have been only reused once by inversions E1 and E2, because the distal E3 breakpoint is displaced >70 kb from the other breakpoint limits. PMID:25712227
Insights into the Melipona scutellaris (Hymenoptera, Apidae, Meliponini) fat body transcriptome.
de Sousa, Cristina Soares; Serrão, José Eduardo; Bonetti, Ana Maria; Amaral, Isabel Marques Rodrigues; Kerr, Warwick Estevam; Maranhão, Andréa Queiroz; Ueira-Vieira, Carlos
2013-07-01
The insect fat body is a multifunctional organ analogous to the vertebrate liver. The fat body is involved in the metabolism of juvenile hormone, regulation of environmental stress, production of immunity regulator-like proteins in cells and protein storage. However, very little is known about the molecular mechanisms involved in fat body physiology in stingless bees. In this study, we analyzed the transcriptome of the fat body from the stingless bee Melipona scutellaris. In silico analysis of a set of cDNA library sequences yielded 1728 expressed sequence tags (ESTs) and 997 high-quality sequences that were assembled into 29 contigs and 117 singlets. The BLAST X tool showed that 86% of the ESTs shared similarity with Apis mellifera (honeybee) genes. The M. scutellaris fat body ESTs encoded proteins with roles in numerous physiological processes, including anti-oxidation, phosphorylation, metabolism, detoxification, transmembrane transport, intracellular transport, cell proliferation, protein hydrolysis and protein synthesis. This is the first report to describe a transcriptomic analysis of specific organs of M. scutellaris. Our findings provide new insights into the physiological role of the fat body in stingless bees.
Insights into the Melipona scutellaris (Hymenoptera, Apidae, Meliponini) fat body transcriptome
de Sousa, Cristina Soares; Serrão, José Eduardo; Bonetti, Ana Maria; Amaral, Isabel Marques Rodrigues; Kerr, Warwick Estevam; Maranhão, Andréa Queiroz; Ueira-Vieira, Carlos
2013-01-01
The insect fat body is a multifunctional organ analogous to the vertebrate liver. The fat body is involved in the metabolism of juvenile hormone, regulation of environmental stress, production of immunity regulator-like proteins in cells and protein storage. However, very little is known about the molecular mechanisms involved in fat body physiology in stingless bees. In this study, we analyzed the transcriptome of the fat body from the stingless bee Melipona scutellaris. In silico analysis of a set of cDNA library sequences yielded 1728 expressed sequence tags (ESTs) and 997 high-quality sequences that were assembled into 29 contigs and 117 singlets. The BLAST X tool showed that 86% of the ESTs shared similarity with Apis mellifera (honeybee) genes. The M. scutellaris fat body ESTs encoded proteins with roles in numerous physiological processes, including anti-oxidation, phosphorylation, metabolism, detoxification, transmembrane transport, intracellular transport, cell proliferation, protein hydrolysis and protein synthesis. This is the first report to describe a transcriptomic analysis of specific organs of M. scutellaris. Our findings provide new insights into the physiological role of the fat body in stingless bees. PMID:23885214
NASA Astrophysics Data System (ADS)
Liu, Jiao; Li, Xianchao; Tang, Xuexi; Zhou, Bin
2016-03-01
Members of the DnaJ family are proteins that play a pivotal role in various cellular processes, such as protein folding, protein transport and cellular responses to stress. In the present study, we identified and characterized the full-length DnaJ cDNA sequence from expressed sequence tags of Pyropia yezoensis ( PyDnaJ) via rapid identification of cDNA ends. This cDNA encoded a protein of 429 amino acids, which shared high sequence similarity with other identified DnaJ proteins, such as a heat shock protein 40/DnaJ from Pyropia haitanensis. The relative mRNA expression level of PyDnaJ was investigated using real-time PCR to determine its specific expression during the algal life cycle and during desiccation. The relative mRNA expression level in sporophytes was higher than that in gametophytes and significantly increased during the whole desiccation process. These results indicate that PyDnaJ is an authentic member of the DnaJ family in plants and red algae and might play a pivotal role in mitigating damage to P. yezoensis during desiccation.
CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database
Jia, Baofeng; Raphenya, Amogelang R.; Alcock, Brian; Waglechner, Nicholas; Guo, Peiyao; Tsang, Kara K.; Lago, Briony A.; Dave, Biren M.; Pereira, Sheldon; Sharma, Arjun N.; Doshi, Sachin; Courtot, Mélanie; Lo, Raymond; Williams, Laura E.; Frye, Jonathan G.; Elsayegh, Tariq; Sardar, Daim; Westman, Erin L.; Pawlowski, Andrew C.; Johnson, Timothy A.; Brinkman, Fiona S.L.; Wright, Gerard D.; McArthur, Andrew G.
2017-01-01
The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins and mutations involved in AMR. CARD is ontologically structured, model centric, and spans the breadth of AMR drug classes and resistance mechanisms, including intrinsic, mutation-driven and acquired resistance. It is built upon the Antibiotic Resistance Ontology (ARO), a custom built, interconnected and hierarchical controlled vocabulary allowing advanced data sharing and organization. Its design allows the development of novel genome analysis tools, such as the Resistance Gene Identifier (RGI) for resistome prediction from raw genome sequence. Recent improvements include extensive curation of additional reference sequences and mutations, development of a unique Model Ontology and accompanying AMR detection models to power sequence analysis, new visualization tools, and expansion of the RGI for detection of emergent AMR threats. CARD curation is updated monthly based on an interplay of manual literature curation, computational text mining, and genome analysis. PMID:27789705
Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors.
Haghverdi, Laleh; Lun, Aaron T L; Morgan, Michael D; Marioni, John C
2018-06-01
Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.
A Study of The Effect of Demand Uncertainty for Low-Carbon Products Using a Newsvendor Model
Qu, Shaojian; Zhou, Yongyi
2017-01-01
This paper studies the effect of uncertain demand on a low-carbon product by using a newsvendor model. With two different kinds of market scales, we examine a game whereby a manufacturer produces and delivers a single new low-carbon product to a single retailer. The retailer observes the demand information and gives an order before the selling season. We find in the game that if the retailer shares truthful (or in contrast unreal or even does not share) forecast information with the manufacturer, the manufacturer will give a low (high) wholesale price through the sequence of events. In addition, as a policy-maker, the government posts a subsidy by selling the low-carbon product per unit. The manufacturer creates a new contract with a rebate for the retailer. We also take the consumer aversion coefficient and truth coefficient as qualitative variables into our model to study the order, pricing, and expected profit for the members of supply chain. The research shows that uncertain demand causes a the major effect on the new low-carbon product. Thereby, we suggest the retailer should share more truthful information with the manufacturer. PMID:29068382
A Study of The Effect of Demand Uncertainty for Low-Carbon Products Using a Newsvendor Model.
Qu, Shaojian; Zhou, Yongyi
2017-10-25
This paper studies the effect of uncertain demand on a low-carbon product by using a newsvendor model. With two different kinds of market scales, we examine a game whereby a manufacturer produces and delivers a single new low-carbon product to a single retailer. The retailer observes the demand information and gives an order before the selling season. We find in the game that if the retailer shares truthful (or in contrast unreal or even does not share) forecast information with the manufacturer, the manufacturer will give a low (high) wholesale price through the sequence of events. In addition, as a policy-maker, the government posts a subsidy by selling the low-carbon product per unit. The manufacturer creates a new contract with a rebate for the retailer. We also take the consumer aversion coefficient and truth coefficient as qualitative variables into our model to study the order, pricing, and expected profit for the members of supply chain. The research shows that uncertain demand causes a the major effect on the new low-carbon product. Thereby, we suggest the retailer should share more truthful information with the manufacturer.
Integrated consensus genetic and physical maps of flax (Linum usitatissimum L.).
Cloutier, Sylvie; Ragupathy, Raja; Miranda, Evelyn; Radovanovic, Natasa; Reimer, Elsa; Walichnowski, Andrzej; Ward, Kerry; Rowland, Gordon; Duguid, Scott; Banik, Mitali
2012-12-01
Three linkage maps of flax (Linum usitatissimum L.) were constructed from populations CDC Bethune/Macbeth, E1747/Viking and SP2047/UGG5-5 containing between 385 and 469 mapped markers each. The first consensus map of flax was constructed incorporating 770 markers based on 371 shared markers including 114 that were shared by all three populations and 257 shared between any two populations. The 15 linkage group map corresponds to the haploid number of chromosomes of this species. The marker order of the consensus map was largely collinear in all three individual maps but a few local inversions and marker rearrangements spanning short intervals were observed. Segregation distortion was present in all linkage groups which contained 1-52 markers displaying non-Mendelian segregation. The total length of the consensus genetic map is 1,551 cM with a mean marker density of 2.0 cM. A total of 670 markers were anchored to 204 of the 416 fingerprinted contigs of the physical map corresponding to ~274 Mb or 74 % of the estimated flax genome size of 370 Mb. This high resolution consensus map will be a resource for comparative genomics, genome organization, evolution studies and anchoring of the whole genome shotgun sequence.