pairwise sequence identity: Topics by Science.gov

Sample records for pairwise sequence identity

Remarkable sequence conservation of the last intron in the PKD1 gene.

PubMed

Rodova, Marianna; Islam, M Rafiq; Peterson, Kenneth R; Calvet, James P

2003-10-01

The last intron of the PKD1 gene (intron 45) was found to have exceptionally high sequence conservation across four mammalian species: human, mouse, rat, and dog. This conservation did not extend to the comparable intron in pufferfish. Pairwise comparisons for intron 45 showed 91% identity (human vs. dog) to 100% identity (mouse vs. rat) for an average for all four species of 94% identity. In contrast, introns 43 and 44 of the PKD1 gene had average pairwise identities of 57% and 54%, and exons 43, 44, and 45 and the coding region of exon 46 had average pairwise identities of 80%, 84%, 82%, and 80%. Intron 45 is 90 to 95 bp in length, with the major region of sequence divergence being in a central 4-bp to 9-bp variable region. RNA secondary structure analysis of intron 45 predicts a branching stem-loop structure in which the central variable region lies in one loop and the putative branch point sequence lies in another loop, suggesting that the intron adopts a specific stem-loop structure that may be important for its removal. Although intron 45 appears to conform to the class of small, G-triplet-containing introns that are spliced by a mechanism utilizing intron definition, its high sequence conservation may be a reflection of constraints imposed by a unique mechanism that coordinates splicing of this last PKD1 intron with polyadenylation.
Sequence determination and analysis of the NSs genes of two tospoviruses.

PubMed

Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

2012-03-01

The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.
Alphasatellitidae: a new family with two subfamilies for the classification of geminivirus- and nanovirus-associated alphasatellites.

PubMed

Briddon, Rob W; Martin, Darren P; Roumagnac, Philippe; Navas-Castillo, Jesús; Fiallo-Olivé, Elvira; Moriones, Enrique; Lett, Jean-Michel; Zerbini, F Murilo; Varsani, Arvind

2018-05-09

Nanoviruses and geminiviruses are circular, single stranded DNA viruses that infect many plant species around the world. Nanoviruses and certain geminiviruses that belong to the Begomovirus and Mastrevirus genera are associated with additional circular, single stranded DNA molecules (~ 1-1.4 kb) that encode a replication-associated protein (Rep). These Rep-encoding satellite molecules are commonly referred to as alphasatellites and here we communicate the establishment of the family Alphasatellitidae to which these have been assigned. Within the Alphasatellitidae family two subfamilies, Geminialphasatellitinae and Nanoalphasatellitinae, have been established to respectively accommodate the geminivirus- and nanovirus-associated alphasatellites. Whereas the pairwise nucleotide sequence identity distribution of all the known geminialphasatellites (n = 628) displayed a troughs at ~ 70% and 88% pairwise identity, that of the known nanoalphasatellites (n = 54) had a troughs at ~ 67% and ~ 80% pairwise identity. We use these pairwise identity values as thresholds together with phylogenetic analyses to establish four genera and 43 species of geminialphasatellites and seven genera and 19 species of nanoalphasatellites. Furthermore, a divergent alphasatellite associated with coconut foliar decay disease is assigned to a species but not a subfamily as it likely represents a new alphasatellite subfamily that could be established once other closely related molecules are discovered.
PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences.

PubMed

Ganesan, K; Parthasarathy, S

2011-12-01

Annotation of any newly determined protein sequence depends on the pairwise sequence identity with known sequences. However, for the twilight zone sequences which have only 15-25% identity, the pair-wise comparison methods are inadequate and the annotation becomes a challenging task. Such sequences can be annotated by using methods that recognize their fold. Bowie et al. described a 3D1D profile method in which the amino acid sequences that fold into a known 3D structure are identified by their compatibility to that known 3D structure. We have improved the above method by using the predicted secondary structure information and employ it for fold recognition from the twilight zone sequences. In our Protein Secondary Structure 3D1D (PSS-3D1D) method, a score (w) for the predicted secondary structure of the query sequence is included in finding the compatibility of the query sequence to the known fold 3D structures. In the benchmarks, the PSS-3D1D method shows a maximum of 21% improvement in predicting correctly the α + β class of folds from the sequences with twilight zone level of identity, when compared with the 3D1D profile method. Hence, the PSS-3D1D method could offer more clues than the 3D1D method for the annotation of twilight zone sequences. The web based PSS-3D1D method is freely available in the PredictFold server at http://bioinfo.bdu.ac.in/servers/ .
Deep Sequencing Reveals a Divergent Ugandan cassava brown streak virus Isolate from Malawi

PubMed Central

Winter, Stephan; Mukasa, Settumba; Tairo, Fred; Sseruwagi, Peter; Ndunguru, Joseph; Duffy, Siobain

2017-01-01

ABSTRACT Illumina sequencing of RNA from a cassava cutting from northern Malawi produced a genome of Ugandan cassava brown streak virus (UCBSV-MW-NB7_2013). Sequence comparisons revealed stronger similarity to an isolate from nearby Tanzania (93.4% pairwise nucleotide identity) than to those previously reported from Malawi (86.9 to 87.0%). PMID:28818908
Manipulation of Karyotype in Caenorhabditis elegans Reveals Multiple Inputs Driving Pairwise Chromosome Synapsis During Meiosis

PubMed Central

Roelens, Baptiste; Schvarzstein, Mara; Villeneuve, Anne M.

2015-01-01

Meiotic chromosome segregation requires pairwise association between homologs, stabilized by the synaptonemal complex (SC). Here, we investigate factors contributing to pairwise synapsis by investigating meiosis in polyploid worms. We devised a strategy, based on transient inhibition of cohesin function, to generate polyploid derivatives of virtually any Caenorhabditis elegans strain. We exploited this strategy to investigate the contribution of recombination to pairwise synapsis in tetraploid and triploid worms. In otherwise wild-type polyploids, chromosomes first sort into homolog groups, then multipartner interactions mature into exclusive pairwise associations. Pairwise synapsis associations still form in recombination-deficient tetraploids, confirming a propensity for synapsis to occur in a strictly pairwise manner. However, the transition from multipartner to pairwise association was perturbed in recombination-deficient triploids, implying a role for recombination in promoting this transition when three partners compete for synapsis. To evaluate the basis of synapsis partner preference, we generated polyploid worms heterozygous for normal sequence and rearranged chromosomes sharing the same pairing center (PC). Tetraploid worms had no detectable preference for identical partners, indicating that PC-adjacent homology drives partner choice in this context. In contrast, triploid worms exhibited a clear preference for identical partners, indicating that homology outside the PC region can influence partner choice. Together, our findings, suggest a two-phase model for C. elegans synapsis: an early phase, in which initial synapsis interactions are driven primarily by recombination-independent assessment of homology near PCs and by a propensity for pairwise SC assembly, and a later phase in which mature synaptic interactions are promoted by recombination. PMID:26500263
Nanopore DNA Sequencing and Genome Assembly on the International Space Station.

PubMed

Castro-Wallace, Sarah L; Chiu, Charles Y; John, Kristen K; Stahl, Sarah E; Rubins, Kathleen H; McIntyre, Alexa B R; Dworkin, Jason P; Lupisella, Mark L; Smith, David J; Botkin, Douglas J; Stephenson, Timothy A; Juul, Sissel; Turner, Daniel J; Izquierdo, Fernando; Federman, Scot; Stryke, Doug; Somasekar, Sneha; Alexander, Noah; Yu, Guixia; Mason, Christopher E; Burton, Aaron S

2017-12-21

We evaluated the performance of the MinION DNA sequencer in-flight on the International Space Station (ISS), and benchmarked its performance off-Earth against the MinION, Illumina MiSeq, and PacBio RS II sequencing platforms in terrestrial laboratories. Samples contained equimolar mixtures of genomic DNA from lambda bacteriophage, Escherichia coli (strain K12, MG1655) and Mus musculus (female BALB/c mouse). Nine sequencing runs were performed aboard the ISS over a 6-month period, yielding a total of 276,882 reads with no apparent decrease in performance over time. From sequence data collected aboard the ISS, we constructed directed assemblies of the ~4.6 Mb E. coli genome, ~48.5 kb lambda genome, and a representative M. musculus sequence (the ~16.3 kb mitochondrial genome), at 100%, 100%, and 96.7% consensus pairwise identity, respectively; de novo assembly of the E. coli genome from raw reads yielded a single contig comprising 99.9% of the genome at 98.6% consensus pairwise identity. Simulated real-time analyses of in-flight sequence data using an automated bioinformatic pipeline and laptop-based genomic assembly demonstrated the feasibility of sequencing analysis and microbial identification aboard the ISS. These findings illustrate the potential for sequencing applications including disease diagnosis, environmental monitoring, and elucidating the molecular basis for how organisms respond to spaceflight.
Molecular characterization of Atractolytocestus sagittatus (Cestoda: Caryophyllidea), monozoic parasite of common carp, and its differentiation from the invasive species Atractolytocestus huronensis.

PubMed

Bazsalovicsová, Eva; Králová-Hromadová, Ivica; Stefka, Jan; Scholz, Tomáš

2012-05-01

Sequence structure of complete internal transcribed spacer 1 and 2 (ITS1 and ITS2) of the ribosomal DNA region and partial mitochondrial cytochrome c oxidase subunit I (cox1) gene sequences were studied in the monozoic tapeworm Atractolytocestus sagittatus (Kulakovskaya et Akhmerov, 1965) (Cestoda: Caryophyllidea), a parasite of common carp (Cyprinus carpio carpio L.). Intraindividual sequence diversity was observed in both ribosomal spacers. In ITS1, a total number of 19 recombinant clones yielded eight different sequence types (pairwise sequence identity, 99.7-100%) which, however, did not resemble the structure typical for divergent intragenomic ITS copies (paralogues). Polymorphism was displayed by several single nucleotide mutations present exclusively in single clones, but variation in the number of short repetitive motifs was not observed. In ITS2, a total of 21 recombinant clones yielded ten different sequence types (pairwise sequence identity, 97.5-100%). They were mostly characterized by a varying number of (TCGT)(n) repeats resulting in assortment of ITS2 sequences into two sequence variants, which reflected the structure specific for ITS paralogues. The third DNA region analysed, mitochondrial cox1 gene (669 bp) was detected to be 100% identical in all studied A. sagittatus individuals. Comparison of molecular data on A. sagittatus with those on Atractolytocestus huronensis Anthony, 1958, an invasive parasite of common carp, has shown that interspecific differences significantly exceeded intraspecific variation in both ribosomal spacers (81.4-82.5% in ITS1, 74.4-75.2% in ITS2) as well as in mitochondrial cox1, which confirms validity of both congeneric tapeworms parasitic in the same fish host.
Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences

PubMed Central

2018-01-01

Prediction of taxonomy for marker gene sequences such as 16S ribosomal RNA (rRNA) is a fundamental task in microbiology. Most experimentally observed sequences are diverged from reference sequences of authoritatively named organisms, creating a challenge for prediction methods. I assessed the accuracy of several algorithms using cross-validation by identity, a new benchmark strategy which explicitly models the variation in distances between query sequences and the closest entry in a reference database. When the accuracy of genus predictions was averaged over a representative range of identities with the reference database (100%, 99%, 97%, 95% and 90%), all tested methods had ≤50% accuracy on the currently-popular V4 region of 16S rRNA. Accuracy was found to fall rapidly with identity; for example, better methods were found to have V4 genus prediction accuracy of ∼100% at 100% identity but ∼50% at 97% identity. The relationship between identity and taxonomy was quantified as the probability that a rank is the lowest shared by a pair of sequences with a given pair-wise identity. With the V4 region, 95% identity was found to be a twilight zone where taxonomy is highly ambiguous because the probabilities that the lowest shared rank between pairs of sequences is genus, family, order or class are approximately equal. PMID:29682424
Identification of a novel bovine enterovirus possessing highly divergent amino acid sequences in capsid protein.

PubMed

Tsuchiaka, Shinobu; Rahpaya, Sayed Samim; Otomaru, Konosuke; Aoki, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Omatsu, Tsutomu; Sano, Kaori; Okazaki-Terashima, Sachiko; Katayama, Yukie; Oba, Mami; Nagai, Makoto; Mizutani, Tetsuya

2017-01-17

Bovine enterovirus (BEV) belongs to the species Enterovirus E or F, genus Enterovirus and family Picornaviridae. Although numerous studies have identified BEVs in the feces of cattle with diarrhea, the pathogenicity of BEVs remains unclear. Previously, we reported the detection of novel kobu-like virus in calf feces, by metagenomics analysis. In the present study, we identified a novel BEV in diarrheal feces collected for that survey. Complete genome sequences were determined by deep sequencing in feces. Secondary RNA structure analysis of the 5' untranslated region (UTR), phylogenetic tree construction and pairwise identity analysis were conducted. The complete genome sequences of BEV were genetically distant from other EVs and the VP1 coding region contained novel and unique amino acid sequences. We named this strain as BEV AN12/Bos taurus/JPN/2014 (referred to as BEV-AN12). According to genome analysis, the genome length of this virus is 7414 nucleotides excluding the poly (A) tail and its genome consists of a 5'UTR, open reading frame encoding a single polyprotein, and 3'UTR. The results of secondary RNA structure analysis showed that in the 5'UTR, BEV-AN12 had an additional clover leaf structure and small stem loop structure, similarly to other BEVs. In pairwise identity analysis, BEV-AN12 showed high amino acid (aa) identities to Enterovirus F in the polyprotein, P2 and P3 regions (aa identity ≥82.4%). Therefore, BEV-AN12 is closely related to Enterovirus F. However, aa sequences in the capsid protein regions, particularly the VP1 encoding region, showed significantly low aa identity to other viruses in genus Enterovirus (VP1 aa identity ≤58.6%). In addition, BEV-AN12 branched separately from Enterovirus E and F in phylogenetic trees based on the aa sequences of P1 and VP1, although it clustered with Enterovirus F in trees based on sequences in the P2 and P3 genome region. We identified novel BEV possessing highly divergent aa sequences in the VP1 coding region in Japan. According to species definition, we proposed naming this strain as "Enterovirus K", which is a novel species within genus Enterovirus. Further genomic studies are needed to understand the pathogenicity of BEVs.
The tapeworm Atractolytocestus tenuicollis (Cestoda: Caryophyllidea)--a sister species or ancestor of an invasive A. huronensis?

PubMed

Králová-Hromadová, Ivica; Štefka, Jan; Bazsalovicsová, Eva; Bokorová, Silvia; Oros, Mikuláš

2013-10-01

Atractolytocestus tenuicollis (Li, 1964) Xi, Wang, Wu, Gao et Nie, 2009 is a monozoic, non-segmented tapeworm of the order Caryophyllidea, parasitizing exclusively common carp (Cyprinus carpio L.). In the current work, the first molecular data, in particular complete ribosomal internal transcribed spacer 2 (ITS2) and partial mitochondrial cytochrome c oxidase subunit I (cox1) on A. tenuicollis from Niushan Lake, Wuhan, China, are provided. In order to evaluate molecular interrelationships within Atractolytocestus, the data on A. tenuicollis were compared with relevant data on two other congeners, Atractolytocestus huronensis and Atractolytocestus sagittatus. Divergent intragenomic copies (ITS2 paralogues) were detected in the ITS2 ribosomal spacer of A. tenuicollis; the same phenomenon has previously been observed also in two other congeners. ITS2 structure of A. tenuicollis was very similar to that of A. huronensis from Slovakia, USA and UK; overall pairwise sequence identity was 91.7-95.2%. On the other hand, values of sequence identity between A. tenuicollis and A. sagittatus were lower, 69.7-70.9%. Cox1 sequence, analysed in five A. tenuicollis individuals, were 100 % identical and no intraspecific variation was observed. Comparison of A. tenuicollis cox1 with respective sequences of two other Atractolytocestus species showed that the mitochondrial haplotype found in Chinese A. tenuicollis is structurally specific (haplotype 4; Ha4) and differs from all so far determined Atractolytocestus haplotypes (Ha1 and Ha2 for A. huronensis; Ha3 for A. sagittatus). Pairwise sequence identity between A. tenuicollis cox1 haplotype and remaining three haplotypes followed the same pattern as in ITS2. The nucleotide and amino acide (aa) sequence comparison with A. huronensis Ha1 and Ha2 revealed higher sequence identity, 90.3-90.8% (96.9% in aa), while lower values were achieved between A. tenuicollis haplotype and Ha3 of Japanese A. sagittatus-75.2 % (81.9 % in aa). The phylogenetic analyses using cox1, ITS2 and combined cox1 + ITS2 sequences revealed close genetic interrelationship between A. tenuicollis and A. huronensis. Independently of a type of analysis and DNA region used, the topology of obtained trees was always identical; A. tenuicollis formed separate clade with A. huronensis forming a closely related sister group.
Distantly related lipocalins share two conserved clusters of hydrophobic residues: use in homology modeling

PubMed Central

Adam, Benoit; Charloteaux, Benoit; Beaufays, Jerome; Vanhamme, Luc; Godfroid, Edmond; Brasseur, Robert; Lins, Laurence

2008-01-01

Background Lipocalins are widely distributed in nature and are found in bacteria, plants, arthropoda and vertebra. In hematophagous arthropods, they are implicated in the successful accomplishment of the blood meal, interfering with platelet aggregation, blood coagulation and inflammation and in the transmission of disease parasites such as Trypanosoma cruzi and Borrelia burgdorferi. The pairwise sequence identity is low among this family, often below 30%, despite a well conserved tertiary structure. Under the 30% identity threshold, alignment methods do not correctly assign and align proteins. The only safe way to assign a sequence to that family is by experimental determination. However, these procedures are long and costly and cannot always be applied. A way to circumvent the experimental approach is sequence and structure analyze. To further help in that task, the residues implicated in the stabilisation of the lipocalin fold were determined. This was done by analyzing the conserved interactions for ten lipocalins having a maximum pairwise identity of 28% and various functions. Results It was determined that two hydrophobic clusters of residues are conserved by analysing the ten lipocalin structures and sequences. One cluster is internal to the barrel, involving all strands and the 310 helix. The other is external, involving four strands and the helix lying parallel to the barrel surface. These clusters are also present in RaHBP2, a unusual "outlier" lipocalin from tick Rhipicephalus appendiculatus. This information was used to assess assignment of LIR2 a protein from Ixodes ricinus and to build a 3D model that helps to predict function. FTIR data support the lipocalin fold for this protein. Conclusion By sequence and structural analyzes, two conserved clusters of hydrophobic residues in interactions have been identified in lipocalins. Since the residues implicated are not conserved for function, they should provide the minimal subset necessary to confer the lipocalin fold. This information has been used to assign LIR2 to lipocalins and to investigate its structure/function relationship. This study could be applied to other protein families with low pairwise similarity, such as the structurally related fatty acid binding proteins or avidins. PMID:18190694
Phylogeny of the Genus Flavivirus

PubMed Central

Kuno, Goro; Chang, Gwong-Jen J.; Tsuchiya, K. Richard; Karabatsos, Nick; Cropp, C. Bruce

1998-01-01

We undertook a comprehensive phylogenetic study to establish the genetic relationship among the viruses of the genus Flavivirus and to compare the classification based on molecular phylogeny with the existing serologic method. By using a combination of quantitative definitions (bootstrap support level and the pairwise nucleotide sequence identity), the viruses could be classified into clusters, clades, and species. Our phylogenetic study revealed for the first time that from the putative ancestor two branches, non-vector and vector-borne virus clusters, evolved and from the latter cluster emerged tick-borne and mosquito-borne virus clusters. Provided that the theory of arthropod association being an acquired trait was correct, pairwise nucleotide sequence identity among these three clusters provided supporting data for a possibility that the non-vector cluster evolved first, followed by the separation of tick-borne and mosquito-borne virus clusters in that order. Clades established in our study correlated significantly with existing antigenic complexes. We also resolved many of the past taxonomic problems by establishing phylogenetic relationships of the antigenically unclassified viruses with the well-established viruses and by identifying synonymous viruses. PMID:9420202
Phylogeny of the genus Flavivirus.

PubMed

Kuno, G; Chang, G J; Tsuchiya, K R; Karabatsos, N; Cropp, C B

1998-01-01

We undertook a comprehensive phylogenetic study to establish the genetic relationship among the viruses of the genus Flavivirus and to compare the classification based on molecular phylogeny with the existing serologic method. By using a combination of quantitative definitions (bootstrap support level and the pairwise nucleotide sequence identity), the viruses could be classified into clusters, clades, and species. Our phylogenetic study revealed for the first time that from the putative ancestor two branches, non-vector and vector-borne virus clusters, evolved and from the latter cluster emerged tick-borne and mosquito-borne virus clusters. Provided that the theory of arthropod association being an acquired trait was correct, pairwise nucleotide sequence identity among these three clusters provided supporting data for a possibility that the non-vector cluster evolved first, followed by the separation of tick-borne and mosquito-borne virus clusters in that order. Clades established in our study correlated significantly with existing antigenic complexes. We also resolved many of the past taxonomic problems by establishing phylogenetic relationships of the antigenically unclassified viruses with the well-established viruses and by identifying synonymous viruses.
Complete Genome Sequence of a Genomovirus Associated with Common Bean Plant Leaves in Brazil.

PubMed

Lamas, Natalia Silva; Fontenele, Rafaela Salgado; Melo, Fernando Lucas; Costa, Antonio Felix; Varsani, Arvind; Ribeiro, Simone Graça

2016-11-10

A new genomovirus has been identified in three common bean plants in Brazil. This virus has a circular genome of 2,220 nucleotides and 3 major open reading frames. It shares 80.7% genome-wide pairwise identity with a genomovirus recovered from Tongan fruit bat guano. Copyright © 2016 Lamas et al.
Oligonucleotide fingerprinting of rRNA genes for analysis of fungal community composition.

PubMed

Valinsky, Lea; Della Vedova, Gianluca; Jiang, Tao; Borneman, James

2002-12-01

Thorough assessments of fungal diversity are currently hindered by technological limitations. Here we describe a new method for identifying fungi, oligonucleotide fingerprinting of rRNA genes (OFRG). ORFG sorts arrayed rRNA gene (ribosomal DNA [rDNA]) clones into taxonomic clusters through a series of hybridization experiments, each using a single oligonucleotide probe. A simulated annealing algorithm was used to design an OFRG probe set for fungal rDNA. Analysis of 1,536 fungal rDNA clones derived from soil generated 455 clusters. A pairwise sequence analysis showed that clones with average sequence identities of 99.2% were grouped into the same cluster. To examine the accuracy of the taxonomic identities produced by this OFRG experiment, we determined the nucleotide sequences for 117 clones distributed throughout the tree. For all but two of these clones, the taxonomic identities generated by this OFRG experiment were consistent with those generated by a nucleotide sequence analysis. Eighty-eight percent of the clones were affiliated with Ascomycota, while 12% belonged to BASIDIOMYCOTA: A large fraction of the clones were affiliated with the genera Fusarium (404 clones) and Raciborskiomyces (176 clones). Smaller assemblages of clones had high sequence identities to the Alternaria, Ascobolus, Chaetomium, Cryptococcus, and Rhizoctonia clades.
SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

PubMed

Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

2008-05-01

Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

PubMed Central

Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

2008-01-01

Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold

PubMed Central

Li, Weizhong; Lopez, Rodrigo

2017-01-01

Abstract Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. PMID:27923999
Classification and evolution of human rhinoviruses.

PubMed

Palmenberg, Ann C; Gern, James E

2015-01-01

The historical classification of human rhinoviruses (RV) by serotyping has been replaced by a logical system of comparative sequencing. Given that strains must diverge within their capsid sequenced by a reasonable degree (>12-13 % pairwise base identities) before becoming immunologically distinct, the new nomenclature system makes allowances for the addition of new, future types, without compromising historical designations. Currently, three species, the RV-A, RV-B, and RV-C, are recognized. Of these, the RV-C, discovered in 2006, are the most unusual in terms of capsid structure, receptor use, and association with severe disease in children.

Molecular and Insecticidal Characterization of a Novel Cry-Related Protein from Bacillus Thuringiensis Toxic against Myzus persicae

PubMed Central

Palma, Leopoldo; Muñoz, Delia; Berry, Colin; Murillo, Jesús; Ruiz de Escudero, Iñigo; Caballero, Primitivo

2014-01-01

This study describes the insecticidal activity of a novel Bacillus thuringiensis Cry-related protein with a deduced 799 amino acid sequence (~89 kDa) and ~19% pairwise identity to the 95-kDa-aphidicidal protein (sequence number 204) from patent US 8318900 and ~40% pairwise identity to the cancer cell killing Cry proteins (parasporins Cry41Ab1 and Cry41Aa1), respectively. This novel Cry-related protein contained the five conserved amino acid blocks and the three conserved domains commonly found in 3-domain Cry proteins. The protein exhibited toxic activity against the green peach aphid, Myzus persicae (Sulzer) (Homoptera: Aphididae) with the lowest mean lethal concentration (LC50 = 32.7 μg/mL) reported to date for a given Cry protein and this insect species, whereas it had no lethal toxicity against the Lepidoptera of the family Noctuidae Helicoverpa armigera (Hübner), Mamestra brassicae (L.), Spodoptera exigua (Hübner), S. frugiperda (J.E. Smith) and S. littoralis (Boisduval), at concentrations as high as ~3.5 μg/cm2. This novel Cry-related protein may become a promising environmentally friendly tool for the biological control of M. persicae and possibly also for other sap sucking insect pests. PMID:25384108
Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts.

PubMed

Göke, Jonathan; Schulz, Marcel H; Lasserre, Julia; Vingron, Martin

2012-03-01

The identity of cells and tissues is to a large degree governed by transcriptional regulation. A major part is accomplished by the combinatorial binding of transcription factors at regulatory sequences, such as enhancers. Even though binding of transcription factors is sequence-specific, estimating the sequence similarity of two functionally similar enhancers is very difficult. However, a similarity measure for regulatory sequences is crucial to detect and understand functional similarities between two enhancers and will facilitate large-scale analyses like clustering, prediction and classification of genome-wide datasets. We present the standardized alignment-free sequence similarity measure N2, a flexible framework that is defined for word neighbourhoods. We explore the usefulness of adding reverse complement words as well as words including mismatches into the neighbourhood. On simulated enhancer sequences as well as functional enhancers in mouse development, N2 is shown to outperform previous alignment-free measures. N2 is flexible, faster than competing methods and less susceptible to single sequence noise and the occurrence of repetitive sequences. Experiments on the mouse enhancers reveal that enhancers active in different tissues can be separated by pairwise comparison using N2. N2 represents an improvement over previous alignment-free similarity measures without compromising speed, which makes it a good candidate for large-scale sequence comparison of regulatory sequences. The software is part of the open-source C++ library SeqAn (www.seqan.de) and a compiled version can be downloaded at http://www.seqan.de/projects/alf.html. Supplementary data are available at Bioinformatics online.
Diversity of partial RNA-dependent RNA polymerase gene sequences of soybean blotchy mosaic virus isolates from different host-, geographical- and temporal origins.

PubMed

Strydom, Elrea; Pietersen, Gerhard

2018-05-01

Infection of soybean by the plant cytorhabdovirus soybean blotchy mosaic virus (SbBMV) results in significant yield losses in the temperate, lower-lying soybean production regions of South Africa. A 277 bp portion of the RNA-dependent RNA polymerase gene of 66 SbBMV isolates from different: hosts, geographical locations in South Africa, and times of collection (spanning 16 years) were amplified by RT-PCR and sequenced to investigate the genetic diversity of isolates. Phylogenetic reconstruction revealed three main lineages, designated Groups A, B and C, with isolates grouping primarily according to geographic origin. Pairwise nucleotide identities ranged between 85.7% and 100% among all isolates, with isolates in Group A exhibiting the highest degree of sequence identity, and isolates of Groups A and B being more closely related to each other than to those in Group C. This is the first study investigating the genetic diversity of SbBMV.
Amino acid sequences of ribosomal proteins S11 from Bacillus stearothermophilus and S19 from Halobacterium marismortui. Comparison of the ribosomal protein S11 family.

PubMed

Kimura, M; Kimura, J; Hatakeyama, T

1988-11-21

The complete amino acid sequences of ribosomal proteins S11 from the Gram-positive eubacterium Bacillus stearothermophilus and of S19 from the archaebacterium Halobacterium marismortui have been determined. A search for homologous sequences of these proteins revealed that they belong to the ribosomal protein S11 family. Homologous proteins have previously been sequenced from Escherichia coli as well as from chloroplast, yeast and mammalian ribosomes. A pairwise comparison of the amino acid sequences showed that Bacillus protein S11 shares 68% identical residues with S11 from Escherichia coli and a slightly lower homology (52%) with the homologous chloroplast protein. The halophilic protein S19 is more related to the eukaryotic (45-49%) than to the eubacterial counterparts (35%).
Profiling cellular protein complexes by proximity ligation with dual tag microarray readout.

PubMed

Hammond, Maria; Nong, Rachel Yuan; Ericsson, Olle; Pardali, Katerina; Landegren, Ulf

2012-01-01

Patterns of protein interactions provide important insights in basic biology, and their analysis plays an increasing role in drug development and diagnostics of disease. We have established a scalable technique to compare two biological samples for the levels of all pairwise interactions among a set of targeted protein molecules. The technique is a combination of the proximity ligation assay with readout via dual tag microarrays. In the proximity ligation assay protein identities are encoded as DNA sequences by attaching DNA oligonucleotides to antibodies directed against the proteins of interest. Upon binding by pairs of antibodies to proteins present in the same molecular complexes, ligation reactions give rise to reporter DNA molecules that contain the combined sequence information from the two DNA strands. The ligation reactions also serve to incorporate a sample barcode in the reporter molecules to allow for direct comparison between pairs of samples. The samples are evaluated using a dual tag microarray where information is decoded, revealing which pairs of tags that have become joined. As a proof-of-concept we demonstrate that this approach can be used to detect a set of five proteins and their pairwise interactions both in cellular lysates and in fixed tissue culture cells. This paper provides a general strategy to analyze the extent of any pairwise interactions in large sets of molecules by decoding reporter DNA strands that identify the interacting molecules.
Molecular characterization of a novel orthomyxovirus from rainbow and steelhead trout (Oncorhynchus mykiss)

USGS Publications Warehouse

Batts, William N.; LaPatra, Scott E.; Katona, Ryan; Leis, Eric; Fei Fan Ng, Terry; Bruieuc, Marine S.O.; Breyta, Rachel; Purcell, Maureen; Waltzek, Thomas B.; Delwart, Eric; Winton, James

2017-01-01

A novel virus, rainbow trout orthomyxovirus (RbtOV), was isolated in 1997 and again in 2000 from commercially-reared rainbow trout (Oncorhynchus mykiss) in Idaho, USA. The virus grew optimally in the CHSE-214 cell line at 15°C producing a diffuse cytopathic effect; however, juvenile rainbow trout exposed to cell culture-grown virus showed no mortality or gross pathology. Electron microscopy of preparations from infected cell cultures revealed the presence of typical orthomyxovirus particles. The complete genome of RbtOV is comprised of eight linear segments of single-stranded, negative-sense RNA having highly conserved 5′ and 3′-terminal nucleotide sequences. Another virus isolated in 2014 from steelhead trout (also O. mykiss) in Wisconsin, USA, and designated SttOV was found to have eight genome segments with high amino acid sequence identities (89–99%) to the corresponding genes of RbtOV, suggesting these new viruses are isolates of the same virus species and may be more widespread than currently realized. The new isolates had the same genome segment order and the closest pairwise amino acid sequence identities of 16–42% with Infectious salmon anemia virus (ISAV), the type species and currently only member of the genus Isavirus in the family Orthomyxoviridae. However, pairwise comparisons of the predicted amino acid sequences of the 10 RbtOV and SttOV proteins with orthologs from representatives of the established orthomyxoviral genera and a phylogenetic analysis using the PB1 protein showed that while RbtOV and SttOV clustered most closely with ISAV, they diverged sufficiently to merit consideration as representatives of a novel genus. A set of PCR primers was designed using conserved regions of the PB1 gene to produce amplicons that may be sequenced for identification of similar fish orthomyxoviruses in the future.
A novel papillomavirus in Adélie penguin (Pygoscelis adeliae) faeces sampled at the Cape Crozier colony, Antarctica.

PubMed

Varsani, Arvind; Kraberger, Simona; Jennings, Scott; Porzig, Elizabeth L; Julian, Laurel; Massaro, Melanie; Pollard, Annie; Ballard, Grant; Ainley, David G

2014-06-01

Papillomaviruses are epitheliotropic viruses that have circular dsDNA genomes encapsidated in non-enveloped virions. They have been found to infect a variety of mammals, reptiles and birds, but so far they have not been found in amphibians. Using a next-generation sequencing de novo assembly contig-informed recovery, we cloned and Sanger sequenced the complete genome of a novel papillomavirus from the faecal matter of Adélie penguins (Pygoscelis adeliae) nesting on Ross Island, Antarctica. The genome had all the usual features of a papillomavirus and an E9 ORF encoding a protein of unknown function that is found in all avian papillomaviruses to date. This novel papillomavirus genome shared ~60 % pairwise identity with the genomes of the other three known avian papillomaviruses: Fringilla coelebs papillomavirus 1 (FcPV1), Francolinus leucoscepus papillomavirus 1 (FlPV1) and Psittacus erithacus papillomavirus 1. Pairwise identity analysis and phylogenetic analysis of the major capsid protein gene clearly indicated that it represents a novel species, which we named Pygoscelis adeliae papillomavirus 1 (PaCV1). No evidence of recombination was detected in the genome of PaCV1, but we did detect a recombinant region (119 nt) in the E6 gene of FlPV1 with the recombinant region being derived from ancestral FcPV1-like sequences. Previously only paramyxoviruses, orthomyxoviruses and avian pox viruses have been genetically identified in penguins; however, the majority of penguin viral identifications have been based on serology or histology. This is the first report, to our knowledge, of a papillomavirus associated with a penguin species. © 2014 The Authors.
How to Choose the Suitable Template for Homology Modelling of GPCRs: 5-HT7 Receptor as a Test Case.

PubMed

Shahaf, Nir; Pappalardo, Matteo; Basile, Livia; Guccione, Salvatore; Rayan, Anwar

2016-09-01

G protein-coupled receptors (GPCRs) are a super-family of membrane proteins that attract great pharmaceutical interest due to their involvement in almost every physiological activity, including extracellular stimuli, neurotransmission, and hormone regulation. Currently, structural information on many GPCRs is mainly obtained by the techniques of computer modelling in general and by homology modelling in particular. Based on a quantitative analysis of eighteen antagonist-bound, resolved structures of rhodopsin family "A" receptors - also used as templates to build 153 homology models - it was concluded that a higher sequence identity between two receptors does not guarantee a lower RMSD between their structures, especially when their pair-wise sequence identity (within trans-membrane domain and/or in binding pocket) lies between 25 % and 40 %. This study suggests that we should consider all template receptors having a sequence identity ≤50 % with the query receptor. In fact, most of the GPCRs, compared to the currently available resolved structures of GPCRs, fall within this range and lack a correlation between structure and sequence. When testing suitability for structure-based drug design, it was found that choosing as a template the most similar resolved protein, based on sequence resemblance only, led to unsound results in many cases. Molecular docking analyses were carried out, and enrichment factors as well as attrition rates were utilized as criteria for assessing suitability for structure-based drug design. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Listeria costaricensis sp. nov.

PubMed

Núñez-Montero, Kattia; Leclercq, Alexandre; Moura, Alexandra; Vales, Guillaume; Peraza, Johnny; Pizarro-Cerdá, Javier; Lecuit, Marc

2018-03-01

A bacterial strain isolated from a food processing drainage system in Costa Rica fulfilled the criteria as belonging to the genus Listeria, but could not be assigned to any of the known species. Phylogenetic analysis based on the 16S rRNA gene revealed highest sequence similarity with the type strain of Listeria floridensis (98.7 %). Phylogenetic analysis based on Listeria core genomes placed the novel taxon within the Listeria fleishmannii, L. floridensis and Listeria aquatica clade (Listeria sensu lato). Whole-genome sequence analyses based on the average nucleotide blast identity (ANI<80 %) indicated that this isolate belonged to a novel species. Results of pairwise amino acid identity (AAI>70 %) and percentage of conserved proteins (POCP>68 %) with currently known Listeria species, as well as of biochemical characterization, confirmed that the strain constituted a novel species within the genus Listeria. The name Listeria costaricensis sp. nov. is proposed for the novel species, and is represented by the type strain CLIP 2016/00682 T (=CIP 111400 T =DSM 105474 T ).
Discovery of the first maize-infecting mastrevirus in the Americas using a vector-enabled metagenomics approach.

PubMed

Fontenele, Rafaela S; Alves-Freitas, Dione M T; Silva, Pedro I T; Foresti, Josemar; Silva, Paulo R; Godinho, Márcio T; Varsani, Arvind; Ribeiro, Simone G

2018-01-01

The genus Mastrevirus (family Geminiviridae) is composed of single-stranded DNA viruses that infect mono- and dicotyledonous plants and are transmitted by leafhoppers. In South America, there have been only two previous reports of mastreviruses, both identified in sweet potatoes (from Peru and Uruguay). As part of a general viral surveillance program, we used a vector-enabled metagenomics (VEM) approach and sampled leafhoppers (Dalbulus maidis) in Itumbiara (State of Goiás), Brazil. High-throughput sequencing of viral DNA purified from the leafhopper sample revealed mastrevirus-like contigs. Using a set of abutting primers, a 2746-nt circular genome was recovered. The circular genome has a typical mastrevirus genome organization and shares <63% pairwise identity with other mastrevirus isolates from around the world. Therefore, the new mastrevirus was tentatively named "maize striate mosaic virus". Seventeen maize leaf samples were collected in the same field as the leafhoppers, and ten samples were found to be positive for this mastrevirus. Furthermore, the ten genomes recovered from the maize samples share >99% pairwise identity with the one from the leafhopper. This is the first report of a maize-infecting mastrevirus in the Americas, the first identified in a non-vegetatively propagated mastrevirus host in South America, and the first mastrevirus to be identified in Brazil.
CombAlign: a code for generating a one-to-many sequence alignment from a set of pairwise structure-based sequence alignments.

PubMed

Zhou, Carol L Ecale

2015-01-01

In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
Cloning and expression in Escherichia coli of isopenicillin N synthetase genes from Streptomyces lipmanii and Aspergillus nidulans.

PubMed Central

Weigel, B J; Burgett, S G; Chen, V J; Skatrud, P L; Frolik, C A; Queener, S W; Ingolia, T D

1988-01-01

beta-Lactam antibiotics such as penicillins and cephalosporins are synthesized by a wide variety of microbes, including procaryotes and eucaryotes. Isopenicillin N synthetase catalyzes a key reaction in the biosynthetic pathway of penicillins and cephalosporins. The genes encoding this protein have previously been cloned from the filamentous fungi Cephalosporium acremonium and Penicillium chrysogenum and characterized. We have extended our analysis to the isopenicillin N synthetase genes from the fungus Aspergillus nidulans and the gram-positive procaryote Streptomyces lipmanii. The isopenicillin N synthetase genes from these organisms have been cloned and sequenced, and the proteins encoded by the open reading frames were expressed in Escherichia coli. Active isopenicillin N synthetase enzyme was recovered from extracts of E. coli cells prepared from cells containing each of the genes in expression vectors. The four isopenicillin N synthetase genes studied are closely related. Pairwise comparison of the DNA sequences showed between 62.5 and 75.7% identity; comparison of the predicted amino acid sequences showed between 53.9 and 80.6% identity. The close homology of the procaryotic and eucaryotic isopenicillin N synthetase genes suggests horizontal transfer of the genes during evolution. Images PMID:3045077
Genome variability of foot-and-mouth disease virus during the short period of the 2010 epidemic in Japan.

PubMed

Nishi, Tatsuya; Yamada, Manabu; Fukai, Katsuhiko; Shimada, Nobuaki; Morioka, Kazuki; Yoshida, Kazuo; Sakamoto, Kenichi; Kanno, Toru; Yamakawa, Makoto

2017-02-01

Foot-and-mouth disease virus (FMDV) is highly contagious and has a high mutation rate, leading to extensive genetic variation. To investigate how FMDV genetically evolves over a short period of an epidemic after initial introduction into an FMD-free area, whole L-fragment sequences of 104 FMDVs isolated from the 2010 epidemic in Japan, which continued for less than three months were determined and phylogenetically and comparatively analyzed. Phylogenetic analysis of whole L-fragment sequences showed that these isolates were classified into a single group, indicating that FMDV was introduced into Japan in the epidemic via a single introduction. Nucleotide sequences of 104 virus isolates showed more than 99.56% pairwise identity rates without any genetic deletion or insertion, although no sequences were completely identical with each other. These results indicate that genetic substitutions of FMDV occurred gradually and constantly during the epidemic and generation of an extensive mutant virus could have been prevented by rapid eradication strategy. From comparative analysis of variability of each FMDV protein coding region, VP4 and 2C regions showed the highest average identity rates and invariant rates, and were confirmed as highly conserved. In contrast, the protein coding regions VP2 and VP1 were confirmed to be highly variable regions with the lowest average identity rates and invariant rates, respectively. Our data demonstrate the importance of rapid eradication strategy in an FMD epidemic and provide valuable information on the genome variability of FMDV during the short period of an epidemic. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Whole genome sequences of Japanese porcine species C rotaviruses reveal a high diversity of genotypes of individual genes and will contribute to a comprehensive, generally accepted classification system.

PubMed

Niira, Kazutaka; Ito, Mika; Masuda, Tsuneyuki; Saitou, Toshiya; Abe, Tadatsugu; Komoto, Satoshi; Sato, Mitsuo; Yamasato, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Sano, Kaori; Tuchiaka, Shinobu; Okada, Takashi; Omatsu, Tsutomu; Furuya, Tetsuya; Aoki, Hiroshi; Katayama, Yukie; Oba, Mami; Shirai, Junsuke; Taniguchi, Koki; Mizutani, Tetsuya; Nagai, Makoto

2016-10-01

Porcine rotavirus C (RVC) is distributed throughout the world and is thought to be a pathogenic agent of diarrhea in piglets. Although, the VP7, VP4, and VP6 gene sequences of Japanese porcine RVCs are currently available, there is no whole-genome sequence data of Japanese RVC. Furthermore, only one to three sequences are available for porcine RVC VP1-VP3 and NSP1-NSP3 genes. Therefore, we determined nearly full-length whole-genome sequences of nine Japanese porcine RVCs from seven piglets with diarrhea and two healthy pigs and compared them with published RVC sequences from a database. The VP7 genes of two Japanese RVCs from healthy pigs were highly divergent from other known RVC strains and were provisionally classified as G12 and G13 based on the 86% nucleotide identity cut-off value. Pairwise sequence identity calculations and phylogenetic analyses revealed that candidate novel genotypes of porcine Japanese RVC were identified in the NSP1, NSP2 and NSP3 encoding genes, respectively. Furthermore, VP3 of Japanese porcine RVCs was shown to be closely related to human RVCs, suggesting a gene reassortment event between porcine and human RVCs and past interspecies transmission. The present study demonstrated that porcine RVCs show greater genetic diversity among strains than human and bovine RVCs. Copyright © 2016 Elsevier B.V. All rights reserved.
Metabolic network prediction through pairwise rational kernels.

PubMed

Roche-Lima, Abiel; Domaratzki, Michael; Fristensky, Brian

2014-09-26

Metabolic networks are represented by the set of metabolic pathways. Metabolic pathways are a series of biochemical reactions, in which the product (output) from one reaction serves as the substrate (input) to another reaction. Many pathways remain incompletely characterized. One of the major challenges of computational biology is to obtain better models of metabolic pathways. Existing models are dependent on the annotation of the genes. This propagates error accumulation when the pathways are predicted by incorrectly annotated genes. Pairwise classification methods are supervised learning methods used to classify new pair of entities. Some of these classification methods, e.g., Pairwise Support Vector Machines (SVMs), use pairwise kernels. Pairwise kernels describe similarity measures between two pairs of entities. Using pairwise kernels to handle sequence data requires long processing times and large storage. Rational kernels are kernels based on weighted finite-state transducers that represent similarity measures between sequences or automata. They have been effectively used in problems that handle large amount of sequence information such as protein essentiality, natural language processing and machine translations. We create a new family of pairwise kernels using weighted finite-state transducers (called Pairwise Rational Kernel (PRK)) to predict metabolic pathways from a variety of biological data. PRKs take advantage of the simpler representations and faster algorithms of transducers. Because raw sequence data can be used, the predictor model avoids the errors introduced by incorrect gene annotations. We then developed several experiments with PRKs and Pairwise SVM to validate our methods using the metabolic network of Saccharomyces cerevisiae. As a result, when PRKs are used, our method executes faster in comparison with other pairwise kernels. Also, when we use PRKs combined with other simple kernels that include evolutionary information, the accuracy values have been improved, while maintaining lower construction and execution times. The power of using kernels is that almost any sort of data can be represented using kernels. Therefore, completely disparate types of data can be combined to add power to kernel-based machine learning methods. When we compared our proposal using PRKs with other similar kernel, the execution times were decreased, with no compromise of accuracy. We also proved that by combining PRKs with other kernels that include evolutionary information, the accuracy can also also be improved. As our proposal can use any type of sequence data, genes do not need to be properly annotated, avoiding accumulation errors because of incorrect previous annotations.
Comparison of predicted binders in Rhipicephalus (Boophilus) microplus intestine protein variants Bm86 Campo Grande strain, Bm86 and Bm95.

PubMed

Andreotti, Renato; Pedroso, Marisela S; Caetano, Alexandre R; Martins, Natália F

2008-01-01

This paper reports the sequence analysis of Bm86 Campo Grande strain comparing it with Bm86 and Bm95 antigens from the preparations TickGardPLUS and Gavac, respectively. The PCR product was cloned into pMOSBlue and sequenced. The secondary structure prediction tool PSIPRED was used to calculate alpha helices and beta strand contents of the predicted polypeptide. The hydrophobicity profile was calculated using the algorithms from the Hopp and Woods method, in addition to identification of potential MHC class-I binding regions in the antigens. Pair-wise alignment revealed that the similarity between Bm86 Campo Grande strain and Bm86 is 0.2% higher than that between Bm86 Campo Grande strain and Bm95 antigens. The identities were 96.5% and 96.3% respectively. Major suggestive differences in hydrophobicity were predicted among the sequences in two specific regions.
Phylogenetic Characterizations of Highly Mutated EV-B106 Recombinants Showing Extensive Genetic Exchanges with Other EV-B in Xinjiang, China.

PubMed

Song, Yang; Zhang, Yong; Fan, Qin; Cui, Hui; Yan, Dongmei; Zhu, Shuangli; Tang, Haishu; Sun, Qiang; Wang, Dongyan; Xu, Wenbo

2017-02-23

Human enterovirus B106 (EV-B106) is a new member of the enterovirus B species. To date, only three nucleotide sequences of EV-B106 have been published, and only one full-length genome sequence (the Yunnan strain 148/YN/CHN/12) is available in the GenBank database. In this study, we conducted phylogenetic characterisation of four EV-B106 strains isolated in Xinjiang, China. Pairwise comparisons of the nucleotide sequences and the deduced amino acid sequences revealed that the four Xinjiang EV-B106 strains had only 80.5-80.8% nucleotide identity and 95.4-97.3% amino acid identity with the Yunnan EV-B106 strain, indicating high mutagenicity. Similarity plots and bootscanning analyses revealed that frequent intertypic recombination occurred in all four Xinjiang EV-B106 strains in the non-structural region. These four strains may share a donor sequence with the EV-B85 strain, which circulated in Xinjiang in 2011, indicating extensive genetic exchanges between these strains. All Xinjiang EV-B106 strains were temperature-sensitive. An antibody seroprevalence study against EV-B106 in two Xinjiang prefectures also showed low titres of neutralizing antibodies, suggesting limited exposure and transmission in the population. This study contributes the whole genome sequences of EV-B106 to the GenBank database and provides valuable information regarding the molecular epidemiology of EV-B106 in China.
Phylogenetic Characterizations of Highly Mutated EV-B106 Recombinants Showing Extensive Genetic Exchanges with Other EV-B in Xinjiang, China

PubMed Central

Song, Yang; Zhang, Yong; Fan, Qin; Cui, Hui; Yan, Dongmei; Zhu, Shuangli; Tang, Haishu; Sun, Qiang; Wang, Dongyan; Xu, Wenbo

2017-01-01

Human enterovirus B106 (EV-B106) is a new member of the enterovirus B species. To date, only three nucleotide sequences of EV-B106 have been published, and only one full-length genome sequence (the Yunnan strain 148/YN/CHN/12) is available in the GenBank database. In this study, we conducted phylogenetic characterisation of four EV-B106 strains isolated in Xinjiang, China. Pairwise comparisons of the nucleotide sequences and the deduced amino acid sequences revealed that the four Xinjiang EV-B106 strains had only 80.5–80.8% nucleotide identity and 95.4–97.3% amino acid identity with the Yunnan EV-B106 strain, indicating high mutagenicity. Similarity plots and bootscanning analyses revealed that frequent intertypic recombination occurred in all four Xinjiang EV-B106 strains in the non-structural region. These four strains may share a donor sequence with the EV-B85 strain, which circulated in Xinjiang in 2011, indicating extensive genetic exchanges between these strains. All Xinjiang EV-B106 strains were temperature-sensitive. An antibody seroprevalence study against EV-B106 in two Xinjiang prefectures also showed low titres of neutralizing antibodies, suggesting limited exposure and transmission in the population. This study contributes the whole genome sequences of EV-B106 to the GenBank database and provides valuable information regarding the molecular epidemiology of EV-B106 in China. PMID:28230168
Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs.

PubMed

Busk, Peter Kamp; Lange, Lene

2013-06-01

Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.
'Candidatus Phytoplasma palmicola', associated with a lethal yellowing-type disease of coconut (Cocos nucifera L.) in Mozambique.

PubMed

Harrison, Nigel A; Davis, Robert E; Oropeza, Carlos; Helmick, Ericka E; Narváez, María; Eden-Green, Simon; Dollet, Michel; Dickinson, Matthew

2014-06-01

In this study, the taxonomic position and group classification of the phytoplasma associated with a lethal yellowing-type disease (LYD) of coconut (Cocos nucifera L.) in Mozambique were addressed. Pairwise similarity values based on alignment of nearly full-length 16S rRNA gene sequences (1530 bp) revealed that the Mozambique coconut phytoplasma (LYDM) shared 100% identity with a comparable sequence derived from a phytoplasma strain (LDN) responsible for Awka wilt disease of coconut in Nigeria, and shared 99.0-99.6% identity with 16S rRNA gene sequences from strains associated with Cape St Paul wilt (CSPW) disease of coconut in Ghana and Côte d'Ivoire. Similarity scores further determined that the 16S rRNA gene of the LYDM phytoplasma shared <97.5% sequence identity with all previously described members of 'Candidatus Phytoplasma'. The presence of unique regions in the 16S rRNA gene sequence distinguished the LYDM phytoplasma from all currently described members of 'Candidatus Phytoplasma', justifying its recognition as the reference strain of a novel taxon, 'Candidatus Phytoplasma palmicola'. Virtual RFLP profiles of the F2n/R2 portion (1251 bp) of the 16S rRNA gene and pattern similarity coefficients delineated coconut LYDM phytoplasma strains from Mozambique as novel members of established group 16SrXXII, subgroup A (16SrXXII-A). Similarity coefficients of 0.97 were obtained for comparisons between subgroup 16SrXXII-A strains and CSPW phytoplasmas from Ghana and Côte d'Ivoire. On this basis, the CSPW phytoplasma strains were designated members of a novel subgroup, 16SrXXII-B.

Morphological identification and COI barcodes of adult flies help determine species identities of chironomid larvae (Diptera, Chironomidae).

PubMed

Failla, A J; Vasquez, A A; Hudson, P; Fujimoto, M; Ram, J L

2016-02-01

Establishing reliable methods for the identification of benthic chironomid communities is important due to their significant contribution to biomass, ecology and the aquatic food web. Immature larval specimens are more difficult to identify to species level by traditional morphological methods than their fully developed adult counterparts, and few keys are available to identify the larval species. In order to develop molecular criteria to identify species of chironomid larvae, larval and adult chironomids from Western Lake Erie were subjected to both molecular and morphological taxonomic analysis. Mitochondrial cytochrome c oxidase I (COI) barcode sequences of 33 adults that were identified to species level by morphological methods were grouped with COI sequences of 189 larvae in a neighbor-joining taxon-ID tree. Most of these larvae could be identified only to genus level by morphological taxonomy (only 22 of the 189 sequenced larvae could be identified to species level). The taxon-ID tree of larval sequences had 45 operational taxonomic units (OTUs, defined as clusters with >97% identity or individual sequences differing from nearest neighbors by >3%; supported by analysis of all larval pairwise differences), of which seven could be identified to species or 'species group' level by larval morphology. Reference sequences from the GenBank and BOLD databases assigned six larval OTUs with presumptive species level identifications and confirmed one previously assigned species level identification. Sequences from morphologically identified adults in the present study grouped with and further classified the identity of 13 larval OTUs. The use of morphological identification and subsequent DNA barcoding of adult chironomids proved to be beneficial in revealing possible species level identifications of larval specimens. Sequence data from this study also contribute to currently inadequate public databases relevant to the Great Lakes region, while the neighbor-joining analysis reported here describes the application and confirmation of a useful tool that can accelerate identification and bioassessment of chironomid communities.
Morphological identification and COI barcodes of adult flies help determine species identities of chironomid larvae (Diptera, Chironomidae)

USGS Publications Warehouse

Failla, Andrew Joseph; Vasquez, Adrian Amelio; Hudson, Patrick L.; Fujimoto, Masanori; Ram, Jeffrey L.

2016-01-01

Establishing reliable methods for the identification of benthic chironomid communities is important due to their significant contribution to biomass, ecology and the aquatic food web. Immature larval specimens are more difficult to identify to species level by traditional morphological methods than their fully developed adult counterparts, and few keys are available to identify the larval species. In order to develop molecular criteria to identify species of chironomid larvae, larval and adult chironomids from Western Lake Erie were subjected to both molecular and morphological taxonomic analysis. Mitochondrial cytochrome c oxidase I (COI) barcode sequences of 33 adults that were identified to species level by morphological methods were grouped with COI sequences of 189 larvae in a neighbor-joining taxon-ID tree. Most of these larvae could be identified only to genus level by morphological taxonomy (only 22 of the 189 sequenced larvae could be identified to species level). The taxon-ID tree of larval sequences had 45 operational taxonomic units (OTUs, defined as clusters with >97% identity or individual sequences differing from nearest neighbors by >3%; supported by analysis of all larval pairwise differences), of which seven could be identified to species or ‘species group’ level by larval morphology. Reference sequences from the GenBank and BOLD databases assigned six larval OTUs with presumptive species level identifications and confirmed one previously assigned species level identification. Sequences from morphologically identified adults in the present study grouped with and further classified the identity of 13 larval OTUs. The use of morphological identification and subsequent DNA barcoding of adult chironomids proved to be beneficial in revealing possible species level identifications of larval specimens. Sequence data from this study also contribute to currently inadequate public databases relevant to the Great Lakes region, while the neighbor-joining analysis reported here describes the application and confirmation of a useful tool that can accelerate identification and bioassesment of chironomid communities.
Global versus Local Regulatory Roles for Lrp-Related Proteins: Haemophilus influenzae as a Case Study

PubMed Central

Friedberg, Devorah; Midkiff, Michael; Calvo, Joseph M.

2001-01-01

Lrp (leucine-responsive regulatory protein) plays a global regulatory role in Escherichia coli, affecting expression of dozens of operons. Numerous lrp-related genes have been identified in different bacteria and archaea, including asnC, an E. coli gene that was the first reported member of this family. Pairwise comparisons of amino acid sequences of the corresponding proteins shows an average sequence identity of only 29% for the vast majority of comparisons. By contrast, Lrp-related proteins from enteric bacteria show more than 97% amino acid identity. Is the global regulatory role associated with E. coli Lrp limited to enteric bacteria? To probe this question we investigated LrfB, an Lrp-related protein from Haemophilus influenzae that shares 75% sequence identity with E. coli Lrp (highest sequence identity among 42 sequences compared). A strain of H. influenzae having an lrfB null allele grew at the wild-type growth rate but with a filamentous morphology. A comparison of two-dimensional (2D) electrophoretic patterns of proteins from parent and mutant strains showed only two differences (comparable studies with lrp+ and lrp E. coli strains by others showed 20 differences). The abundance of LrfB in H. influenzae, estimated by Western blotting experiments, was about 130 dimers per cell (compared to 3,000 dimers per E. coli cell). LrfB expressed in E. coli replaced Lrp as a repressor of the lrp gene but acted only to a limited extent as an activator of the ilvIH operon. Thus, although LrfB resembles Lrp sufficiently to perform some of its functions, its low abundance is consonant with a more local role in regulating but a few genes, a view consistent with the results of the 2D electrophoretic analysis. We speculate that an Lrp having a global regulatory role evolved to help enteric bacteria adapt to their ecological niches and that it is unlikely that Lrp-related proteins in other organisms have a broad regulatory function. PMID:11395465
The complete genome of klassevirus – a novel picornavirus in pediatric stool

PubMed Central

Greninger, Alexander L; Runckel, Charles; Chiu, Charles Y; Haggerty, Thomas; Parsonnet, Julie; Ganem, Donald; DeRisi, Joseph L

2009-01-01

Background Diarrhea kills 2 million children worldwide each year, yet an etiological agent is not found in approximately 30–50% of cases. Picornaviral genera such as enterovirus, kobuvirus, cosavirus, parechovirus, hepatovirus, teschovirus, and cardiovirus have all been found in human and animal diarrhea. Modern technologies, especially deep sequencing, allow rapid, high-throughput screening of clinical samples such as stool for new infectious agents associated with human disease. Results A pool of 141 pediatric gastroenteritis samples that were previously found to be negative for known diarrheal viruses was subjected to pyrosequencing. From a total of 937,935 sequence reads, a collection of 849 reads distantly related to Aichi virus were assembled and found to comprise 75% of a novel picornavirus genome. The complete genome was subsequently cloned and found to share 52.3% nucleotide pairwise identity and 38.9% amino acid identity to Aichi virus. The low level of sequence identity suggests a novel picornavirus genus which we have designated klassevirus. Blinded screening of 751 stool specimens from both symptomatic and asymptomatic individuals revealed a second positive case of klassevirus infection, which was subsequently found to be from the index case's 11-month old twin. Conclusion We report the discovery of human klassevirus 1, a member of a novel picornavirus genus, in stool from two infants from Northern California. Further characterization and epidemiological studies will be required to establish whether klasseviruses are significant causes of human infection. PMID:19538752
Phylogenomic analysis of the species of the Mycobacterium tuberculosis complex demonstrates that Mycobacterium africanum, Mycobacterium bovis, Mycobacterium caprae, Mycobacterium microti and Mycobacterium pinnipedii are later heterotypic synonyms of Mycobacterium tuberculosis.

PubMed

Riojas, Marco A; McGough, Katya J; Rider-Riojas, Cristin J; Rastogi, Nalin; Hazbón, Manzour Hernando

2018-01-01

The species within the Mycobacterium tuberculosis Complex (MTBC) have undergone numerous taxonomic and nomenclatural changes, leaving the true structure of the MTBC in doubt. We used next-generation sequencing (NGS), digital DNA-DNA hybridization (dDDH), and average nucleotide identity (ANI) to investigate the relationship between these species. The type strains of Mycobacterium africanum, Mycobacterium bovis, Mycobacterium caprae, Mycobacterium microti and Mycobacterium pinnipedii were sequenced via NGS. Pairwise dDDH and ANI comparisons between these, previously sequenced MTBC type strain genomes (including 'Mycobacterium canettii', 'Mycobacterium mungi' and 'Mycobacterium orygis') and M. tuberculosis H37Rv T were performed. Further, all available genome sequences in GenBank for species in or putatively in the MTBC were compared to H37Rv T . Pairwise results indicated that all of the type strains of the species are extremely closely related to each other (dDDH: 91.2-99.2 %, ANI: 99.21-99.92 %), greatly exceeding the respective species delineation thresholds, thus indicating that they belong to the same species. Results from the GenBank genomes indicate that all the strains examined are within the circumscription of H37Rv T (dDDH: 83.5-100 %). We, therefore, formally propose a union of the species of the MTBC as M. tuberculosis. M. africanum, M. bovis, M. caprae, M. microti and M. pinnipedii are reclassified as later heterotypic synonyms of M. tuberculosis. 'M. canettii', 'M. mungi', and 'M. orygis' are classified as strains of the species M. tuberculosis. We further recommend use of the infrasubspecific term 'variant' ('var.') and infrasubspecific designations that generally retain the historical nomenclature associated with the groups or otherwise convey such characteristics, e.g. M. tuberculosis var. bovis.
HLA Diversity in the 1000 Genomes Dataset

PubMed Central

Gourraud, Pierre-Antoine; Khankhanian, Pouya; Cereb, Nezih; Yang, Soo Young; Feolo, Michael; Maiers, Martin; D. Rioux, John; Hauser, Stephen; Oksenberg, Jorge

2014-01-01

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC), only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD) decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies. PMID:24988075
HLA diversity in the 1000 genomes dataset.

PubMed

Gourraud, Pierre-Antoine; Khankhanian, Pouya; Cereb, Nezih; Yang, Soo Young; Feolo, Michael; Maiers, Martin; Rioux, John D; Hauser, Stephen; Oksenberg, Jorge

2014-01-01

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC), only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD) decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies.
Analysis of drug binding pockets and repurposing opportunities for twelve essential enzymes of ESKAPE pathogens

PubMed Central

Naz, Sadia; Ngo, Tony; Farooq, Umar

2017-01-01

Background The rapid increase in antibiotic resistance by various bacterial pathogens underlies the significance of developing new therapies and exploring different drug targets. A fraction of bacterial pathogens abbreviated as ESKAPE by the European Center for Disease Prevention and Control have been considered a major threat due to the rise in nosocomial infections. Here, we compared putative drug binding pockets of twelve essential and mostly conserved metabolic enzymes in numerous bacterial pathogens including those of the ESKAPE group and Mycobacterium tuberculosis. The comparative analysis will provide guidelines for the likelihood of transferability of the inhibitors from one species to another. Methods Nine bacterial species including six ESKAPE pathogens, Mycobacterium tuberculosis along with Mycobacterium smegmatis and Eschershia coli, two non-pathogenic bacteria, have been selected for drug binding pocket analysis of twelve essential enzymes. The amino acid sequences were obtained from Uniprot, aligned using ICM v3.8-4a and matched against the Pocketome encyclopedia. We used known co-crystal structures of selected target enzyme orthologs to evaluate the location of their active sites and binding pockets and to calculate a matrix of pairwise sequence identities across each target enzyme across the different species. This was used to generate sequence maps. Results High sequence identity of enzyme binding pockets, derived from experimentally determined co-crystallized structures, was observed among various species. Comparison at both full sequence level and for drug binding pockets of key metabolic enzymes showed that binding pockets are highly conserved (sequence similarity up to 100%) among various ESKAPE pathogens as well as Mycobacterium tuberculosis. Enzymes orthologs having conserved binding sites may have potential to interact with inhibitors in similar way and might be helpful for design of similar class of inhibitors for a particular species. The derived pocket alignments and distance-based maps provide guidelines for drug discovery and repurposing. In addition they also provide recommendations for the relevant model bacteria that may be used for initial drug testing. Discussion Comparing ligand binding sites through sequence identity calculation could be an effective approach to identify conserved orthologs as drug binding pockets have shown higher level of conservation among various species. By using this approach we could avoid the problems associated with full sequence comparison. We identified essential metabolic enzymes among ESKAPE pathogens that share high sequence identity in their putative drug binding pockets (up to 100%), of which known inhibitors can potentially antagonize these identical pockets in the various species in a similar manner. PMID:28948099
Analysis of drug binding pockets and repurposing opportunities for twelve essential enzymes of ESKAPE pathogens.

PubMed

Naz, Sadia; Ngo, Tony; Farooq, Umar; Abagyan, Ruben

2017-01-01

The rapid increase in antibiotic resistance by various bacterial pathogens underlies the significance of developing new therapies and exploring different drug targets. A fraction of bacterial pathogens abbreviated as ESKAPE by the European Center for Disease Prevention and Control have been considered a major threat due to the rise in nosocomial infections. Here, we compared putative drug binding pockets of twelve essential and mostly conserved metabolic enzymes in numerous bacterial pathogens including those of the ESKAPE group and Mycobacterium tuberculosis . The comparative analysis will provide guidelines for the likelihood of transferability of the inhibitors from one species to another. Nine bacterial species including six ESKAPE pathogens, Mycobacterium tuberculosis along with Mycobacterium smegmatis and Eschershia coli , two non-pathogenic bacteria, have been selected for drug binding pocket analysis of twelve essential enzymes. The amino acid sequences were obtained from Uniprot, aligned using ICM v3.8-4a and matched against the Pocketome encyclopedia. We used known co-crystal structures of selected target enzyme orthologs to evaluate the location of their active sites and binding pockets and to calculate a matrix of pairwise sequence identities across each target enzyme across the different species. This was used to generate sequence maps. High sequence identity of enzyme binding pockets, derived from experimentally determined co-crystallized structures, was observed among various species. Comparison at both full sequence level and for drug binding pockets of key metabolic enzymes showed that binding pockets are highly conserved (sequence similarity up to 100%) among various ESKAPE pathogens as well as Mycobacterium tuberculosis . Enzymes orthologs having conserved binding sites may have potential to interact with inhibitors in similar way and might be helpful for design of similar class of inhibitors for a particular species. The derived pocket alignments and distance-based maps provide guidelines for drug discovery and repurposing. In addition they also provide recommendations for the relevant model bacteria that may be used for initial drug testing. Comparing ligand binding sites through sequence identity calculation could be an effective approach to identify conserved orthologs as drug binding pockets have shown higher level of conservation among various species. By using this approach we could avoid the problems associated with full sequence comparison. We identified essential metabolic enzymes among ESKAPE pathogens that share high sequence identity in their putative drug binding pockets (up to 100%), of which known inhibitors can potentially antagonize these identical pockets in the various species in a similar manner.
A configuration space of homologous proteins conserving mutual information and allowing a phylogeny inference based on pair-wise Z-score probabilities.

PubMed

Bastien, Olivier; Ortet, Philippe; Roy, Sylvaine; Maréchal, Eric

2005-03-10

Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction. We have built up a spatial representation of protein sequences using concepts from particle physics (configuration space) and respecting a frame of constraints deduced from pair-wise alignment score properties in information theory. The obtained configuration space of homologous proteins (CSHP) allows the representation of real and shuffled sequences, and thereupon an expression of the TULIP theorem for Z-score probabilities. Based on the CSHP, we propose a phylogeny reconstruction using Z-scores. Deduced trees, called TULIP trees, are consistent with multiple-alignment based trees. Furthermore, the TULIP tree reconstruction method provides a solution for some previously reported incongruent results, such as the apicomplexan enolase phylogeny. The CSHP is a unified model that conserves mutual information between proteins in the way physical models conserve energy. Applications include the reconstruction of evolutionary consistent and robust trees, the topology of which is based on a spatial representation that is not reordered after addition or removal of sequences. The CSHP and its assigned phylogenetic topology, provide a powerful and easily updated representation for massive pair-wise genome comparisons based on Z-score computations.
The complete genome sequence of the Atlantic salmon paramyxovirus (ASPV)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nylund, Stian; Karlsen, Marius; Nylund, Are

2008-03-30

The complete RNA genome of the Atlantic salmon paramyxovirus (ASPV), isolated from Atlantic salmon suffering from proliferative gill inflammation (PGI), has been determined. The genome is 16,965 nucleotides in length and consists of six nonoverlapping genes in the order 3'- N - P/C/V - M - F - HN - L -5', coding for the nucleocapsid, phospho-, matrix, fusion, hemagglutinin-neuraminidase and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and trinucleotide intergenic regions similar to those of other Paramyxoviridae. The ASPV P-gene expression strategy is like that of the respiro- and morbilliviruses,more » which express the phosphoprotein from the primary transcript, and edit a portion of the mRNA to encode the accessory proteins V and W. It also encodes the C-protein by ribosomal choice of translation initiation. Pairwise comparisons of amino acid identities, and phylogenetic analysis of deduced ASPV protein sequences with homologous sequences from other Paramyxoviridae, show that ASPV has an affinity for the genus Respirovirus, but may represent a new genus within the subfamily Paramyxovirinae.« less
Pairwise contact energy statistical potentials can help to find probability of point mutations.

PubMed

Saravanan, K M; Suvaithenamudhan, S; Parthasarathy, S; Selvaraj, S

2017-01-01

To adopt a particular fold, a protein requires several interactions between its amino acid residues. The energetic contribution of these residue-residue interactions can be approximated by extracting statistical potentials from known high resolution structures. Several methods based on statistical potentials extracted from unrelated proteins are found to make a better prediction of probability of point mutations. We postulate that the statistical potentials extracted from known structures of similar folds with varying sequence identity can be a powerful tool to examine probability of point mutation. By keeping this in mind, we have derived pairwise residue and atomic contact energy potentials for the different functional families that adopt the (α/β) 8 TIM-Barrel fold. We carried out computational point mutations at various conserved residue positions in yeast Triose phosphate isomerase enzyme for which experimental results are already reported. We have also performed molecular dynamics simulations on a subset of point mutants to make a comparative study. The difference in pairwise residue and atomic contact energy of wildtype and various point mutations reveals probability of mutations at a particular position. Interestingly, we found that our computational prediction agrees with the experimental studies of Silverman et al. (Proc Natl Acad Sci 2001;98:3092-3097) and perform better prediction than i Mutant and Cologne University Protein Stability Analysis Tool. The present work thus suggests deriving pairwise contact energy potentials and molecular dynamics simulations of functionally important folds could help us to predict probability of point mutations which may ultimately reduce the time and cost of mutation experiments. Proteins 2016; 85:54-64. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
SVM-dependent pairwise HMM: an application to protein pairwise alignments.

PubMed

Orlando, Gabriele; Raimondi, Daniele; Khan, Taushif; Lenaerts, Tom; Vranken, Wim F

2017-12-15

Methods able to provide reliable protein alignments are crucial for many bioinformatics applications. In the last years many different algorithms have been developed and various kinds of information, from sequence conservation to secondary structure, have been used to improve the alignment performances. This is especially relevant for proteins with highly divergent sequences. However, recent works suggest that different features may have different importance in diverse protein classes and it would be an advantage to have more customizable approaches, capable to deal with different alignment definitions. Here we present Rigapollo, a highly flexible pairwise alignment method based on a pairwise HMM-SVM that can use any type of information to build alignments. Rigapollo lets the user decide the optimal features to align their protein class of interest. It outperforms current state of the art methods on two well-known benchmark datasets when aligning highly divergent sequences. A Python implementation of the algorithm is available at http://ibsquare.be/rigapollo. wim.vranken@vub.be. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Characterization of perch rhabdovirus (PRV) in farmed grayling Thymallus thymallus.

PubMed

Gadd, Tuija; Viljamaa-Dirks, Satu; Holopainen, Riikka; Koski, Perttu; Jakava-Viljanen, Miia

2013-10-11

Two Finnish fish farms experienced elevated mortality rates in farmed grayling Thymallus thymallus fry during the summer months, most typically in July. The mortalities occurred during several years and were connected with a few neurological disorders and peritonitis. Virological investigation detected an infection with an unknown rhabdovirus. Based on the entire glycoprotein (G) and partial RNA polymerase (L) gene sequences, the virus was classified as a perch rhabdovirus (PRV). Pairwise comparisons of the G and L gene regions of grayling isolates revealed that all isolates were very closely related, with 99 to 100% nucleotide identity, which suggests the same origin of infection. Phylogenetic analysis demonstrated that they were closely related to the strain isolated from perch Perca fluviatilis and sea trout Salmo trutta trutta caught from the Baltic Sea. The entire G gene sequences revealed that all Finnish grayling isolates, and both the perch and sea trout isolates, were most closely related to a PRV isolated in France in 2004. According to the partial L gene sequences, all of the Finnish grayling isolates were most closely related to the Danish isolate DK5533 from pike. The genetic analysis of entire G gene and partial L gene sequences showed that the Finnish brown trout isolate ka907_87 shared only approximately 67 and 78% identity, respectively, with our grayling isolates. The grayling isolates were also analysed by an immunofluorescence antibody test. This is the first report of a PRV causing disease in grayling in Finland.
Genotypic characterization and species identification of Fasciola spp. with implications regarding the isolates infecting goats in Vietnam.

PubMed

Nguyen, Thanh Giang Thi; Van De, Nguyen; Vercruysse, Jozef; Dorny, Pierre; Le, Thanh Hoa

2009-12-01

Ribosomal RNA sequences (361 or 362bp) of the second internal transcribed spacer 2 (ITS-2) and a portion of mitochondrial cox1 (423bp) for Fasciola spp. obtained from specimens collected in indigenous and hybrid goats and sheep in Vietnam were characterized for genotypic status and hybridization/introgression. Alignment of 48 ITS-2 sequences (also those from goats and sheep in this study) indicates that F. gigantica and F. hepatica differ typically from each other at seven sites whereas one of these is a distinguishing deletion (T) at the 327th position in F. gigantica relative to F. hepatica. The isolates from the mountainous goats in the North of Vietnam (Yen Bai province) showed the ITS-2 composition relatively identical to that of F. hepatica. The ITS-2 sequences from populations of Fasciola isolates in goats had probably experienced introgression/hybridization as reported previously in other ruminants and humans. All Vietnamese goat-of-origin specimens had high pairwise percentage of mitochondrial cox1 sequences to F. gigantica (97-100%), and very low identity to F. hepatica (91-93%), suggesting their maternal linkage to be traced to F. gigantica. The presence of hybrid and/or introgressed populations of liver flukes bearing genetic material from both F. hepatica and F. gigantica in the goats/sheep in Vietnam, regardless of indigenous or imported hosts, appears to be the first demonstration from a tropical country.
Molecular identification and first description of the male of Neoechinorhynchus schmidti (Acanthocephala: Neoechinorhynchidae), a parasite of Trachemys scripta (Testudines) in México.

PubMed

García-Varela, Martín; García-Prieto, Luís; Rodríguez, Rodolfo Pérez

2011-12-01

The morphology of the males of Neoechinorhynchus schmidti (Acanthocephala: Neoechinorhynchidae) is unknown, because this species was described based exclusively on females. However, recently we collected 2 common slider turtles Trachemys scripta in Centla swamps, Tabasco, Mexico, parasitized by 27 specimens of an acanthocephalan whose females were morphologically identical to N. schmidti. The domains D2 and D3 of the large subunit of the nuclear ribosomal RNA (LSU) of 3 males and 2 females of this material were sequenced. The sequences of both sexes were identical, and based on this result, we described for the first time the morphology of the males of N. schmidti. In addition, 6 sequences of a congeneric species, also parasite of turtles (Neoechinorhynchus emyditoides) were generated in the current research. The 11 sequences of these 2 species were aligned with 13 sequences of another 4 species of the same genus, producing a data set of 24 taxa with 674 nucleotides. The genetic divergence between N. schmidti and N. emyditoides was 4% and intraspecific differences ranged from 0.01 to 0.02%. Pairwise differences between either of these species and 4 other congeners parasitic in fresh and brackish water fishes (Neoechinorhynchus golvani, Neoechinorhynchus roseum, Neoechinorhynchus saginatus, and Neoechinorhynchus sp.) varied from 9.5 to 33%. Maximum likelihood and maximum parsimony analyses show that N. schmidti and N. emyditoides are sister taxa. Bootstrap analysis also indicates that the sister relationship is reliably supported. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Molecular characterization of previously elusive badnaviruses associated with symptomatic cacao in the New World.

PubMed

Chingandu, Nomatter; Zia-Ur-Rehman, Muhammad; Sreenivasan, Thyail N; Surujdeo-Maharaj, Surendra; Umaharan, Pathmanathan; Gutierrez, Osman A; Brown, Judith K

2017-05-01

Suspected virus-like symptoms were observed in cacao plants in Trinidad during 1943, and the viruses associated with these symptoms were designated as strains A and B of cacao Trinidad virus (CTV). However, viral etiology has not been demonstrated for either phenotype. Total DNA was isolated from symptomatic cacao leaves exhibiting the CTV A and B phenotypes and subjected to Illumina HiSeq and Sanger DNA sequencing. Based on de novo assembly, two apparently full-length badnavirus genomes of 7,533 and 7,454 nucleotides (nt) were associated with CTV strain A and B, respectively. The Trinidad badnaviral genomes contained four open reading frames, three of which are characteristic of other known badnaviruses, and a fourth that is present in only some badnaviruses. Both badnaviral genomes harbored hallmark caulimovirus-like features, including a tRNA Met priming site, a TATA box, and a polyadenylation-like signal. Pairwise comparisons of the RT-RNase H region indicated that the Trinidad isolates share 57-71% nt sequence identity with other known badnaviruses. Based on the system for badnavirus species demarcation in which viruses with less than 80% nt sequence identity in the RT-RNase gene are considered members of separate species, these isolates represent two previously unidentified badnaviruses, herein named cacao mild mosaic virus and cacao yellow vein banding virus, making them the first cacao-infecting badnaviruses identified thus far in the Western Hemisphere.
Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity

PubMed Central

Jia, Yi; Huan, Jun; Buhr, Vincent; Zhang, Jintao; Carayannopoulos, Leonidas N

2009-01-01

Background Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. Results Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusion We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty. PMID:19208148
Estimating Seven Coefficients of Pairwise Relatedness Using Population-Genomic Data

PubMed Central

Ackerman, Matthew S.; Johri, Parul; Spitze, Ken; Xu, Sen; Doak, Thomas G.; Young, Kimberly; Lynch, Michael

2017-01-01

Population structure can be described by genotypic-correlation coefficients between groups of individuals, the most basic of which are the pairwise relatedness coefficients between any two individuals. There are nine pairwise relatedness coefficients in the most general model, and we show that these can be reduced to seven coefficients for biallelic loci. Although all nine coefficients can be estimated from pedigrees, six coefficients have been beyond empirical reach. We provide a numerical optimization procedure that estimates all seven reduced coefficients from population-genomic data. Simulations show that the procedure is nearly unbiased, even at 3× coverage, and errors in five of the seven coefficients are statistically uncorrelated. The remaining two coefficients have a negative correlation of errors, but their sum provides an unbiased assessment of the overall correlation of heterozygosity between two individuals. Application of these new methods to four populations of the freshwater crustacean Daphnia pulex reveal the occurrence of half siblings in our samples, as well as a number of identical individuals that are likely obligately asexual clone mates. Statistically significant negative estimates of these pairwise relatedness coefficients, including inbreeding coefficients that were typically negative, underscore the difficulties that arise when interpreting genotypic correlations as estimations of the probability that alleles are identical by descent. PMID:28341647
Identity-by-Descent-Based Phasing and Imputation in Founder Populations Using Graphical Models

PubMed Central

Palin, Kimmo; Campbell, Harry; Wright, Alan F; Wilson, James F; Durbin, Richard

2011-01-01

Accurate knowledge of haplotypes, the combination of alleles co-residing on a single copy of a chromosome, enables powerful gene mapping and sequence imputation methods. Since humans are diploid, haplotypes must be derived from genotypes by a phasing process. In this study, we present a new computational model for haplotype phasing based on pairwise sharing of haplotypes inferred to be Identical-By-Descent (IBD). We apply the Bayesian network based model in a new phasing algorithm, called systematic long-range phasing (SLRP), that can capitalize on the close genetic relationships in isolated founder populations, and show with simulated and real genome-wide genotype data that SLRP substantially reduces the rate of phasing errors compared to previous phasing algorithms. Furthermore, the method accurately identifies regions of IBD, enabling linkage-like studies without pedigrees, and can be used to impute most genotypes with very low error rate. Genet. Epidemiol. 2011. © 2011 Wiley Periodicals, Inc.35:853-860, 2011 PMID:22006673

SFESA: a web server for pairwise alignment refinement by secondary structure shifts.

PubMed

Tong, Jing; Pei, Jimin; Grishin, Nick V

2015-09-03

Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate. We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software. SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.
In-silico Taxonomic Classification of 373 Genomes Reveals Species Misidentification and New Genospecies within the Genus Pseudomonas.

PubMed

Tran, Phuong N; Savka, Michael A; Gan, Han Ming

2017-01-01

The genus Pseudomonas has one of the largest diversity of species within the Bacteria kingdom. To date, its taxonomy is still being revised and updated. Due to the non-standardized procedure and ambiguous thresholds at species level, largely based on 16S rRNA gene or conventional biochemical assay, species identification of publicly available Pseudomonas genomes remains questionable. In this study, we performed a large-scale analysis of all Pseudomonas genomes with species designation (excluding the well-defined P. aeruginosa ) and re-evaluated their taxonomic assignment via in silico genome-genome hybridization and/or genetic comparison with valid type species. Three-hundred and seventy-three pseudomonad genomes were analyzed and subsequently clustered into 145 distinct genospecies. We detected 207 erroneous labels and corrected 43 to the proper species based on Average Nucleotide Identity Multilocus Sequence Typing (MLST) sequence similarity to the type strain. Surprisingly, more than half of the genomes initially designated as Pseudomonas syringae and Pseudomonas fluorescens should be classified either to a previously described species or to a new genospecies. Notably, high pairwise average nucleotide identity (>95%) indicating species-level similarity was observed between P. synxantha-P. libanensis, P. psychrotolerans - P. oryzihabitans , and P. kilonensis- P. brassicacearum , that were previously differentiated based on conventional biochemical tests and/or genome-genome hybridization techniques.
Complete nucleotide sequences of a new bipartite begomovirus from Malvastrum sp. plants with bright yellow mosaic symptoms in South Texas.

PubMed

Alabi, Olufemi J; Villegas, Cecilia; Gregg, Lori; Murray, K Daniel

2016-06-01

Two isolates of a novel bipartite begomovirus, tentatively named malvastrum bright yellow mosaic virus (MaBYMV), were molecularly characterized from naturally infected plants of the genus Malvastrum showing bright yellow mosaic disease symptoms in South Texas. Six complete DNA-A and five DNA-B genome sequences of MaBYMV obtained from the isolates ranged in length from 2,608 to 2,609 nucleotides (nt) and 2,578 to 2,605 nt, respectively. Both genome segments shared a 178- to 180-nt common region. In pairwise comparisons, the complete DNA-A and DNA-B sequences of MaBYMV were most similar (87-88 % and 79-81 % identity, respectively) and phylogenetically related to the corresponding sequences of sida mosaic Sinaloa virus-[MX-Gua-06]. Further analysis revealed that MaBYMV is a putative recombinant virus, thus supporting the notion that malvaceous hosts may be influencing the evolution of several begomoviruses. The design of new diagnostic primers enabled the detection of MaBYMV in cohorts of Bemisia tabaci collected from symptomatic Malvastrum sp. plants, thus implicating whiteflies as potential vectors of the virus.
Genetic variability in Melipona quinquefasciata (Hymenoptera, Apidae, Meliponini) from northeastern Brazil determined using the first internal transcribed spacer (ITS1).

PubMed

Pereira, J O P; Freitas, B M; Jorge, D M M; Torres, D C; Soares, C E A; Grangeiro, T B

2009-01-01

Melipona quinquefasciata is a ground-nesting South American stingless bee whose geographic distribution was believed to comprise only the central and southern states of Brazil. We obtained partial sequences (about 500-570 bp) of first internal transcribed spacer (ITS1) nuclear ribosomal DNA from Melipona specimens putatively identified as M. quinquefasciata collected from different localities in northeastern Brazil. To confirm the taxonomic identity of the northeastern samples, specimens from the state of Goiás (Central region of Brazil) were included for comparison. All sequences were deposited in GenBank (accession numbers EU073751-EU073759). The mean nucleotide divergence (excluding sites with insertions/deletions) in the ITS1 sequences was only 1.4%, ranging from 0 to 4.1%. When the sites with insertions/deletions were also taken into account, sequence divergences varied from 0 to 5.3%. In all pairwise comparisons, the ITS1 sequence from the specimens collected in Goiás was most divergent compared to the ITS1 sequences of the bees from the other locations. However, neighbor-joining phylogenetic analysis showed that all ITS1 sequences from northeastern specimens along with the sample of Goiás were resolved in a single clade with a bootstrap support of 100%. The ITS1 sequencing data thus support the occurrence of M. quinquefasciata in northeast Brazil.
Bioinformatic prediction and in vivo validation of residue-residue interactions in human proteins

NASA Astrophysics Data System (ADS)

Jordan, Daniel; Davis, Erica; Katsanis, Nicholas; Sunyaev, Shamil

2014-03-01

Identifying residue-residue interactions in protein molecules is important for understanding both protein structure and function in the context of evolutionary dynamics and medical genetics. Such interactions can be difficult to predict using existing empirical or physical potentials, especially when residues are far from each other in sequence space. Using a multiple sequence alignment of 46 diverse vertebrate species we explore the space of allowed sequences for orthologous protein families. Amino acid changes that are known to damage protein function allow us to identify specific changes that are likely to have interacting partners. We fit the parameters of the continuous-time Markov process used in the alignment to conclude that these interactions are primarily pairwise, rather than higher order. Candidates for sites under pairwise epistasis are predicted, which can then be tested by experiment. We report the results of an initial round of in vivo experiments in a zebrafish model that verify the presence of multiple pairwise interactions predicted by our model. These experimentally validated interactions are novel, distant in sequence, and are not readily explained by known biochemical or biophysical features.
Breaking the computational barriers of pairwise genome comparison.

PubMed

Torreno, Oscar; Trelles, Oswaldo

2015-08-11

Conventional pairwise sequence comparison software algorithms are being used to process much larger datasets than they were originally designed for. This can result in processing bottlenecks that limit software capabilities or prevent full use of the available hardware resources. Overcoming the barriers that limit the efficient computational analysis of large biological sequence datasets by retrofitting existing algorithms or by creating new applications represents a major challenge for the bioinformatics community. We have developed C libraries for pairwise sequence comparison within diverse architectures, ranging from commodity systems to high performance and cloud computing environments. Exhaustive tests were performed using different datasets of closely- and distantly-related sequences that span from small viral genomes to large mammalian chromosomes. The tests demonstrated that our solution is capable of generating high quality results with a linear-time response and controlled memory consumption, being comparable or faster than the current state-of-the-art methods. We have addressed the problem of pairwise and all-versus-all comparison of large sequences in general, greatly increasing the limits on input data size. The approach described here is based on a modular out-of-core strategy that uses secondary storage to avoid reaching memory limits during the identification of High-scoring Segment Pairs (HSPs) between the sequences under comparison. Software engineering concepts were applied to avoid intermediate result re-calculation, to minimise the performance impact of input/output (I/O) operations and to modularise the process, thus enhancing application flexibility and extendibility. Our computationally-efficient approach allows tasks such as the massive comparison of complete genomes, evolutionary event detection, the identification of conserved synteny blocks and inter-genome distance calculations to be performed more effectively.
Preliminary Classification of Novel Hemorrhagic Fever-Causing Viruses Using Sequence-Based PAirwise Sequence Comparison (PASC) Analysis.

PubMed

Bào, Yīmíng; Kuhn, Jens H

2018-01-01

During the last decade, genome sequence-based classification of viruses has become increasingly prominent. Viruses can be even classified based on coding-complete genome sequence data alone. Nevertheless, classification remains arduous as experts are required to establish phylogenetic trees to depict the evolutionary relationships of such sequences for preliminary taxonomic placement. Pairwise sequence comparison (PASC) of genomes is one of several novel methods for establishing relationships among viruses. This method, provided by the US National Center for Biotechnology Information as an open-access tool, circumvents phylogenetics, and yet PASC results are often in agreement with those of phylogenetic analyses. Computationally inexpensive, PASC can be easily performed by non-taxonomists. Here we describe how to use the PASC tool for the preliminary classification of novel viral hemorrhagic fever-causing viruses.
Lactococcus petauri sp. nov., isolated from an abscess of a sugar glider

PubMed Central

Goodman, Laura B.; Lawton, Marie R.; Franklin-Guild, Rebecca J.; Anderson, Renee R.; Schaan, Lynn; Thachil, Anil J.; Wiedmann, Martin; Miller, Claire B.; Alcaine, Samuel D.; Kovac, Jasna

2017-01-01

A strain of lactic acid bacteria, designated 159469T, isolated from a facial abscess in a sugar glider, was characterized genetically and phenotypically. Cells of the strain were Gram-stain-positive, coccoid and catalase-negative. Morphological, physiological and phylogenetic data indicated that the isolate belongs to the genus Lactococcus. Strain 159469T was closely related to Lactococcus garvieae ATCC 43921T, showing 95.86 and 98.08 % sequence similarity in 16S rRNA gene and rpoB gene sequences, respectively. Furthermore, a pairwise average nucleotide identity blast (ANIb) value of 93.54 % and in silico DNA–DNA hybridization value of 50.7 % were determined for the genome of strain 159469T, when compared with the genome of the type strain of Lactococcus garvieae. Based on the data presented here, the isolate represents a novel species of the genus Lactococcus, for which the name Lactococcus petauri sp. nov. is proposed. The type strain is 159469T (=LMG 30040T=DSM 104842T). PMID:28945531
Lactococcus petauri sp. nov., isolated from an abscess of a sugar glider.

PubMed

Goodman, Laura B; Lawton, Marie R; Franklin-Guild, Rebecca J; Anderson, Renee R; Schaan, Lynn; Thachil, Anil J; Wiedmann, Martin; Miller, Claire B; Alcaine, Samuel D; Kovac, Jasna

2017-11-01

A strain of lactic acid bacteria, designated 159469 T , isolated from a facial abscess in a sugar glider, was characterized genetically and phenotypically. Cells of the strain were Gram-stain-positive, coccoid and catalase-negative. Morphological, physiological and phylogenetic data indicated that the isolate belongs to the genus Lactococcus. Strain 159469 T was closely related to Lactococcus garvieae ATCC 43921 T , showing 95.86 and 98.08 % sequence similarity in 16S rRNA gene and rpoB gene sequences, respectively. Furthermore, a pairwise average nucleotide identity blast (ANIb) value of 93.54 % and in silico DNA-DNA hybridization value of 50.7 % were determined for the genome of strain 159469 T , when compared with the genome of the type strain of Lactococcus garvieae. Based on the data presented here, the isolate represents a novel species of the genus Lactococcus, for which the name Lactococcus petauri sp. nov. is proposed. The type strain is 159469 T (=LMG 30040 T =DSM 104842 T ).
Score distributions of gapped multiple sequence alignments down to the low-probability tail

NASA Astrophysics Data System (ADS)

Fieth, Pascal; Hartmann, Alexander K.

2016-08-01

Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.
Multiple alignment-free sequence comparison

PubMed Central

Ren, Jie; Song, Kai; Sun, Fengzhu; Deng, Minghua; Reinert, Gesine

2013-01-01

Motivation: Recently, a range of new statistics have become available for the alignment-free comparison of two sequences based on k-tuple word content. Here, we extend these statistics to the simultaneous comparison of more than two sequences. Our suite of statistics contains, first, and , extensions of statistics for pairwise comparison of the joint k-tuple content of all the sequences, and second, , and , averages of sums of pairwise comparison statistics. The two tasks we consider are, first, to identify sequences that are similar to a set of target sequences, and, second, to measure the similarity within a set of sequences. Results: Our investigation uses both simulated data as well as cis-regulatory module data where the task is to identify cis-regulatory modules with similar transcription factor binding sites. We find that although for real data, all of our statistics show a similar performance, on simulated data the Shepp-type statistics are in some instances outperformed by star-type statistics. The multiple alignment-free statistics are more sensitive to contamination in the data than the pairwise average statistics. Availability: Our implementation of the five statistics is available as R package named ‘multiAlignFree’ at be http://www-rcf.usc.edu/∼fsun/Programs/multiAlignFree/multiAlignFreemain.html. Contact: reinert@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23990418
SANSparallel: interactive homology search against Uniprot

PubMed Central

Somervuo, Panu; Holm, Liisa

2015-01-01

Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. PMID:25855811
The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation

PubMed Central

Casadio, Rita

2017-01-01

Abstract BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. PMID:28453653
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daily, Jeffrey A.

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

DOE PAGES

Daily, Jeffrey A.

2016-02-10

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less
Koi herpesvirus encodes and expresses a functional interleukin-10.

PubMed

Sunarto, Agus; Liongue, Clifford; McColl, Kenneth A; Adams, Mathew M; Bulach, Dieter; Crane, Mark St J; Schat, Karel A; Slobedman, Barry; Barnes, Andrew C; Ward, Alister C; Walker, Peter J

2012-11-01

Koi herpesvirus (KHV) (species Cyprinid herpesvirus 3) ORF134 was shown to transcribe a spliced transcript encoding a 179-amino-acid (aa) interleukin-10 (IL-10) homolog (khvIL-10) in koi fin (KF-1) cells. Pairwise sequence alignment indicated that the expressed product shares 25% identity with carp IL-10, 22 to 24% identity with mammalian (including primate) IL-10s, and 19.1% identity with European eel herpesvirus IL-10 (ahvIL-10). In phylogenetic analyses, khvIL-10 fell in a divergent position from all host IL-10 sequences, indicating extensive structural divergence following capture from the host. In KHV-infected fish, khvIL-10 transcripts were observed to be highly expressed during the acute and reactivation phases but to be expressed at very low levels during low-temperature-induced persistence. Similarly, KHV early (helicase [Hel] and DNA polymerase [DNAP]) and late (intercapsomeric triplex protein [ITP] and major capsid protein [MCP]) genes were also expressed at high levels during the acute and reactivation phases, but only low-level expression of the ITP gene was detected during the persistent phase. Injection of khvIL-10 mRNA into zebrafish (Danio rerio) embryos increased the number of lysozyme-positive cells to a similar degree as zebrafish IL-10. Downregulation of the IL-10 receptor long chain (IL-10R1) using a specific morpholino abrogated the response to both khvIL-10 and zebrafish IL-10 transcripts, indicating that, despite the structural divergence, khvIL-10 functions via this receptor. This is the first report describing the characteristics of a functional viral IL-10 gene in the Alloherpesviridae.
Multidrug resistant pathogens respond differently to the presence of co-pathogen, commensal, probiotic and host cells.

PubMed

Chan, Agnes P; Choi, Yongwook; Brinkac, Lauren M; Krishnakumar, Radha; DePew, Jessica; Kim, Maria; Hinkle, Mary K; Lesho, Emil P; Fouts, Derrick E

2018-06-05

In light of the ongoing antimicrobial resistance crisis, there is a need to understand the role of co-pathogens, commensals, and the local microbiome in modulating virulence and antibiotic resistance. To identify possible interactions that influence the expression of virulence or survival mechanisms in both the multidrug-resistant organisms (MDROs) and human host cells, unique cohorts of clinical isolates were selected for whole genome sequencing with enhanced assembly and full annotation, pairwise co-culturing, and transcriptome profiling. The MDROs were co-cultured in pairwise combinations either with: (1) another MDRO, (2) skin commensals (Staphylococcus epidermidis and Corynebacterium jeikeium), (3) the common probiotic Lactobacillus reuteri, and (4) human fibroblasts. RNA-Seq analysis showed distinct regulation of virulence and antimicrobial resistance gene responses across different combinations of MDROs, commensals, and human cells. Co-culture assays demonstrated that microbial interactions can modulate gene responses of both the target and pathogen/commensal species, and that the responses are specific to the identity of the pathogen/commensal species. In summary, bacteria have mechanisms to distinguish between friends, foe and host cells. These results provide foundational data and insight into the possibility of manipulating the local microbiome when treating complicated polymicrobial wound, intra-abdominal, or respiratory infections.
Alignment-independent comparison of binding sites based on DrugScore potential fields encoded by 3D Zernike descriptors.

PubMed

Nisius, Britta; Gohlke, Holger

2012-09-24

Analyzing protein binding sites provides detailed insights into the biological processes proteins are involved in, e.g., into drug-target interactions, and so is of crucial importance in drug discovery. Herein, we present novel alignment-independent binding site descriptors based on DrugScore potential fields. The potential fields are transformed to a set of information-rich descriptors using a series expansion in 3D Zernike polynomials. The resulting Zernike descriptors show a promising performance in detecting similarities among proteins with low pairwise sequence identities that bind identical ligands, as well as within subfamilies of one target class. Furthermore, the Zernike descriptors are robust against structural variations among protein binding sites. Finally, the Zernike descriptors show a high data compression power, and computing similarities between binding sites based on these descriptors is highly efficient. Consequently, the Zernike descriptors are a useful tool for computational binding site analysis, e.g., to predict the function of novel proteins, off-targets for drug candidates, or novel targets for known drugs.
A diverse family of serine proteinase genes expressed in cotton boll weevil (Anthonomus grandis): implications for the design of pest-resistant transgenic cotton plants.

PubMed

Oliveira-Neto, Osmundo B; Batista, João A N; Rigden, Daniel J; Fragoso, Rodrigo R; Silva, Rodrigo O; Gomes, Eliane A; Franco, Octávio L; Dias, Simoni C; Cordeiro, Célia M T; Monnerat, Rose G; Grossi-De-Sá, Maria F

2004-09-01

Fourteen different cDNA fragments encoding serine proteinases were isolated by reverse transcription-PCR from cotton boll weevil (Anthonomus grandis) larvae. A large diversity between the sequences was observed, with a mean pairwise identity of 22% in the amino acid sequence. The cDNAs encompassed 11 trypsin-like sequences classifiable into three families and three chymotrypsin-like sequences belonging to a single family. Using a combination of 5' and 3' RACE, the full-length sequence was obtained for five of the cDNAs, named Agser2, Agser5, Agser6, Agser10 and Agser21. The encoded proteins included amino acid sequence motifs of serine proteinase active sites, conserved cysteine residues, and both zymogen activation and signal peptides. Southern blotting analysis suggested that one or two copies of these serine proteinase genes exist in the A. grandis genome. Northern blotting analysis of Agser2 and Agser5 showed that for both genes, expression is induced upon feeding and is concentrated in the gut of larvae and adult insects. Reverse northern analysis of the 14 cDNA fragments showed that only two trypsin-like and two chymotrypsin-like were expressed at detectable levels. Under the effect of the serine proteinase inhibitors soybean Kunitz trypsin inhibitor and black-eyed pea trypsin/chymotrypsin inhibitor, expression of one of the trypsin-like sequences was upregulated while expression of the two chymotrypsin-like sequences was downregulated. Copyright 2004 Elsevier Ltd.
In-silico Taxonomic Classification of 373 Genomes Reveals Species Misidentification and New Genospecies within the Genus Pseudomonas

PubMed Central

Tran, Phuong N.; Savka, Michael A.; Gan, Han Ming

2017-01-01

The genus Pseudomonas has one of the largest diversity of species within the Bacteria kingdom. To date, its taxonomy is still being revised and updated. Due to the non-standardized procedure and ambiguous thresholds at species level, largely based on 16S rRNA gene or conventional biochemical assay, species identification of publicly available Pseudomonas genomes remains questionable. In this study, we performed a large-scale analysis of all Pseudomonas genomes with species designation (excluding the well-defined P. aeruginosa) and re-evaluated their taxonomic assignment via in silico genome-genome hybridization and/or genetic comparison with valid type species. Three-hundred and seventy-three pseudomonad genomes were analyzed and subsequently clustered into 145 distinct genospecies. We detected 207 erroneous labels and corrected 43 to the proper species based on Average Nucleotide Identity Multilocus Sequence Typing (MLST) sequence similarity to the type strain. Surprisingly, more than half of the genomes initially designated as Pseudomonas syringae and Pseudomonas fluorescens should be classified either to a previously described species or to a new genospecies. Notably, high pairwise average nucleotide identity (>95%) indicating species-level similarity was observed between P. synxantha-P. libanensis, P. psychrotolerans–P. oryzihabitans, and P. kilonensis- P. brassicacearum, that were previously differentiated based on conventional biochemical tests and/or genome-genome hybridization techniques. PMID:28747902

Inferring Recent Demography from Isolation by Distance of Long Shared Sequence Blocks

PubMed Central

Ringbauer, Harald; Coop, Graham

2017-01-01

Recently it has become feasible to detect long blocks of nearly identical sequence shared between pairs of genomes. These identity-by-descent (IBD) blocks are direct traces of recent coalescence events and, as such, contain ample signal to infer recent demography. Here, we examine sharing of such blocks in two-dimensional populations with local migration. Using a diffusion approximation to trace genetic ancestry, we derive analytical formulas for patterns of isolation by distance of IBD blocks, which can also incorporate recent population density changes. We introduce an inference scheme that uses a composite-likelihood approach to fit these formulas. We then extensively evaluate our theory and inference method on a range of scenarios using simulated data. We first validate the diffusion approximation by showing that the theoretical results closely match the simulated block-sharing patterns. We then demonstrate that our inference scheme can accurately and robustly infer dispersal rate and effective density, as well as bounds on recent dynamics of population density. To demonstrate an application, we use our estimation scheme to explore the fit of a diffusion model to Eastern European samples in the Population Reference Sample data set. We show that ancestry diffusing with a rate of σ≈50−−100 km/gen during the last centuries, combined with accelerating population growth, can explain the observed exponential decay of block sharing with increasing pairwise sample distance. PMID:28108588
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.

PubMed

Daily, Jeff

2016-02-10

Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.
A molecular phylogenetic study of the subtribe Glycininae (Leguminosae) derived from the chloroplast DNA rps16 intron sequences.

PubMed

Lee, J; Hymowitz, T

2001-11-01

Phylogenetic relationships among 13 genera of the subtribe Glycininae, two genera of the allied subtribe Diocleinae that were included within Glycininae by Polhill, and two genera of the subtribe Erythrininae as outgroups were inferred from chloroplast DNA rps16 intron sequence variation. Pairwise sequence divergence values ranged from identity between Teramnus mollis and T. micans and between T. flexilis and T. labialis to 7.89% between Pueraria wallichii and Pseudeminia comosa across all accessions. Phylogenies estimated using parsimony and neighbor-joining methods revealed that (1) Glycininae is monophyletic if Pachyrhizus and Calopogonium (both Diocleinae) are included within Glycininae; (2) the genus Teramnus is closely related to Glycine, and Amphicarpaea showed a sister relationship to the clade comprising Teramnus and Glycine; (3) the expanded Glycininae including two genera of Diocleinae is divided into three branches, temporarily named I (comprising the rest of the examined taxa), II (Pueraria wallichii), and III (Mastersia), but their relationships are equivocal; and (4) the genus Pueraria, regarded as a closely related genus to Glycine, is not monophyletic and should be divided into at least four genera (a hypothesis supported previously by Lackey).
The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation.

PubMed

Profiti, Giuseppe; Martelli, Pier Luigi; Casadio, Rita

2017-07-03

BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models.

PubMed

Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi

2016-05-01

Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.
HIPPI: highly accurate protein family classification with ensembles of HMMs.

PubMed

Nguyen, Nam-Phuong; Nute, Michael; Mirarab, Siavash; Warnow, Tandy

2016-11-11

Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .
Memory-efficient dynamic programming backtrace and pairwise local sequence alignment.

PubMed

Newberg, Lee A

2008-08-15

A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward-backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis. Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10,000. Sample C++-code for optimal backtrace is available in the Supplementary Materials. Supplementary data is available at Bioinformatics online.
SANSparallel: interactive homology search against Uniprot.

PubMed

Somervuo, Panu; Holm, Liisa

2015-07-01

Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Identification of a Herbal Powder by Deoxyribonucleic Acid Barcoding and Structural Analyses.

PubMed

Sheth, Bhavisha P; Thaker, Vrinda S

2015-10-01

Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. To identify a herbal powder obtained from a herbalist in the local vicinity of Rajkot, Gujarat, using deoxyribonucleic acid (DNA) barcoding and molecular tools. The DNA was extracted from a herbal powder and selected Cassia species, followed by the polymerase chain reaction (PCR) and sequencing of the rbcL barcode locus. Thereafter the sequences were subjected to National Center for Biotechnology Information (NCBI) basic local alignment search tool (BLAST) analysis, followed by the protein three-dimension structure determination of the rbcL protein from the herbal powder and Cassia species namely Cassia fistula, Cassia tora and Cassia javanica (sequences obtained in the present study), Cassia Roxburghii, and Cassia abbreviata (sequences retrieved from Genbank). Further, the multiple and pairwise structural alignment were carried out in order to identify the herbal powder. The nucleotide sequences obtained from the selected species of Cassia were submitted to Genbank (Accession No. JX141397, JX141405, JX141420). The NCBI BLAST analysis of the rbcL protein from the herbal powder showed an equal sequence similarity (with reference to different parameters like E value, maximum identity, total score, query coverage) to C. javanica and C. roxburghii. In order to solve the ambiguities of the BLAST result, a protein structural approach was implemented. The protein homology models obtained in the present study were submitted to the protein model database (PM0079748-PM0079753). The pairwise structural alignment of the herbal powder (as template) and C. javanica and C. roxburghii (as targets individually) revealed a close similarity of the herbal powder with C. javanica. A strategy as used here, incorporating the integrated use of DNA barcoding and protein structural analyses could be adopted, as a novel rapid and economic procedure, especially in cases when protein coding loci are considered. Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. A herbal powder was obtained from a herbalist in the local vicinity of Rajkot, Gujarat. An integrated approach using DNA barcoding and structural analyses was carried out to identify the herbal powder. The herbal powder was identified as Cassia javanica L.
Horizontal transfers of Mariner transposons between mammals and insects.

PubMed

Oliveira, Sarah G; Bao, Weidong; Martins, Cesar; Jurka, Jerzy

2012-09-26

Active transposable elements (TEs) can be passed between genomes of different species by horizontal transfer (HT). This may help them to avoid vertical extinction due to elimination by natural selection or silencing. HT is relatively frequent within eukaryotic taxa, but rare between distant species. Closely related Mariner-type DNA transposon families, collectively named as Mariner-1_Tbel families, are present in the genomes of two ants and two mammalian genomes. Consensus sequences of the four families show pairwise identities greater than 95%. In addition, mammalian Mariner1_BT family shows a close evolutionary relationship with some insect Mariner families. Mammalian Mariner1_BT type sequences are present only in species from three groups including ruminants, tooth whales (Odontoceti), and New World leaf-nosed bats (Phyllostomidae). Horizontal transfer accounts for the presence of Mariner_Tbel and Mariner1_BT families in mammals. Mariner_Tbel family was introduced into hedgehog and tree shrew genomes approximately 100 to 69 million years ago (MYA). Most likely, these TE families were transferred from insects to mammals, but details of the transfer remain unknown.
Fast and accurate estimation of the covariance between pairwise maximum likelihood distances.

PubMed

Gil, Manuel

2014-01-01

Pairwise evolutionary distances are a model-based summary statistic for a set of molecular sequences. They represent the leaf-to-leaf path lengths of the underlying phylogenetic tree. Estimates of pairwise distances with overlapping paths covary because of shared mutation events. It is desirable to take these covariance structure into account to increase precision in any process that compares or combines distances. This paper introduces a fast estimator for the covariance of two pairwise maximum likelihood distances, estimated under general Markov models. The estimator is based on a conjecture (going back to Nei & Jin, 1989) which links the covariance to path lengths. It is proven here under a simple symmetric substitution model. A simulation shows that the estimator outperforms previously published ones in terms of the mean squared error.
Fast and accurate estimation of the covariance between pairwise maximum likelihood distances

PubMed Central

2014-01-01

Pairwise evolutionary distances are a model-based summary statistic for a set of molecular sequences. They represent the leaf-to-leaf path lengths of the underlying phylogenetic tree. Estimates of pairwise distances with overlapping paths covary because of shared mutation events. It is desirable to take these covariance structure into account to increase precision in any process that compares or combines distances. This paper introduces a fast estimator for the covariance of two pairwise maximum likelihood distances, estimated under general Markov models. The estimator is based on a conjecture (going back to Nei & Jin, 1989) which links the covariance to path lengths. It is proven here under a simple symmetric substitution model. A simulation shows that the estimator outperforms previously published ones in terms of the mean squared error. PMID:25279263
Increasing the structural coverage of tuberculosis drug targets.

PubMed

Baugh, Loren; Phan, Isabelle; Begley, Darren W; Clifton, Matthew C; Armour, Brianna; Dranow, David M; Taylor, Brandy M; Muruthi, Marvin M; Abendroth, Jan; Fairman, James W; Fox, David; Dieterich, Shellie H; Staker, Bart L; Gardberg, Anna S; Choi, Ryan; Hewitt, Stephen N; Napuli, Alberto J; Myers, Janette; Barrett, Lynn K; Zhang, Yang; Ferrell, Micah; Mundt, Elizabeth; Thompkins, Katie; Tran, Ngoc; Lyons-Abbott, Sally; Abramov, Ariel; Sekar, Aarthi; Serbzhinskiy, Dmitri; Lorimer, Don; Buchko, Garry W; Stacy, Robin; Stewart, Lance J; Edwards, Thomas E; Van Voorhis, Wesley C; Myler, Peter J

2015-03-01

High-resolution three-dimensional structures of essential Mycobacterium tuberculosis (Mtb) proteins provide templates for TB drug design, but are available for only a small fraction of the Mtb proteome. Here we evaluate an intra-genus "homolog-rescue" strategy to increase the structural information available for TB drug discovery by using mycobacterial homologs with conserved active sites. Of 179 potential TB drug targets selected for x-ray structure determination, only 16 yielded a crystal structure. By adding 1675 homologs from nine other mycobacterial species to the pipeline, structures representing an additional 52 otherwise intractable targets were solved. To determine whether these homolog structures would be useful surrogates in TB drug design, we compared the active sites of 106 pairs of Mtb and non-TB mycobacterial (NTM) enzyme homologs with experimentally determined structures, using three metrics of active site similarity, including superposition of continuous pharmacophoric property distributions. Pair-wise structural comparisons revealed that 19/22 pairs with >55% overall sequence identity had active site Cα RMSD <1 Å, >85% side chain identity, and ≥80% PSAPF (similarity based on pharmacophoric properties) indicating highly conserved active site shape and chemistry. Applying these results to the 52 NTM structures described above, 41 shared >55% sequence identity with the Mtb target, thus increasing the effective structural coverage of the 179 Mtb targets over three-fold (from 9% to 32%). The utility of these structures in TB drug design can be tested by designing inhibitors using the homolog structure and assaying the cognate Mtb enzyme; a promising test case, Mtb cytidylate kinase, is described. The homolog-rescue strategy evaluated here for TB is also generalizable to drug targets for other diseases. Copyright © 2014 Elsevier Ltd. All rights reserved.
Increasing the structural coverage of tuberculosis drug targets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baugh, Loren; Phan, Isabelle; Begley, Darren W.

High-resolution three-dimensional structures of essential Mycobacterium tuberculosis (Mtb) proteins provide templates for TB drug design, but are available for only a small fraction of the Mtb proteome. Here we evaluate an intra-genus “homolog-rescue” strategy to increase the structural information available for TB drug discovery by using mycobacterial homologs with conserved active sites. We found that of 179 potential TB drug targets selected for x-ray structure determination, only 16 yielded a crystal structure. By adding 1675 homologs from nine other mycobacterial species to the pipeline, structures representing an additional 52 otherwise intractable targets were solved. To determine whether these homolog structuresmore » would be useful surrogates in TB drug design, we compared the active sites of 106 pairs of Mtb and non-TB mycobacterial (NTM) enzyme homologs with experimentally determined structures, using three metrics of active site similarity, including superposition of continuous pharmacophoric property distributions. Pair-wise structural comparisons revealed that 19/22 pairs with >55% overall sequence identity had active site Cα RMSD <1 Å, >85% side chain identity, and ≥80% PS APF (similarity based on pharmacophoric properties) indicating highly conserved active site shape and chemistry. Applying these results to the 52 NTM structures described above, 41 shared >55% sequence identity with the Mtb target, thus increasing the effective structural coverage of the 179 Mtb targets over three-fold (from 9% to 32%). The utility of these structures in TB drug design can be tested by designing inhibitors using the homolog structure and assaying the cognate Mtb enzyme; a promising test case, Mtb cytidylate kinase, is described. The homolog-rescue strategy evaluated here for TB is also generalizable to drug targets for other diseases.« less
Increasing the structural coverage of tuberculosis drug targets

DOE PAGES

Baugh, Loren; Phan, Isabelle; Begley, Darren W.; ...

2014-12-19

High-resolution three-dimensional structures of essential Mycobacterium tuberculosis (Mtb) proteins provide templates for TB drug design, but are available for only a small fraction of the Mtb proteome. Here we evaluate an intra-genus “homolog-rescue” strategy to increase the structural information available for TB drug discovery by using mycobacterial homologs with conserved active sites. We found that of 179 potential TB drug targets selected for x-ray structure determination, only 16 yielded a crystal structure. By adding 1675 homologs from nine other mycobacterial species to the pipeline, structures representing an additional 52 otherwise intractable targets were solved. To determine whether these homolog structuresmore » would be useful surrogates in TB drug design, we compared the active sites of 106 pairs of Mtb and non-TB mycobacterial (NTM) enzyme homologs with experimentally determined structures, using three metrics of active site similarity, including superposition of continuous pharmacophoric property distributions. Pair-wise structural comparisons revealed that 19/22 pairs with >55% overall sequence identity had active site Cα RMSD <1 Å, >85% side chain identity, and ≥80% PS APF (similarity based on pharmacophoric properties) indicating highly conserved active site shape and chemistry. Applying these results to the 52 NTM structures described above, 41 shared >55% sequence identity with the Mtb target, thus increasing the effective structural coverage of the 179 Mtb targets over three-fold (from 9% to 32%). The utility of these structures in TB drug design can be tested by designing inhibitors using the homolog structure and assaying the cognate Mtb enzyme; a promising test case, Mtb cytidylate kinase, is described. The homolog-rescue strategy evaluated here for TB is also generalizable to drug targets for other diseases.« less
Increasing the Structural Coverage of Tuberculosis Drug Targets

PubMed Central

Baugh, Loren; Phan, Isabelle; Begley, Darren W.; Clifton, Matthew C.; Armour, Brianna; Dranow, David M.; Taylor, Brandy M.; Muruthi, Marvin M.; Abendroth, Jan; Fairman, James W.; Fox, David; Dieterich, Shellie H.; Staker, Bart L.; Gardberg, Anna S.; Choi, Ryan; Hewitt, Stephen N.; Napuli, Alberto J.; Myers, Janette; Barrett, Lynn K.; Zhang, Yang; Ferrell, Micah; Mundt, Elizabeth; Thompkins, Katie; Tran, Ngoc; Lyons-Abbott, Sally; Abramov, Ariel; Sekar, Aarthi; Serbzhinskiy, Dmitri; Lorimer, Don; Buchko, Garry W.; Stacy, Robin; Stewart, Lance J.; Edwards, Thomas E.; Van Voorhis, Wesley C.; Myler, Peter J.

2015-01-01

High-resolution three-dimensional structures of essential Mycobacterium tuberculosis (Mtb) proteins provide templates for TB drug design, but are available for only a small fraction of the Mtb proteome. Here we evaluate an intra-genus “homolog-rescue” strategy to increase the structural information available for TB drug discovery by using mycobacterial homologs with conserved active sites. Of 179 potential TB drug targets selected for x-ray structure determination, only 16 yielded a crystal structure. By adding 1675 homologs from nine other mycobacterial species to the pipeline, structures representing an additional 52 otherwise intractable targets were solved. To determine whether these homolog structures would be useful surrogates in TB drug design, we compared the active sites of 106 pairs of Mtb and non-TB mycobacterial (NTM) enzyme homologs with experimentally determined structures, using three metrics of active site similarity, including superposition of continuous pharmacophoric property distributions. Pair-wise structural comparisons revealed that 19/22 pairs with >55% overall sequence identity had active site Cα RMSD <1Å, >85% side chain identity, and ≥80% PSAPF (similarity based on pharmacophoric properties) indicating highly conserved active site shape and chemistry. Applying these results to the 52 NTM structures described above, 41 shared >55% sequence identity with the Mtb target, thus increasing the effective structural coverage of the 179 Mtb targets over three-fold (from 9% to 32%). The utility of these structures in TB drug design can be tested by designing inhibitors using the homolog structure and assaying the cognate Mtb enzyme; a promising test case, Mtb cytidylate kinase, is described. The homolog-rescue strategy evaluated here for TB is also generalizable to drug targets for other diseases. PMID:25613812
Draft Genome Sequences of Two Bacillus thuringiensis Strains and Characterization of a Putative 41.9-kDa Insecticidal Toxin

PubMed Central

Palma, Leopoldo; Muñoz, Delia; Berry, Colin; Murillo, Jesús; Caballero, Primitivo

2014-01-01

In this work, we report the genome sequencing of two Bacillus thuringiensis strains using Illumina next-generation sequencing technology (NGS). Strain Hu4-2, toxic to many lepidopteran pest species and to some mosquitoes, encoded genes for two insecticidal crystal (Cry) proteins, cry1Ia and cry9Ea, and a vegetative insecticidal protein (Vip) gene, vip3Ca2. Strain Leapi01 contained genes coding for seven Cry proteins (cry1Aa, cry1Ca, cry1Da, cry2Ab, cry9Ea and two cry1Ia gene variants) and a vip3 gene (vip3Aa10). A putative novel insecticidal protein gene 1143 bp long was found in both strains, whose sequences exhibited 100% nucleotide identity. The predicted protein showed 57 and 100% pairwise identity to protein sequence 72 from a patented Bt strain (US8318900) and to a putative 41.9-kDa insecticidal toxin from Bacillus cereus, respectively. The 41.9-kDa protein, containing a C-terminal 6× HisTag fusion, was expressed in Escherichia coli and tested for the first time against four lepidopteran species (Mamestra brassicae, Ostrinia nubilalis, Spodoptera frugiperda and S. littoralis) and the green-peach aphid Myzus persicae at doses as high as 4.8 µg/cm2 and 1.5 mg/mL, respectively. At these protein concentrations, the recombinant 41.9-kDa protein caused no mortality or symptoms of impaired growth against any of the insects tested, suggesting that these species are outside the protein’s target range or that the protein may not, in fact, be toxic. While the use of the polymerase chain reaction has allowed a significant increase in the number of Bt insecticidal genes characterized to date, novel NGS technologies promise a much faster, cheaper and efficient screening of Bt pesticidal proteins. PMID:24784323
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.

PubMed

Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

2004-09-09

Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man

PubMed Central

Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.

2000-01-01

The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409
AlignMe—a membrane protein sequence alignment web server

PubMed Central

Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.

2014-01-01

We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425

Buried chloride stereochemistry in the Protein Data Bank

PubMed Central

2014-01-01

Background Despite the chloride anion is involved in fundamental biological processes, its interactions with proteins are little known. In particular, we lack a systematic survey of its coordination spheres. Results The analysis of a non-redundant set (pairwise sequence identity?
Buried chloride stereochemistry in the Protein Data Bank.

PubMed

Carugo, Oliviero

2014-09-23

Despite the chloride anion is involved in fundamental biological processes, its interactions with proteins are little known. In particular, we lack a systematic survey of its coordination spheres. The analysis of a non-redundant set (pairwise sequence identity < 30%) of 1739 high resolution (<2 Å) crystal structures that contain at least one chloride anion shows that the first coordination spheres of the chlorides are essentially constituted by hydrogen bond donors. Amongst the side-chains positively charged, arginine interacts with chlorides much more frequently than lysine. Although the most common coordination number is 4, the coordination stereochemistry is closer to the expected geometry when the coordination number is 5, suggesting that this is the coordination number towards which the chlorides tend when they interact with proteins. The results of these analyses are useful in interpreting, describing, and validating new protein crystal structures that contain chloride anions.
Trajectory Based Behavior Analysis for User Verification

NASA Astrophysics Data System (ADS)

Pao, Hsing-Kuo; Lin, Hong-Yi; Chen, Kuan-Ta; Fadlil, Junaidillah

Many of our activities on computer need a verification step for authorized access. The goal of verification is to tell apart the true account owner from intruders. We propose a general approach for user verification based on user trajectory inputs. The approach is labor-free for users and is likely to avoid the possible copy or simulation from other non-authorized users or even automatic programs like bots. Our study focuses on finding the hidden patterns embedded in the trajectories produced by account users. We employ a Markov chain model with Gaussian distribution in its transitions to describe the behavior in the trajectory. To distinguish between two trajectories, we propose a novel dissimilarity measure combined with a manifold learnt tuning for catching the pairwise relationship. Based on the pairwise relationship, we plug-in any effective classification or clustering methods for the detection of unauthorized access. The method can also be applied for the task of recognition, predicting the trajectory type without pre-defined identity. Given a trajectory input, the results show that the proposed method can accurately verify the user identity, or suggest whom owns the trajectory if the input identity is not provided.
Saving the Best for Last? A Cross-Species Analysis of Choices between Reinforcer Sequences

ERIC Educational Resources Information Center

Andrade, Leonardo F.; Hackenberg, Timothy D.

2012-01-01

Two experiments were conducted to compare choices between sequences of reinforcers in pigeon (Experiment 1) and human (Experiment 2) subjects, using functionally analogous procedures. The subjects made pairwise choices among 3 sequence types, all of which provided the same overall reinforcement rate, but differed in their temporal patterning.…
Amino acid positions subject to multiple coevolutionary constraints can be robustly identified by their eigenvector network centrality scores.

PubMed

Parente, Daniel J; Ray, J Christian J; Swint-Kruse, Liskin

2015-12-01

As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; "central" positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints-detectable by divergent algorithms--that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions. © 2015 Wiley Periodicals, Inc.
Mhc class II B gene evolution in East African cichlid fishes.

PubMed

Figueroa, F; Mayer, W E; Sültmann, H; O'hUigin, C; Tichy, H; Satta, Y; Takezaki, N; Takahata, N; Klein, J

2000-06-01

A distinctive feature of essential major histocompatibility complex (Mhc) loci is their polymorphism characterized by large genetic distances between alleles and long persistence times of allelic lineages. Since the lineages often span several successive speciations, we investigated the behavior of the Mhc alleles during or close to the speciation phase. We sequenced exon 2 of the class II B locus 4 from 232 East African cichlid fishes representing 32 related species. The divergence times of the (sub)species ranged from 6,000 to 8.4 million years. Two types of evolutionary analysis were used to elucidate the pattern of exon 2 sequence divergence. First, phylogenetic methods were applied to reconstruct the most likely evolutionary pathways leading from the last common ancestor of the set to the extant sequences, and to assess the probable mechanisms involved in allelic diversification. Second, pairwise comparisons of sequences were carried out to detect differences seemingly incompatible with origin by nonparallel point mutations. The analysis revealed point mutations to be the most important mechanism behind allelic divergences, with recombination playing only an auxiliary part. Comparison of sequences from related species revealed evidence of random allelic (lineage) losses apparently associated with speciation. Sharing of identical alleles could be demonstrated between species that diverged 2 million years ago. The phylogeny of the exon was incongruent with that of the flanking introns, indicating either a high degree of convergent evolution at the peptide-binding region-encoding sites, or intron homogenization.
Improving pairwise comparison of protein sequences with domain co-occurrence

PubMed Central

Gascuel, Olivier

2018-01-01

Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence PMID:29293498
Delineating slowly and rapidly evolving fractions of the Drosophila genome.

PubMed

Keith, Jonathan M; Adams, Peter; Stephen, Stuart; Mattick, John S

2008-05-01

Evolutionary conservation is an important indicator of function and a major component of bioinformatic methods to identify non-protein-coding genes. We present a new Bayesian method for segmenting pairwise alignments of eukaryotic genomes while simultaneously classifying segments into slowly and rapidly evolving fractions. We also describe an information criterion similar to the Akaike Information Criterion (AIC) for determining the number of classes. Working with pairwise alignments enables detection of differences in conservation patterns among closely related species. We analyzed three whole-genome and three partial-genome pairwise alignments among eight Drosophila species. Three distinct classes of conservation level were detected. Sequences comprising the most slowly evolving component were consistent across a range of species pairs, and constituted approximately 62-66% of the D. melanogaster genome. Almost all (>90%) of the aligned protein-coding sequence is in this fraction, suggesting much of it (comprising the majority of the Drosophila genome, including approximately 56% of non-protein-coding sequences) is functional. The size and content of the most rapidly evolving component was species dependent, and varied from 1.6% to 4.8%. This fraction is also enriched for protein-coding sequence (while containing significant amounts of non-protein-coding sequence), suggesting it is under positive selection. We also classified segments according to conservation and GC content simultaneously. This analysis identified numerous sub-classes of those identified on the basis of conservation alone, but was nevertheless consistent with that classification. Software, data, and results available at www.maths.qut.edu.au/-keithj/. Genomic segments comprising the conservation classes available in BED format.
Population and forensic genetic analyses of mitochondrial DNA control region variation from six major provinces in the Korean population.

PubMed

Hong, Seung Beom; Kim, Ki Cheol; Kim, Wook

2015-07-01

We generated complete mitochondrial DNA (mtDNA) control region sequences from 704 unrelated individuals residing in six major provinces in Korea. In addition to our earlier survey of the distribution of mtDNA haplogroup variation, a total of 560 different haplotypes characterized by 271 polymorphic sites were identified, of which 473 haplotypes were unique. The gene diversity and random match probability were 0.9989 and 0.0025, respectively. According to the pairwise comparison of the 704 control region sequences, the mean number of pairwise differences between individuals was 13.47±6.06. Based on the result of mtDNA control region sequences, pairwise FST genetic distances revealed genetic homogeneity of the Korean provinces on a peninsular level, except in samples from Jeju Island. This result indicates there may be a need to formulate a local mtDNA database for Jeju Island, to avoid bias in forensic parameter estimates caused by genetic heterogeneity of the population. Thus, the present data may help not only in personal identification but also in determining maternal lineages to provide an expanded and reliable Korean mtDNA database. These data will be available on the EMPOP database via accession number EMP00661. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Towards the Rational Design of a Candidate Vaccine against Pregnancy Associated Malaria: Conserved Sequences of the DBL6ε Domain of VAR2CSA

PubMed Central

Badaut, Cyril; Bertin, Gwladys; Rustico, Tatiana; Fievet, Nadine; Massougbodji, Achille; Gaye, Alioune; Deloron, Philippe

2010-01-01

Background Placental malaria is a disease linked to the sequestration of Plasmodium falciparum infected red blood cells (IRBC) in the placenta, leading to reduced materno-fetal exchanges and to local inflammation. One of the virulence factors of P. falciparum involved in cytoadherence to chondroitin sulfate A, its placental receptor, is the adhesive protein VAR2CSA. Its localisation on the surface of IRBC makes it accessible to the immune system. VAR2CSA contains six DBL domains. The DBL6ε domain is the most variable. High variability constitutes a means for the parasite to evade the host immune response. The DBL6ε domain could constitute a very attractive basis for a vaccine candidate but its reported variability necessitates, for antigenic characterisations, identifying and classifying commonalities across isolates. Methodology/Principal Findings Local alignment analysis of the DBL6ε domain had revealed that it is not as variable as previously described. Variability is concentrated in seven regions present on the surface of the DBL6ε domain. The main goal of our work is to classify and group variable sequences that will simplify further research to determine dominant epitopes. Firstly, variable sequences were grouped following their average percent pairwise identity (APPI). Groups comprising many variable sequences sharing low variability were found. Secondly, ELISA experiments following the IgG recognition of a recombinant DBL6ε domain, and of peptides mimicking its seven variable blocks, allowed to determine an APPI cut-off and to isolate groups represented by a single consensus sequence. Conclusions/Significance A new sequence approach is used to compare variable regions in sequences that have extensive segmental gene relationship. Using this approach, the VAR2CSA DBL6 domain is composed of 7 variable blocks with limited polymorphism. Each variable block is composed of a limited number of consensus types. Based on peptide based ELISA, variable blocks with 85% or greater sequence identity are expected to be recognized equally well by antibody and can be considered the same consensus type. Therefore, the analysis of the antibody response against the classified small number of sequences should be helpful to determine epitopes. PMID:20585655
Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

PubMed Central

2007-01-01

Background Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction. Results The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources. Conclusion Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download. PMID:17445273
Phylogenetic analysis of Demodex caprae based on mitochondrial 16S rDNA sequence.

PubMed

Zhao, Ya-E; Hu, Li; Ma, Jun-Xian

2013-11-01

Demodex caprae infests the hair follicles and sebaceous glands of goats worldwide, which not only seriously impairs goat farming, but also causes a big economic loss. However, there are few reports on the DNA level of D. caprae. To reveal the taxonomic position of D. caprae within the genus Demodex, the present study conducted phylogenetic analysis of D. caprae based on mt16S rDNA sequence data. D. caprae adults and eggs were obtained from a skin nodule of the goat suffering demodicidosis. The mt16S rDNA sequences of individual mite were amplified using specific primers, and then cloned, sequenced, and aligned. The sequence divergence, genetic distance, and transition/transversion rate were computed, and the phylogenetic trees in Demodex were reconstructed. Results revealed the 339-bp partial sequences of six D. caprae isolates were obtained, and the sequence identity was 100% among isolates. The pairwise divergences between D. caprae and Demodex canis or Demodex folliculorum or Demodex brevis were 22.2-24.0%, 24.0-24.9%, and 22.9-23.2%, respectively. The corresponding average genetic distances were 2.840, 2.926, and 2.665, and the average transition/transversion rates were 0.70, 0.55, and 0.54, respectively. The divergences, genetic distances, and transition/transversion rates of D. caprae versus the other three species all reached interspecies level. The five phylogenetic trees all presented that D. caprae clustered with D. brevis first, and then with D. canis, D. folliculorum, and Demodex injai in sequence. In conclusion, D. caprae is an independent species, and it is closer to D. brevis than to D. canis, D. folliculorum, or D. injai.
Characterization of a novel flavivirus isolated from Culex (Melanoconion) ocossa mosquitoes from Iquitos, Peru.

PubMed

Evangelista, Julio; Cruz, Cristhopher; Guevara, Carolina; Astete, Helvio; Carey, Cristiam; Kochel, Tadeusz J; Morrison, Amy C; Williams, Maya; Halsey, Eric S; Forshey, Brett M

2013-06-01

We describe the isolation and characterization of a novel flavivirus, isolated from a pool of Culex (Melanoconion) ocossa Dyar and Knab mosquitoes collected in 2009 in an urban area of the Amazon basin city of Iquitos, Peru. Flavivirus infection was detected by indirect immunofluorescent assay of inoculated C6/36 cells using polyclonal flavivirus antibodies (St. Louis encephalitis virus, yellow fever virus and dengue virus type 1) and confirmed by RT-PCR. Based on partial sequencing of the E and NS5 gene regions, the virus isolate was most closely related to the mosquito-borne flaviviruses but divergent from known species, with less than 45 and 71 % pairwise amino acid identity in the E and NS5 gene products, respectively. Phylogenetic analysis of E and NS5 amino acid sequences demonstrated that this flavivirus grouped with mosquito-borne flaviviruses, forming a clade with Nounané virus (NOUV). Like NOUV, no replication was detected in a variety of mammalian cells (Vero-76, Vero-E6, BHK, LLCMK, MDCK, A549 and RD) or in intracerebrally inoculated newborn mice. We tentatively designate this genetically distinct flavivirus as representing a novel species, Nanay virus, after the river near where it was first detected.
Structure of the N-terminal domain of human thioredoxin-interacting protein.

PubMed

Polekhina, Galina; Ascher, David Benjamin; Kok, Shie Foong; Beckham, Simone; Wilce, Matthew; Waltham, Mark

2013-03-01

Thioredoxin-interacting protein (TXNIP) is one of the six known α-arrestins and has recently received considerable attention owing to its involvement in redox signalling and metabolism. Various stress stimuli such as high glucose, heat shock, UV, H2O2 and mechanical stress among others robustly induce the expression of TXNIP, resulting in the sequestration and inactivation of thioredoxin, which in turn leads to cellular oxidative stress. While TXNIP is the only α-arrestin known to bind thioredoxin, TXNIP and two other α-arrestins, Arrdc4 and Arrdc3, have been implicated in metabolism. Furthermore, owing to its roles in the pathologies of diabetes and cardiovascular disease, TXNIP is considered to be a promising drug target. Based on their amino-acid sequences, TXNIP and the other α-arrestins are remotely related to β-arrestins. Here, the crystal structure of the N-terminal domain of TXNIP is reported. It provides the first structural information on any of the α-arrestins and reveals that although TXNIP adopts a β-arrestin fold as predicted, it is structurally more similar to Vps26 proteins than to β-arrestins, while sharing below 15% pairwise sequence identity with either.
Structure, synthesis, and activity of dermaseptin b, a novel vertebrate defensive peptide from frog skin: relationship with adenoregulin.

PubMed

Mor, A; Amiche, M; Nicolas, P

1994-05-31

A novel antimicrobial peptide, designated dermaseptin b, was isolated from the skin of the arboreal frog Phyllomedusa bicolor. This 27-residue peptide amide is basic, containing 3 lysine residues that punctuate an alternating hydrophobic and hydrophilic sequence. In helix-inducing solvent, dermaseptin b adopts an amphipathic alpha-helical conformation that most closely resembles class L amphipathic helixes, with all lysine residues on the polar face of the helix. The peptide exhibits growth inhibition activity in vitro against a broad spectrum of pathogenic microorganisms including yeast and bacteria as well as various filamentous fungi that are responsible for severe opportunistic infections accompanying acquired immunodeficiency syndrome and the use of immunosuppressive agents. Maximized pairwise sequence alignment of dermaseptin b and dermaseptin s, a 34-residue antimicrobial peptide previously isolated from Phyllomedusa sauvagii, reveals 81% amino acid identity. No other significant similarity was found between dermaseptin b and any prokaryotic or eukaryotic protein, but similarity was found with adenoregulin (38% amino acid postional identity), a 33-residue peptide that enhances binding of agonists to the A1 adenosine receptor. The synthetic replicates of dermaseptin b and adenoregulin displayed similar but nonidentical spectra of antimicrobial activity, and both peptides were devoid of lytic effect on mammalian cells. Accordingly, the observation that adenoregulin enhances binding of agonists to the adenosine receptor may in fact be a consequence of its ability to alter the structure of biological membranes and to produce signal transduction via interactions with the lipid bilayer, bypassing cell surface receptor interactions.
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

PubMed

Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut

2018-05-03

Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.
Phylogenetic Relationship of Necoclí Virus to Other South American Hantaviruses (Bunyaviridae: Hantavirus).

PubMed

Montoya-Ruiz, Carolina; Cajimat, Maria N B; Milazzo, Mary Louise; Diaz, Francisco J; Rodas, Juan David; Valbuena, Gustavo; Fulhorst, Charles F

2015-07-01

The results of a previous study suggested that Cherrie's cane rat (Zygodontomys cherriei) is the principal host of Necoclí virus (family Bunyaviridae, genus Hantavirus) in Colombia. Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences in this study confirmed that Necoclí virus is phylogenetically closely related to Maporal virus, which is principally associated with the delicate pygmy rice rat (Oligoryzomys delicatus) in western Venezuela. In pairwise comparisons, nonidentities between the complete amino acid sequence of the nucleocapsid protein of Necoclí virus and the complete amino acid sequences of the nucleocapsid proteins of other hantaviruses were ≥8.7%. Likewise, nonidentities between the complete amino acid sequence of the glycoprotein precursor of Necoclí virus and the complete amino acid sequences of the glycoprotein precursors of other hantaviruses were ≥11.7%. Collectively, the unique association of Necoclí virus with Z. cherriei in Colombia, results of the Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences, and results of the pairwise comparisons of amino acid sequences strongly support the notion that Necoclí virus represents a novel species in the genus Hantavirus. Further work is needed to determine whether Calabazo virus (a hantavirus associated with Z. brevicauda cherriei in Panama) and Necoclí virus are conspecific.
Complete sequence analysis of 18S rDNA based on genomic DNA extraction from individual Demodex mites (Acari: Demodicidae).

PubMed

Zhao, Ya-E; Xu, Ji-Ru; Hu, Li; Wu, Li-Ping; Wang, Zheng-Hang

2012-05-01

The study for the first time attempted to accomplish 18S ribosomal DNA (rDNA) complete sequence amplification and analysis for three Demodex species (Demodex folliculorum, Demodex brevis and Demodex canis) based on gDNA extraction from individual mites. The mites were treated by DNA Release Additive and Hot Start II DNA Polymerase so as to promote mite disruption and increase PCR specificity. Determination of D. folliculorum gDNA showed that the gDNA yield reached the highest at 1 mite, tending to descend with the increase of mite number. The individual mite gDNA was successfully used for 18S rDNA fragment (about 900 bp) amplification examination. The alignments of 18S rDNA complete sequences of individual mite samples and those of pooled mite samples ( ≥ 1000mites/sample) showed over 97% identities for each species, indicating that the gDNA extracted from a single individual mite was as satisfactory as that from pooled mites for PCR amplification. Further pairwise sequence analyses showed that average divergence, genetic distance, transition/transversion or phylogenetic tree could not effectively identify the three Demodex species, largely due to the differentiation in the D. canis isolates. It can be concluded that the individual Demodex mite gDNA can satisfy the molecular study of Demodex. 18S rDNA complete sequence is suitable for interfamily identification in Cheyletoidea, but whether it is suitable for intrafamily identification cannot be confirmed until the ascertainment of the types of Demodex mites parasitizing in dogs. Copyright © 2012 Elsevier Inc. All rights reserved.
Evolution of puma lentivirus in bobcats (Lynx rufus) and mountain lions (Puma concolor) in North America

USGS Publications Warehouse

Lee, Justin S.; Bevins, Sarah N.; Serieys, Laurel E.K.; Vickers, Winston; Logan, Ken A.; Aldredge, Mat; Boydston, Erin E.; Lyren, Lisa M.; McBride, Roy; Roelke-Parker, Melody; Pecon-Slattery, Jill; Troyer, Jennifer L.; Riley, Seth P.; Boyce, Walter M.; Crooks, Kevin R.; VandeWoude, Sue

2014-01-01

Mountain lions (Puma concolor) throughout North and South America are infected with puma lentivirus clade B (PLVB). A second, highly divergent lentiviral clade, PLVA, infects mountain lions in southern California and Florida. Bobcats (Lynx rufus) in these two geographic regions are also infected with PLVA, and to date, this is the only strain of lentivirus identified in bobcats. We sequenced full-length PLV genomes in order to characterize the molecular evolution of PLV in bobcats and mountain lions. Low sequence homology (88% average pairwise identity) and frequent recombination (1 recombination breakpoint per 3 isolates analyzed) were observed in both clades. Viral proteins have markedly different patterns of evolution; sequence homology and negative selection were highest in Gag and Pol and lowest in Vif and Env. A total of 1.7% of sites across the PLV genome evolve under positive selection, indicating that host-imposed selection pressure is an important force shaping PLV evolution. PLVA strains are highly spatially structured, reflecting the population dynamics of their primary host, the bobcat. In contrast, the phylogeography of PLVB reflects the highly mobile mountain lion, with diverse PLVB isolates cocirculating in some areas and genetically related viruses being present in populations separated by thousands of kilometers. We conclude that PLVA and PLVB are two different viral species with distinct feline hosts and evolutionary histories.
QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically

PubMed Central

Pérez-Quintero, Alvaro L.; Lamy, Léo; Gordon, Jonathan L.; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel

2015-01-01

Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi. PMID:26284082

Complete genome sequence of Fer-de-Lance Virus reveals a novel gene in reptilian Paramyxoviruses

USGS Publications Warehouse

Kurath, G.; Batts, W.N.; Ahne, W.; Winton, J.R.

2004-01-01

The complete RNA genome sequence of the archetype reptilian paramyxovirus, Fer-de-Lance virus (FDLV), has been determined. The genome is 15,378 nucleotides in length and consists of seven nonoverlapping genes in the order 3??? N-U-P-M-F-HN-L 5???, coding for the nucleocapsid, unknown, phospho-, matrix, fusion, hemagglutinin-neuraminidase, and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and tri-nucleotide intergenic regions similar to those of other Paramyxoviridae. The FDLV P gene expression strategy is like that of rubulaviruses, which express the accessory V protein from the primary transcript and edit a portion of the mRNA to encode P and I proteins. There is also an overlapping open reading frame potentially encoding a small basic protein in the P gene. The gene designated U (unknown), encodes a deduced protein of 19.4 kDa that has no counterpart in other paramyxoviruses and has no similarity with sequences in the National Center for Biotechnology Information database. Active transcription of the U gene in infected cells was demonstrated by Northern blot analysis, and bicistronic N-U mRNA was also evident. The genomes of two other snake paramyxovirus genotypes were also found to have U genes, with 11 to 16% nucleotide divergence from the FDLV U gene. Pairwise comparisons of amino acid identities and phylogenetic analyses of all deduced FDLV protein sequences with homologous sequences from other Paramyxoviridae indicate that FDLV represents a new genus within the subfamily Paramyxovirinae. We suggest the name Ferlavirus for the new genus, with FDLV as the type species.
Polynucleobacter meluiroseus sp. nov., a bacterium isolated from a lake located in the mountains of the Mediterranean island of Corsica.

PubMed

Pitt, Alexandra; Schmidt, Johanna; Lang, Elke; Whitman, William B; Woyke, Tanja; Hahn, Martin W

2018-06-01

Strain AP-Melu-1000-B4 was isolated from a lake located in the mountains of the Mediterranean island of Corsica (France). Phenotypic, chemotaxonomic and genomic traits were investigated. Phylogenetic analyses based on 16S rRNA gene sequencing referred the strain to the cryptic species complex PnecC within the genus Polynucleobacter. The strain encoded genes for biosynthesis of proteorhodopsin and retinal. When pelleted by centrifugation the strain showed an intense rose colouring. Major fatty acids were C16 : 1ω7c, C16 : 0, C18 : 1ω7c and summed feature 2 (C16 : 1 isoI and C14 : 0-3OH). The sequence of the 16S rRNA gene contained an indel which was not present in any previously described Polynucleobacter species. Genome sequencing revealed a genome size of 1.89 Mbp and a G+C content of 46.6 mol%. In order to resolve the phylogenetic position of the new strain within subcluster PnecC, its phylogeny was reconstructed from sequences of 319 shared genes. To represent all currently described Polynucleobacter species by whole genome sequences, three type strains were additionally sequenced. Our phylogenetic analysis revealed that strain AP-Melu-100-B4 occupied a basal position compared with previously described PnecC strains. Pairwise determined whole genome average nucleotide identity (gANI) values suggested that strain AP-Melu-1000-B4 represents a new species, for which we propose the name Polynucleobacter meluiroseus sp. nov. with the type strain AP-Melu-1000-B4 T (=DSM 103591 T =CIP 111329 T ).
DNA Barcodes of Asian Houbara Bustard (Chlamydotis undulata macqueenii)

PubMed Central

Arif, Ibrahim A.; Khan, Haseeb A.; Williams, Joseph B.; Shobrak, Mohammad; Arif, Waad I.

2012-01-01

Populations of Houbara Bustards have dramatically declined in recent years. Captive breeding and reintroduction programs have had limited success in reviving population numbers and thus new technological solutions involving molecular methods are essential for the long term survival of this species. In this study, we sequenced the 694 bp segment of COI gene of the four specimens of Asian Houbara Bustard (Chlamydotis undulata macqueenii). We also compared these sequences with earlier published barcodes of 11 individuals comprising different families of the orders Gruiformes, Ciconiiformes, Podicipediformes and Crocodylia (out group). The pair-wise sequence comparison showed a total of 254 variable sites across all the 15 sequences from different taxa. Three of the four specimens of Houbara Bustard had an identical sequence of COI gene and one individual showed a single nucleotide difference (G > A transition at position 83). Within the bustard family (Otididae), comparison among the three species (Asian Houbara Bustard, Great Bustard (Otis tarda) and the Little Bustard (Tetrax tetrax)), representing three different genera, showed 116 variable sites. For another family (Rallidae), the intra-family variable sites among the individuals of four different genera were found to be 146. The COI genetic distances among the 15 individuals varied from 0.000 to 0.431. Phylogenetic analysis using 619 bp nucleotide segment of COI clearly discriminated all the species representing different genera, families and orders. All the four specimens of Houbara Bustard formed a single clade and are clearly separated from other two individuals of the same family (Otis tarda and Tetrax tetrax). The nucleotide sequence of partial segment of COI gene effectively discriminated the closely related species. This is the first study reporting the barcodes of Houbara Bustard and would be helpful in future molecular studies, particularly for the conservation of this threatened bird in Saudi Arabia. PMID:22408462
Diverse and highly recombinant anelloviruses associated with Weddell seals in Antarctica

PubMed Central

Fahsbender, Elizabeth; Kim, Stacy; Kraberger, Simona; Frankfurter, Greg; Eilers, Alice A.; Shero, Michelle R.; Beltran, Roxanne; Kirkham, Amy; McCorkell, Robert; Berngartt, Rachel K.; Male, Maketalena F.; Ballard, Grant; Ainley, David G.; Breitbart, Mya

2017-01-01

Abstract The viruses circulating among Antarctic wildlife remain largely unknown. In an effort to identify viruses associated with Weddell seals (Leptonychotes weddellii) inhabiting the Ross Sea, vaginal and nasal swabs, and faecal samples were collected between November 2014 and February 2015. In addition, a Weddell seal kidney and South Polar skua (Stercorarius maccormicki) faeces were opportunistically sampled. Using high throughput sequencing, we identified and recovered 152 anellovirus genomes that share 63–70% genome-wide identities with other pinniped anelloviruses. Genome-wide pairwise comparisons coupled with phylogenetic analysis revealed two novel anellovirus species, tentatively named torque teno Leptonychotes weddellii virus (TTLwV) -1 and -2. TTLwV-1 (n = 133, genomes encompassing 40 genotypes) is highly recombinant, whereas TTLwV-2 (n = 19, genomes encompassing three genotypes) is relatively less recombinant. This study documents ubiquitous TTLwVs among Weddell seals in Antarctica with frequent co-infection by multiple genotypes, however, the role these anelloviruses play in seal health remains unknown. PMID:28744371
Diverse and highly recombinant anelloviruses associated with Weddell seals in Antarctica.

PubMed

Fahsbender, Elizabeth; Burns, Jennifer M; Kim, Stacy; Kraberger, Simona; Frankfurter, Greg; Eilers, Alice A; Shero, Michelle R; Beltran, Roxanne; Kirkham, Amy; McCorkell, Robert; Berngartt, Rachel K; Male, Maketalena F; Ballard, Grant; Ainley, David G; Breitbart, Mya; Varsani, Arvind

2017-01-01

The viruses circulating among Antarctic wildlife remain largely unknown. In an effort to identify viruses associated with Weddell seals ( Leptonychotes weddellii ) inhabiting the Ross Sea, vaginal and nasal swabs, and faecal samples were collected between November 2014 and February 2015. In addition, a Weddell seal kidney and South Polar skua ( Stercorarius maccormicki ) faeces were opportunistically sampled. Using high throughput sequencing, we identified and recovered 152 anellovirus genomes that share 63-70% genome-wide identities with other pinniped anelloviruses. Genome-wide pairwise comparisons coupled with phylogenetic analysis revealed two novel anellovirus species, tentatively named torque teno Leptonychotes weddellii virus (TTLwV) -1 and -2. TTLwV-1 ( n = 133, genomes encompassing 40 genotypes) is highly recombinant, whereas TTLwV-2 ( n = 19, genomes encompassing three genotypes) is relatively less recombinant. This study documents ubiquitous TTLwVs among Weddell seals in Antarctica with frequent co-infection by multiple genotypes, however, the role these anelloviruses play in seal health remains unknown.
Constructing STR multiplexes for individual identification of Hungarian red deer.

PubMed

Szabolcsi, Zoltan; Egyed, Balazs; Zenke, Petra; Padar, Zsolt; Borsy, Adrienn; Steger, Viktor; Pasztor, Erzsebet; Csanyi, Sandor; Buzas, Zsuzsanna; Orosz, Laszlo

2014-07-01

Red deer is the most valuable game of the fauna in Hungary, and there is a strong need for genetic identification of individuals. For this purpose, 10 tetranucleotide STR markers were developed and amplified in two 5-plex systems. The study presented here includes the flanking region sequence analysis and the allele nomenclature of the 10 loci as well as the PCR optimization of the DeerPlex I and II. LD pairwise tests and cross-species similarity analyses showed the 10 loci to be independently inherited. Considerable levels of genetic differences between two subpopulations were recorded, and F(ST) was 0.034 using AMOVA. The average probability of identity (PI(ave)) was at the value of 2.6736 × 10(-15). This low value for PI(ave) nearly eliminates false identification. An illegal hunting case solved by DeerPlex is described herein. The calculated likelihood ratio (LR) illustrates the potential of the 10 red deer microsatellite markers for forensic investigations. © 2014 American Academy of Forensic Sciences.
Computational design of enzyme-ligand binding using a combined energy function and deterministic sequence optimization algorithm.

PubMed

Tian, Ye; Huang, Xiaoqiang; Zhu, Yushan

2015-08-01

Enzyme amino-acid sequences at ligand-binding interfaces are evolutionarily optimized for reactions, and the natural conformation of an enzyme-ligand complex must have a low free energy relative to alternative conformations in native-like or non-native sequences. Based on this assumption, a combined energy function was developed for enzyme design and then evaluated by recapitulating native enzyme sequences at ligand-binding interfaces for 10 enzyme-ligand complexes. In this energy function, the electrostatic interaction between polar or charged atoms at buried interfaces is described by an explicitly orientation-dependent hydrogen-bonding potential and a pairwise-decomposable generalized Born model based on the general side chain in the protein design framework. The energy function is augmented with a pairwise surface-area based hydrophobic contribution for nonpolar atom burial. Using this function, on average, 78% of the amino acids at ligand-binding sites were predicted correctly in the minimum-energy sequences, whereas 84% were predicted correctly in the most-similar sequences, which were selected from the top 20 sequences for each enzyme-ligand complex. Hydrogen bonds at the enzyme-ligand binding interfaces in the 10 complexes were usually recovered with the correct geometries. The binding energies calculated using the combined energy function helped to discriminate the active sequences from a pool of alternative sequences that were generated by repeatedly solving a series of mixed-integer linear programming problems for sequence selection with increasing integer cuts.
Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data.

PubMed

Polanski, A; Kimmel, M; Chakraborty, R

1998-05-12

Distribution of pairwise differences of nucleotides from data on a sample of DNA sequences from a given segment of the genome has been used in the past to draw inferences about the past history of population size changes. However, all earlier methods assume a given model of population size changes (such as sudden expansion), parameters of which (e.g., time and amplitude of expansion) are fitted to the observed distributions of nucleotide differences among pairwise comparisons of all DNA sequences in the sample. Our theory indicates that for any time-dependent population size, N(tau) (in which time tau is counted backward from present), a time-dependent coalescence process yields the distribution, p(tau), of the time of coalescence between two DNA sequences randomly drawn from the population. Prediction of p(tau) and N(tau) requires the use of a reverse Laplace transform known to be unstable. Nevertheless, simulated data obtained from three models of monotone population change (stepwise, exponential, and logistic) indicate that the pattern of a past population size change leaves its signature on the pattern of DNA polymorphism. Application of the theory to the published mtDNA sequences indicates that the current mtDNA sequence variation is not inconsistent with a logistic growth of the human population.
Genetic divergence between populations of feral and domestic forms of a mosquito disease vector assessed by transcriptomics

PubMed Central

2015-01-01

Culex pipiens, an invasive mosquito and vector of West Nile virus in the US, has two morphologically indistinguishable forms that differ dramatically in behavior and physiology. Cx. pipiens form pipiens is primarily a bird-feeding temperate mosquito, while the sub-tropical Cx. pipiens form molestus thrives in sewers and feeds on mammals. Because the feral form can diapause during the cold winters but the domestic form cannot, the two Cx. pipiens forms are allopatric in northern Europe and, although viable, hybrids are rare. Cx. pipiens form molestus has spread across all inhabited continents and hybrids of the two forms are common in the US. Here we elucidate the genes and gene families with the greatest divergence rates between these phenotypically diverged mosquito populations, and discuss them in light of their potential biological and ecological effects. After generating and assembling novel transcriptome data for each population, we performed pairwise tests for nonsynonymous divergence (Ka) of homologous coding sequences and examined gene ontology terms that were statistically over-represented in those sequences with the greatest divergence rates. We identified genes involved in digestion (serine endopeptidases), innate immunity (fibrinogens and α-macroglobulins), hemostasis (D7 salivary proteins), olfaction (odorant binding proteins) and chitin binding (peritrophic matrix proteins). By examining molecular divergence between closely related yet phenotypically divergent forms of the same species, our results provide insights into the identity of rapidly-evolving genes between incipient species. Additionally, we found that families of signal transducers, ATP synthases and transcription regulators remained identical at the amino acid level, thus constituting conserved components of the Cx. pipiens proteome. We provide a reference with which to gauge the divergence reported in this analysis by performing a comparison of transcriptome sequences from conspecific (yet allopatric) populations of another member of the Cx. pipiens complex, Cx. quinquefasciatus. PMID:25755934
ScaffoldSeq: Software for characterization of directed evolution populations.

PubMed

Woldring, Daniel R; Holec, Patrick V; Hackel, Benjamin J

2016-07-01

ScaffoldSeq is software designed for the numerous applications-including directed evolution analysis-in which a user generates a population of DNA sequences encoding for partially diverse proteins with related functions and would like to characterize the single site and pairwise amino acid frequencies across the population. A common scenario for enzyme maturation, antibody screening, and alternative scaffold engineering involves naïve and evolved populations that contain diversified regions, varying in both sequence and length, within a conserved framework. Analyzing the diversified regions of such populations is facilitated by high-throughput sequencing platforms; however, length variability within these regions (e.g., antibody CDRs) encumbers the alignment process. To overcome this challenge, the ScaffoldSeq algorithm takes advantage of conserved framework sequences to quickly identify diverse regions. Beyond this, unintended biases in sequence frequency are generated throughout the experimental workflow required to evolve and isolate clones of interest prior to DNA sequencing. ScaffoldSeq software uniquely handles this issue by providing tools to quantify and remove background sequences, cluster similar protein families, and dampen the impact of dominant clones. The software produces graphical and tabular summaries for each region of interest, allowing users to evaluate diversity in a site-specific manner as well as identify epistatic pairwise interactions. The code and detailed information are freely available at http://research.cems.umn.edu/hackel. Proteins 2016; 84:869-874. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Global Occurrence of Archaeal amoA Genes in Terrestrial Hot Springs▿

PubMed Central

Zhang, Chuanlun L.; Ye, Qi; Huang, Zhiyong; Li, WenJun; Chen, Jinquan; Song, Zhaoqi; Zhao, Weidong; Bagwell, Christopher; Inskeep, William P.; Ross, Christian; Gao, Lei; Wiegel, Juergen; Romanek, Christopher S.; Shock, Everett L.; Hedlund, Brian P.

2008-01-01

Despite the ubiquity of ammonium in geothermal environments and the thermodynamic favorability of aerobic ammonia oxidation, thermophilic ammonia-oxidizing microorganisms belonging to the crenarchaeota kingdom have only recently been described. In this study, we analyzed microbial mats and surface sediments from 21 hot spring samples (pH 3.4 to 9.0; temperature, 41 to 86°C) from the United States, China, and Russia and obtained 846 putative archaeal ammonia monooxygenase large-subunit (amoA) gene and transcript sequences, representing a total of 41 amoA operational taxonomic units (OTUs) at 2% identity. The amoA gene sequences were highly diverse, yet they clustered within two major clades of archaeal amoA sequences known from water columns, sediments, and soils: clusters A and B. Eighty-four percent (711/846) of the sequences belonged to cluster A, which is typically found in water columns and sediments, whereas 16% (135/846) belonged to cluster B, which is typically found in soils and sediments. Although a few amoA OTUs were present in several geothermal regions, most were specific to a single region. In addition, cluster A amoA genes formed geographic groups, while cluster B sequences did not group geographically. With the exception of only one hot spring, principal-component analysis and UPGMA (unweighted-pair group method using average linkages) based on the UniFrac metric derived from cluster A grouped the springs by location, regardless of temperature or bulk water pH, suggesting that geography may play a role in structuring communities of putative ammonia-oxidizing archaea (AOA). The amoA genes were distinct from those of low-temperature environments; in particular, pair-wise comparisons between hot spring amoA genes and those from sympatric soils showed less than 85% sequence identity, underscoring the distinctness of hot spring archaeal communities from those of the surrounding soil system. Reverse transcription-PCR showed that amoA genes were transcribed in situ in one spring and the transcripts were closely related to the amoA genes amplified from the same spring. Our study demonstrates the global occurrence of putative archaeal amoA genes in a wide variety of terrestrial hot springs and suggests that geography may play an important role in selecting different assemblages of AOA. PMID:18676703
Global occurrence of archaeal amoA genes in terrestrial hot springs.

PubMed

Zhang, Chuanlun L; Ye, Qi; Huang, Zhiyong; Li, Wenjun; Chen, Jinquan; Song, Zhaoqi; Zhao, Weidong; Bagwell, Christopher; Inskeep, William P; Ross, Christian; Gao, Lei; Wiegel, Juergen; Romanek, Christopher S; Shock, Everett L; Hedlund, Brian P

2008-10-01

Despite the ubiquity of ammonium in geothermal environments and the thermodynamic favorability of aerobic ammonia oxidation, thermophilic ammonia-oxidizing microorganisms belonging to the crenarchaeota kingdom have only recently been described. In this study, we analyzed microbial mats and surface sediments from 21 hot spring samples (pH 3.4 to 9.0; temperature, 41 to 86 degrees C) from the United States, China, and Russia and obtained 846 putative archaeal ammonia monooxygenase large-subunit (amoA) gene and transcript sequences, representing a total of 41 amoA operational taxonomic units (OTUs) at 2% identity. The amoA gene sequences were highly diverse, yet they clustered within two major clades of archaeal amoA sequences known from water columns, sediments, and soils: clusters A and B. Eighty-four percent (711/846) of the sequences belonged to cluster A, which is typically found in water columns and sediments, whereas 16% (135/846) belonged to cluster B, which is typically found in soils and sediments. Although a few amoA OTUs were present in several geothermal regions, most were specific to a single region. In addition, cluster A amoA genes formed geographic groups, while cluster B sequences did not group geographically. With the exception of only one hot spring, principal-component analysis and UPGMA (unweighted-pair group method using average linkages) based on the UniFrac metric derived from cluster A grouped the springs by location, regardless of temperature or bulk water pH, suggesting that geography may play a role in structuring communities of putative ammonia-oxidizing archaea (AOA). The amoA genes were distinct from those of low-temperature environments; in particular, pair-wise comparisons between hot spring amoA genes and those from sympatric soils showed less than 85% sequence identity, underscoring the distinctness of hot spring archaeal communities from those of the surrounding soil system. Reverse transcription-PCR showed that amoA genes were transcribed in situ in one spring and the transcripts were closely related to the amoA genes amplified from the same spring. Our study demonstrates the global occurrence of putative archaeal amoA genes in a wide variety of terrestrial hot springs and suggests that geography may play an important role in selecting different assemblages of AOA.
An Outbreak of Streptococcus pyogenes in a Mental Health Facility: Advantage of Well-Timed Whole-Genome Sequencing Over emm Typing.

PubMed

Bergin, Sarah M; Periaswamy, Balamurugan; Barkham, Timothy; Chua, Hong Choon; Mok, Yee Ming; Fung, Daniel Shuen Sheng; Su, Alex Hsin Chuan; Lee, Yen Ling; Chua, Ming Lai Ivan; Ng, Poh Yong; Soon, Wei Jia Wendy; Chu, Collins Wenhan; Tan, Siyun Lucinda; Meehan, Mary; Ang, Brenda Sze Peng; Leo, Yee Sin; Holden, Matthew T G; De, Partha; Hsu, Li Yang; Chen, Swaine L; de Sessions, Paola Florez; Marimuthu, Kalisvar

2018-05-09

OBJECTIVEWe report the utility of whole-genome sequencing (WGS) conducted in a clinically relevant time frame (ie, sufficient for guiding management decision), in managing a Streptococcus pyogenes outbreak, and present a comparison of its performance with emm typing.SETTINGA 2,000-bed tertiary-care psychiatric hospital.METHODSActive surveillance was conducted to identify new cases of S. pyogenes. WGS guided targeted epidemiological investigations, and infection control measures were implemented. Single-nucleotide polymorphism (SNP)-based genome phylogeny, emm typing, and multilocus sequence typing (MLST) were performed. We compared the ability of WGS and emm typing to correctly identify person-to-person transmission and to guide the management of the outbreak.RESULTSThe study included 204 patients and 152 staff. We identified 35 patients and 2 staff members with S. pyogenes. WGS revealed polyclonal S. pyogenes infections with 3 genetically distinct phylogenetic clusters (C1-C3). Cluster C1 isolates were all emm type 4, sequence type 915 and had pairwise SNP differences of 0-5, which suggested recent person-to-person transmissions. Epidemiological investigation revealed that cluster C1 was mediated by dermal colonization and transmission of S. pyogenes in a male residential ward. Clusters C2 and C3 were genomically diverse, with pairwise SNP differences of 21-45 and 26-58, and emm 11 and mostly emm120, respectively. Clusters C2 and C3, which may have been considered person-to-person transmissions by emm typing, were shown by WGS to be unlikely by integrating pairwise SNP differences with epidemiology.CONCLUSIONSWGS had higher resolution than emm typing in identifying clusters with recent and ongoing person-to-person transmissions, which allowed implementation of targeted intervention to control the outbreak.Infect Control Hosp Epidemiol 2018;1-9.
Epidemiological links between tuberculosis cases identified twice as efficiently by whole genome sequencing than conventional molecular typing: A population-based study.

PubMed

Jajou, Rana; de Neeling, Albert; van Hunen, Rianne; de Vries, Gerard; Schimmel, Henrieke; Mulder, Arnout; Anthony, Richard; van der Hoek, Wim; van Soolingen, Dick

2018-01-01

Patients with Mycobacterium tuberculosis isolates sharing identical DNA fingerprint patterns can be epidemiologically linked. However, municipal health services in the Netherlands are able to confirm an epidemiological link in only around 23% of the patients with isolates clustered by the conventional variable number of tandem repeat (VNTR) genotyping. This research aims to investigate whether whole genome sequencing (WGS) is a more reliable predictor of epidemiological links between tuberculosis patients than VNTR genotyping. VNTR genotyping and WGS were performed in parallel on all Mycobacterium tuberculosis complex isolates received at the Netherlands National Institute for Public Health and the Environment in 2016. Isolates were clustered by VNTR when they shared identical 24-loci VNTR patterns; isolates were assigned to a WGS cluster when the pair-wise genetic distance was ≤ 12 single nucleotide polymorphisms (SNPs). Cluster investigation was performed by municipal health services on all isolates clustered by VNTR in 2016. The proportion of epidemiological links identified among patients clustered by either method was calculated. In total, 535 isolates were genotyped, of which 25% (134/535) were clustered by VNTR and 14% (76/535) by WGS; the concordance between both typing methods was 86%. The proportion of epidemiological links among WGS clustered cases (57%) was twice as common than among VNTR clustered cases (31%). When WGS was applied, the number of clustered isolates was halved, while all epidemiologically linked cases remained clustered. WGS is therefore a more reliable tool to predict epidemiological links between tuberculosis cases than VNTR genotyping and will allow more efficient transmission tracing, as epidemiological investigations based on false clustering can be avoided.
Glycomyces sediminimaris sp. nov., a new species of actinobacteria isolated from marine sediment.

PubMed

Mohammadipanah, Fatemeh; Atasayar, Ewelina; Heidarian, Sheida; Wink, Joachim

2018-06-05

A novel Glycomyces strain, designated as MH2460 T , was isolated from marine sediment collected from 12 m depth in Rostami seaport, Bushehr Province in Iran. On International Streptomyces Project 2 medium it produced branching substrate hyphae that developed into a large number of irregularly shaped spores in 8 days. It showed optimal growth at 25-35 °C, pH 6.0-8.0 and in salinity between 2.5-5 % (w/v) NaCl. Chemotaxonomic and molecular characteristics of the isolate matched descriptions for members of the genus Glycomyces. Whole-cell hydrolysates of strain MH2460 T contained meso-diaminopimelic acids along with glucose, ribose and small traces of xylose and galactose. The phospholipids comprised diphosphatidylglycerol, phosphatidylglycerol, phosphatidylinositol and phosphatidylinositol mannosides as well as two unidentified phosphoglycolipids, one unidentified phospholipid and an unidentified aminolipid. The predominant menaquinones were MK-11(H4) and MK-10(H4). The fatty-acid pattern was mainly composed of anteiso-C15 : 0, anteiso-C17 : 0, iso-C15 : 0 and iso-C16 : 0. The strain belongs to the genus Glycomyces based on 16S rRNA gene sequence with the highest pairwise sequence identity (98.3 %) with Glycomyces phytohabitans KLBMP 1483 T . The DNA-DNA hybridization value showed 53.9±2.7 % identity when MH2460 T was compared to this reference strain. The G+C content of the DNA was 70.2 mol%. Based on phenotypic, biochemical, chemotaxonomic and genotypic features, strain MH2460 T (DSM 103727 T =UTMC 2460 T =NCCB 100631 T ) is considered to represent a novel species of the genus Glycomyces, for which the name Glycomycessediminimaris is proposed.
Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

PubMed

Bastien, Olivier; Maréchal, Eric

2008-08-07

Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate. Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.
In Silico Analysis of Gene Expression Network Components Underlying Pigmentation Phenotypes in the Python Identified Evolutionarily Conserved Clusters of Transcription Factor Binding Sites

PubMed Central

2016-01-01

Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1) that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus. Our results provide insight into pigment phenotypes in pythons. PMID:27698666
In Silico Analysis of Gene Expression Network Components Underlying Pigmentation Phenotypes in the Python Identified Evolutionarily Conserved Clusters of Transcription Factor Binding Sites.

PubMed

Irizarry, Kristopher J L; Bryden, Randall L

2016-01-01

Color variation provides the opportunity to investigate the genetic basis of evolution and selection. Reptiles are less studied than mammals. Comparative genomics approaches allow for knowledge gained in one species to be leveraged for use in another species. We describe a comparative vertebrate analysis of conserved regulatory modules in pythons aimed at assessing bioinformatics evidence that transcription factors important in mammalian pigmentation phenotypes may also be important in python pigmentation phenotypes. We identified 23 python orthologs of mammalian genes associated with variation in coat color phenotypes for which we assessed the extent of pairwise protein sequence identity between pythons and mouse, dog, horse, cow, chicken, anole lizard, and garter snake. We next identified a set of melanocyte/pigment associated transcription factors (CREB, FOXD3, LEF-1, MITF, POU3F2, and USF-1) that exhibit relatively conserved sequence similarity within their DNA binding regions across species based on orthologous alignments across multiple species. Finally, we identified 27 evolutionarily conserved clusters of transcription factor binding sites within ~200-nucleotide intervals of the 1500-nucleotide upstream regions of AIM1, DCT, MC1R, MITF, MLANA, OA1, PMEL, RAB27A, and TYR from Python bivittatus . Our results provide insight into pigment phenotypes in pythons.
A Proposed Genus Boundary for the Prokaryotes Based on Genomic Insights

PubMed Central

Qin, Qi-Long; Xie, Bin-Bin; Zhang, Xi-Ying; Chen, Xiu-Lan; Zhou, Bai-Cheng; Zhou, Jizhong; Oren, Aharon

2014-01-01

Genomic information has already been applied to prokaryotic species definition and classification. However, the contribution of the genome sequence to prokaryotic genus delimitation has been less studied. To gain insights into genus definition for the prokaryotes, we attempted to reveal the genus-level genomic differences in the current prokaryotic classification system and to delineate the boundary of a genus on the basis of genomic information. The average nucleotide sequence identity between two genomes can be used for prokaryotic species delineation, but it is not suitable for genus demarcation. We used the percentage of conserved proteins (POCP) between two strains to estimate their evolutionary and phenotypic distance. A comprehensive genomic survey indicated that the POCP can serve as a robust genomic index for establishing the genus boundary for prokaryotic groups. Basically, two species belonging to the same genus would share at least half of their proteins. In a specific lineage, the genus and family/order ranks showed slight or no overlap in terms of POCP values. A prokaryotic genus can be defined as a group of species with all pairwise POCP values higher than 50%. Integration of whole-genome data into the current taxonomy system can provide comprehensive information for prokaryotic genus definition and delimitation. PMID:24706738
kWIP: The k-mer weighted inner product, a de novo estimator of genetic similarity.

PubMed

Murray, Kevin D; Webers, Christfried; Ong, Cheng Soon; Borevitz, Justin; Warthmann, Norman

2017-09-01

Modern genomics techniques generate overwhelming quantities of data. Extracting population genetic variation demands computationally efficient methods to determine genetic relatedness between individuals (or "samples") in an unbiased manner, preferably de novo. Rapid estimation of genetic relatedness directly from sequencing data has the potential to overcome reference genome bias, and to verify that individuals belong to the correct genetic lineage before conclusions are drawn using mislabelled, or misidentified samples. We present the k-mer Weighted Inner Product (kWIP), an assembly-, and alignment-free estimator of genetic similarity. kWIP combines a probabilistic data structure with a novel metric, the weighted inner product (WIP), to efficiently calculate pairwise similarity between sequencing runs from their k-mer counts. It produces a distance matrix, which can then be further analysed and visualised. Our method does not require prior knowledge of the underlying genomes and applications include establishing sample identity and detecting mix-up, non-obvious genomic variation, and population structure. We show that kWIP can reconstruct the true relatedness between samples from simulated populations. By re-analysing several published datasets we show that our results are consistent with marker-based analyses. kWIP is written in C++, licensed under the GNU GPL, and is available from https://github.com/kdmurray91/kwip.

The Intestinal Microbiota of Tadpoles Differs from Those of Syntopic Aquatic Invertebrates.

PubMed

Lyra, Mariana L; Bletz, Molly C; Haddad, Célio F B; Vences, Miguel

2017-11-20

Bacterial communities associated to eukaryotes play important roles in the physiology, development, and health of their hosts. Here, we examine the intestinal microbiota in tadpoles and aquatic invertebrates (insects and gastropods) to better understand the degree of specialization in the tadpole microbiotas. Samples were collected at the same time in one pond, and the V4 region of the bacterial 16S rRNA gene was sequenced with Illumina amplicon sequencing. We found that bacterial richness and diversity were highest in two studied snail individuals, intermediate in tadpoles, and lowest in the four groups of aquatic insects. All groups had substantial numbers of exclusive bacterial operational taxonomic units (OTUs) in their guts, but also shared a high proportion of OTUs, probably corresponding to transient environmental bacteria. Significant differences were found for all pairwise comparisons of tadpoles and snails with the major groups of insects, but not among insect groups or between snails and tadpoles. The similarity between tadpoles and snails may be related to similar feeding mode as both snails and tadpoles scratch biofilms and algae from surfaces; however, this requires confirmation due to low sample sizes. Overall, the gut microbiota differences found among syntopic aquatic animals are likely shaped by both food preferences and host identity.
Environmental Sequencing of Biotic Components of Dust in the Chihuahuan Desert

NASA Astrophysics Data System (ADS)

Walsh, E.; Gill, T. E.; Rivas, J. A., Jr.; Leung, M. Y.; Mohl, J.

2015-12-01

A growing number of studies mark the role of wind in dispersing biota. Most of these approaches have used traditional methods to assess taxonomic diversity. Here we used next generation sequencing to characterize microbiota in dust collected from the Chihuahuan Desert. Atmospheric dust was collected during events during 2011-2014 using dry deposition collectors placed at two sites in El Paso Co., TX. In parallel experiments, we rehydrated subsamples of dust and conducted PCR amplifications using conserved primers for 16S and 18S ribosomal genes. Sequenced reads were de-multiplexed, quality filtered, and processed using QIIME. Taxonomy was assigned based on pairwise identity using BLAST for microbial eukaryotes. All samples were rarefied to a set number of sequences per sample prior to downstream analyses. Bioinformatic analysis of four of the dust samples yielded a diversity of biota, including zooplankton, bacteria, fungi, algae, and protists, but fungi predominate (>90% of both 10K and 3K reads). In our rehydrations of dust samples from the U.S. southwest nematodes, gastrotrichs, tardigrades, monogonont and bdelloid rotifers, branchiopods and numerous ciliates have been recovered. Variability in genetic diversity among samples is based, in part, on the source and extent of the particular dust event. We anticipate the same patterns will be seen in the complete data set. These preliminary results indicate that wind is a major transporter of not only fungi, bacteria and other unicellular organisms but may also be important in shaping the distribution patterns of multi-cellular organisms such as those that inhabit aquatic environments in the arid southwestern US.
Complete sequence of two tick-borne flaviviruses isolated from Siberia and the UK: analysis and significance of the 5' and 3'-UTRs.

PubMed

Gritsun, T S; Venugopal, K; Zanotto, P M; Mikhailov, M V; Sall, A A; Holmes, E C; Polkinghorne, I; Frolova, T V; Pogodina, V V; Lashkevich, V A; Gould, E A

1997-05-01

The complete nucleotide sequence of two tick-transmitted flaviviruses, Vasilchenko (Vs) from Siberia and louping ill (LI) from the UK, have been determined. The genomes were respectively, 10928 and 10871 nucleotides (nt) in length. The coding strategy and functional protein sequence motifs of tick-borne flaviviruses are presented in both Vs and LI viruses. The phylogenies based on maximum likelihood, maximum parsimony and distance analysis of the polyproteins, identified Vs virus as a member of the tick-borne encephalitis virus subgroup within the tick-borne serocomplex, genus Flavivirus, family Flaviviridae. Comparative alignment of the 3'-untranslated regions revealed deletions of different lengths essentially at the same position downstream of the stop codon for all tick-borne viruses. Two direct 27 nucleotide repeats at the 3'-end were found only for Vs and LI virus. Immediately following the deletions a region of 332-334 nt with relatively conserved primary structure (67-94% identity) was observed at the 3'-non-coding end of the virus genome. Pairwise comparisons of the nucleotide sequence data revealed similar levels of variation between the coding region, and the 5' and 3'-termini of the genome, implying an equivalent strong selective control for translated and untranslated regions. Indeed the predicted folding of the 5' and 3'-untranslated regions revealed patterns of stem and loop structures conserved for all tick-borne flaviviruses suggesting a purifying selection for preservation of essential RNA secondary structures which could be involved in translational control and replication. The possible implications of these findings are discussed.
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.

PubMed

Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

2010-03-01

Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org
Comparison of the ITS1 and ITS2 rDNA in Emeria callospermophili (Apicomplexa: Eimeriidae) from Sciurid Rodents

PubMed Central

Motriuk-Smith, Dagmara; Seville, R Scott; Quealy, Leah; Oliver, Clinton E.

2011-01-01

The taxonomy of the coccidia has historically been morphologically based. The purpose of this study was to establish if conspecificity of isolates of Eimeria callospermophili from 4 ground-dwelling squirrel hosts (Rodentia: Sciuridae) is supported by comparison of rDNA sequence data and to examine how this species relates to eimerian species from other sciurid hosts. Eimeria callospermophili was isolated from 4 wild caught hosts, i.e., Urocitellus elegans, Cynomys leucurus, Marmota flaviventris, and Cynomys ludovicianus. The ITS1 and ITS2 genomic rDNA sequences were PCR generated, sequenced, and analyzed. The highest intraspecific pairwise distance values of 6.0% in ITS1 and 7.1% in ITS2 were observed in C. leucurus. Interspecific pairwise distance values greater than 5% do not support E. callospermophili conspecificity. Generated E. callospermophili sequences were compared to Eimeria lancasterensis from Sciuris niger and Sciurus niger cinereus, and Eimeria ontarioensis from S. niger. A single well-supported clade was formed by E. callospermophili amplicons in Neighbor Joining and Maximum Parsimony analyses. However, within the clade there was little evidence of host or geographic structuring of the species. PMID:21506777
Candida phyllophila sp. nov. and Candida vitiphila sp. nov., two novel yeast species from grape phylloplane in Thailand.

PubMed

Limtong, Savitree; Kaewwichian, Rungluk

2013-01-01

Three strains (K59(T), K60 and K70 (T)) representing two novel yeast species were isolated from the external surface of leaves of different wine grape (Vitis vinifera) plants, which were collected from the Kanchanaburi Research Station (N14°07'15.1″ E099°19'05.6″), Wang Dong Sub-district, Mueang District, Kanchanaburi Province, Thailand, by an enrichment technique. The sequences of the D1/D2 domain of the large subunit (LSU) rRNA gene of two strains (K59(T) and K60) were identical and differed from that of strain K70(T). In terms of pairwise sequence similarity of the D1/D2 domain, the closest species to the three strains was Candida asparagi but with 2.3% nucleotide substitutions for strains K59(T) and K60, and 2.1% nucleotide substitutions for strain K70(T). On the basis of morphological, biochemical, physiological and chemotaxonomic characteristics and the sequence analysis of the D1/D2 domain of the large subunit (LSU) rRNA gene, the three strains were assigned to be two novel Candida species. Two strains (K59(T) and K60) were assigned as Candida phyllophila sp. nov. (type strain K59(T)=BCC 42662(T)=NBRC 107776(T)=CBS 12671(T)). Candida vitiphila sp. nov. is proposed for strain K70(T) (=BCC 42663(T)=NBRC 107777(T)=CBS 12672(T)).
Bacillus wiedmannii sp. nov., a psychrotolerant and cytotoxic Bacillus cereus group species isolated from dairy foods and dairy environments

PubMed Central

Miller, Rachel A.; Beno, Sarah M.; Kent, David J.; Carroll, Laura M.; Martin, Nicole H.; Boor, Kathryn J.

2016-01-01

A facultatively anaerobic, spore-forming Bacillus strain, FSL W8-0169T, collected from raw milk stored in a silo at a dairy powder processing plant in the north-eastern USA was initially identified as a Bacillus cereus group species based on a partial sequence of the rpoB gene and 16S rRNA gene sequence. Analysis of core genome single nucleotide polymorphisms clustered this strain separately from known B. cereus group species. Pairwise average nucleotide identity blast values obtained for FSL W8-0169T compared to the type strains of existing B. cereus group species were <95 % and predicted DNA–DNA hybridization values were <70 %, suggesting that this strain represents a novel B. cereus group species. We characterized 10 additional strains with the same or closely related rpoB allelic type, by whole genome sequencing and phenotypic analyses. Phenotypic characterization identified a higher content of iso-C16 : 0 fatty acid and the combined inability to ferment sucrose or to hydrolyse arginine as the key characteristics differentiating FSL W8-0169T from other B. cereus group species. FSL W8-0169T is psychrotolerant, produces haemolysin BL and non-haemolytic enterotoxin, and is cytotoxic in a HeLa cell model. The name Bacillus wiedmannii sp. nov. is proposed for the novel species represented by the type strain FSL W8-0169T (=DSM 102050T=LMG 29269T). PMID:27520992
Solution to urn models of pairwise interaction with application to social, physical, and biological sciences

NASA Astrophysics Data System (ADS)

Pickering, William; Lim, Chjan

2017-07-01

We investigate a family of urn models that correspond to one-dimensional random walks with quadratic transition probabilities that have highly diverse applications. Well-known instances of these two-urn models are the Ehrenfest model of molecular diffusion, the voter model of social influence, and the Moran model of population genetics. We also provide a generating function method for diagonalizing the corresponding transition matrix that is valid if and only if the underlying mean density satisfies a linear differential equation and express the eigenvector components as terms of ordinary hypergeometric functions. The nature of the models lead to a natural extension to interaction between agents in a general network topology. We analyze the dynamics on uncorrelated heterogeneous degree sequence networks and relate the convergence times to the moments of the degree sequences for various pairwise interaction mechanisms.
High-speed multiple sequence alignment on a reconfigurable platform.

PubMed

Oliver, Tim; Schmidt, Bertil; Maskell, Douglas; Nathan, Darran; Clemens, Ralf

2006-01-01

Progressive alignment is a widely used approach to compute multiple sequence alignments (MSAs). However, aligning several hundred sequences by popular progressive alignment tools requires hours on sequential computers. Due to the rapid growth of sequence databases biologists have to compute MSAs in a far shorter time. In this paper we present a new approach to MSA on reconfigurable hardware platforms to gain high performance at low cost. We have constructed a linear systolic array to perform pairwise sequence distance computations using dynamic programming. This results in an implementation with significant runtime savings on a standard FPGA.
Molecular and biological characterization of the 5 human-bovine rotavirus (WC3)-based reassortant strains of the pentavalent rotavirus vaccine, RotaTeq (registered)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Matthijnssens, Jelle; Joelsson, Daniel B.; Warakomski, Donald J.

RotaTeq (registered) is a pentavalent rotavirus vaccine that contains five human-bovine reassortant strains (designated G1, G2, G3, G4, and P1) on the backbone of the naturally attenuated tissue culture-adapted parental bovine rotavirus (BRV) strain WC3. The viral genomes of each of the reassortant strains were completely sequenced and compared pairwise and phylogenetically among each other and to human rotavirus (HRV) and BRV reference strains. Reassortants G1, G2, G3, and G4 contained the VP7 gene from their corresponding HRV parent strains, while reassortants G1 and G2 also contained the VP3 gene (genotype M1) from the HRV parent strain. The P1 reassortantmore » contained the VP4 gene from the HRV parent strain and all the other gene segments from the BRV WC3 strain. The human VP7s had a high level of overall amino acid identity (G1: 95-99%, G2: 94-99% G3: 96-100%, G4: 93-99%) when compared to those of representative rotavirus strains of their corresponding G serotypes. The VP4 of the P1 reassortant had a high identity (92-97%) with those of serotype P1A[8] HRV reference strains, while the BRV VP7 showed identities ranging from 91% to 94% to those of serotype G6 HRV strains. Sequence analyses of the BRV or HRV genes confirmed that the fundamental structure of the proteins in the vaccine was similar to those of the HRV and BRV references strains. Sequences analyses showed that RotaTeq (registered) exhibited a high degree of genetic stability as no mutations were identified in the material of each reassortant, which undergoes two rounds of replication cycles in cell culture during the manufacturing process, when compared to the final material used to fill the dosing tubes. The infectivity of each of the reassortant strains of RotaTeq (registered) , like HRV strains, did not require the presence of sialic acid residues on the cell surface. The molecular and biologic characterization of RotaTeq (registered) adds to the significant body of clinical data supporting the consistent efficacy, immunogenicity, and safety of RotaTeq (registered) .« less
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes

PubMed Central

Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim

2010-01-01

Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith–Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid™, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. Availability: The database can be accessed through http://proteinworlddb.org Contact: otto@fiocruz.br PMID:20089515
Deducing the temporal order of cofactor function in ligand-regulated gene transcription: theory and experimental verification.

PubMed

Dougherty, Edward J; Guo, Chunhua; Simons, S Stoney; Chow, Carson C

2012-01-01

Cofactors are intimately involved in steroid-regulated gene expression. Two critical questions are (1) the steps at which cofactors exert their biological activities and (2) the nature of that activity. Here we show that a new mathematical theory of steroid hormone action can be used to deduce the kinetic properties and reaction sequence position for the functioning of any two cofactors relative to a concentration limiting step (CLS) and to each other. The predictions of the theory, which can be applied using graphical methods similar to those of enzyme kinetics, are validated by obtaining internally consistent data for pair-wise analyses of three cofactors (TIF2, sSMRT, and NCoR) in U2OS cells. The analysis of TIF2 and sSMRT actions on GR-induction of an endogenous gene gave results identical to those with an exogenous reporter. Thus new tools to determine previously unobtainable information about the nature and position of cofactor action in any process displaying first-order Hill plot kinetics are now available.
Deducing the Temporal Order of Cofactor Function in Ligand-Regulated Gene Transcription: Theory and Experimental Verification

PubMed Central

Dougherty, Edward J.; Guo, Chunhua; Simons, S. Stoney; Chow, Carson C.

2012-01-01

Cofactors are intimately involved in steroid-regulated gene expression. Two critical questions are (1) the steps at which cofactors exert their biological activities and (2) the nature of that activity. Here we show that a new mathematical theory of steroid hormone action can be used to deduce the kinetic properties and reaction sequence position for the functioning of any two cofactors relative to a concentration limiting step (CLS) and to each other. The predictions of the theory, which can be applied using graphical methods similar to those of enzyme kinetics, are validated by obtaining internally consistent data for pair-wise analyses of three cofactors (TIF2, sSMRT, and NCoR) in U2OS cells. The analysis of TIF2 and sSMRT actions on GR-induction of an endogenous gene gave results identical to those with an exogenous reporter. Thus new tools to determine previously unobtainable information about the nature and position of cofactor action in any process displaying first-order Hill plot kinetics are now available. PMID:22272313
Tools for Protecting the Privacy of Specific Individuals in Video

NASA Astrophysics Data System (ADS)

Chen, Datong; Chang, Yi; Yan, Rong; Yang, Jie

2007-12-01

This paper presents a system for protecting the privacy of specific individuals in video recordings. We address the following two problems: automatic people identification with limited labeled data, and human body obscuring with preserved structure and motion information. In order to address the first problem, we propose a new discriminative learning algorithm to improve people identification accuracy using limited training data labeled from the original video and imperfect pairwise constraints labeled from face obscured video data. We employ a robust face detection and tracking algorithm to obscure human faces in the video. Our experiments in a nursing home environment show that the system can obtain a high accuracy of people identification using limited labeled data and noisy pairwise constraints. The study result indicates that human subjects can perform reasonably well in labeling pairwise constraints with the face masked data. For the second problem, we propose a novel method of body obscuring, which removes the appearance information of the people while preserving rich structure and motion information. The proposed approach provides a way to minimize the risk of exposing the identities of the protected people while maximizing the use of the captured data for activity/behavior analysis.
Measuring pair-wise molecular interactions in a complex mixture

NASA Astrophysics Data System (ADS)

Chakraborty, Krishnendu; Varma, Manoj M.; Venkatapathi, Murugesan

2016-03-01

Complex biological samples such as serum contain thousands of proteins and other molecules spanning up to 13 orders of magnitude in concentration. Present measurement techniques do not permit the analysis of all pair-wise interactions between the components of such a complex mixture to a given target molecule. In this work we explore the use of nanoparticle tags which encode the identity of the molecule to obtain the statistical distribution of pair-wise interactions using their Localized Surface Plasmon Resonance (LSPR) signals. The nanoparticle tags are chosen such that the binding between two molecules conjugated to the respective nanoparticle tags can be recognized by the coupling of their LSPR signals. This numerical simulation is done by DDA to investigate this approach using a reduced system consisting of three nanoparticles (a gold ellipsoid with aspect ratio 2.5 and short axis 16 nm, and two silver ellipsoids with aspect ratios 3 and 2 and short axes 8 nm and 10 nm respectively) and the set of all possible dimers formed between them. Incident light was circularly polarized and all possible particle and dimer orientations were considered. We observed that minimum peak separation between two spectra is 5 nm while maximum is 184nm.
Identification of the same polyomavirus species in different African horseshoe bat species is indicative of short-range host-switching events.

PubMed

Carr, Michael; Gonzalez, Gabriel; Sasaki, Michihito; Dool, Serena E; Ito, Kimihito; Ishii, Akihiro; Hang'ombe, Bernard M; Mweene, Aaron S; Teeling, Emma C; Hall, William W; Orba, Yasuko; Sawa, Hirofumi

2017-10-06

Polyomaviruses (PyVs) are considered to be highly host-specific in different mammalian species, with no well-supported evidence for host-switching events. We examined the species diversity and host specificity of PyVs in horseshoe bats (Rhinolophus spp.), a broadly distributed and highly speciose mammalian genus. We annotated six PyV genomes, comprising four new PyV species, based on pairwise identity within the large T antigen (LTAg) coding region. Phylogenetic comparisons revealed two instances of highly related PyV species, one in each of the Alphapolyomavirus and Betapolyomavirus genera, present in different horseshoe bat host species (Rhinolophus blasii and R. simulator), suggestive of short-range host-switching events. The two pairs of Rhinolophus PyVs in different horseshoe bat host species were 99.9 and 88.8 % identical with each other over their respective LTAg coding sequences and thus constitute the same virus species. To corroborate the species identification of the bat hosts, we analysed mitochondrial cytb and a large nuclear intron dataset derived from six independent and neutrally evolving loci for bat taxa of interest. Bayesian estimates of the ages of the most recent common ancestors suggested that the near-identical and more distantly related PyV species diverged approximately 9.1E4 (5E3-2.8E5) and 9.9E6 (4E6-18E6) years before the present, respectively, in contrast to the divergence times of the bat host species: 12.4E6 (10.4E6-15.4E6). Our findings provide evidence that short-range host-switching of PyVs is possible in horseshoe bats, suggesting that PyV transmission between closely related mammalian species can occur.
Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species

PubMed Central

Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.

2012-01-01

Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of genomic content. Differences in gene content likely contribute to differences in the clinical and environmental distribution of species and sequence types. PMID:23166675
Partial sequencing analysis of the NS5B region confirmed the predominance of hepatitis C virus genotype 1 infection in Jeddah, Saudi Arabia.

PubMed

El Hadad, Sahar; Al-Hamdan, Hesa; Linjawi, Sabah

2017-01-01

Chronic hepatitis C virus (HCV) infection and its progression are major health problems that many countries including Saudi Arabia are facing. Determination of HCV genotypes and subgenotypes is critical for epidemiological and clinical analysis and aids in the determination of the ideal treatment strategy that needs to be followed and the expected therapy response. Although HCV infection has been identified as the second most predominant type of hepatitis in Saudi Arabia, little is known about the molecular epidemiology and genetic variability of HCV circulating in the Jeddah province of Saudi Arabia. The aim of this study was to determine the dominance of various HCV genotypes and subgenotypes circulating in Jeddah using partial sequencing of the NS5B region. To the best of our knowledge, this is the first study of its kind in Saudi Arabia. To characterize HCV genotypes and subgenotypes, serum samples from 56 patients with chronic HCV infection were collected and subjected to partial NS5B gene amplification and sequence analysis. Phylogenetic analysis of the NS5B partial sequences revealed that HCV/1 was the predominant genotype (73%), followed by HCV/4 (24.49%) and HCV/3 (2.04%). Moreover, pairwise analysis also confirmed these results based on the average specific nucleotide distance identity: ±0.112, ±0.112, and ±0.179 for HCV/1, HCV/4, and HCV/3, respectively, without any interference between genotypes. Notably, the phylogenetic tree of the HCV/1 subgenotypes revealed that all the isolates (100%) from the present study belonged to the HCV/1a subgenotype. Our findings also revealed similarities in the nucleotide sequences between HCV circulating in Saudi Arabia and those circulating in countries such as Morocco, Egypt, Canada, India, Pakistan, and France. These results indicated that determination of HCV genotypes and subgenotypes based on partial sequence analysis of the NS5B region is accurate and reliable for HCV subtype determination.
Mitochondrial control-region sequence variation in aboriginal Australians.

PubMed Central

van Holst Pellekaan, S; Frommer, M; Sved, J; Boettcher, B

1998-01-01

The mitochondrial D-loop hypervariable segment 1 (mt HVS1) between nucleotides 15997 and 16377 has been examined in aboriginal Australian people from the Darling River region of New South Wales (riverine) and from Yuendumu in central Australia (desert). Forty-seven unique HVS1 types were identified, varying at 49 nucleotide positions. Pairwise analysis by calculation of BEPPI (between population proportion index) reveals statistically significant structure in the populations, although some identical HVS1 types are seen in the two contrasting regions. mt HVS1 types may reflect more-ancient distributions than do linguistic diversity and other culturally distinguishing attributes. Comparison with sequences from five published global studies reveals that these Australians demonstrate greatest divergence from some Africans, least from Papua New Guinea highlanders, and only slightly more from some Pacific groups (Indonesian, Asian, Samoan, and coastal Papua New Guinea), although the HVS1 types vary at different nucleotide sites. Construction of a median network, displaying three main groups, suggests that several hypervariable nucleotide sites within the HVS1 are likely to have undergone mutation independently, making phylogenetic comparison with global samples by conventional methods difficult. Specific nucleotide-site variants are major separators in median networks constructed from Australian HVS1 types alone and for one global selection. The distribution of these, requiring extended study, suggests that they may be signatures of different groups of prehistoric colonizers into Australia, for which the time of colonization remains elusive. PMID:9463317
SARA-Coffee web server, a tool for the computation of RNA sequence and structure multiple alignments

PubMed Central

Di Tommaso, Paolo; Bussotti, Giovanni; Kemena, Carsten; Capriotti, Emidio; Chatzou, Maria; Prieto, Pablo; Notredame, Cedric

2014-01-01

This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for every sequence, column and aligned residue. SARA-Coffee combines SARA, a pairwise structural RNA aligner with the R-Coffee multiple RNA aligner in a way that has been shown to improve alignment accuracy over most sequence aligners when enough structural data is available. The server can be accessed from http://tcoffee.crg.cat/apps/tcoffee/do:saracoffee. PMID:24972831

Analysis of Neuronal Sequences Using Pairwise Biases

DTIC Science & Technology

2015-08-27

semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...hippocampal formation in an attempt to be cured of severe epileptic seizures. Although the surgery was successful in regards to reducing the frequency and...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order
Multiple alignment analysis on phylogenetic tree of the spread of SARS epidemic using distance method

NASA Astrophysics Data System (ADS)

Amiroch, S.; Pradana, M. S.; Irawan, M. I.; Mukhlash, I.

2017-09-01

Multiple Alignment (MA) is a particularly important tool for studying the viral genome and determine the evolutionary process of the specific virus. Application of MA in the case of the spread of the Severe acute respiratory syndrome (SARS) epidemic is an interesting thing because this virus epidemic a few years ago spread so quickly that medical attention in many countries. Although there has been a lot of software to process multiple sequences, but the use of pairwise alignment to process MA is very important to consider. In previous research, the alignment between the sequences to process MA algorithm, Super Pairwise Alignment, but in this study used a dynamic programming algorithm Needleman wunchs simulated in Matlab. From the analysis of MA obtained and stable region and unstable which indicates the position where the mutation occurs, the system network topology that produced the phylogenetic tree of the SARS epidemic distance method, and system area networks mutation.
Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.

PubMed

Khaled, Heba; Faheem, Hossam El Deen Mostafa; El Gohary, Rania

2015-01-01

This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.
Complete mitochondrial genome sequences from five Eimeria species (Apicomplexa; Coccidia; Eimeriidae) infecting domestic turkeys

PubMed Central

2014-01-01

Background Clinical and subclinical coccidiosis is cosmopolitan and inflicts significant losses to the poultry industry globally. Seven named Eimeria species are responsible for coccidiosis in turkeys: Eimeria dispersa; Eimeria meleagrimitis; Eimeria gallopavonis; Eimeria meleagridis; Eimeria adenoeides; Eimeria innocua; and, Eimeria subrotunda. Although attempts have been made to characterize these parasites molecularly at the nuclear 18S rDNA and ITS loci, the maternally-derived and mitotically replicating mitochondrial genome may be more suited for species level molecular work; however, only limited sequence data are available for Eimeria spp. infecting turkeys. The purpose of this study was to sequence and annotate the complete mitochondrial genomes from 5 Eimeria species that commonly infect the domestic turkey (Meleagris gallopavo). Methods Six single-oocyst derived cultures of five Eimeria species infecting turkeys were PCR-amplified and sequenced completely prior to detailed annotation. Resulting sequences were aligned and used in phylogenetic analyses (BI, ML, and MP) that included complete mitochondrial genomes from 16 Eimeria species or concatenated CDS sequences from each genome. Results Complete mitochondrial genome sequences were obtained for Eimeria adenoeides Guelph, 6211 bp; Eimeria dispersa Briston, 6238 bp; Eimeria meleagridis USAR97-01, 6212 bp; Eimeria meleagrimitis USMN08-01, 6165 bp; Eimeria gallopavonis Weybridge, 6215 bp; and Eimeria gallopavonis USKS06-01, 6215 bp). The order, orientation and CDS lengths of the three protein coding genes (COI, COIII and CytB) as well as rDNA fragments encoding ribosomal large and small subunit rRNA were conserved among all sequences. Pairwise sequence identities between species ranged from 88.1% to 98.2%; sequence variability was concentrated within CDS or between rDNA fragments (where indels were common). No phylogenetic reconstruction supported monophyly of Eimeria species infecting turkeys; Eimeria dispersa may have arisen via host switching from another avian host. Phylogenetic analyses suggest E. necatrix and E. tenella are related distantly to other Eimeria of chickens. Conclusions Mitochondrial genomes of Eimeria species sequenced to date are highly conserved with regard to gene content and structure. Nonetheless, complete mitochondrial genome sequences and, particularly the three CDS, possess sufficient sequence variability for differentiating Eimeria species of poultry. The mitochondrial genome sequences are highly suited for molecular diagnostics and phylogenetics of coccidia and, potentially, genetic markers for molecular epidemiology. PMID:25034633
Complete mitochondrial genome sequences from five Eimeria species (Apicomplexa; Coccidia; Eimeriidae) infecting domestic turkeys.

PubMed

Ogedengbe, Mosun E; El-Sherry, Shiem; Whale, Julia; Barta, John R

2014-07-17

Clinical and subclinical coccidiosis is cosmopolitan and inflicts significant losses to the poultry industry globally. Seven named Eimeria species are responsible for coccidiosis in turkeys: Eimeria dispersa; Eimeria meleagrimitis; Eimeria gallopavonis; Eimeria meleagridis; Eimeria adenoeides; Eimeria innocua; and, Eimeria subrotunda. Although attempts have been made to characterize these parasites molecularly at the nuclear 18S rDNA and ITS loci, the maternally-derived and mitotically replicating mitochondrial genome may be more suited for species level molecular work; however, only limited sequence data are available for Eimeria spp. infecting turkeys. The purpose of this study was to sequence and annotate the complete mitochondrial genomes from 5 Eimeria species that commonly infect the domestic turkey (Meleagris gallopavo). Six single-oocyst derived cultures of five Eimeria species infecting turkeys were PCR-amplified and sequenced completely prior to detailed annotation. Resulting sequences were aligned and used in phylogenetic analyses (BI, ML, and MP) that included complete mitochondrial genomes from 16 Eimeria species or concatenated CDS sequences from each genome. Complete mitochondrial genome sequences were obtained for Eimeria adenoeides Guelph, 6211 bp; Eimeria dispersa Briston, 6238 bp; Eimeria meleagridis USAR97-01, 6212 bp; Eimeria meleagrimitis USMN08-01, 6165 bp; Eimeria gallopavonis Weybridge, 6215 bp; and Eimeria gallopavonis USKS06-01, 6215 bp). The order, orientation and CDS lengths of the three protein coding genes (COI, COIII and CytB) as well as rDNA fragments encoding ribosomal large and small subunit rRNA were conserved among all sequences. Pairwise sequence identities between species ranged from 88.1% to 98.2%; sequence variability was concentrated within CDS or between rDNA fragments (where indels were common). No phylogenetic reconstruction supported monophyly of Eimeria species infecting turkeys; Eimeria dispersa may have arisen via host switching from another avian host. Phylogenetic analyses suggest E. necatrix and E. tenella are related distantly to other Eimeria of chickens. Mitochondrial genomes of Eimeria species sequenced to date are highly conserved with regard to gene content and structure. Nonetheless, complete mitochondrial genome sequences and, particularly the three CDS, possess sufficient sequence variability for differentiating Eimeria species of poultry. The mitochondrial genome sequences are highly suited for molecular diagnostics and phylogenetics of coccidia and, potentially, genetic markers for molecular epidemiology.
Lunatimonas lonarensis gen. nov., sp. nov., a haloalkaline bacterium of the family Cyclobacteriaceae with nitrate reducing activity.

PubMed

Srinivas, T N R; Aditya, S; Bhumika, V; Kumar, P Anil

2014-02-01

Novel pinkish-orange pigmented, Gram-negative staining, half-moon shaped, non-motile, strictly aerobic strains designated AK24(T) and AK26 were isolated from water and sediment samples of Lonar Lake, Buldhana district, Maharahstra, India. Both strains were positive for oxidase, catalase and β-galactosidase activities. The predominant fatty acids were iso-C15:0 (41.5%), anteiso-C15:0 (9.7%), iso-C17:0 3OH (9.6%), iso-C17:1 ω9c (10.2%) and C16:1 ω7c/C16:1 ω6c/iso-C15:0 2OH (summed feature 3) (14.4%). The strains contained MK-7 as the major respiratory quinone, and phosphatidylethanolamine and five unidentified lipids as the polar lipids. Blast analysis of the 16S rRNA gene sequence of strain AK24(T) showed that it was closely related to Aquiflexum balticum, with a pair-wise sequence similarity of 91.6%, as well as to Fontibacter ferrireducens, Belliella baltica and Indibacter alkaliphilus (91.3, 91.2 and 91.2% pair-wise sequence similarity, respectively), but it only had between 88.6 and 91.0% pair-wise sequence similarity to the rest of the family members. The MALDI-TOF assay reported no significant similarities for AK24(T) and AK26, since they potentially represented a new species. A MALDI MSP dendrogram showed close similarity between the two strains, but they maintained a distance from their phylogenetic neighbors. The genome of AK24(T) showed the presence of heavy metal tolerance genes, including the genes providing resistance to arsenic, cadmium, cobalt and zinc. A cluster of heat shock resistance genes was also found in the genome. Two lantibiotic producing genes, LanR and LasB, were also found in the genome of AK24(T). Strains AK24(T) and AK26 were very closely related to each other with 99.5% pair-wise sequence similarity. Phylogenetic analysis indicated that the strains were members of the family Cyclobacteriaceae and they clustered with the genus Mariniradius, as well as with the genera Aquiflexum, Cecembia, Fontibacter, Indibacter, and Shivajiella. DNA-DNA hybridization between strains AK24(T) and AK26 showed a relatedness of 82% and their rep-PCR banding patterns were very similar. Based on data from the current polyphasic study, it is proposed that the isolates be placed in a new genus and species with the name Lunatimonas lonarensis gen. nov., sp. nov. The type strain of Lunatimonas lonarensis is AK24(T) (=JCM 18822(T)=MTCC 11627(T)). Copyright © 2013 Elsevier GmbH. All rights reserved.
Dynamic facial expression recognition based on geometric and texture features

NASA Astrophysics Data System (ADS)

Li, Ming; Wang, Zengfu

2018-04-01

Recently, dynamic facial expression recognition in videos has attracted growing attention. In this paper, we propose a novel dynamic facial expression recognition method by using geometric and texture features. In our system, the facial landmark movements and texture variations upon pairwise images are used to perform the dynamic facial expression recognition tasks. For one facial expression sequence, pairwise images are created between the first frame and each of its subsequent frames. Integration of both geometric and texture features further enhances the representation of the facial expressions. Finally, Support Vector Machine is used for facial expression recognition. Experiments conducted on the extended Cohn-Kanade database show that our proposed method can achieve a competitive performance with other methods.
Distributed Data-aggregation Consensus for Sensor Networks: Relaxation of Consensus Concept and Convergence Property

DTIC Science & Technology

2014-08-01

consensus algorithm called randomized gossip is more suitable [7, 8]. In asynchronous randomized gossip algorithms, pairs of neighboring nodes exchange...messages and perform updates in an asynchronous and unattended manner, and they also 1 The class of broadcast gossip algorithms [9, 10, 11, 12] are...dynamics [2] and asynchronous pairwise randomized gossip [7, 8], broadcast gossip algorithms do not require that nodes know the identities of their
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs.

PubMed

Blazewicz, Jacek; Frohmberg, Wojciech; Kierzynka, Michal; Pesch, Erwin; Wojciechowski, Pawel

2011-05-20

Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.
Gene order in rosid phylogeny, inferred from pairwise syntenies among extant genomes

PubMed Central

2012-01-01

Background Ancestral gene order reconstruction for flowering plants has lagged behind developments in yeasts, insects and higher animals, because of the recency of widespread plant genome sequencing, sequencers' embargoes on public data use, paralogies due to whole genome duplication (WGD) and fractionation of undeleted duplicates, extensive paralogy from other sources, and the computational cost of existing methods. Results We address these problems, using the gene order of four core eudicot genomes (cacao, castor bean, papaya and grapevine) that have escaped any recent WGD events, and two others (poplar and cucumber) that descend from independent WGDs, in inferring the ancestral gene order of the rosid clade and those of its main subgroups, the fabids and malvids. We improve and adapt techniques including the OMG method for extracting large, paralogy-free, multiple orthologies from conflated pairwise synteny data among the six genomes and the PATHGROUPS approach for ancestral gene order reconstruction in a given phylogeny, where some genomes may be descendants of WGD events. We use the gene order evidence to evaluate the hypothesis that the order Malpighiales belongs to the malvids rather than as traditionally assigned to the fabids. Conclusions Gene orders of ancestral eudicot species, involving 10,000 or more genes can be reconstructed in an efficient, parsimonious and consistent way, despite paralogies due to WGD and other processes. Pairwise genomic syntenies provide appropriate input to a parameter-free procedure of multiple ortholog identification followed by gene-order reconstruction in solving instances of the "small phylogeny" problem. PMID:22759433
Nucleotide sequencing and identification of some wild mushrooms.

PubMed

Das, Sudip Kumar; Mandal, Aninda; Datta, Animesh K; Gupta, Sudha; Paul, Rita; Saha, Aditi; Sengupta, Sonali; Dubey, Priyanka Kumari

2013-01-01

The rDNA-ITS (Ribosomal DNA Internal Transcribed Spacers) fragment of the genomic DNA of 8 wild edible mushrooms (collected from Eastern Chota Nagpur Plateau of West Bengal, India) was amplified using ITS1 (Internal Transcribed Spacers 1) and ITS2 primers and subjected to nucleotide sequence determination for identification of mushrooms as mentioned. The sequences were aligned using ClustalW software program. The aligned sequences revealed identity (homology percentage from GenBank data base) of Amanita hemibapha [CN (Chota Nagpur) 1, % identity 99 (JX844716.1)], Amanita sp. [CN 2, % identity 98 (JX844763.1)], Astraeus hygrometricus [CN 3, % identity 87 (FJ536664.1)], Termitomyces sp. [CN 4, % identity 90 (JF746992.1)], Termitomyces sp. [CN 5, % identity 99 (GU001667.1)], T. microcarpus [CN 6, % identity 82 (EF421077.1)], Termitomyces sp. [CN 7, % identity 76 (JF746993.1)], and Volvariella volvacea [CN 8, % identity 100 (JN086680.1)]. Although out of 8 mushrooms 4 could be identified up to species level, the nucleotide sequences of the rest may be relevant to further characterization. A phylogenetic tree is constructed using Neighbor-Joining method showing interrelationship between/among the mushrooms. The determined nucleotide sequences of the mushrooms may provide additional information enriching GenBank database aiding to molecular taxonomy and facilitating its domestication and characterization for human benefits.
Comparative Metagenomics of Cellulose- and Poplar Hydrolysate-Degrading Microcosms from Gut Microflora of the Canadian Beaver (Castor canadensis) and North American Moose (Alces americanus) after Long-Term Enrichment

PubMed Central

Wong, Mabel T.; Wang, Weijun; Couturier, Marie; Razeq, Fakhria M.; Lombard, Vincent; Lapebie, Pascal; Edwards, Elizabeth A.; Terrapon, Nicolas; Henrissat, Bernard; Master, Emma R.

2017-01-01

To identify carbohydrate-active enzymes (CAZymes) that might be particularly relevant for wood fiber processing, we performed a comparative metagenomic analysis of digestive systems from Canadian beaver (Castor canadensis) and North American moose (Alces americanus) following 3 years of enrichment on either microcrystalline cellulose or poplar hydrolysate. In total, 9,386 genes encoding CAZymes and carbohydrate-binding modules (CBMs) were identified, with up to half predicted to originate from Firmicutes, Bacteroidetes, Chloroflexi, and Proteobacteria phyla, and up to 17% from unknown phyla. Both PCA and hierarchical cluster analysis distinguished the annotated glycoside hydrolase (GH) distributions identified herein, from those previously reported for grass-feeding mammals and herbivorous foragers. The CAZyme profile of moose rumen enrichments also differed from a recently reported moose rumen metagenome, most notably by the absence of GH13-appended dockerins. Consistent with substrate-driven convergence, CAZyme profiles from both poplar hydrolysate-fed cultures differed from cellulose-fed cultures, most notably by increased numbers of unique sequences belonging to families GH3, GH5, GH43, GH53, and CE1. Moreover, pairwise comparisons of moose rumen enrichments further revealed higher counts of GH127 and CE15 families in cultures fed with poplar hydrolysate. To expand our scope to lesser known carbohydrate-active proteins, we identified and compared multi-domain proteins comprising both a CBM and domain of unknown function (DUF) as well as proteins with unknown function within the 416 predicted polysaccharide utilization loci (PULs). Interestingly, DUF362, identified in iron–sulfur proteins, was consistently appended to CBM9; on the other hand, proteins with unknown function from PULs shared little identity unless from identical PULs. Overall, this study sheds new light on the lignocellulose degrading capabilities of microbes originating from digestive systems of mammals known for fiber-rich diets, and highlights the value of enrichment to select new CAZymes from metagenome sequences for future biochemical characterization. PMID:29326667
The temperate Burkholderia phage AP3 of the Peduovirinae shows efficient antimicrobial activity against B. cenocepacia of the IIIA lineage.

PubMed

Roszniowski, Bartosz; Latka, Agnieszka; Maciejewska, Barbara; Vandenheuvel, Dieter; Olszak, Tomasz; Briers, Yves; Holt, Giles S; Valvano, Miguel A; Lavigne, Rob; Smith, Darren L; Drulis-Kawa, Zuzanna

2017-02-01

Burkholderia phage AP3 (vB_BceM_AP3) is a temperate virus of the Myoviridae and the Peduovirinae subfamily (P2likevirus genus). This phage specifically infects multidrug-resistant clinical Burkholderia cenocepacia lineage IIIA strains commonly isolated from cystic fibrosis patients. AP3 exhibits high pairwise nucleotide identity (61.7 %) to Burkholderia phage KS5, specific to the same B. cenocepacia host, and has 46.7-49.5 % identity to phages infecting other species of Burkholderia. The lysis cassette of these related phages has a similar organization (putative antiholin, putative holin, endolysin, and spanins) and shows 29-98 % homology between specific lysis genes, in contrast to Enterobacteria phage P2, the hallmark phage of this genus. The AP3 and KS5 lysis genes have conserved locations and high amino acid sequence similarity. The AP3 bacteriophage particles remain infective up to 5 h at pH 4-10 and are stable at 60 °C for 30 min, but are sensitive to chloroform, with no remaining infective particles after 24 h of treatment. AP3 lysogeny can occur by stable genomic integration and by pseudo-lysogeny. The lysogenic bacterial mutants did not exhibit any significant changes in virulence compared to wild-type host strain when tested in the Galleria mellonella moth wax model. Moreover, AP3 treatment of larvae infected with B. cenocepacia revealed a significant increase (P < 0.0001) in larvae survival in comparison to AP3-untreated infected larvae. AP3 showed robust lytic activity, as evidenced by its broad host range, the absence of increased virulence in lysogenic isolates, the lack of bacterial gene disruption conditioned by bacterial tRNA downstream integration site, and the absence of detected toxin sequences. These data suggest that the AP3 phage is a promising potent agent against bacteria belonging to the most common B. cenocepacia IIIA lineage strains.
Chloroplast genome expansion by intron multiplication in the basal psychrophilic euglenoid Eutreptiella pomquetensis

PubMed Central

Bennett, Matthew S.; Triemer, Richard E.; Preisfeld, Angelika

2017-01-01

Background Over the last few years multiple studies have been published showing a great diversity in size of chloroplast genomes (cpGenomes), and in the arrangement of gene clusters, in the Euglenales. However, while these genomes provided important insights into the evolution of cpGenomes across the Euglenales and within their genera, only two genomes were analyzed in regard to genomic variability between and within Euglenales and Eutreptiales. To better understand the dynamics of chloroplast genome evolution in early evolving Eutreptiales, this study focused on the cpGenome of Eutreptiella pomquetensis, and the spread and peculiarities of introns. Methods The Etl. pomquetensis cpGenome was sequenced, annotated and afterwards examined in structure, size, gene order and intron content. These features were compared with other euglenoid cpGenomes as well as those of prasinophyte green algae, including Pyramimonas parkeae. Results and Discussion With about 130,561 bp the chloroplast genome of Etl. pomquetensis, a basal taxon in the phototrophic euglenoids, was considerably larger than the two other Eutreptiales cpGenomes sequenced so far. Although the detected quadripartite structure resembled most green algae and plant chloroplast genomes, the gene content of the single copy regions in Etl. pomquetensis was completely different from those observed in green algae and plants. The gene composition of Etl. pomquetensis was extensively changed and turned out to be almost identical to other Eutreptiales and Euglenales, and not to P. parkeae. Furthermore, the cpGenome of Etl. pomquetensis was unexpectedly permeated by a high number of introns, which led to a substantially larger genome. The 51 identified introns of Etl. pomquetensis showed two major unique features: (i) more than half of the introns displayed a high level of pairwise identities; (ii) no group III introns could be identified in the protein coding genes. These findings support the hypothesis that group III introns are degenerated group II introns and evolved later. PMID:28852596
Subgenome-specific assembly of vitamin E biosynthesis genes and expression patterns during seed development provide insight into the evolution of oat genome.

PubMed

Gutierrez-Gonzalez, Juan J; Garvin, David F

2016-11-01

Vitamin E is essential for humans and thus must be a component of a healthy diet. Among the cereal grains, hexaploid oats (Avena sativa L.) have high vitamin E content. To date, no gene sequences in the vitamin E biosynthesis pathway have been reported for oats. Using deep sequencing and orthology-guided assembly, coding sequences of genes for each step in vitamin E synthesis in oats were reconstructed, including resolution of the sequences of homeologs. Three homeologs, presumably representing each of the three oat subgenomes, were identified for the main steps of the pathway. Partial sequences, likely representing pseudogenes, were recovered in some instances as well. Pairwise comparisons among homeologs revealed that two of the three putative subgenome-specific homeologs are almost identical for each gene. Synonymous substitution rates indicate the time of divergence of the two more similar subgenomes from the distinct one at 7.9-8.7 MYA, and a divergence between the similar subgenomes from a common ancestor 1.1 MYA. A new proposed evolutionary model for hexaploid oat formation is discussed. Homeolog-specific gene expression was quantified during oat seed development and compared with vitamin E accumulation. Homeolog expression largely appears to be similar for most of genes; however, for some genes, homoeolog-specific transcriptional bias was observed. The expression of HPPD, as well as certain homoeologs of VTE2 and VTE4, is highly correlated with seed vitamin E accumulation. Our findings expand our understanding of oat genome evolution and will assist efforts to modify vitamin E content and composition in oats. Published 2016. This article is a U.S. Government work and is in the public domain in the USA. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity

NASA Technical Reports Server (NTRS)

Fox, G. E.; Wisotzkey, J. D.; Jurtshuk, P. Jr

1992-01-01

16S rRNA (genes coding for rRNA) sequence comparisons were conducted with the following three psychrophilic strains: Bacillus globisporus W25T (T = type strain) and Bacillus psychrophilus W16AT, and W5. These strains exhibited more than 99.5% sequence identity and within experimental uncertainty could be regarded as identical. Their close taxonomic relationship was further documented by phenotypic similarities. In contrast, previously published DNA-DNA hybridization results have convincingly established that these strains do not belong to the same species if current standards are used. These results emphasize the important point that effective identity of 16S rRNA sequences is not necessarily a sufficient criterion to guarantee species identity. Thus, although 16S rRNA sequences can be used routinely to distinguish and establish relationships between genera and well-resolved species, very recently diverged species may not be recognizable.
Revisiting the diffusion approximation to estimate evolutionary rates of gene family diversification.

PubMed

Gjini, Erida; Haydon, Daniel T; David Barry, J; Cobbold, Christina A

2014-01-21

Genetic diversity in multigene families is shaped by multiple processes, including gene conversion and point mutation. Because multi-gene families are involved in crucial traits of organisms, quantifying the rates of their genetic diversification is important. With increasing availability of genomic data, there is a growing need for quantitative approaches that integrate the molecular evolution of gene families with their higher-scale function. In this study, we integrate a stochastic simulation framework with population genetics theory, namely the diffusion approximation, to investigate the dynamics of genetic diversification in a gene family. Duplicated genes can diverge and encode new functions as a result of point mutation, and become more similar through gene conversion. To model the evolution of pairwise identity in a multigene family, we first consider all conversion and mutation events in a discrete manner, keeping track of their details and times of occurrence; second we consider only the infinitesimal effect of these processes on pairwise identity accounting for random sampling of genes and positions. The purely stochastic approach is closer to biological reality and is based on many explicit parameters, such as conversion tract length and family size, but is more challenging analytically. The population genetics approach is an approximation accounting implicitly for point mutation and gene conversion, only in terms of per-site average probabilities. Comparison of these two approaches across a range of parameter combinations reveals that they are not entirely equivalent, but that for certain relevant regimes they do match. As an application of this modelling framework, we consider the distribution of nucleotide identity among VSG genes of African trypanosomes, representing the most prominent example of a multi-gene family mediating parasite antigenic variation and within-host immune evasion. © 2013 Published by Elsevier Ltd. All rights reserved.
GATA: A graphic alignment tool for comparative sequenceanalysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nix, David A.; Eisen, Michael B.

2005-01-01

Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
Genotypic and Functional Impact of HIV-1 Adaptation to Its Host Population during the North American Epidemic

PubMed Central

Carlson, Jonathan M.; Chan, Benjamin; Chopera, Denis R.; Brumme, Chanson J.; Markle, Tristan J.; Martin, Eric; Shahid, Aniqa; Anmole, Gursev; Mwimanzi, Philip; Nassab, Pauline; Penney, Kali A.; Rahman, Manal A.; Milloy, M.-J.; Schechter, Martin T.; Markowitz, Martin; Carrington, Mary; Walker, Bruce D.; Wagner, Theresa; Buchbinder, Susan; Fuchs, Jonathan; Koblin, Beryl; Mayer, Kenneth H.; Harrigan, P. Richard; Brockman, Mark A.; Poon, Art F. Y.; Brumme, Zabrina L.

2014-01-01

HLA-restricted immune escape mutations that persist following HIV transmission could gradually spread through the viral population, thereby compromising host antiviral immunity as the epidemic progresses. To assess the extent and phenotypic impact of this phenomenon in an immunogenetically diverse population, we genotypically and functionally compared linked HLA and HIV (Gag/Nef) sequences from 358 historic (1979–1989) and 382 modern (2000–2011) specimens from four key cities in the North American epidemic (New York, Boston, San Francisco, Vancouver). Inferred HIV phylogenies were star-like, with approximately two-fold greater mean pairwise distances in modern versus historic sequences. The reconstructed epidemic ancestral (founder) HIV sequence was essentially identical to the North American subtype B consensus. Consistent with gradual diversification of a “consensus-like” founder virus, the median “background” frequencies of individual HLA-associated polymorphisms in HIV (in individuals lacking the restricting HLA[s]) were ∼2-fold higher in modern versus historic HIV sequences, though these remained notably low overall (e.g. in Gag, medians were 3.7% in the 2000s versus 2.0% in the 1980s). HIV polymorphisms exhibiting the greatest relative spread were those restricted by protective HLAs. Despite these increases, when HIV sequences were analyzed as a whole, their total average burden of polymorphisms that were “pre-adapted” to the average host HLA profile was only ∼2% greater in modern versus historic eras. Furthermore, HLA-associated polymorphisms identified in historic HIV sequences were consistent with those detectable today, with none identified that could explain the few HIV codons where the inferred epidemic ancestor differed from the modern consensus. Results are therefore consistent with slow HIV adaptation to HLA, but at a rate unlikely to yield imminent negative implications for cellular immunity, at least in North America. Intriguingly, temporal changes in protein activity of patient-derived Nef (though not Gag) sequences were observed, suggesting functional implications of population-level HIV evolution on certain viral proteins. PMID:24762668
Risk of breast cancer with CXCR4-using HIV defined by V3 loop sequencing.

PubMed

Goedert, James J; Swenson, Luke C; Napolitano, Laura A; Haddad, Mojgan; Anastos, Kathryn; Minkoff, Howard; Young, Mary; Levine, Alexandra; Adeyemi, Oluwatoyin; Seaberg, Eric C; Aouizerat, Bradley; Rabkin, Charles S; Harrigan, P Richard; Hessol, Nancy A

2015-01-01

Evaluate the risk of female breast cancer associated with HIV-CXCR4 (X4) tropism as determined by various genotypic measures. A breast cancer case-control study, with pairwise comparisons of tropism determination methods, was conducted. From the Women's Interagency HIV Study repository, one stored plasma specimen was selected from 25 HIV-infected cases near the breast cancer diagnosis date and 75 HIV-infected control women matched for age and calendar date. HIV-gp120 V3 sequences were derived by Sanger population sequencing (PS) and 454-pyro deep sequencing (DS). Sequencing-based HIV-X4 tropism was defined using the geno2pheno algorithm, with both high-stringency DS [false-positive rate (3.5) and 2% X4 cutoff], and lower stringency DS (false-positive rate, 5.75 and 15% X4 cutoff). Concordance of tropism results by PS, DS, and previously performed phenotyping was assessed with kappa (κ) statistics. Case-control comparisons used exact P values and conditional logistic regression. In 74 women (19 cases, 55 controls) with complete results, prevalence of HIV-X4 by PS was 5% in cases vs 29% in controls (P = 0.06; odds ratio, 0.14; confidence interval: 0.003 to 1.03). Smaller case-control prevalence differences were found with high-stringency DS (21% vs 36%, P = 0.32), lower stringency DS (16% vs 35%, P = 0.18), and phenotyping (11% vs 31%, P = 0.10). HIV-X4 tropism concordance was best between PS and lower stringency DS (93%, κ = 0.83). Other pairwise concordances were 82%-92% (κ = 0.56-0.81). Concordance was similar among cases and controls. HIV-X4 defined by population sequencing (PS) had good agreement with lower stringency DS and was significantly associated with lower odds of breast cancer.

POEM: Identifying Joint Additive Effects on Regulatory Circuits.

PubMed

Botzman, Maya; Nachshon, Aharon; Brodt, Avital; Gat-Viks, Irit

2016-01-01

Expression Quantitative Trait Locus (eQTL) mapping tackles the problem of identifying variation in DNA sequence that have an effect on the transcriptional regulatory network. Major computational efforts are aimed at characterizing the joint effects of several eQTLs acting in concert to govern the expression of the same genes. Yet, progress toward a comprehensive prediction of such joint effects is limited. For example, existing eQTL methods commonly discover interacting loci affecting the expression levels of a module of co-regulated genes. Such "modularization" approaches, however, are focused on epistatic relations and thus have limited utility for the case of additive (non-epistatic) effects. Here we present POEM (Pairwise effect On Expression Modules), a methodology for identifying pairwise eQTL effects on gene modules. POEM is specifically designed to achieve high performance in the case of additive joint effects. We applied POEM to transcription profiles measured in bone marrow-derived dendritic cells across a population of genotyped mice. Our study reveals widespread additive, trans-acting pairwise effects on gene modules, characterizes their organizational principles, and highlights high-order interconnections between modules within the immune signaling network. These analyses elucidate the central role of additive pairwise effect in regulatory circuits, and provide computational tools for future investigations into the interplay between eQTLs. The software described in this article is available at csgi.tau.ac.il/POEM/.
POEM: Identifying Joint Additive Effects on Regulatory Circuits

PubMed Central

Botzman, Maya; Nachshon, Aharon; Brodt, Avital; Gat-Viks, Irit

2016-01-01

Motivation: Expression Quantitative Trait Locus (eQTL) mapping tackles the problem of identifying variation in DNA sequence that have an effect on the transcriptional regulatory network. Major computational efforts are aimed at characterizing the joint effects of several eQTLs acting in concert to govern the expression of the same genes. Yet, progress toward a comprehensive prediction of such joint effects is limited. For example, existing eQTL methods commonly discover interacting loci affecting the expression levels of a module of co-regulated genes. Such “modularization” approaches, however, are focused on epistatic relations and thus have limited utility for the case of additive (non-epistatic) effects. Results: Here we present POEM (Pairwise effect On Expression Modules), a methodology for identifying pairwise eQTL effects on gene modules. POEM is specifically designed to achieve high performance in the case of additive joint effects. We applied POEM to transcription profiles measured in bone marrow-derived dendritic cells across a population of genotyped mice. Our study reveals widespread additive, trans-acting pairwise effects on gene modules, characterizes their organizational principles, and highlights high-order interconnections between modules within the immune signaling network. These analyses elucidate the central role of additive pairwise effect in regulatory circuits, and provide computational tools for future investigations into the interplay between eQTLs. Availability: The software described in this article is available at csgi.tau.ac.il/POEM/. PMID:27148351
Improving homology modeling of G-protein coupled receptors through multiple-template derived conserved inter-residue interactions

NASA Astrophysics Data System (ADS)

Chaudhari, Rajan; Heim, Andrew J.; Li, Zhijun

2015-05-01

Evidenced by the three-rounds of G-protein coupled receptors (GPCR) Dock competitions, improving homology modeling methods of helical transmembrane proteins including the GPCRs, based on templates of low sequence identity, remains an eminent challenge. Current approaches addressing this challenge adopt the philosophy of "modeling first, refinement next". In the present work, we developed an alternative modeling approach through the novel application of available multiple templates. First, conserved inter-residue interactions are derived from each additional template through conservation analysis of each template-target pairwise alignment. Then, these interactions are converted into distance restraints and incorporated in the homology modeling process. This approach was applied to modeling of the human β2 adrenergic receptor using the bovin rhodopsin and the human protease-activated receptor 1 as templates and improved model quality was demonstrated compared to the homology model generated by standard single-template and multiple-template methods. This method of "refined restraints first, modeling next", provides a fast and complementary way to the current modeling approaches. It allows rational identification and implementation of additional conserved distance restraints extracted from multiple templates and/or experimental data, and has the potential to be applicable to modeling of all helical transmembrane proteins.
Molecular Characterization of the Complete Genome of Three Basal-BR Isolates of Turnip mosaic virus Infecting Raphanus sativus in China.

PubMed

Zhu, Fuxiang; Sun, Ying; Wang, Yan; Pan, Hongyu; Wang, Fengting; Zhang, Xianghui; Zhang, Yanhua; Liu, Jinliang

2016-06-04

Turnip mosaic virus (TuMV) infects crops of plant species in the family Brassicaceae worldwide. TuMV isolates were clustered to five lineages corresponding to basal-B, basal-BR, Asian-BR, world-B and OMs. Here, we determined the complete genome sequences of three TuMV basal-BR isolates infecting radish from Shandong and Jilin Provinces in China. Their genomes were all composed of 9833 nucleotides, excluding the 3'-terminal poly(A) tail. They contained two open reading frames (ORFs), with the large one encoding a polyprotein of 3164 amino acids and the small overlapping ORF encoding a PIPO protein of 61 amino acids, which contained the typically conserved motifs found in members of the genus Potyvirus. In pairwise comparison with 30 other TuMV genome sequences, these three isolates shared their highest identities with isolates from Eurasian countries (Germany, Italy, Turkey and China). Recombination analysis showed that the three isolates in this study had no "clear" recombination. The analyses of conserved amino acids changed between groups showed that the codons in the TuMV out group (OGp) and OMs group were the same at three codon sites (852, 1006, 1548), and the other TuMV groups (basal-B, basal-BR, Asian-BR, world-B) were different. This pattern suggests that the codon in the OMs progenitor did not change but that in the other TuMV groups the progenitor sequence did change at divergence. Genetic diversity analyses indicate that the PIPO gene was under the highest selection pressure and the selection pressure on P3N-PIPO and P3 was almost the same. It suggests that most of the selection pressure on P3 was probably imposed through P3N-PIPO.
iDNA-Prot: Identification of DNA Binding Proteins Using Random Forest with Grey Model

PubMed Central

Lin, Wei-Zhong; Fang, Jian-An; Xiao, Xuan; Chou, Kuo-Chen

2011-01-01

DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the “grey model” and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results. PMID:21935457
First report of Perkinsus beihaiensis in Crassostrea madrasensis from the Indian subcontinent.

PubMed

Sanil, N K; Suja, G; Lijo, J; Vijayan, K K

2012-04-26

Protozoan parasites of the genus Perkinsus are considered important pathogens responsible for mass mortalities in many wild and farmed bivalve populations. The present study was initiated to screen populations of the Indian edible oyster Crassostrea madrasensis, a promising candidate for aquaculture along the Indian coasts, for the presence of Perkinsus spp. The study reports the presence of P. beihaiensis for the first time in C. madrasensis populations from the Indian subcontinent and south Asia. Samples collected from the east and west coasts of India were subjected to Ray's fluid thioglycollate medium (RFTM) culture and histology which indicated the presence of Perkinsus spp. PCR screening of the tissues using specific primers amplified the product specific to the genus Perkinsus. The taxonomic affinities of the parasites were determined by sequencing both internal transcribed spacer (ITS) and actin genes followed by basic local alignment search tool (BLAST) analysis. Analysis based on the ITS sequences showed 98 to 100% identity to Perkinsus spp. (P. beihaiensis and Brazilian Perkinsus sp.). The pairwise genetic distance values and phylogenetic analysis confirmed that 2 of the present samples belonged to the P. beihaiensis clade while the other 4 showed close affinities with the Brazilian Perkinsus sp. clade. The genetic divergence data, close affinity with the Brazilian Perkinsus sp., and co-existence with P. beihaiensis in the same host species in the same habitat show that the remaining 4 samples exhibit some degree of variation from P. beihaiensis. As expected, the sequencing of actin genes did not show any divergence among the samples studied. They probably could be intraspecific variants of P. beihaiensis having a separate lineage in the process of evolution.
HIV-TRACE (Transmission Cluster Engine): a tool for large scale molecular epidemiology of HIV-1 and other rapidly evolving pathogens.

PubMed

Kosakovsky Pond, Sergei L; Weaver, Steven; Leigh Brown, Andrew J; Wertheim, Joel O

2018-01-31

In modern applications of molecular epidemiology, genetic sequence data are routinely used to identify clusters of transmission in rapidly evolving pathogens, most notably HIV-1. Traditional 'shoeleather' epidemiology infers transmission clusters by tracing chains of partners sharing epidemiological connections (e.g., sexual contact). Here, we present a computational tool for identifying a molecular transmission analog of such clusters: HIV-TRACE (TRAnsmission Cluster Engine). HIV-TRACE implements an approach inspired by traditional epidemiology, by identifying chains of partners whose viral genetic relatedness imply direct or indirect epidemiological connections. Molecular transmission clusters are constructed using codon-aware pairwise alignment to a reference sequence followed by pairwise genetic distance estimation among all sequences. This approach is computationally tractable and is capable of identifying HIV-1 transmission clusters in large surveillance databases comprising tens or hundreds of thousands of sequences in near real time, i.e., on the order of minutes to hours. HIV-TRACE is available at www.hivtrace.org and from github.com/veg/hivtrace, along with the accompanying result visualization module from github.com/veg/hivtrace-viz. Importantly, the approach underlying HIV-TRACE is not limited to the study of HIV-1 and can be applied to study outbreaks and epidemics of other rapidly evolving pathogens. © The Author 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Dynamically heterogenous partitions and phylogenetic inference: an evaluation of analytical strategies with cytochrome b and ND6 gene sequences in cranes.

PubMed

Krajewski, C; Fain, M G; Buckley, L; King, D G

1999-11-01

ki ctes over whether molecular sequence data should be partitioned for phylogenetic analysis often confound two types of heterogeneity among partitions. We distinguish historical heterogeneity (i.e., different partitions have different evolutionary relationships) from dynamic heterogeneity (i.e., different partitions show different patterns of sequence evolution) and explore the impact of the latter on phylogenetic accuracy and precision with a two-gene, mitochondrial data set for cranes. The well-established phylogeny of cranes allows us to contrast tree-based estimates of relevant parameter values with estimates based on pairwise comparisons and to ascertain the effects of incorporating different amounts of process information into phylogenetic estimates. We show that codon positions in the cytochrome b and NADH dehydrogenase subunit 6 genes are dynamically heterogenous under both Poisson and invariable-sites + gamma-rates versions of the F84 model and that heterogeneity includes variation in base composition and transition bias as well as substitution rate. Estimates of transition-bias and relative-rate parameters from pairwise sequence comparisons were comparable to those obtained as tree-based maximum likelihood estimates. Neither rate-category nor mixed-model partitioning strategies resulted in a loss of phylogenetic precision relative to unpartitioned analyses. We suggest that weighted-average distances provide a computationally feasible alternative to direct maximum likelihood estimates of phylogeny for mixed-model analyses of large, dynamically heterogenous data sets. Copyright 1999 Academic Press.
Predator identity more than predator richness structures aquatic microbial assemblages in Sarracenia purpurea leaves.

PubMed

Canter, Erin J; Cuellar-Gempeler, Catalina; Pastore, Abigail I; Miller, Thomas E; Mason, Olivia U

2018-03-01

The importance of predators in influencing community structure is a well-studied area of ecology. However, few studies test ecological hypotheses of predation in multi-predator microbial communities. The phytotelmic community found within the water-filled leaves of the pitcher plant, Sarracenia purpurea, exhibits a simple trophic structure that includes multiple protozoan predators and microbial prey. Using this system, we sought to determine whether different predators target distinct microorganisms, how interactions among protozoans affect resource (microorganism) use, and how predator diversity affects prey community diversity. In particular, we endeavored to determine if protozoa followed known ecological patterns such as keystone predation or generalist predation. For these experiments, replicate inquiline microbial communities were maintained for seven days with five protozoan species. Microbial community structure was determined by 16S rRNA gene amplicon sequencing (iTag) and analysis. Compared to the control (no protozoa), two ciliates followed patterns of keystone predation by increasing microbial evenness. In pairwise competition treatments with a generalist flagellate, prey communities resembled the microbial communities of the respective keystone predator in monoculture. The relative abundance of the most common bacterial Operational Taxonomic Unit (OTU) in our system decreased compared to the control in the presence of these ciliates. This OTU was 98% similar to a known chitin degrader and nitrate reducer, important functions for the microbial community and the plant host. Collectively, the data demonstrated that predator identity had a greater effect on prey diversity and composition than overall predator diversity. © 2018 by the Ecological Society of America.
Functional Basis of Microorganism Classification.

PubMed

Zhu, Chengsheng; Delmont, Tom O; Vogel, Timothy M; Bromberg, Yana

2015-08-01

Correctly identifying nearest "neighbors" of a given microorganism is important in industrial and clinical applications where close relationships imply similar treatment. Microbial classification based on similarity of physiological and genetic organism traits (polyphasic similarity) is experimentally difficult and, arguably, subjective. Evolutionary relatedness, inferred from phylogenetic markers, facilitates classification but does not guarantee functional identity between members of the same taxon or lack of similarity between different taxa. Using over thirteen hundred sequenced bacterial genomes, we built a novel function-based microorganism classification scheme, functional-repertoire similarity-based organism network (FuSiON; flattened to fusion). Our scheme is phenetic, based on a network of quantitatively defined organism relationships across the known prokaryotic space. It correlates significantly with the current taxonomy, but the observed discrepancies reveal both (1) the inconsistency of functional diversity levels among different taxa and (2) an (unsurprising) bias towards prioritizing, for classification purposes, relatively minor traits of particular interest to humans. Our dynamic network-based organism classification is independent of the arbitrary pairwise organism similarity cut-offs traditionally applied to establish taxonomic identity. Instead, it reveals natural, functionally defined organism groupings and is thus robust in handling organism diversity. Additionally, fusion can use organism meta-data to highlight the specific environmental factors that drive microbial diversification. Our approach provides a complementary view to cladistic assignments and holds important clues for further exploration of microbial lifestyles. Fusion is a more practical fit for biomedical, industrial, and ecological applications, as many of these rely on understanding the functional capabilities of the microbes in their environment and are less concerned with phylogenetic descent.
Functional Basis of Microorganism Classification

PubMed Central

Zhu, Chengsheng; Delmont, Tom O.; Vogel, Timothy M.; Bromberg, Yana

2015-01-01

Correctly identifying nearest “neighbors” of a given microorganism is important in industrial and clinical applications where close relationships imply similar treatment. Microbial classification based on similarity of physiological and genetic organism traits (polyphasic similarity) is experimentally difficult and, arguably, subjective. Evolutionary relatedness, inferred from phylogenetic markers, facilitates classification but does not guarantee functional identity between members of the same taxon or lack of similarity between different taxa. Using over thirteen hundred sequenced bacterial genomes, we built a novel function-based microorganism classification scheme, functional-repertoire similarity-based organism network (FuSiON; flattened to fusion). Our scheme is phenetic, based on a network of quantitatively defined organism relationships across the known prokaryotic space. It correlates significantly with the current taxonomy, but the observed discrepancies reveal both (1) the inconsistency of functional diversity levels among different taxa and (2) an (unsurprising) bias towards prioritizing, for classification purposes, relatively minor traits of particular interest to humans. Our dynamic network-based organism classification is independent of the arbitrary pairwise organism similarity cut-offs traditionally applied to establish taxonomic identity. Instead, it reveals natural, functionally defined organism groupings and is thus robust in handling organism diversity. Additionally, fusion can use organism meta-data to highlight the specific environmental factors that drive microbial diversification. Our approach provides a complementary view to cladistic assignments and holds important clues for further exploration of microbial lifestyles. Fusion is a more practical fit for biomedical, industrial, and ecological applications, as many of these rely on understanding the functional capabilities of the microbes in their environment and are less concerned with phylogenetic descent. PMID:26317871
Global occurrence and heterogeneity of the Roseobacter-clade species Ruegeria mobilis

PubMed Central

Sonnenschein, Eva C; Nielsen, Kristian F; D'Alvise, Paul; Porsby, Cisse H; Melchiorsen, Jette; Heilmann, Jens; Kalatzis, Panos G; López-Pérez, Mario; Bunk, Boyke; Spröer, Cathrin; Middelboe, Mathias; Gram, Lone

2017-01-01

Tropodithietic acid (TDA)-producing Ruegeria mobilis strains of the Roseobacter clade have primarily been isolated from marine aquaculture and have probiotic potential due to inhibition of fish pathogens. We hypothesized that TDA producers with additional novel features are present in the oceanic environment. We isolated 42 TDA-producing R. mobilis strains during a global marine research cruise. While highly similar on the 16S ribosomal RNA gene level (99–100% identity), the strains separated into four sub-clusters in a multilocus sequence analysis. They were further differentiated to the strain level by average nucleotide identity using pairwise genome comparison. The four sub-clusters could not be associated with a specific environmental niche, however, correlated with the pattern of sub-typing using co-isolated phages, the number of prophages in the genomes and the distribution in ocean provinces. Major genomic differences within the sub-clusters include prophages and toxin-antitoxin systems. In general, the genome of R. mobilis revealed adaptation to a particle-associated life style and querying TARA ocean data confirmed that R. mobilis is more abundant in the particle-associated fraction than in the free-living fraction occurring in 40% and 6% of the samples, respectively. Our data and the TARA data, although lacking sufficient data from the polar regions, demonstrate that R. mobilis is a globally distributed marine bacterial species found primarily in the upper open oceans. It has preserved key phenotypic behaviors such as the production of TDA, but contains diverse sub-clusters, which could provide new capabilities for utilization in aquaculture. PMID:27552638
Phylogeography and genetic diversity of a widespread Old World butterfly, Lampides boeticus (Lepidoptera: Lycaenidae).

PubMed

Lohman, David J; Peggie, Djunijanti; Pierce, Naomi E; Meier, Rudolf

2008-10-30

Evolutionary genetics provides a rich theoretical framework for empirical studies of phylogeography. Investigations of intraspecific genetic variation can uncover new putative species while allowing inference into the evolutionary origin and history of extant populations. With a distribution on four continents ranging throughout most of the Old World, Lampides boeticus (Lepidoptera: Lycaenidae) is one of the most widely distributed species of butterfly. It is placed in a monotypic genus with no commonly accepted subspecies. Here, we investigate the demographic history and taxonomic status of this widespread species, and screen for the presence or absence of the bacterial endosymbiont Wolbachia. We performed phylogenetic, population genetic, and phylogeographic analyses using 1799 bp of mitochondrial sequence data from 57 specimens collected throughout the species' range. Most of the samples (>90%) were nearly genetically identical, with uncorrected pairwise sequence differences of 0-0.5% across geographic distances >9,000 km. However, five samples from central Thailand, Madagascar, northern Australia and the Moluccas formed two divergent clades differing from the majority of samples by uncorrected pairwise distances ranging from 1.79-2.21%. Phylogenetic analyses suggest that L. boeticus is almost certainly monophyletic, with all sampled genes coalescing well after the divergence from three closely related taxa included for outgroup comparisons. Analyses of molecular diversity indicate that most L. boeticus individuals in extant populations are descended from one or two relatively recent population bottlenecks. The combined analyses suggest a scenario in which the most recent common ancestor of L. boeticus and its sister taxon lived in the African region approximately 7 Mya; extant lineages of L. boeticus began spreading throughout the Old World at least 1.5 Mya. More recently, expansion after population bottlenecks approximately 1.4 Mya seem to have displaced most of the ancestral polymorphism throughout its range, though at least two early-branching lineages still persist. One of these lineages, in northern Australia and the Moluccas, may have experienced accelerated differentiation due to infection with the bacterial endosymbiont Wolbachia, which affects reproduction. Examination of a haplotype network suggests that Australia has been colonized by the species several times. While there is little evidence for the existence of morphologically cryptic species, these results suggest a complex history affected by repeated dispersal events.
Haplotype Reconstruction in Large Pedigrees with Many Untyped Individuals

NASA Astrophysics Data System (ADS)

Li, Xin; Li, Jing

Haplotypes, as they specify the linkage patterns between dispersed genetic variations, provide important information for understanding the genetics of human traits. However haplotypes are not directly available from current genotyping platforms, and hence there are extensive investigations of computational methods to recover such information. Two major computational challenges arising in current family-based disease studies are large family sizes and many ungenotyped family members. Traditional haplotyping methods can neither handle large families nor families with missing members. In this paper, we propose a method which addresses these issues by integrating multiple novel techniques. The method consists of three major components: pairwise identical-bydescent (IBD) inference, global IBD reconstruction and haplotype restoring. By reconstructing the global IBD of a family from pairwise IBD and then restoring the haplotypes based on the inferred IBD, this method can scale to large pedigrees, and more importantly it can handle families with missing members. Compared with existing methods, this method demonstrates much higher power to recover haplotype information, especially in families with many untyped individuals.
Scalable Creation of Long-Lived Multipartite Entanglement

NASA Astrophysics Data System (ADS)

Kaufmann, H.; Ruster, T.; Schmiegelow, C. T.; Luda, M. A.; Kaushal, V.; Schulz, J.; von Lindenfels, D.; Schmidt-Kaler, F.; Poschinger, U. G.

2017-10-01

We demonstrate the deterministic generation of multipartite entanglement based on scalable methods. Four qubits are encoded in 40Ca+, stored in a microstructured segmented Paul trap. These qubits are sequentially entangled by laser-driven pairwise gate operations. Between these, the qubit register is dynamically reconfigured via ion shuttling operations, where ion crystals are separated and merged, and ions are moved in and out of a fixed laser interaction zone. A sequence consisting of three pairwise entangling gates yields a four-ion Greenberger-Horne-Zeilinger state |ψ ⟩=(1 /√{2 })(|0000 ⟩+|1111 ⟩) , and full quantum state tomography reveals a state fidelity of 94.4(3)%. We analyze the decoherence of this state and employ dynamic decoupling on the spatially distributed constituents to maintain 69(5)% coherence at a storage time of 1.1 sec.
Structured prediction models for RNN based sequence labeling in clinical text.

PubMed

Jagannatha, Abhyuday N; Yu, Hong

2016-11-01

Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.
Structured prediction models for RNN based sequence labeling in clinical text

PubMed Central

Jagannatha, Abhyuday N; Yu, Hong

2016-01-01

Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies1 for structured prediction in order to improve the exact phrase detection of various medical entities. PMID:28004040
Impact of recombination on polymorphism of genes encoding Kunitz-type protease inhibitors in the genus Solanum.

PubMed

Speranskaya, Anna S; Krinitsina, Anastasia A; Kudryavtseva, Anna V; Poltronieri, Palmiro; Santino, Angelo; Oparina, Nina Y; Dmitriev, Alexey A; Belenikin, Maxim S; Guseva, Marina A; Shevelev, Alexei B

2012-08-01

The group of Kunitz-type protease inhibitors (KPI) from potato is encoded by a polymorphic family of multiple allelic and non-allelic genes. The previous explanations of the KPI variability were based on the hypothesis of random mutagenesis as a key factor of KPI polymorphism. KPI-A genes from the genomes of Solanum tuberosum cv. Istrinskii and the wild species Solanum palustre were amplified by PCR with subsequent cloning in plasmids. True KPI sequences were derived from comparison of the cloned copies. "Hot spots" of recombination in KPI genes were independently identified by DnaSP 4.0 and TOPALi v2.5 software. The KPI-A sequence from potato cv. Istrinskii was found to be 100% identical to the gene from Solanum nigrum. This fact illustrates a high degree of similarity of KPI genes in the genus Solanum. Pairwise comparison of KPI A and B genes unambiguously showed a non-uniform extent of polymorphism at different nt positions. Moreover, the occurrence of substitutions was not random along the strand. Taken together, these facts contradict the traditional hypothesis of random mutagenesis as a principal source of KPI gene polymorphism. The experimentally found mosaic structure of KPI genes in both plants studied is consistent with the hypothesis suggesting recombination of ancestral genes. The same mechanism was proposed earlier for other resistance-conferring genes in the nightshade family (Solanaceae). Based on the data obtained, we searched for potential motifs of site-specific binding with plant DNA recombinases. During this work, we analyzed the sequencing data reported by the Potato Genome Sequencing Consortium (PGSC), 2011 and found considerable inconsistence of their data concerning the number, location, and orientation of KPI genes of groups A and B. The key role of recombination rather than random point mutagenesis in KPI polymorphism was demonstrated for the first time. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
A Genome-Scale Investigation of How Sequence, Function, and Tree-Based Gene Properties Influence Phylogenetic Inference.

PubMed

Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis

2016-09-02

Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Metabolic Pathway Assignment of Plant Genes based on Phylogenetic Profiling–A Feasibility Study

PubMed Central

Weißenborn, Sandra; Walther, Dirk

2017-01-01

Despite many developed experimental and computational approaches, functional gene annotation remains challenging. With the rapidly growing number of sequenced genomes, the concept of phylogenetic profiling, which predicts functional links between genes that share a common co-occurrence pattern across different genomes, has gained renewed attention as it promises to annotate gene functions based on presence/absence calls alone. We applied phylogenetic profiling to the problem of metabolic pathway assignments of plant genes with a particular focus on secondary metabolism pathways. We determined phylogenetic profiles for 40,960 metabolic pathway enzyme genes with assigned EC numbers from 24 plant species based on sequence and pathway annotation data from KEGG and Ensembl Plants. For gene sequence family assignments, needed to determine the presence or absence of particular gene functions in the given plant species, we included data of all 39 species available at the Ensembl Plants database and established gene families based on pairwise sequence identities and annotation information. Aside from performing profiling comparisons, we used machine learning approaches to predict pathway associations from phylogenetic profiles alone. Selected metabolic pathways were indeed found to be composed of gene families of greater than expected phylogenetic profile similarity. This was particularly evident for primary metabolism pathways, whereas for secondary pathways, both the available annotation in different species as well as the abstraction of functional association via distinct pathways proved limiting. While phylogenetic profile similarity was generally not found to correlate with gene co-expression, direct physical interactions of proteins were reflected by a significantly increased profile similarity suggesting an application of phylogenetic profiling methods as a filtering step in the identification of protein-protein interactions. This feasibility study highlights the potential and challenges associated with phylogenetic profiling methods for the detection of functional relationships between genes as well as the need to enlarge the set of plant genes with proven secondary metabolism involvement as well as the limitations of distinct pathways as abstractions of relationships between genes. PMID:29163570

Whole genome analysis of porcine astroviruses detected in Japanese pigs reveals genetic diversity and possible intra-genotypic recombination.

PubMed

Ito, Mika; Kuroda, Moegi; Masuda, Tsuneyuki; Akagami, Masataka; Haga, Kei; Tsuchiaka, Shinobu; Kishimoto, Mai; Naoi, Yuki; Sano, Kaori; Omatsu, Tsutomu; Katayama, Yukie; Oba, Mami; Aoki, Hiroshi; Ichimaru, Toru; Mukono, Itsuro; Ouchi, Yoshinao; Yamasato, Hiroshi; Shirai, Junsuke; Katayama, Kazuhiko; Mizutani, Tetsuya; Nagai, Makoto

2017-06-01

Porcine astroviruses (PoAstVs) are ubiquitous enteric virus of pigs that are distributed in several countries throughout the world. Since PoAstVs are detected in apparent healthy pigs, the clinical significance of infection is unknown. However, AstVs have recently been associated with a severe neurological disorder in animals, including humans, and zoonotic potential has been suggested. To date, little is known about the epidemiology of PoAstVs among the pig population in Japan. In this report, we present an analysis of nearly complete genomes of 36 PoAstVs detected by a metagenomics approach in the feces of Japanese pigs. Based on a phylogenetic analysis and pairwise sequence comparison, 10, 5, 15, and 6 sequences were classified as PoAstV2, PoAstV3, PoAstV4, and PoAstV5, respectively. Co-infection with two or three strains was found in individual fecal samples from eight pigs. The phylogenetic trees of ORF1a, ORF1b, and ORF2 of PoAstV2 and PoAstV4 showed differences in their topologies. The PoAstV3 and PoAstV5 strains shared high sequence identities within each genotype in all ORFs; however, one PoAstV3 strain and one PoAstV5 strain showed considerable sequence divergence from the other PoAstV3 and PoAstV5 strains, respectively, in ORF2. Recombination analysis using whole genomes revealed evidence of multiple possible intra-genotype recombination events in PoAstV2 and PoAstV4, suggesting that recombination might have contributed to the genetic diversity and played an important role in the evolution of Japanese PoAstVs. Copyright © 2017 Elsevier B.V. All rights reserved.
Species detection and identification in sexual organisms using population genetic theory and DNA sequences.

PubMed

Birky, C William

2013-01-01

Phylogenetic trees of DNA sequences of a group of specimens may include clades of two kinds: those produced by stochastic processes (random genetic drift) within a species, and clades that represent different species. The ratio of the mean pairwise sequence difference between a pair of clades (K) to the mean pairwise sequence difference within a clade (θ) can be used to determine whether the clades are samples from different species (K/θ ≥ 4) or the same species (K/θ<4) with probability ≥ 0.95. Previously I applied this criterion to delimit species of asexual organisms. Here I use data from the literature to show how it can also be applied to delimit sexual species using four groups of sexual organisms as examples: ravens, spotted leopards, sea butterflies, and liverworts. Mitochondrial or chloroplast genes are used because these segregate earlier during speciation than most nuclear genes and hence detect earlier stages of speciation. In several cases the K/θ ratio was greater than 4, confirming the original authors' intuition that the clades were sufficiently different to be assigned to different species. But the K/θ ratio split each of two liverwort species into two evolutionary species, and showed that support for the distinction between the common and Chihuahuan raven species is weak. I also discuss some possible sources of error in using the K/θ ratio; the most significant one would be cases where males migrate between different populations but females do not, making the use of maternally inherited organelle genes problematic. The K/θ ratio must be used with some caution, like all other methods for species delimitation. Nevertheless, it is a simple theory-based quantitative method for using DNA sequences to make rigorous decisions about species delimitation in sexual as well as asexual eukaryotes.
Risk of Breast Cancer with CXCR4-using HIV Defined by V3-Loop Sequencing

PubMed Central

Goedert, James J.; Swenson, Luke C.; Napolitano, Laura A.; Haddad, Mojgan; Anastos, Kathryn; Minkoff, Howard; Young, Mary; Levine, Alexandra; Adeyemi, Oluwatoyin; Seaberg, Eric C.; Aouizerat, Bradley; Rabkin, Charles S.; Harrigan, P. Richard; Hessol, Nancy A.

2014-01-01

Objective Evaluate the risk of female breast cancer associated with HIV-CXCR4 (X4) tropism as determined by various genotypic measures. Methods A breast cancer case-control study, with pairwise comparisons of tropism determination methods, was conducted. From the Women's Interagency HIV Study repository, one stored plasma specimen was selected from 25 HIV-infected cases near the breast cancer diagnosis date and 75 HIV-infected control women matched for age and calendar date. HIVgp120-V3 sequences were derived by Sanger population sequencing (PS) and 454-pyro deep sequencing (DS). Sequencing-based HIV-X4 tropism was defined using the geno2pheno algorithm, with both high-stringency DS [False-Positive-Rate (FPR 3.5) and 2% X4 cutoff], and lower stringency DS (FPR 5.75, 15% X4 cut-off). Concordance of tropism results by PS, DS, and previously performed phenotyping was assessed with kappa (κ) statistics. Case-control comparisons used exact P-values and conditional logistic regression. Results In 74 women (19 cases, 55 controls) with complete results, prevalence of HIV-X4 by PS was 5% in cases vs 29% in controls (P=0.06, odds ratio 0.14, confidence interval 0.003-1.03). Smaller case-control prevalence differences were found with high-stringency DS (21% vs 36%, P=0.32), lower-stringency DS (16% vs 35%, P=0.18), and phenotyping (11% vs 31%, P=0.10). HIV-X4-tropism concordance was best between PS and lower-stringency DS (93%, κ=0.83). Other pairwise concordances were 82%-92% (κ=0.56-0.81). Concordance was similar among cases and controls. Conclusions HIV-X4 defined by population sequencing (PS) had good agreement with lower stringency deep sequencing and was significantly associated with lower odds of breast cancer. PMID:25321183
N -term pairwise-correlation inequalities, steering, and joint measurability

NASA Astrophysics Data System (ADS)

Karthik, H. S.; Devi, A. R. Usha; Tej, J. Prabhu; Rajagopal, A. K.; Sudha, Narayanan, A.

2017-05-01

Chained inequalities involving pairwise correlations of qubit observables in the equatorial plane are constructed based on the positivity of a sequence of moment matrices. When a jointly measurable set of positive-operator-valued measures (POVMs) is employed in the first measurement of every pair of sequential measurements, the chained pairwise correlations do not violate the classical bound imposed by the moment matrix positivity. We find that incompatibility of the set of POVMs employed in first measurements is only necessary, but not sufficient, in general, for the violation of the inequality. On the other hand, there exists a one-to-one equivalence between the degree of incompatibility (which quantifies the joint measurability) of the equatorial qubit POVMs and the optimal violation of a nonlocal steering inequality, proposed by Jones and Wiseman [S. J. Jones and H. M. Wiseman, Phys. Rev. A 84, 012110 (2011), 10.1103/PhysRevA.84.012110]. To this end, we construct a local analog of this steering inequality in a single-qubit system and show that its violation is a mere reflection of measurement incompatibility of equatorial qubit POVMs, employed in first measurements in the sequential unsharp-sharp scheme.
Phylogenetic relationships in three species of canine Demodex mite based on partial sequences of mitochondrial 16S rDNA.

PubMed

Sastre, Natalia; Ravera, Ivan; Villanueva, Sergio; Altet, Laura; Bardagí, Mar; Sánchez, Armand; Francino, Olga; Ferrer, Lluís

2012-12-01

The historical classification of Demodex mites has been based on their hosts and morphological features. Genome sequencing has proved to be a very effective taxonomic tool in phylogenetic studies and has been applied in the classification of Demodex. Mitochondrial 16S rDNA has been demonstrated to be an especially useful marker to establish phylogenetic relationships. To amplify and sequence a segment of the mitochondrial 16S rDNA from Demodex canis and Demodex injai, as well as from the short-bodied mite called, unofficially, D. cornei and to determine their genetic proximity. Demodex mites were examined microscopically and classified as Demodex folliculorum (one sample), D. canis (four samples), D. injai (two samples) or the short-bodied species D. cornei (three samples). DNA was extracted, and a 338 bp fragment of the 16S rDNA was amplified and sequenced. The sequences of the four D. canis mites were identical and shared 99.6 and 97.3% identity with two D. canis sequences available at GenBank. The sequences of the D. cornei isolates were identical and showed 97.8, 98.2 and 99.6% identity with the D. canis isolates. The sequences of the two D. injai isolates were also identical and showed 76.6% identity with the D. canis sequence. Demodex canis and D. injai are two different species, with a genetic distance of 23.3%. It would seem that the short-bodied Demodex mite D. cornei is a morphological variant of D. canis. © 2012 The Authors. Veterinary Dermatology © 2012 ESVD and ACVD.
Molecular characterization of echovirus 30-associated outbreak of aseptic meningitis in Korea in 2008.

PubMed

Choi, Young Jin; Park, Kwi Sung; Baek, Kyoung Ah; Jung, Eun Hye; Nam, Hae Seon; Kim, Yong Bae; Park, Joon Soo

2010-03-01

Evaluation of the primary etiologic agents that cause aseptic meningitis outbreaks may provide valuable information regarding the prevention and management of aseptic meningitis. In Korea, an outbreak of aseptic meningitis caused by echovirus type 30 (E30) occurred from May to October in 2008. In order to determine the etiologic agent, CSF and/or stool specimens from 140 children hospitalized for aseptic meningitis at Soonchunhyang University Cheonan Hospital between June and October of 2008 were tested for virus isolation and identification. E30 accounted for 61.7% (37 cases) and echovirus 6 accounted for 21.7% (13 cases) of all the human enteroviruses (HEVs) isolates (60 cases in total). For the molecular characterization of the isolates, the VP1 gene sequence of 18 Korean E30 isolates was compared pairwise using the MegAlign with 34 reference strains from the GenBank database. The pairwise comparison of the nucleotide sequences of the VP1 genes demonstrated that the sequences of the Korean strains differed from those of lineage groups A, B, C, D, E, F and G. Reconstruction of the phylogenetic tree based on the complete VP1 nucleotide sequences resulted in a monophyletic tree, with eight clustered lineage groups. All Korean isolates were segregated from other lineage groups, thus suggesting that the Korean strains were a distinct lineage of E30, and a probable cause of this outbreak. This manuscript is the first report, to the best of our knowledge, of the molecular characteristics of E30 strains associated with an aseptic meningitis outbreak in Korea, and their respective phylogenetic relationships.
Object-oriented sequence analysis: SCL--a C++ class library.

PubMed

Vahrson, W; Hermann, K; Kleffe, J; Wittig, B

1996-04-01

SCL (Sequence Class Library) is a class library written in the C++ programming language. Designed using object-oriented programming principles, SCL consists of classes of objects performing tasks typically needed for analyzing DNA or protein sequences. Among them are very flexible sequence classes, classes accessing databases in various formats, classes managing collections of sequences, as well as classes performing higher-level tasks like calculating a pairwise sequence alignment. SCL also includes classes that provide general programming support, like a dynamically growing array, sets, matrices, strings, classes performing file input/output, and utilities for error handling. By providing these components, SCL fosters an explorative programming style: experimenting with algorithms and alternative implementations is encouraged rather than punished. A description of SCL's overall structure as well as an overview of its classes is given. Important aspects of the work with SCL are discussed in the context of a sample program.
‘Candidatus Phytoplasma palmicola’, a novel taxon associated with a lethal yellowing-type disease (LYD) of coconut (Cocos nucifera L.) in Mozambique

USDA-ARS?s Scientific Manuscript database

In this study, the taxonomic position and group classification of the phytoplasma associated with a lethal yellowing-type disease (LYD) of coconut (Cocos nucifera L.) in Mozambique were addressed. Pairwise sequence similarity values based on alignment of near full-length 16SrRNA genes (1530 bp) reve...
Fuzzy measures on the Gene Ontology for gene product similarity.

PubMed

Popescu, Mihail; Keller, James M; Mitchell, Joyce A

2006-01-01

One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum.
Three-dimensional analysis of the uniqueness of the anterior dentition in orthodontically treated patients and twins.

PubMed

Franco, A; Willems, G; Souza, P H C; Tanaka, O M; Coucke, W; Thevissen, P

2017-04-01

Dental uniqueness can be proven if no perfect match in pair-wise morphological comparisons of human dentitions is detected. Establishing these comparisons in a worldwide random population is practically unfeasible due to the need for a large and representative sample size. Sample stratification is an option to reduce sample size. The present study investigated the uniqueness of the human dentition in randomly selected subjects (Group 1), orthodontically treated patients (Group 2), twins (Group 3), and orthodontically treated twins (Group 4) in comparison with a threshold control sample of identical dentitions (Group 5). The samples consisted of digital cast files (DCF) obtained through extraoral 3D scanning. A total of 2.013 pair-wise morphological comparisons were performed (Group 1 n=110, Group 2 n=1.711, Group 3 n=172, Group 4 n=10, Group 5 n=10) with Geomagic Studio ® (3D Systems ® , Rock Hill, SC, USA) software package. Comparisons within groups were performed quantifying the morphological differences between DCF in Euclidean distances. Comparisons between groups were established applying One-way ANOVA. To ensure fair comparisons a post-hoc Power Analysis was performed. ROC analysis was applied to distinguish unique from non-unique dentures. Identical DCF were not detected within the experimental groups (from 1 to 4). The most similar DCF had Euclidian distance of 5.19mm in Group 1, 2.06mm in Group 2, 2.03mm in Group 3, and 1.88mm in Group 4. Groups 2 and 3 were statistically different from Group 5 (p<0.05). Statistically significant difference between Group 4 and 5 revealed to be possible including more pair-wise comparisons in both groups. The ROC analysis revealed sensitivity rate of 80% and specificity between 66.7% and 81.6%. Evidence to sustain the uniqueness of the human dentition in random and stratified populations was observed in the present study. Further studies testing the influence of the quantity of tooth material on morphological difference between dentitions and its impact on uniqueness remain necessary. Copyright © 2017 Elsevier B.V. All rights reserved.
Non-rigid multi-frame registration of cell nuclei in live cell fluorescence microscopy image data.

PubMed

Tektonidis, Marco; Kim, Il-Han; Chen, Yi-Chun M; Eils, Roland; Spector, David L; Rohr, Karl

2015-01-01

The analysis of the motion of subcellular particles in live cell microscopy images is essential for understanding biological processes within cells. For accurate quantification of the particle motion, compensation of the motion and deformation of the cell nucleus is required. We introduce a non-rigid multi-frame registration approach for live cell fluorescence microscopy image data. Compared to existing approaches using pairwise registration, our approach exploits information from multiple consecutive images simultaneously to improve the registration accuracy. We present three intensity-based variants of the multi-frame registration approach and we investigate two different temporal weighting schemes. The approach has been successfully applied to synthetic and live cell microscopy image sequences, and an experimental comparison with non-rigid pairwise registration has been carried out. Copyright © 2014 Elsevier B.V. All rights reserved.
Scalable Creation of Long-Lived Multipartite Entanglement.

PubMed

Kaufmann, H; Ruster, T; Schmiegelow, C T; Luda, M A; Kaushal, V; Schulz, J; von Lindenfels, D; Schmidt-Kaler, F; Poschinger, U G

2017-10-13

We demonstrate the deterministic generation of multipartite entanglement based on scalable methods. Four qubits are encoded in ^{40}Ca^{+}, stored in a microstructured segmented Paul trap. These qubits are sequentially entangled by laser-driven pairwise gate operations. Between these, the qubit register is dynamically reconfigured via ion shuttling operations, where ion crystals are separated and merged, and ions are moved in and out of a fixed laser interaction zone. A sequence consisting of three pairwise entangling gates yields a four-ion Greenberger-Horne-Zeilinger state |ψ⟩=(1/sqrt[2])(|0000⟩+|1111⟩), and full quantum state tomography reveals a state fidelity of 94.4(3)%. We analyze the decoherence of this state and employ dynamic decoupling on the spatially distributed constituents to maintain 69(5)% coherence at a storage time of 1.1 sec.
Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gregory, Ann C.; Solonenko, Sergei A.; Ignacio-Espinoza, J. Cesar

Genetic recombination is a driving force in genome evolution. Among viruses it has a dual role. For genomes with higher fitness, it maintains genome integrity in the face of high mutation rates. Conversely, for genomes with lower fitness, it provides immediate access to sequence space that cannot be reached by mutation alone. Understanding how recombination impacts the cohesion and dissolution of individual whole genomes within viral sequence space is poorly understood across double-stranded DNA bacteriophages (a.k.a phages) due to the challenges of obtaining appropriately scaled genomic datasets. Here in this study we explore the role of recombination in both maintainingmore » and differentiating whole genomes of 142 wild double-stranded DNA marine cyanophages. Phylogenomic analysis across the 51 core genes revealed ten lineages, six of which were well represented. These phylogenomic lineages represent discrete genotypic populations based on comparisons of intra- and inter- lineage shared gene content, genome-wide average nucleotide identity, as well as detected gaps in the distribution of pairwise differences between genomes. McDonald-Kreitman selection tests identified putative niche-differentiating genes under positive selection that differed across the six well-represented genotypic populations and that may have driven initial divergence. Concurrent with patterns of recombination of discrete populations, recombination analyses of both genic and intergenic regions largely revealed decreased genetic exchange across individual genomes between relative to within populations. Lastly, these findings suggest that discrete double-stranded DNA marine cyanophage populations occur in nature and are maintained by patterns of recombination akin to those observed in bacteria, archaea and in sexual eukaryotes.« less
Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer

DOE PAGES

Gregory, Ann C.; Solonenko, Sergei A.; Ignacio-Espinoza, J. Cesar; ...

2016-11-16

Genetic recombination is a driving force in genome evolution. Among viruses it has a dual role. For genomes with higher fitness, it maintains genome integrity in the face of high mutation rates. Conversely, for genomes with lower fitness, it provides immediate access to sequence space that cannot be reached by mutation alone. Understanding how recombination impacts the cohesion and dissolution of individual whole genomes within viral sequence space is poorly understood across double-stranded DNA bacteriophages (a.k.a phages) due to the challenges of obtaining appropriately scaled genomic datasets. Here in this study we explore the role of recombination in both maintainingmore » and differentiating whole genomes of 142 wild double-stranded DNA marine cyanophages. Phylogenomic analysis across the 51 core genes revealed ten lineages, six of which were well represented. These phylogenomic lineages represent discrete genotypic populations based on comparisons of intra- and inter- lineage shared gene content, genome-wide average nucleotide identity, as well as detected gaps in the distribution of pairwise differences between genomes. McDonald-Kreitman selection tests identified putative niche-differentiating genes under positive selection that differed across the six well-represented genotypic populations and that may have driven initial divergence. Concurrent with patterns of recombination of discrete populations, recombination analyses of both genic and intergenic regions largely revealed decreased genetic exchange across individual genomes between relative to within populations. Lastly, these findings suggest that discrete double-stranded DNA marine cyanophage populations occur in nature and are maintained by patterns of recombination akin to those observed in bacteria, archaea and in sexual eukaryotes.« less
The PL6-Family Plasmids of Haloquadratum Are Virus-Related.

PubMed

Dyall-Smith, Mike; Pfeiffer, Friedhelm

2018-01-01

Plasmids PL6A and PL6B are both carried by the C23 T strain of the square archaeon Haloquadratum walsbyi , and are closely related (76% nucleotide identity), circular, about 6 kb in size, and display the same gene synteny. They are unrelated to other known plasmids and all of the predicted proteins are cryptic in function. Here we describe two additional PL6-related plasmids, pBAJ9-6 and pLT53-7, each carried by distinct isolates of Haloquadratum walsbyi that were recovered from hypersaline waters in Australia. A third PL6-like plasmid, pLTMV-6, was assembled from metavirome data from Lake Tyrell, a salt-lake in Victoria, Australia. Comparison of all five plasmids revealed a distinct plasmid family with strong conservation of gene content and synteny, an average size of 6.2 kb (range 5.8-7.0 kb) and pairwise similarities between 61-79%. One protein (F3) was closely similar to a protein carried by betapleolipoviruses while another (R6) was similar to a predicted AAA-ATPase of His 1 halovirus (His1V_gp16). Plasmid pLT53-7 carried a gene for a FkbM family methyltransferase that was not present in any of the other plasmids. Comparative analysis of all PL6-like plasmids provided better resolution of conserved sequences and coding regions, confirmed the strong link to haloviruses, and showed that their sequences are highly conserved among examples from Haloquadratum isolates and metagenomic data that collectively cover geographically distant locations, indicating that these genetic elements are widespread.
iNR-PhysChem: A Sequence-Based Predictor for Identifying Nuclear Receptors and Their Subfamilies via Physical-Chemical Property Matrix

PubMed Central

Xiao, Xuan; Wang, Pu; Chou, Kuo-Chen

2012-01-01

Nuclear receptors (NRs) form a family of ligand-activated transcription factors that regulate a wide variety of biological processes, such as homeostasis, reproduction, development, and metabolism. Human genome contains 48 genes encoding NRs. These receptors have become one of the most important targets for therapeutic drug development. According to their different action mechanisms or functions, NRs have been classified into seven subfamilies. With the avalanche of protein sequences generated in the postgenomic age, we are facing the following challenging problems. Given an uncharacterized protein sequence, how can we identify whether it is a nuclear receptor? If it is, what subfamily it belongs to? To address these problems, we developed a predictor called iNR-PhysChem in which the protein samples were expressed by a novel mode of pseudo amino acid composition (PseAAC) whose components were derived from a physical-chemical matrix via a series of auto-covariance and cross-covariance transformations. It was observed that the overall success rate achieved by iNR-PhysChem was over 98% in identifying NRs or non-NRs, and over 92% in identifying NRs among the following seven subfamilies: NR1thyroid hormone like, NR2HNF4-like, NR3estrogen like, NR4nerve growth factor IB-like, NR5fushi tarazu-F1 like, NR6germ cell nuclear factor like, and NR0knirps like. These rates were derived by the jackknife tests on a stringent benchmark dataset in which none of protein sequences included has pairwise sequence identity to any other in a same subset. As a user-friendly web-server, iNR-PhysChem is freely accessible to the public at either http://www.jci-bioinfo.cn/iNR-PhysChem or http://icpr.jci.edu.cn/bioinfo/iNR-PhysChem. Also a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics involved in developing the predictor. It is anticipated that iNR-PhysChem may become a useful high throughput tool for both basic research and drug design. PMID:22363503
Volcanic Soils as Sources of Novel CO-Oxidizing Paraburkholderia and Burkholderia: Paraburkholderia hiiakae sp. nov., Paraburkholderia metrosideri sp. nov., Paraburkholderia paradisi sp. nov., Paraburkholderia peleae sp. nov., and Burkholderia alpina sp. nov. a Member of the Burkholderia cepacia Complex

PubMed Central

Weber, Carolyn F.; King, Gary M.

2017-01-01

Previous studies showed that members of the Burkholderiales were important in the succession of aerobic, molybdenum-dependent CO oxidizing-bacteria on volcanic soils. During these studies, four isolates were obtained from Kilauea Volcano (Hawai‘i, USA); one strain was isolated from Pico de Orizaba (Mexico) during a separate study. Based on 16S rRNA gene sequence similarities, the Pico de Orizaba isolate and the isolates from Kilauea Volcano were provisionally assigned to the genera Burkholderia and Paraburkholderia, respectively. Each of the isolates possessed a form I coxL gene that encoded the catalytic subunit of carbon monoxide dehydrogenase (CODH); none of the most closely related type strains possessed coxL or oxidized CO. Genome sequences for Paraburkholderia type strains facilitated an analysis of 16S rRNA gene sequence similarities and average nucleotide identities (ANI). ANI did not exceed 95% (the recommended cutoff for species differentiation) for any of the pairwise comparisons among 27 reference strains related to the new isolates. However, since the highest 16S rRNA gene sequence similarity among this set of reference strains was 98.93%, DNA-DNA hybridizations (DDH) were performed for two isolates whose 16S rRNA gene sequence similarities with their nearest phylogenetic neighbors were 98.96 and 99.11%. In both cases DDH values were <16%. Based on multiple variables, four of the isolates represent novel species within the Paraburkholderia: Paraburkholderia hiiakae sp. nov. (type strain I2T = DSM 28029T = LMG 27952T); Paraburkholderia paradisi sp. nov. (type strain WAT = DSM 28027T = LMG 27949T); Paraburkholderia peleae sp. nov. (type strain PP52-1T = DSM 28028T = LMG 27950T); and Paraburkholderia metrosideri sp. nov. (type strain DNBP6-1T = DSM 28030T = LMG 28140T). The remaining isolate represents the first CO-oxidizing member of the Burkholderia cepacia complex: Burkholderia alpina sp. nov. (type strain PO-04-17-38T = DSM 28031T = LMG 28138T). PMID:28270796
Volcanic Soils as Sources of Novel CO-Oxidizing Paraburkholderia and Burkholderia: Paraburkholderia hiiakae sp. nov., Paraburkholderia metrosideri sp. nov., Paraburkholderia paradisi sp. nov., Paraburkholderia peleae sp. nov., and Burkholderia alpina sp. nov. a Member of the Burkholderia cepacia Complex.

PubMed

Weber, Carolyn F; King, Gary M

2017-01-01

Previous studies showed that members of the Burkholderiales were important in the succession of aerobic, molybdenum-dependent CO oxidizing-bacteria on volcanic soils. During these studies, four isolates were obtained from Kilauea Volcano (Hawai'i, USA); one strain was isolated from Pico de Orizaba (Mexico) during a separate study. Based on 16S rRNA gene sequence similarities, the Pico de Orizaba isolate and the isolates from Kilauea Volcano were provisionally assigned to the genera Burkholderia and Paraburkholderia , respectively. Each of the isolates possessed a form I coxL gene that encoded the catalytic subunit of carbon monoxide dehydrogenase (CODH); none of the most closely related type strains possessed coxL or oxidized CO. Genome sequences for Paraburkholderia type strains facilitated an analysis of 16S rRNA gene sequence similarities and average nucleotide identities (ANI). ANI did not exceed 95% (the recommended cutoff for species differentiation) for any of the pairwise comparisons among 27 reference strains related to the new isolates. However, since the highest 16S rRNA gene sequence similarity among this set of reference strains was 98.93%, DNA-DNA hybridizations (DDH) were performed for two isolates whose 16S rRNA gene sequence similarities with their nearest phylogenetic neighbors were 98.96 and 99.11%. In both cases DDH values were <16%. Based on multiple variables, four of the isolates represent novel species within the Paraburkholderia : Paraburkholderia hiiakae sp. nov. (type strain I2 T = DSM 28029 T = LMG 27952 T ); Paraburkholderia paradisi sp. nov. (type strain WA T = DSM 28027 T = LMG 27949 T ); Paraburkholderia peleae sp. nov. (type strain PP52-1 T = DSM 28028 T = LMG 27950 T ); and Paraburkholderia metrosideri sp. nov. (type strain DNBP6-1 T = DSM 28030 T = LMG 28140 T ). The remaining isolate represents the first CO-oxidizing member of the Burkholderia cepacia complex: Burkholderia alpina sp. nov. (type strain PO-04-17-38 T = DSM 28031 T = LMG 28138 T ).
Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing.

PubMed

Hong, Jungeui; Gresham, David

2017-11-01

Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.
Genomic characterization of the first oral avian papillomavirus in a colony of breeding canaries (Serinus canaria).

PubMed

Truchado, Daniel A; Moens, Michaël A J; Callejas, Sergio; Pérez-Tris, Javier; Benítez, Laura

2018-06-01

Papillomaviruses are non-enveloped, DNA viruses that infect skin and mucosa of a wide variety of vertebrates, causing neoplasias or simply persisting asymptomatically. Avian papillomaviruses, with six fully sequenced genomes, are the second most studied group after mammalian papillomaviruses. In this study, we describe the first oral avian papillomavirus, detected in the tongue of a dead Yorkshire canary (Serinus canaria) and in oral swabs of the same bird and other two live canaries from an aviary in Madrid, Spain. Its genome is 8,071 bp and presents the canonical papillomavirus architecture with six early (E6, E7, E1, E9, E2, E4) and two late open reading frames (L1 and L2) and a long control region between L1 and E6. This new avian papillomavirus L1 gene shares a 64% pairwise identity with FcPV1 L1, so it has been classified as a new species (ScPV1) within the Ethapapillomavirus genus. Although the canary died after showing breathing problems, there is no evidence that the papillomavirus caused those symptoms so it could be part of the oral microbiota of the birds. Hence, future investigations are needed to evaluate the clinical relevance of the virus.

Lineage divergence detected in the malaria vector Anopheles marajoara (Diptera: Culicidae) in Amazonian Brazil

PubMed Central

2010-01-01

Background Cryptic species complexes are common among anophelines. Previous phylogenetic analysis based on the complete mtDNA COI gene sequences detected paraphyly in the Neotropical malaria vector Anopheles marajoara. The "Folmer region" detects a single taxon using a 3% divergence threshold. Methods To test the paraphyletic hypothesis and examine the utility of the Folmer region, genealogical trees based on a concatenated (white + 3' COI sequences) dataset and pairwise differentiation of COI fragments were examined. The population structure and demographic history were based on partial COI sequences for 294 individuals from 14 localities in Amazonian Brazil. 109 individuals from 12 localities were sequenced for the nDNA white gene, and 57 individuals from 11 localities were sequenced for the ribosomal DNA (rDNA) internal transcribed spacer 2 (ITS2). Results Distinct A. marajoara lineages were detected by combined genealogical analysis and were also supported among COI haplotypes using a median joining network and AMOVA, with time since divergence during the Pleistocene (<100,000 ya). COI sequences at the 3' end were more variable, demonstrating significant pairwise differentiation (3.82%) compared to the more moderate 2.92% detected by the Folmer region. Lineage 1 was present in all localities, whereas lineage 2 was restricted mainly to the west. Mismatch distributions for both lineages were bimodal, likely due to multiple colonization events and spatial expansion (~798 - 81,045 ya). There appears to be gene flow within, not between lineages, and a partial barrier was detected near Rio Jari in Amapá state, separating western and eastern populations. In contrast, both nDNA data sets (white gene sequences with or without the retention of the 4th intron, and ITS2 sequences and length) detected a single A. marajoara lineage. Conclusions Strong support for combined data with significant differentiation detected in the COI and absent in the nDNA suggest that the divergence is recent, and detectable only by the faster evolving mtDNA. A within subgenus threshold of >2% may be more appropriate among sister taxa in cryptic anopheline complexes than the standard 3%. Differences in demographic history and climatic changes may have contributed to mtDNA lineage divergence in A. marajoara. PMID:20929572
Facilitated sequence counting and assembly by template mutagenesis

PubMed Central

Levy, Dan; Wigler, Michael

2014-01-01

Presently, inferring the long-range structure of the DNA templates is limited by short read lengths. Accurate template counts suffer from distortions occurring during PCR amplification. We explore the utility of introducing random mutations in identical or nearly identical templates to create distinguishable patterns that are inherited during subsequent copying. We simulate the applications of this process under assumptions of error-free sequencing and perfect mapping, using cytosine deamination as a model for mutation. The simulations demonstrate that within readily achievable conditions of nucleotide conversion and sequence coverage, we can accurately count the number of otherwise identical molecules as well as connect variants separated by long spans of identical sequence. We discuss many potential applications, such as transcript profiling, isoform assembly, haplotype phasing, and de novo genome assembly. PMID:25313059
Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices.

PubMed

Li, Guang; Wang, Yadong; Su, Xiaohong

2012-10-01

When developing personal DNA databases, there must be an appropriate guarantee of anonymity, which means that the data cannot be related back to individuals. DNA lattice anonymization (DNALA) is a successful method for making personal DNA sequences anonymous. However, it uses time-consuming multiple sequence alignment and a low-accuracy greedy clustering algorithm. Furthermore, DNALA is not an online algorithm, and so it cannot quickly return results when the database is updated. This study improves the DNALA method. Specifically, we replaced the multiple sequence alignment in DNALA with global pairwise sequence alignment to save time, and we designed a hybrid clustering algorithm comprised of a maximum weight matching (MWM)-based algorithm and an online algorithm. The MWM-based algorithm is more accurate than the greedy algorithm in DNALA and has the same time complexity. The online algorithm can process data quickly when the database is updated. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Sequence analysis of a few species of termites (Order: Isoptera) on the basis of partial characterization of COII gene.

PubMed

Sobti, Ranbir Chander; Kumari, Mamtesh; Sharma, Vijay Lakshmi; Sodhi, Monika; Mukesh, Manishi; Shouche, Yogesh

2009-11-01

The present study was aimed to get the nucleotide sequences of a part of COII mitochondrial gene amplified from individuals of five species of Termites (Isoptera: Termitidae: Macrotermitinae). Four of them belonged to the genus Odontotermes (O. obesus, O. horni, O. bhagwatii and Odontotermes sp.) and one to Microtermes (M. obesi). Partial COII gene fragments were amplified by using specific primers. The sequences so obtained were characterized to calculate the frequencies of each nucleotide bases and a high A + T content was observed. The interspecific pairwise sequence divergence in Odontotermes species ranged from 6.5% to 17.1% across COII fragment. M. obesi sequence diversity ranged from 2.5 with Odontotermes sp. to 19.0% with O. bhagwatii. Phylogenetic trees drawn on the basis of distance neighbour-joining method revealed three main clades clustering all the individuals according to their genera and families.
mtDNA sequence diversity in Africa.

PubMed Central

Watson, E.; Bauer, K.; Aman, R.; Weiss, G.; von Haeseler, A.; Pääbo, S.

1996-01-01

mtDNA sequences were determined from 241 individuals from nine ethnic groups in Africa. When they were compared with published data from other groups, it was found that the !Kung, Mbuti, and Biaka show on the order of 10 times more sequence differences between the three groups, as well as between those and the other groups (the Fulbe, Hausa, Tuareg, Songhai, Kanuri, Yoruba, Mandenka, Somali, Tukana, and Kikuyu), than these other groups do between one other. Furthermore, the pairwise sequence distributions, patterns of coalescence events, and numbers of variable positions relative to the mean sequence difference indicate that the former three groups have been of constant size over time, whereas the latter have expanded in size. We suggest that this reflects subsistence patterns in that the populations that have expanded in size are food producers whereas those that have not are hunters and gatherers. PMID:8755932
Automated Identification of Medically Important Bacteria by 16S rRNA Gene Sequencing Using a Novel Comprehensive Database, 16SpathDB▿

PubMed Central

Woo, Patrick C. Y.; Teng, Jade L. L.; Yeung, Juilian M. Y.; Tse, Herman; Lau, Susanna K. P.; Yuen, Kwok-Yung

2011-01-01

Despite the increasing use of 16S rRNA gene sequencing, interpretation of 16S rRNA gene sequence results is one of the most difficult problems faced by clinical microbiologists and technicians. To overcome the problems we encountered in the existing databases during 16S rRNA gene sequence interpretation, we built a comprehensive database, 16SpathDB (http://147.8.74.24/16SpathDB) based on the 16S rRNA gene sequences of all medically important bacteria listed in the Manual of Clinical Microbiology and evaluated its use for automated identification of these bacteria. Among 91 nonduplicated bacterial isolates collected in our clinical microbiology laboratory, 71 (78%) were reported by 16SpathDB as a single bacterial species having >98.0% nucleotide identity with the query sequence, 19 (20.9%) were reported as more than one bacterial species having >98.0% nucleotide identity with the query sequence, and 1 (1.1%) was reported as no match. For the 71 bacterial isolates reported as a single bacterial species, all results were identical to their true identities as determined by a polyphasic approach. For the 19 bacterial isolates reported as more than one bacterial species, all results contained their true identities as determined by a polyphasic approach and all of them had their true identities as the “best match in 16SpathDB.” For the isolate (Gordonibacter pamelaeae) reported as no match, the bacterium has never been reported to be associated with human disease and was not included in the Manual of Clinical Microbiology. 16SpathDB is an automated, user-friendly, efficient, accurate, and regularly updated database for 16S rRNA gene sequence interpretation in clinical microbiology laboratories. PMID:21389154
Differences in the second internal transcribed spacer of four species of Nematodirus (Nematoda: Molineidae).

PubMed

Newton, L A; Chilton, N B; Beveridge, I; Gasser, R B

1998-02-01

Genetic differences among Nematodirus spathiger, Nematodirus filicollis, Nematodirus helvetianus and Nematodirus battus in the nucleotide sequence of the second internal transcribed spacer (ITS-2) of ribosomal DNA ranged from 3.9 to 24.7%. Pairwise comparisons of their ITS-2 sequences indicated that the most genetically similar species were N. spathiger and N. helvetianus. N. battus was the most genetically distinct species, with differences ranging from 22.8 to 24.7% with respect to the other three species. Some of the nucleotide differences among species provided different endonuclease restriction sites that could be used in restriction fragment length polymorphism studies. The ITS-2 sequence data may prove useful in studies of the systematics of molineid nematodes.
Impact of Sampling Density on the Extent of HIV Clustering

PubMed Central

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

2014-01-01

Abstract Identifying and monitoring HIV clusters could be useful in tracking the leading edge of HIV transmission in epidemics. Currently, greater specificity in the definition of HIV clusters is needed to reduce confusion in the interpretation of HIV clustering results. We address sampling density as one of the key aspects of HIV cluster analysis. The proportion of viral sequences in clusters was estimated at sampling densities from 1.0% to 70%. A set of 1,248 HIV-1C env gp120 V1C5 sequences from a single community in Botswana was utilized in simulation studies. Matching numbers of HIV-1C V1C5 sequences from the LANL HIV Database were used as comparators. HIV clusters were identified by phylogenetic inference under bootstrapped maximum likelihood and pairwise distance cut-offs. Sampling density below 10% was associated with stochastic HIV clustering with broad confidence intervals. HIV clustering increased linearly at sampling density >10%, and was accompanied by narrowing confidence intervals. Patterns of HIV clustering were similar at bootstrap thresholds 0.7 to 1.0, but the extent of HIV clustering decreased with higher bootstrap thresholds. The origin of sampling (local concentrated vs. scattered global) had a substantial impact on HIV clustering at sampling densities ≥10%. Pairwise distances at 10% were estimated as a threshold for cluster analysis of HIV-1 V1C5 sequences. The node bootstrap support distribution provided additional evidence for 10% sampling density as the threshold for HIV cluster analysis. The detectability of HIV clusters is substantially affected by sampling density. A minimal genotyping density of 10% and sampling density of 50–70% are suggested for HIV-1 V1C5 cluster analysis. PMID:25275430
Genome Sequence Analysis of New Isolates of the Winona Strain of Plum pox virus and the First Definitive Evidence of Intrastrain Recombination Events.

PubMed

James, Delano; Sanderson, Dan; Varga, Aniko; Sheveleva, Anna; Chirkov, Sergei

2016-04-01

Plum pox virus (PPV) is genetically diverse with nine different strains identified. Mutations, indel events, and interstrain recombination events are known to contribute to the genetic diversity of PPV. This is the first report of intrastrain recombination events that contribute to PPV's genetic diversity. Fourteen isolates of the PPV strain Winona (W) were analyzed including nine new strain W isolates sequenced completely in this study. Isolates of other strains of PPV with more than one isolate with the complete genome sequence available in GenBank were included also in this study for comparison and analysis. Five intrastrain recombination events were detected among the PPV W isolates, one among PPV C strain isolates, and one among PPV M strain isolates. Four (29%) of the PPV W isolates analyzed are recombinants; one of which (P2-1) is a mosaic, with three recombination events identified. A new interstrain recombinant event was identified between a strain M isolate and a strain Rec isolate, a known recombinant. In silico recombination studies and pairwise distance analyses of PPV strain D isolates indicate that a threshold of genetic diversity exists for the detectability of recombination events, in the range of approximately 0.78×10(-2) to 1.33×10(-2) mean pairwise distance. RDP4 analyses indicate that in the case of PPV Rec isolates there may be a recombinant breakpoint distinct from the obvious transition point of strain sequences. Evidence was obtained that indicates that the frequency of PPV recombination is underestimated, which may be true for other RNA viruses where low genetic diversity exists.
A pluggable framework for parallel pairwise sequence search.

PubMed

Archuleta, Jeremy; Feng, Wu-chun; Tilevich, Eli

2007-01-01

The current and near future of the computing industry is one of multi-core and multi-processor technology. Most existing sequence-search tools have been designed with a focus on single-core, single-processor systems. This discrepancy between software design and hardware architecture substantially hinders sequence-search performance by not allowing full utilization of the hardware. This paper presents a novel framework that will aid the conversion of serial sequence-search tools into a parallel version that can take full advantage of the available hardware. The framework, which is based on a software architecture called mixin layers with refined roles, enables modules to be plugged into the framework with minimal effort. The inherent modular design improves maintenance and extensibility, thus opening up a plethora of opportunities for advanced algorithmic features to be developed and incorporated while routine maintenance of the codebase persists.
Archaebacterial rhodopsin sequences: Implications for evolution

NASA Technical Reports Server (NTRS)

Lanyi, J. K.

1991-01-01

It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.
Pairagon: a highly accurate, HMM-based cDNA-to-genome aligner.

PubMed

Lu, David V; Brown, Randall H; Arumugam, Manimozhiyan; Brent, Michael R

2009-07-01

The most accurate way to determine the intron-exon structures in a genome is to align spliced cDNA sequences to the genome. Thus, cDNA-to-genome alignment programs are a key component of most annotation pipelines. The scoring system used to choose the best alignment is a primary determinant of alignment accuracy, while heuristics that prevent consideration of certain alignments are a primary determinant of runtime and memory usage. Both accuracy and speed are important considerations in choosing an alignment algorithm, but scoring systems have received much less attention than heuristics. We present Pairagon, a pair hidden Markov model based cDNA-to-genome alignment program, as the most accurate aligner for sequences with high- and low-identity levels. We conducted a series of experiments testing alignment accuracy with varying sequence identity. We first created 'perfect' simulated cDNA sequences by splicing the sequences of exons in the reference genome sequences of fly and human. The complete reference genome sequences were then mutated to various degrees using a realistic mutation simulator and the perfect cDNAs were aligned to them using Pairagon and 12 other aligners. To validate these results with natural sequences, we performed cross-species alignment using orthologous transcripts from human, mouse and rat. We found that aligner accuracy is heavily dependent on sequence identity. For sequences with 100% identity, Pairagon achieved accuracy levels of >99.6%, with one quarter of the errors of any other aligner. Furthermore, for human/mouse alignments, which are only 85% identical, Pairagon achieved 87% accuracy, higher than any other aligner. Pairagon source and executables are freely available at http://mblab.wustl.edu/software/pairagon/
Taxonomic evaluation of Streptomyces albus and related species using multilocus sequence analysis

USDA-ARS?s Scientific Manuscript database

In phylogenetic analyses of the genus Streptomyces using 16S rRNA gene sequences, Streptomyces albus subsp. albus NRRL B-1811T formed a cluster with 5 other species having identical or nearly identical 16S rRNA gene sequences. Moreover, the morphological and physiological characteristics of these ot...
Detecting authorized and unauthorized genetically modified organisms containing vip3A by real-time PCR and next-generation sequencing.

PubMed

Liang, Chanjuan; van Dijk, Jeroen P; Scholtens, Ingrid M J; Staats, Martijn; Prins, Theo W; Voorhuijzen, Marleen M; da Silva, Andrea M; Arisi, Ana Carolina Maisonnave; den Dunnen, Johan T; Kok, Esther J

2014-04-01

The growing number of biotech crops with novel genetic elements increasingly complicates the detection of genetically modified organisms (GMOs) in food and feed samples using conventional screening methods. Unauthorized GMOs (UGMOs) in food and feed are currently identified through combining GMO element screening with sequencing the DNA flanking these elements. In this study, a specific and sensitive qPCR assay was developed for vip3A element detection based on the vip3Aa20 coding sequences of the recently marketed MIR162 maize and COT102 cotton. Furthermore, SiteFinding-PCR in combination with Sanger, Illumina or Pacific BioSciences (PacBio) sequencing was performed targeting the flanking DNA of the vip3Aa20 element in MIR162. De novo assembly and Basic Local Alignment Search Tool searches were used to mimic UGMO identification. PacBio data resulted in relatively long contigs in the upstream (1,326 nucleotides (nt); 95 % identity) and downstream (1,135 nt; 92 % identity) regions, whereas Illumina data resulted in two smaller contigs of 858 and 1,038 nt with higher sequence identity (>99 % identity). Both approaches outperformed Sanger sequencing, underlining the potential for next-generation sequencing in UGMO identification.
Half-unit weighted bilinear algorithm for image contrast enhancement in capsule endoscopy

NASA Astrophysics Data System (ADS)

Rukundo, Olivier

2018-04-01

This paper proposes a novel enhancement method based exclusively on the bilinear interpolation algorithm for capsule endoscopy images. The proposed method does not convert the original RBG image components to HSV or any other color space or model; instead, it processes directly RGB components. In each component, a group of four adjacent pixels and half-unit weight in the bilinear weighting function are used to calculate the average pixel value, identical for each pixel in that particular group. After calculations, groups of identical pixels are overlapped successively in horizontal and vertical directions to achieve a preliminary-enhanced image. The final-enhanced image is achieved by halving the sum of the original and preliminary-enhanced image pixels. Quantitative and qualitative experiments were conducted focusing on pairwise comparisons between original and enhanced images. Final-enhanced images have generally the best diagnostic quality and gave more details about the visibility of vessels and structures in capsule endoscopy images.
MetaSeq: privacy preserving meta-analysis of sequencing-based association studies.

PubMed

Singh, Angad Pal; Zafer, Samreen; Pe'er, Itsik

2013-01-01

Human genetics recently transitioned from GWAS to studies based on NGS data. For GWAS, small effects dictated large sample sizes, typically made possible through meta-analysis by exchanging summary statistics across consortia. NGS studies groupwise-test for association of multiple potentially-causal alleles along each gene. They are subject to similar power constraints and therefore likely to resort to meta-analysis as well. The problem arises when considering privacy of the genetic information during the data-exchange process. Many scoring schemes for NGS association rely on the frequency of each variant thus requiring the exchange of identity of the sequenced variant. As such variants are often rare, potentially revealing the identity of their carriers and jeopardizing privacy. We have thus developed MetaSeq, a protocol for meta-analysis of genome-wide sequencing data by multiple collaborating parties, scoring association for rare variants pooled per gene across all parties. We tackle the challenge of tallying frequency counts of rare, sequenced alleles, for metaanalysis of sequencing data without disclosing the allele identity and counts, thereby protecting sample identity. This apparent paradoxical exchange of information is achieved through cryptographic means. The key idea is that parties encrypt identity of genes and variants. When they transfer information about frequency counts in cases and controls, the exchanged data does not convey the identity of a mutation and therefore does not expose carrier identity. The exchange relies on a 3rd party, trusted to follow the protocol although not trusted to learn about the raw data. We show applicability of this method to publicly available exome-sequencing data from multiple studies, simulating phenotypic information for powerful meta-analysis. The MetaSeq software is publicly available as open source.
Genetic and Antigenic Evidence Supports the Separation of Hepatozoon canis and Hepatozoon americanum at the Species Level

PubMed Central

Baneth, Gad; Barta, John R.; Shkap, Varda; Martin, Donald S.; Macintire, Douglass K.; Vincent-Johnson, Nancy

2000-01-01

Recognition of Hepatozoon canis and Hepatozoon americanum as distinct species was supported by the results of Western immunoblotting of canine anti-H. canis and anti-H. americanum sera against H. canis gamonts. Sequence analysis of 368 bases near the 3′ end of the 18S rRNA gene from each species revealed a pairwise difference of 13.59%. PMID:10699047
Mean convergence theorems and weak laws of large numbers for weighted sums of random variables under a condition of weighted integrability

NASA Astrophysics Data System (ADS)

Ordóñez Cabrera, Manuel; Volodin, Andrei I.

2005-05-01

From the classical notion of uniform integrability of a sequence of random variables, a new concept of integrability (called h-integrability) is introduced for an array of random variables, concerning an array of constantsE We prove that this concept is weaker than other previous related notions of integrability, such as Cesàro uniform integrability [Chandra, Sankhya Ser. A 51 (1989) 309-317], uniform integrability concerning the weights [Ordóñez Cabrera, Collect. Math. 45 (1994) 121-132] and Cesàro [alpha]-integrability [Chandra and Goswami, J. Theoret. ProbabE 16 (2003) 655-669]. Under this condition of integrability and appropriate conditions on the array of weights, mean convergence theorems and weak laws of large numbers for weighted sums of an array of random variables are obtained when the random variables are subject to some special kinds of dependence: (a) rowwise pairwise negative dependence, (b) rowwise pairwise non-positive correlation, (c) when the sequence of random variables in every row is [phi]-mixing. Finally, we consider the general weak law of large numbers in the sense of Gut [Statist. Probab. Lett. 14 (1992) 49-52] under this new condition of integrability for a Banach space setting.
Molecular characterisation of Sarcocystis lutrae n. sp. and Toxoplasma gondii from the musculature of two Eurasian otters (Lutra lutra) in Norway.

PubMed

Gjerde, Bjørn; Josefsen, Terje D

2015-03-01

Sarcocysts were detected in routinely processed histological sections of skeletal muscle, but not cardiac muscle, of two adult male otters (Lutra lutra; Mustelidae) from northern Norway following their post-mortem examination in 1999 and 2000. The sarcocysts were slender, spindle-shaped, up to 970 μm long and 35-70 μm in greatest diameter. The sarcocyst wall was thin (∼ 0.5 μm) and smooth with no visible protrusions. Portions of unfixed diaphragm of both animals were collected at the autopsies and kept frozen for about 14 years pending further examination. When the study was resumed in 2013, the thawed muscle samples were examined for sarcocysts under a stereo microscope, but none could be found. Genomic DNA was therefore extracted from a total of 36 small pieces of the diaphragm from both otters, and samples found to contain Sarcocystidae DNA were used selectively for PCR amplification and sequencing of the nuclear 18S and 28S ribosomal (r) RNA genes and internal transcribed spacer 1 (ITS1) region, as well as the mitochondrial cytochrome b (cytb) and cytochrome c oxidase subunit 1 (cox1) genes. Sequence comparisons revealed that both otters were infected by the same Sarcocystis sp. and that there was no genetic variation (100 % identity) among sequenced isolates at the 18S and 28S rRNA genes (six identical isolates at both loci) or at cox1 (13 identical isolates). PCR products comprising the ITS1 region, on the other hand, had to be cloned before sequencing due to intraspecific sequence variation. A total of 33 clones were sequenced, and the identities between them were 97.9-99.9 %. These sequences were most similar (93.7-96.0 % identity) to a sequence of Sarcocystis kalvikus from the wolverine in Canada, but the phylogenetic analyses placed all of them as a monophyletic sister group to S. kalvikus. Hence, they were considered to represent a novel species, which was named Sarcocystis lutrae. Sequence comparisons and phylogenetic analyses based on sequences of the 18S and 28S rRNA genes and cox1, for which little or no sequence data were available for S. kalvikus, revealed that S. lutrae otherwise was most closely related to various Sarcocystis spp. using birds or carnivores as intermediate hosts. The cox1 sequences of S. lutrae from the otters were identical to two sequences from an arctic fox, which in a previous study had been assigned to Sarcocystis arctica due to a high identity (99.4 %) with the latter species at this gene and a complete identity with S. arctica at three other loci when using the same DNA samples as templates for PCR reactions. Additional PCR amplifications and sequencing of cox1 (ten sequences) and the ITS1 region (four sequences) using four DNA samples from this fox as templates again generated cox1 sequences exclusively of S. lutrae, but ITS1 sequences of S. arctica, and thus confirmed that this arctic fox had acted as intermediate host for both S. arctica and S. lutrae. Based on the phylogenetic placement of S. lutrae, the geographical location of infected animals (otters, arctic fox) and the distribution of carnivores/raptors which may have interacted with them, the white-tailed eagle (Haliaeetus albicilla) seems to be a possible definitive host of S. lutrae. Some of the muscle samples from both otters were shown to harbour stages of Toxoplasma gondii through PCR amplification and sequencing of the entire ITS1 region (five isolates) and/or the partial cytb (eight isolates) and cox1 (one isolate). These sequences were identical to several previous sequences of T. gondii in GenBank. Thus, both otters had a dual infection with S. lutrae and T. gondii.
Characterisation of Potential Antimicrobial Targets in Bacillus spp. I. Aminotransferases and Methionine Regeneration in Bacillus subtilis

DTIC Science & Technology

2002-07-01

DAAT and 45% identical to the Staphylococcus haemolyticus DAAT. The ybgE and ywaA sequences were found in the Illa subfamily, and were 59% identical to...halodurans BH1060 gene product. The two sequences also had a respective 40% and 37% identity to the Staphylococcus aureuts SAV2560 gene product. The 6

Mitogenomes of Giant-Skipper Butterflies reveal an ancient split between deep and shallow root feeders.

PubMed

Zhang, Jing; Cong, Qian; Fan, Xiao-Ling; Wang, Rongjiang; Wang, Min; Grishin, Nick V

2017-01-01

Background: Giant-Skipper butterflies from the genus Megathymus are North American endemics. These large and thick-bodied Skippers resemble moths and are unique in their life cycles. Grub-like at the later stages of development, caterpillars of these species feed and live inside yucca roots. Adults do not feed and are mostly local, not straying far from the patches of yucca plants. Methods: Pieces of muscle were dissected from the thorax of specimens and genomic DNA was extracted (also from the abdomen of a specimen collected nearly 60 years ago). Paired-end libraries were prepared and sequenced for 150bp from both ends. The mitogenomes were assembled from the reads followed by a manual gap-closing procedure and a phylogenetic tree was constructed using a maximum likelihood method from an alignment of the mitogenomes. Results: We determined mitogenome sequences of nominal subspecies of all five known species of Megathymus and Agathymus mariae to confidently root the phylogenetic tree. Pairwise sequence identity indicates the high similarity, ranging from 88-96% among coding regions for 13 proteins, 22 tRNAs and 2 rRNA, with a gene order typical for mitogenomes of Lepidoptera. Phylogenetic analysis confirms that Giant-Skippers (Megathymini) originate within the subfamily Hesperiinae and do not warrant a subfamily rank. Genus Megathymus is monophyletic and splits into two species groups. M. streckeri and M. cofaqui caterpillars feed deep in the main root system of yucca plants and deposit frass underground. M. ursus , M. beulahae and M. yuccae feed in the yucca caudex and roots near the ground, and deposit frass outside through a "tent" (a silk tube projecting from the center of yucca plant). M. yuccae and M. beulahae are sister species consistently with morphological similarities between them. Conclusions: We constructed the first DNA-based phylogeny of the genus Megathymus from their mitogenomes. The phylogeny agrees with morphological considerations.
Analysis of the Sarcocystis neurona microneme protein SnMIC10: protein characteristics and expression during intracellular development.

PubMed

Hoane, Jessica S; Carruthers, Vernon B; Striepen, Boris; Morrison, David P; Entzeroth, Rolf; Howe, Daniel K

2003-07-01

Sarcocystis neurona, an apicomplexan parasite, is the primary causative agent of equine protozoal myeloencephalitis. Like other members of the Apicomplexa, S. neurona zoites possess secretory organelles that contain proteins necessary for host cell invasion and intracellular survival. From a collection of S. neurona expressed sequence tags, we identified a sequence encoding a putative microneme protein based on similarity to Toxoplasma gondii MIC10 (TgMIC10). Pairwise sequence alignments of SnMIC10 to TgMIC10 and NcMIC10 from Neospora caninum revealed approximately 33% identity to both orthologues. The open reading frame of the S. neurona gene encodes a 255 amino acid protein with a predicted 39-residue signal peptide. Like TgMIC10 and NcMIC10, SnMIC10 is predicted to be hydrophilic, highly alpha-helical in structure, and devoid of identifiable adhesive domains. Antibodies raised against recombinant SnMIC10 recognised a protein band with an apparent molecular weight of 24 kDa in Western blots of S. neurona merozoites, consistent with the size predicted for SnMIC10. In vitro secretion assays demonstrated that this protein is secreted by extracellular merozoites in a temperature-dependent manner. Indirect immunofluorescence analysis of SnMIC10 showed a polar labelling pattern, which is consistent with the apical position of the micronemes, and immunoelectron microscopy provided definitive localisation of the protein to these secretory organelles. Further analysis of SnMIC10 in intracellular parasites revealed that expression of this protein is temporally regulated during endopolygeny, supporting the view that micronemes are only needed during host cell invasion. Collectively, the data indicate that SnMIC10 is a microneme protein that is part of the excreted/secreted antigen fraction of S. neurona. Identification and characterisation of additional S. neurona microneme antigens and comparisons to orthologues in other Apicomplexa could provide further insight into the functions that these proteins serve during invasion of host cells.
Evolution of puma lentivirus in bobcats (Lynx rufus) and mountain lions (Puma concolor) in North America.

PubMed

Lee, Justin S; Bevins, Sarah N; Serieys, Laurel E K; Vickers, Winston; Logan, Ken A; Aldredge, Mat; Boydston, Erin E; Lyren, Lisa M; McBride, Roy; Roelke-Parker, Melody; Pecon-Slattery, Jill; Troyer, Jennifer L; Riley, Seth P; Boyce, Walter M; Crooks, Kevin R; VandeWoude, Sue

2014-07-01

Mountain lions (Puma concolor) throughout North and South America are infected with puma lentivirus clade B (PLVB). A second, highly divergent lentiviral clade, PLVA, infects mountain lions in southern California and Florida. Bobcats (Lynx rufus) in these two geographic regions are also infected with PLVA, and to date, this is the only strain of lentivirus identified in bobcats. We sequenced full-length PLV genomes in order to characterize the molecular evolution of PLV in bobcats and mountain lions. Low sequence homology (88% average pairwise identity) and frequent recombination (1 recombination breakpoint per 3 isolates analyzed) were observed in both clades. Viral proteins have markedly different patterns of evolution; sequence homology and negative selection were highest in Gag and Pol and lowest in Vif and Env. A total of 1.7% of sites across the PLV genome evolve under positive selection, indicating that host-imposed selection pressure is an important force shaping PLV evolution. PLVA strains are highly spatially structured, reflecting the population dynamics of their primary host, the bobcat. In contrast, the phylogeography of PLVB reflects the highly mobile mountain lion, with diverse PLVB isolates cocirculating in some areas and genetically related viruses being present in populations separated by thousands of kilometers. We conclude that PLVA and PLVB are two different viral species with distinct feline hosts and evolutionary histories. Importance: An understanding of viral evolution in natural host populations is a fundamental goal of virology, molecular biology, and disease ecology. Here we provide a detailed analysis of puma lentivirus (PLV) evolution in two natural carnivore hosts, the bobcat and mountain lion. Our results illustrate that PLV evolution is a dynamic process that results from high rates of viral mutation/recombination and host-imposed selection pressure. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
Phylogenomics and Divergence Dating of Fungus-Farming Ants (Hymenoptera: Formicidae) of the Genera Sericomyrmex and Apterostigma.

PubMed

Ješovnik, Ana; González, Vanessa L; Schultz, Ted R

2016-01-01

Fungus-farming ("attine") ants are model systems for studies of symbiosis, coevolution, and advanced eusociality. A New World clade of nearly 300 species in 15 genera, all attine ants cultivate fungal symbionts for food. In order to better understand the evolution of ant agriculture, we sequenced, assembled, and analyzed transcriptomes of four different attine ant species in two genera: three species in the higher-attine genus Sericomyrmex and a single lower-attine ant species, Apterostigma megacephala, representing the first genomic data for either genus. These data were combined with published genomes of nine other ant species and the honey bee Apis mellifera for phylogenomic and divergence-dating analyses. The resulting phylogeny confirms relationships inferred in previous studies of fungus-farming ants. Divergence-dating analyses recovered slightly older dates than most prior analyses, estimating that attine ants originated 53.6-66.7 million of years ago, and recovered a very long branch subtending a very recent, rapid radiation of the genus Sericomyrmex. This result is further confirmed by a separate analysis of the three Sericomyrmex species, which reveals that 92.71% of orthologs have 99% - 100% pairwise-identical nucleotide sequences. We searched the transcriptomes for genes of interest, most importantly argininosuccinate synthase and argininosuccinate lyase, which are functional in other ants but which are known to have been lost in seven previously studied attine ant species. Loss of the ability to produce the amino acid arginine has been hypothesized to contribute to the obligate dependence of attine ants upon their cultivated fungi, but the point in fungus-farming ant evolution at which these losses occurred has remained unknown. We did not find these genes in any of the sequenced transcriptomes. Although expected for Sericomyrmex species, the absence of arginine anabolic genes in the lower-attine ant Apterostigma megacephala strongly suggests that the loss coincided with the origin of attine ants.
A statistical view of FMRFamide neuropeptide diversity.

PubMed

Espinoza, E; Carrigan, M; Thomas, S G; Shaw, G; Edison, A S

2000-01-01

FMRFamide-like peptide (FLP) amino acid sequences have been collected and statistically analyzed. FLP amino acid composition as a function of position in the peptide is graphically presented for several major phyla. Results of total amino acid composition and frequencies of pairs of FLP amino acids have been computed and compared with corresponding values from the entire GenBank protein sequence database. The data for pairwise distributions of amino acids should help in future structure-function studies of FLPs. To aid in future peptide discovery, a computer program and search protocol was developed to identify FLPs from the GenBank protein database without the use of keywords.
The recent emergence in hospitals of multidrug-resistant community-associated sequence type 1 and spa type t127 methicillin-resistant Staphylococcus aureus investigated by whole-genome sequencing: Implications for screening

PubMed Central

Earls, Megan R.; Kinnevey, Peter M.; Brennan, Gráinne I.; Lazaris, Alexandros; Skally, Mairead; O’Connell, Brian; Humphreys, Hilary; Shore, Anna C.

2017-01-01

Community-associated spa type t127/t922 methicillin-resistant Staphylococcus aureus (MRSA) prevalence increased from 1%-7% in Ireland between 2010–2015. This study tracked the spread of 89 such isolates from June 2013-June 2016. These included 78 healthcare-associated and 11 community associated-MRSA isolates from a prolonged hospital outbreak (H1) (n = 46), 16 other hospitals (n = 28), four other healthcare facilities (n = 4) and community-associated sources (n = 11). Isolates underwent antimicrobial susceptibility testing, DNA microarray profiling and whole-genome sequencing. Minimum spanning trees were generated following core-genome multilocus sequence typing and pairwise single nucleotide variation (SNV) analysis was performed. All isolates were sequence type 1 MRSA staphylococcal cassette chromosome mec type IV (ST1-MRSA-IV) and 76/89 were multidrug-resistant. Fifty isolates, including 40/46 from H1, were high-level mupirocin-resistant, carrying a conjugative 39 kb iles2-encoding plasmid. Two closely related ST1-MRSA-IV strains (I and II) and multiple sporadic strains were identified. Strain I isolates (57/89), including 43/46 H1 and all high-level mupirocin-resistant isolates, exhibited ≤80 SNVs. Two strain I isolates from separate H1 healthcare workers differed from other H1/strain I isolates by 7–47 and 12–53 SNVs, respectively, indicating healthcare worker involvement in this outbreak. Strain II isolates (19/89), including the remaining H1 isolates, exhibited ≤127 SNVs. For each strain, the pairwise SNVs exhibited by healthcare-associated and community-associated isolates indicated recent transmission of ST1-MRSA-IV within and between multiple hospitals, healthcare facilities and communities in Ireland. Given the interchange between healthcare-associated and community-associated isolates in hospitals, the risk factors that inform screening for MRSA require revision. PMID:28399151
The nucleotide sequences of 5S rRNAs from a fern Dryopteris acuminata and a horsetail Equisetum arvense.

PubMed Central

Hori, H; Osawa, S; Takaiwa, F; Sugiura, M

1984-01-01

The nucleotide sequences from two Pteridophyta species, a fern Dryopteris acuminata and a horsetail Equisetum arvense have been determined. These two sequences are more related to those of the Bryophyta species (88% identity on average) than to those of seed plants (84% identity on average). PMID:6538332
ExprAlign - the identification of ESTs in non-model species by alignment of cDNA microarray expression profiles

PubMed Central

2009-01-01

Background Sequence identification of ESTs from non-model species offers distinct challenges particularly when these species have duplicated genomes and when they are phylogenetically distant from sequenced model organisms. For the common carp, an environmental model of aquacultural interest, large numbers of ESTs remained unidentified using BLAST sequence alignment. We have used the expression profiles from large-scale microarray experiments to suggest gene identities. Results Expression profiles from ~700 cDNA microarrays describing responses of 7 major tissues to multiple environmental stressors were used to define a co-expression landscape. This was based on the Pearsons correlation coefficient relating each gene with all other genes, from which a network description provided clusters of highly correlated genes as 'mountains'. We show that these contain genes with known identities and genes with unknown identities, and that the correlation constitutes evidence of identity in the latter. This procedure has suggested identities to 522 of 2701 unknown carp ESTs sequences. We also discriminate several common carp genes and gene isoforms that were not discriminated by BLAST sequence alignment alone. Precision in identification was substantially improved by use of data from multiple tissues and treatments. Conclusion The detailed analysis of co-expression landscapes is a sensitive technique for suggesting an identity for the large number of BLAST unidentified cDNAs generated in EST projects. It is capable of detecting even subtle changes in expression profiles, and thereby of distinguishing genes with a common BLAST identity into different identities. It benefits from the use of multiple treatments or contrasts, and from the large-scale microarray data. PMID:19939286
Population Expansion and Genetic Structure in Carcharhinus brevipinna in the Southern Indo-Pacific

PubMed Central

Geraghty, Pascal T.; Williamson, Jane E.; Macbeth, William G.; Wintner, Sabine P.; Harry, Alastair V.; Ovenden, Jennifer R.; Gillings, Michael R.

2013-01-01

Background Quantifying genetic diversity and metapopulation structure provides insights into the evolutionary history of a species and helps develop appropriate management strategies. We provide the first assessment of genetic structure in spinner sharks (Carcharhinus brevipinna), a large cosmopolitan carcharhinid, sampled from eastern and northern Australia and South Africa. Methods and Findings Sequencing of the mitochondrial DNA NADH dehydrogenase subunit 4 gene for 430 individuals revealed 37 haplotypes and moderately high haplotype diversity (h = 0.6770 ±0.025). While two metrics of genetic divergence (ΦST and F ST) revealed somewhat different results, subdivision was detected between South Africa and all Australian locations (pairwise ΦST, range 0.02717–0.03508, p values ≤ 0.0013; pairwise F ST South Africa vs New South Wales = 0.04056, p = 0.0008). Evidence for fine-scale genetic structuring was also detected along Australia’s east coast (pairwise ΦST = 0.01328, p < 0.015), and between south-eastern and northern locations (pairwise ΦST = 0.00669, p < 0.04). Conclusions The Indian Ocean represents a robust barrier to contemporary gene flow in C. brevipinna between Australia and South Africa. Gene flow also appears restricted along a continuous continental margin in this species, with data tentatively suggesting the delineation of two management units within Australian waters. Further sampling, however, is required for a more robust evaluation of the latter finding. Evidence indicates that all sampled populations were shaped by a substantial demographic expansion event, with the resultant high genetic diversity being cause for optimism when considering conservation of this commercially-targeted species in the southern Indo-Pacific. PMID:24086462
Complete sequence analysis reveals two distinct poleroviruses infecting cucurbits in China.

PubMed

Xiang, Hai-ying; Shang, Qiao-xia; Han, Cheng-gui; Li, Da-wei; Yu, Jia-lin

2008-01-01

The complete RNA genomes of a Chinese isolate of cucurbit aphid-borne yellows virus (CABYV-CHN) and a new polerovirus tentatively referred to as melon aphid-borne yellows virus (MABYV) were determined. The entire genome of CABYV-CHN shared 89.0% nucleotide sequence identity with the French CABYV isolate. In contrast, nucleotide sequence identities between MABYV and CABYV and other poleroviruses were in the range of 50.7-74.2%, with amino acid sequence identities ranging from 24.8 to 82.9% for individual gene products. We propose that CABYV-CHN is a strain of CABYV and that MABYV is a member of a tentative distinct species within the genus Polerovirus.
TIA: algorithms for development of identity-linked SNP islands for analysis by massively parallel DNA sequencing.

PubMed

Farris, M Heath; Scott, Andrew R; Texter, Pamela A; Bartlett, Marta; Coleman, Patricia; Masters, David

2018-04-11

Single nucleotide polymorphisms (SNPs) located within the human genome have been shown to have utility as markers of identity in the differentiation of DNA from individual contributors. Massively parallel DNA sequencing (MPS) technologies and human genome SNP databases allow for the design of suites of identity-linked target regions, amenable to sequencing in a multiplexed and massively parallel manner. Therefore, tools are needed for leveraging the genotypic information found within SNP databases for the discovery of genomic targets that can be evaluated on MPS platforms. The SNP island target identification algorithm (TIA) was developed as a user-tunable system to leverage SNP information within databases. Using data within the 1000 Genomes Project SNP database, human genome regions were identified that contain globally ubiquitous identity-linked SNPs and that were responsive to targeted resequencing on MPS platforms. Algorithmic filters were used to exclude target regions that did not conform to user-tunable SNP island target characteristics. To validate the accuracy of TIA for discovering these identity-linked SNP islands within the human genome, SNP island target regions were amplified from 70 contributor genomic DNA samples using the polymerase chain reaction. Multiplexed amplicons were sequenced using the Illumina MiSeq platform, and the resulting sequences were analyzed for SNP variations. 166 putative identity-linked SNPs were targeted in the identified genomic regions. Of the 309 SNPs that provided discerning power across individual SNP profiles, 74 previously undefined SNPs were identified during evaluation of targets from individual genomes. Overall, DNA samples of 70 individuals were uniquely identified using a subset of the suite of identity-linked SNP islands. TIA offers a tunable genome search tool for the discovery of targeted genomic regions that are scalable in the population frequency and numbers of SNPs contained within the SNP island regions. It also allows the definition of sequence length and sequence variability of the target region as well as the less variable flanking regions for tailoring to MPS platforms. As shown in this study, TIA can be used to discover identity-linked SNP islands within the human genome, useful for differentiating individuals by targeted resequencing on MPS technologies.
Terminal region sequence variations in variola virus DNA.

PubMed

Massung, R F; Loparev, V N; Knight, J C; Totmenin, A V; Chizhikov, V E; Parsons, J M; Safronov, P F; Gutorov, V V; Shchelkunov, S N; Esposito, J J

1996-07-15

Genome DNA terminal region sequences were determined for a Brazilian alastrim variola minor virus strain Garcia-1966 that was associated with an 0.8% case-fatality rate and African smallpox strains Congo-1970 and Somalia-1977 associated with variola major (9.6%) and minor (0.4%) mortality rates, respectively. A base sequence identity of > or = 98.8% was determined after aligning 30 kb of the left- or right-end region sequences with cognate sequences previously determined for Asian variola major strains India-1967 (31% death rate) and Bangladesh-1975 (18.5% death rate). The deduced amino acid sequences of putative proteins of > or = 65 amino acids also showed relatively high identity, although the Asian and African viruses were clearly more related to each other than to alastrim virus. Alastrim virus contained only 10 of 70 proteins that were 100% identical to homologs in Asian strains, and 7 alastrim-specific proteins were noted.
Effects of learning with explicit elaboration on implicit transfer of visuomotor sequence learning.

PubMed

Tanaka, Kanji; Watanabe, Katsumi

2013-08-01

Intervals between stimuli and/or responses have significant influences on sequential learning. In the present study, we investigated whether transfer would occur even when the intervals and the visual configurations in a sequence were drastically changed so that participants did not notice that the required sequences of responses were identical. In the experiment, two (or three) sequential button presses comprised a "set," and nine (or six) consecutive sets comprised a "hyperset." In the first session, participants learned either a 2 × 9 or 3 × 6 hyperset by trial and error until they completed it 20 times without error. In the second block, the 2 × 9 (3 × 6) hyperset was changed into the 3 × 6 (2 × 9) hyperset, resulting in different visual configurations and intervals between stimuli and responses. Participants were assigned into two groups: the Identical and Random groups. In the Identical group, the sequence (i.e., the buttons to be pressed) in the second block was identical to that in the first block. In the Random group, a new hyperset was learned. Even in the Identical group, no participants noticed that the sequences were identical. Nevertheless, a significant transfer of performance occurred. However, in the subsequent experiment that did not require explicit trial-and-error learning in the first session, implicit transfer in the second session did not occur. These results indicate that learning with explicit elaboration strengthens the implicit representation of the sequence order as a whole; this might occur independently of the intervals between elements and enable implicit transfer.
Sequence and structural implications of a bovine corneal keratan sulfate proteoglycan core protein. Protein 37B represents bovine lumican and proteins 37A and 25 are unique

NASA Technical Reports Server (NTRS)

Funderburgh, J. L.; Funderburgh, M. L.; Brown, S. J.; Vergnes, J. P.; Hassell, J. R.; Mann, M. M.; Conrad, G. W.; Spooner, B. S. (Principal Investigator)

1993-01-01

Amino acid sequence from tryptic peptides of three different bovine corneal keratan sulfate proteoglycan (KSPG) core proteins (designated 37A, 37B, and 25) showed similarities to the sequence of a chicken KSPG core protein lumican. Bovine lumican cDNA was isolated from a bovine corneal expression library by screening with chicken lumican cDNA. The bovine cDNA codes for a 342-amino acid protein, M(r) 38,712, containing amino acid sequences identified in the 37B KSPG core protein. The bovine lumican is 68% identical to chicken lumican, with an 83% identity excluding the N-terminal 40 amino acids. Location of 6 cysteine and 4 consensus N-glycosylation sites in the bovine sequence were identical to those in chicken lumican. Bovine lumican had about 50% identity to bovine fibromodulin and 20% identity to bovine decorin and biglycan. About two-thirds of the lumican protein consists of a series of 10 amino acid leucine-rich repeats that occur in regions of calculated high beta-hydrophobic moment, suggesting that the leucine-rich repeats contribute to beta-sheet formation in these proteins. Sequences obtained from 37A and 25 core proteins were absent in bovine lumican, thus predicting a unique primary structure and separate mRNA for each of the three bovine KSPG core proteins.
GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data.

PubMed

Chen, Li; Reeve, James; Zhang, Lujun; Huang, Shengbing; Wang, Xuefeng; Chen, Jun

2018-01-01

Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios-a simple but effective normalization method-for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.
Molecular epidemiology of Plum pox virus in Japan.

PubMed

Maejima, Kensaku; Himeno, Misako; Komatsu, Ken; Takinami, Yusuke; Hashimoto, Masayoshi; Takahashi, Shuichiro; Yamaji, Yasuyuki; Oshima, Kenro; Namba, Shigetou

2011-05-01

For a molecular epidemiological study based on complete genome sequences, 37 Plum pox virus (PPV) isolates were collected from the Kanto region in Japan. Pair-wise analyses revealed that all 37 Japanese isolates belong to the PPV-D strain, with low genetic diversity (less than 0.8%). In phylogenetic analysis of the PPV-D strain based on complete nucleotide sequences, the relationships of the PPV-D strain were reconstructed with high resolution: at the global level, the American, Canadian, and Japanese isolates formed their own distinct monophyletic clusters, suggesting that the routes of viral entry into these countries were independent; at the local level, the actual transmission histories of PPV were precisely reconstructed with high bootstrap support. This is the first description of the molecular epidemiology of PPV based on complete genome sequences.
Conserved features of eukaryotic hsp70 genes revealed by comparison with the nucleotide sequence of human hsp70.

PubMed Central

Hunt, C; Morimoto, R I

1985-01-01

We have determined the nucleotide sequence of the human hsp70 gene and 5' flanking region. The hsp70 gene is transcribed as an uninterrupted primary transcript of 2440 nucleotides composed of a 5' noncoding leader sequence of 212 nucleotides, a 3' noncoding region of 242 nucleotides, and a continuous open reading frame of 1986 nucleotides that encodes a protein with predicted molecular mass of 69,800 daltons. Upstream of the 5' terminus are the canonical TATAAA box, the sequence ATTGG that corresponds in the inverted orientation to the CCAAT motif, and the dyad sequence CTGGAAT/ATTCCCG that shares homology in 12 of 14 positions with the consensus transcription regulatory sequence common to Drosophila heat shock genes. Comparison of the predicted amino acid sequences of human hsp70 with the published sequences of Drosophila hsp70 and Escherichia coli dnaK reveals that human hsp70 is 73% identical to Drosophila hsp70 and 47% identical to E. coli dnaK. Surprisingly, the nucleotide sequences of the human and Drosophila genes are 72% identical and human and E. coli genes are 50% identical, which is more highly conserved than necessary given the degeneracy of the genetic code. The lack of accumulated silent nucleotide substitutions leads us to propose that there may be additional information in the nucleotide sequence of the hsp70 gene or the corresponding mRNA that precludes the maximum divergence allowed in the silent codon positions. PMID:3931075
Ecuador Paraiso Escondido Virus, a New Flavivirus Isolated from New World Sand Flies in Ecuador, Is the First Representative of a Novel Clade in the Genus Flavivirus.

PubMed

Alkan, Cigdem; Zapata, Sonia; Bichaud, Laurence; Moureau, Grégory; Lemey, Philippe; Firth, Andrew E; Gritsun, Tamara S; Gould, Ernest A; de Lamballerie, Xavier; Depaquit, Jérôme; Charrel, Rémi N

2015-12-01

A new flavivirus, Ecuador Paraiso Escondido virus (EPEV), named after the village where it was discovered, was isolated from sand flies (Psathyromyia abonnenci, formerly Lutzomyia abonnenci) that are unique to the New World. This represents the first sand fly-borne flavivirus identified in the New World. EPEV exhibited a typical flavivirus genome organization. Nevertheless, the maximum pairwise amino acid sequence identity with currently recognized flaviviruses was 52.8%. Phylogenetic analysis of the complete coding sequence showed that EPEV represents a distinct clade which diverged from a lineage that was ancestral to the nonvectored flaviviruses Entebbe bat virus, Yokose virus, and Sokoluk virus and also the Aedes-associated mosquito-borne flaviviruses, which include yellow fever virus, Sepik virus, Saboya virus, and others. EPEV replicated in C6/36 mosquito cells, yielding high infectious titers, but failed to reproduce either in vertebrate cell lines (Vero, BHK, SW13, and XTC cells) or in suckling mouse brains. This surprising result, which appears to eliminate an association with vertebrate hosts in the life cycle of EPEV, is discussed in the context of the evolutionary origins of EPEV in the New World. The flaviviruses are rarely (if ever) vectored by sand fly species, at least in the Old World. We have identified the first representative of a sand fly-associated flavivirus, Ecuador Paraiso Escondido virus (EPEV), in the New World. EPEV constitutes a novel clade according to current knowledge of the flaviviruses. Phylogenetic analysis of the virus genome showed that EPEV roots the Aedes-associated mosquito-borne flaviviruses, including yellow fever virus. In light of this new discovery, the New World origin of EPEV is discussed together with that of the other flaviviruses. Copyright © 2015 Alkan et al.
Ecuador Paraiso Escondido Virus, a New Flavivirus Isolated from New World Sand Flies in Ecuador, Is the First Representative of a Novel Clade in the Genus Flavivirus

PubMed Central

Zapata, Sonia; Bichaud, Laurence; Moureau, Grégory; Lemey, Philippe; Firth, Andrew E.; Gritsun, Tamara S.; Gould, Ernest A.; de Lamballerie, Xavier; Depaquit, Jérôme

2015-01-01

ABSTRACT A new flavivirus, Ecuador Paraiso Escondido virus (EPEV), named after the village where it was discovered, was isolated from sand flies (Psathyromyia abonnenci, formerly Lutzomyia abonnenci) that are unique to the New World. This represents the first sand fly-borne flavivirus identified in the New World. EPEV exhibited a typical flavivirus genome organization. Nevertheless, the maximum pairwise amino acid sequence identity with currently recognized flaviviruses was 52.8%. Phylogenetic analysis of the complete coding sequence showed that EPEV represents a distinct clade which diverged from a lineage that was ancestral to the nonvectored flaviviruses Entebbe bat virus, Yokose virus, and Sokoluk virus and also the Aedes-associated mosquito-borne flaviviruses, which include yellow fever virus, Sepik virus, Saboya virus, and others. EPEV replicated in C6/36 mosquito cells, yielding high infectious titers, but failed to reproduce either in vertebrate cell lines (Vero, BHK, SW13, and XTC cells) or in suckling mouse brains. This surprising result, which appears to eliminate an association with vertebrate hosts in the life cycle of EPEV, is discussed in the context of the evolutionary origins of EPEV in the New World. IMPORTANCE The flaviviruses are rarely (if ever) vectored by sand fly species, at least in the Old World. We have identified the first representative of a sand fly-associated flavivirus, Ecuador Paraiso Escondido virus (EPEV), in the New World. EPEV constitutes a novel clade according to current knowledge of the flaviviruses. Phylogenetic analysis of the virus genome showed that EPEV roots the Aedes-associated mosquito-borne flaviviruses, including yellow fever virus. In light of this new discovery, the New World origin of EPEV is discussed together with that of the other flaviviruses. PMID:26355096
Identification and molecular profiling of DC-SIGN-like from big belly seahorse (Hippocampus abdominalis) inferring its potential relevancy in host immunity.

PubMed

Jo, Eunyoung; Elvitigala, Don Anushka Sandaruwan; Wan, Qiang; Oh, Minyoung; Oh, Chulhong; Lee, Jehee

2017-12-01

Dendritic-cell-specific ICAM-3-grabbing non-integrin (DC-SIGN) is a C-type lectin that functions as a pattern recognition receptor by recognizing pathogen-associated molecular patterns (PAMPs). It is also involved in various events of the dendritic cell (DC) life cycle, such as DC migration, antigen capture and presentation, and T cell priming. In this study, a DC-SIGN-like gene from the big belly seahorse Hippocampus abdominalis (designated as ShDCS-like) was identified and molecularly characterized. The putative, complete ORF was found to be 1368 bp in length, encoding a protein of 462 amino acids with a molecular mass of 52.6 kDa and a theoretical isoelectric point of 8.26. The deduced amino acid sequence contains a single carbohydrate recognition domain (CRD), in which six conserved cysteine residues and two Ca 2+ -binding site motifs (QPN, WND) were identified. Based on pairwise sequence analysis, ShDCS-like exhibits the highest amino acid identity (94.6%) and similarity (97.4%) with DC-SIGN-like counterpart from tiger tail seahorse Hippocampus comes. Quantitative real-time PCR revealed that ShDCS-like mRNA is transcribed universally in all tissues examined, but with abundance in kidney and gill tissues. The basal mRNA expression of ShDCS-like was modulated in blood cell, kidney, gill and liver tissues in response to the stimulation of healthy fish with lipopolysaccharides (LPS), Edwardsiella tarda, or Streptococcus iniae. Moreover, recombinant ShDCS-like-CRD domain exhibited detectable agglutination activity against different bacteria. Collectively, these results suggest that ShDCS-like may potentially involve in immune function in big belly seahorses. Copyright © 2017 Elsevier Ltd. All rights reserved.

Burkholderia megalochromosomata sp. nov., isolated from grassland soil.

PubMed

Baek, Inwoo; Seo, Boram; Lee, Imchang; Lee, Kihyun; Park, Sang-Cheol; Yi, Hana; Chun, Jongsik

2015-03-01

A Gram-stain negative, rod-shaped, non-spore-forming, obligate aerobic bacterial strain, JC2949(T), was isolated from grassland soil in Gwanak Mountain, Seoul, Republic of Korea. Phylogenetic analysis, based on 16S rRNA sequences, indicated that strain JC2949(T) belongs to the genus Burkholderia, showing highest sequence similarities with Burkholderia grimmiae R27(T) (98.8 %), Burkholderia cordobensis LMG 27620(T) (98.6 %), Burkholderia jiangsuensis MP-1T(T) (98.6 %), Burkholderia zhejiangensis OP-1(T) (98.5 %), Burkholderia humi LMG 22934(T) (97.5 %), Burkholderia terrestris LMG 22937(T) (97.3 %), Burkholderia telluris LMG 22936(T) (97.2 %) and Burkholderia glathei ATCC 29195(T) (97.0 %). The major fatty acids of strain JC2949(T) were C18 : 1ω7c, summed feature 3 (C16 : 1ω7c and/or C16 : 1ω6c) and C16 : 0. Its predominant polar lipids were phosphatidylethanolamine, diphosphatidylglycerol, phosphatidylglycerol and an unknown amino phospholipid. The dominant isoprenoid quinone was ubiquinone Q-8. The pairwise average nucleotide identity values between strain JC2949(T) and the genomes of 30 other species of the genus Burkholderia ranged from 73.4-90.4 %, indicating that the isolate is a novel genomic species within this genus. Based on phenotypic and chemotaxonomic comparisons, it is clear that strain JC2949(T) represents a novel species of the genus Burkholderia. We propose the name for this novel species to be Burkholderia megalochromosomata sp. nov. The type strain is JC2949(T) ( = KACC 17925(T) = JCM 19905(T)). © 2015 IUMS.
The Use of Weighted Graphs for Large-Scale Genome Analysis

PubMed Central

Zhou, Fang; Toivonen, Hannu; King, Ross D.

2014-01-01

There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution. PMID:24619061
Phylogenetic analysis of phenotypically characterized Cryptococcus laurentii isolates reveals high frequency of cryptic species.

PubMed

Ferreira-Paim, Kennio; Ferreira, Thatiana Bragine; Andrade-Silva, Leonardo; Mora, Delio Jose; Springer, Deborah J; Heitman, Joseph; Fonseca, Fernanda Machado; Matos, Dulcilena; Melhem, Márcia Souza Carvalho; Silva-Vergara, Mario León

2014-01-01

Although Cryptococcus laurentii has been considered saprophytic and its taxonomy is still being described, several cases of human infections have already reported. This study aimed to evaluate molecular aspects of C. laurentii isolates from Brazil, Botswana, Canada, and the United States. In this study, 100 phenotypically identified C. laurentii isolates were evaluated by sequencing the 18S nuclear ribosomal small subunit rRNA gene (18S-SSU), D1/D2 region of 28S nuclear ribosomal large subunit rRNA gene (28S-LSU), and the internal transcribed spacer (ITS) of the ribosomal region. BLAST searches using 550-bp, 650-bp, and 550-bp sequenced amplicons obtained from the 18S-SSU, 28S-LSU, and the ITS region led to the identification of 75 C. laurentii strains that shared 99-100% identity with C. laurentii CBS 139. A total of nine isolates shared 99% identity with both Bullera sp. VY-68 and C. laurentii RY1. One isolate shared 99% identity with Cryptococcus rajasthanensis CBS 10406, and eight isolates shared 100% identity with Cryptococcus sp. APSS 862 according to the 28S-LSU and ITS regions and designated as Cryptococcus aspenensis sp. nov. (CBS 13867). While 16 isolates shared 99% identity with Cryptococcus flavescens CBS 942 according to the 18S-SSU sequence, only six were confirmed using the 28S-LSU and ITS region sequences. The remaining 10 shared 99% identity with Cryptococcus terrestris CBS 10810, which was recently described in Brazil. Through concatenated sequence analyses, seven sequence types in C. laurentii, three in C. flavescens, one in C. terrestris, and one in the C. aspenensis sp. nov. were identified. Sequencing permitted the characterization of 75% of the environmental C. laurentii isolates from different geographical areas and the identification of seven haplotypes of this species. Among sequenced regions, the increased variability of the ITS region in comparison to the 18S-SSU and 28S-LSU regions reinforces its applicability as a DNA barcode.
NoFold: RNA structure clustering without folding or alignment.

PubMed

Middleton, Sarah A; Kim, Junhyong

2014-11-01

Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function-for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures. © 2014 Middleton and Kim; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Mosaic Graphs and Comparative Genomics in Phage Communities

PubMed Central

Belcaid, Mahdi; Bergeron, Anne

2010-01-01

Abstract Comparing the genomes of two closely related viruses often produces mosaics where nearly identical sequences alternate with sequences that are unique to each genome. When several closely related genomes are compared, the unique sequences are likely to be shared with third genomes, leading to virus mosaic communities. Here we present comparative analysis of sets of Staphylococcus aureus phages that share large identical sequences with up to three other genomes, and with different partners along their genomes. We introduce mosaic graphs to represent these complex recombination events, and use them to illustrate the breath and depth of sequence sharing: some genomes are almost completely made up of shared sequences, while genomes that share very large identical sequences can adopt alternate functional modules. Mosaic graphs also allow us to identify breakpoints that could eventually be used for the construction of recombination networks. These findings have several implications on phage metagenomics assembly, on the horizontal gene transfer paradigm, and more generally on the understanding of the composition and evolutionary dynamics of virus communities. PMID:20874413
Complete nucleotide sequence of a monopartite Begomovirus and associated satellites infecting Carica papaya in Nepal.

PubMed

Shahid, M S; Yoshida, S; Khatri-Chhetri, G B; Briddon, R W; Natsuaki, K T

2013-06-01

Carica papaya (papaya) is a fruit crop that is cultivated mostly in kitchen gardens throughout Nepal. Leaf samples of C. papaya plants with leaf curling, vein darkening, vein thickening, and a reduction in leaf size were collected from a garden in Darai village, Rampur, Nepal in 2010. Full-length clones of a monopartite Begomovirus, a betasatellite and an alphasatellite were isolated. The complete nucleotide sequence of the Begomovirus showed the arrangement of genes typical of Old World begomoviruses with the highest nucleotide sequence identity (>99 %) to an isolate of Ageratum yellow vein virus (AYVV), confirming it as an isolate of AYVV. The complete nucleotide sequence of betasatellite showed greater than 89 % nucleotide sequence identity to an isolate of Tomato leaf curl Java betasatellite originating from Indonesian. The sequence of the alphasatellite displayed 92 % nucleotide sequence identity to Sida yellow vein China alphasatellite. This is the first identification of these components in Nepal and the first time they have been identified in papaya.
Application of ISSR markers to analyze molecular relationships in Iranian jasmine (Jasminum spp.) accessions.

PubMed

Ghasemi Ghehsareh, Masood; Salehi, Hassan; Khosh-Khui, Morteza; Niazi, Ali

2015-01-01

There are many species of jasmines in different regions of Iran in natural or cultivated form, and there is no information about their genetic status. Therefore, inter-simple sequence repeat (ISSR) analysis was used to evaluate genetic variations of the 53 accessions representing eight species of Jasminum collected from different regions of Iran. A total of 21 ISSR primers were used which generated 981 bands of different sizes. Mean percentage of polymorphic bands was 90.64 %. Maximum resolving power, polymorphic information content average, and marker index values were 21.55, 0.35, and 14.42 for primers of 3, 4, and 3 respectively. The unweighted pair group method with arithmetic mean dendrogram based on Jaccard's coefficients indicated that 53 accessions were divided into two major clusters. The first major cluster was divided into two subclusters; the subcluster A included Jasminum grandiflorum L., J. officinale L., and J. azoricum L. and the subcluster B consisted of three forms of J. sambac L. (single, semi-double, and double flowers). The second major cluster was divided into two subclusters; the first subcluster (C) included J. humile L., J. primulinum Hemsl., J. nudiflorum Lindl. and the second subcluster (D) consisted of J. fruticans L. At the species level, the highest percentage of polymorphism (34.05 %), numbers of effective alleles (1.16), Shannon index (0.151), and Nei's genetic diversity (0.098) were observed in J. officinale. The lowest values of percentage polymorphism (0.011), number of effective alleles (1.009), Shannon index (0.007), and Nei's genetic diversity (0.005) were obtained for J. nudiflorum. Based on pairwise population matrix of Nei's unbiased genetic identity, the highest identity (0.85) was found between J.officinale and J. azoricum and the lowest identity (0.69) was between J. grandiflorum and J. perimulinum. Based on analysis of molecular variance, the amount of genetic variations among the eight populations was 83 %. This study demonstrated that the ISSR is an useful tool in jasmine genomic diversity studies and to detect their relationships.
A highly conserved N-terminal sequence for teleost vitellogenin with potential value to the biochemistry, molecular biology and pathology of vitellogenesis

USGS Publications Warehouse

Folmar, L.D.; Denslow, N.D.; Wallace, R.A.; LaFleur, G.; Gross, T.S.; Bonomelli, S.; Sullivan, C.V.

1995-01-01

N-terminal amino acid sequences for vitellogenin (Vtg) from six species of teleost fish (striped bass, mummichog, pinfish, brown bullhead, medaka, yellow perch and the sturgeon) are compared with published N-terminal Vtg sequences for the lamprey, clawed frog and domestic chicken. Striped bass and mummichog had 100% identical amino acids between positions 7 and 21, while pinfish, brown bullhead, sturgeon, lamprey, Xenopus and chicken had 87%, 93%, 60%, 47%, 47-60%) for four transcripts and had 40% identical, respectively, with striped bass for the same positions. Partial sequences obtained for medaka and yellow perch were 100% identical between positions 5 to 10. The potential utility of this conserved sequence for studies on the biochemistry, molecular biology and pathology of vitellogenesis is discussed.
Detection and molecular characterization of infectious bronchitis virus isolated from recent outbreaks in broiler flocks in Thailand.

PubMed

Pohuang, Tawatchai; Chansiripornchai, Niwat; Tawatsin, Achara; Sasipreeyajan, Jiroj

2009-09-01

Thirteen field isolates of infectious bronchitis virus (IBV) were isolated from broiler flocks in Thailand between January and June 2008. The 878-bp of the S1 gene covering a hypervariable region was amplified and sequenced. Phylogenetic analysis based on that region revealed that these viruses were separated into two groups (I and II). IBV isolates in group I were not related to other IBV strains published in the GenBank database. Group 1 nucleotide sequence identities were less than 85% and amino acid sequence identities less than 84% in common with IBVs published in the GenBank database. This group likely represents the strains indigenous to Thailand. The isolates in group II showed a close relationship with Chinese IBVs. They had nucleotide sequence identities of 97-98% and amino acid sequence identities 96-98% in common with Chinese IBVs (strain A2, SH and QXIBV). This finding indicated that the recent Thai IBVs evolved separately and at least two groups of viruses are circulating in Thailand.
Pairwise Sequence Alignment Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jeff Daily, PNNL

2015-05-20

Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, amore » novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less
Unexpected Relationships and Inbreeding in HapMap Phase III Populations

PubMed Central

Stevens, Eric L.; Baugher, Joseph D.; Shirley, Matthew D.; Frelin, Laurence P.; Pevsner, Jonathan

2012-01-01

Correct annotation of the genetic relationships between samples is essential for population genomic studies, which could be biased by errors or omissions. To this end, we used identity-by-state (IBS) and identity-by-descent (IBD) methods to assess genetic relatedness of individuals within HapMap phase III data. We analyzed data from 1,397 individuals across 11 ethnic populations. Our results support previous studies (Pemberton et al., 2010; Kyriazopoulou-Panagiotopoulou et al., 2011) assessing unknown relatedness present within this population. Additionally, we present evidence for 1,657 novel pairwise relationships across 9 populations. Surprisingly, significant Cotterman's coefficients of relatedness K1 (IBD1) values were detected between pairs of known parents. Furthermore, significant K2 (IBD2) values were detected in 32 previously annotated parent-child relationships. Consistent with a hypothesis of inbreeding, regions of homozygosity (ROH) were identified in the offspring of related parents, of which a subset overlapped those reported in previous studies (Gibson et al. 2010; Johnson et al. 2011). In total, we inferred 28 inbred individuals with ROH that overlapped areas of relatedness between the parents and/or IBD2 sharing at a different genomic locus between a child and a parent. Finally, 8 previously annotated parent-child relationships had unexpected K0 (IBD0) values (resulting from a chromosomal abnormality or genotype error), and 10 previously annotated second-degree relationships along with 38 other novel pairwise relationships had unexpected IBD2 (indicating two separate paths of recent ancestry). These newly described types of relatedness may impact the outcome of previous studies and should inform the design of future studies relying on the HapMap Phase III resource. PMID:23185369
SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale.

PubMed

Nepusz, Tamás; Sasidharan, Rajkumar; Paccanaro, Alberto

2010-03-09

An important problem in genomics is the automatic inference of groups of homologous proteins from pairwise sequence similarities. Several approaches have been proposed for this task which are "local" in the sense that they assign a protein to a cluster based only on the distances between that protein and the other proteins in the set. It was shown recently that global methods such as spectral clustering have better performance on a wide variety of datasets. However, currently available implementations of spectral clustering methods mostly consist of a few loosely coupled Matlab scripts that assume a fair amount of familiarity with Matlab programming and hence they are inaccessible for large parts of the research community. SCPS (Spectral Clustering of Protein Sequences) is an efficient and user-friendly implementation of a spectral method for inferring protein families. The method uses only pairwise sequence similarities, and is therefore practical when only sequence information is available. SCPS was tested on difficult sets of proteins whose relationships were extracted from the SCOP database, and its results were extensively compared with those obtained using other popular protein clustering algorithms such as TribeMCL, hierarchical clustering and connected component analysis. We show that SCPS is able to identify many of the family/superfamily relationships correctly and that the quality of the obtained clusters as indicated by their F-scores is consistently better than all the other methods we compared it with. We also demonstrate the scalability of SCPS by clustering the entire SCOP database (14,183 sequences) and the complete genome of the yeast Saccharomyces cerevisiae (6,690 sequences). Besides the spectral method, SCPS also implements connected component analysis and hierarchical clustering, it integrates TribeMCL, it provides different cluster quality tools, it can extract human-readable protein descriptions using GI numbers from NCBI, it interfaces with external tools such as BLAST and Cytoscape, and it can produce publication-quality graphical representations of the clusters obtained, thus constituting a comprehensive and effective tool for practical research in computational biology. Source code and precompiled executables for Windows, Linux and Mac OS X are freely available at http://www.paccanarolab.org/software/scps.
K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.

PubMed

Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue

2018-05-15

Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.
The structured ancestral selection graph and the many-demes limit.

PubMed

Slade, Paul F; Wakeley, John

2005-02-01

We show that the unstructured ancestral selection graph applies to part of the history of a sample from a population structured by restricted migration among subpopulations, or demes. The result holds in the limit as the number of demes tends to infinity with proportionately weak selection, and we have also made the assumptions of island-type migration and that demes are equivalent in size. After an instantaneous sample-size adjustment, this structured ancestral selection graph converges to an unstructured ancestral selection graph with a mutation parameter that depends inversely on the migration rate. In contrast, the selection parameter for the population is independent of the migration rate and is identical to the selection parameter in an unstructured population. We show analytically that estimators of the migration rate, based on pairwise sequence differences, derived under the assumption of neutrality should perform equally well in the presence of weak selection. We also modify an algorithm for simulating genealogies conditional on the frequencies of two selected alleles in a sample. This permits efficient simulation of stronger selection than was previously possible. Using this new algorithm, we simulate gene genealogies under the many-demes ancestral selection graph and identify some situations in which migration has a strong effect on the time to the most recent common ancestor of the sample. We find that a similar effect also increases the sensitivity of the genealogy to selection.
Sequence of a second gene encoding bovine submaxillary mucin: implication for mucin heterogeneity and cloning.

PubMed

Jiang, W; Woitach, J T; Gupta, D; Bhavanandan, V P

1998-10-20

Secreted epithelial mucins are extremely large and heterogeneous glycoproteins. We report the 5 kilobase DNA sequence of a second gene, BSM2, which encodes bovine submaxillary mucin. The determined nucleotide and deduced amino acid sequences of BSM2 are 95.2% and 92. 2% identical, respectively, to those of the previously described BSM1 gene isolated from the same cow. Further, the five predicted protein domains of the two genes are 100%, 94%, 93%, 77%, and 88% identical. Based on the above results, we propose that expression of multiple homologous core proteins from a single animal is a factor in generating diversity of saccharides in mucins and in providing resistance of the molecules to proteolysis. In addition, this work raises several important issues in mucin cloning such as assembling sequences from seemingly overlapping clones and deducing consensus sequences for nearly identical tandem repeats. Copyright 1998 Academic Press.
Identification of a new Apscaviroid from Japanese persimmon.

PubMed

Nakaune, Ryoji; Nakano, Masaaki

2008-01-01

Three viroid-like sequences were detected from Japanese persimmon (Diospyrus kaki Thunb.) by RT-PCR using primers specific for members of the genus Apscaviroid. Based on the sequences, we determined the complete genomic sequences. Two had 92.1-94.3% sequence identity with citrus viroid OS (CVd-OS) and 91.4-96.3% identity with apple fruit crinkle viroid (AFCVd), respectively. Another one, tentatively named persimmon viroid (PVd), had 396 nucleotides and less than 70% sequence identity with known viroids. The secondary structure of PVd is proposed to be rod-like with extensive base pairing and contains the terminal conserved region and the central conserved region characteristic of the genus Apscaviroid. Moreover, we confirmed that the viroids, including PVd, are graft transmissible from persimmon to persimmon and that persimmon is a natural host of these viroids. According to its molecular and biological properties, PVd should be considered a member of a new species in the genus Apscaviroid.
Host switch during evolution of a genetically distinct hantavirus in the American shrew mole (Neurotrichus gibbsii)

PubMed Central

Kang, Hae Ji; Bennett, Shannon N.; Dizney, Laurie; Sumibcay, Laarni; Arai, Satoru; Ruedas, Luis A.; Song, Jin-Won; Yanagihara, Richard

2009-01-01

A genetically distinct hantavirus, designated Oxbow virus (OXBV), was detected in tissues of an American shrew mole (Neurotrichus gibbsii), captured in Gresham, Oregon, in September 2003. Pairwise analysis of full-length S- and M- and partial L-segment nucleotide and amino acid sequences of OXBV indicated low sequence similarity with rodent-borne hantaviruses. Phylogenetic analyses using maximum-likelihood and Bayesian methods, and host-parasite evolutionary comparisons, showed that OXBV and Asama virus, a hantavirus recently identified from the Japanese shrew mole (Urotrichus talpoides), were related to soricine shrew-borne hantaviruses from North America and Eurasia, respectively, suggesting parallel evolution associated with cross-species transmission. PMID:19394994
Trellis Coding of Non-coherent Multiple Symbol Full Response M-ary CPFSK with Modulation Index 1/M

NASA Technical Reports Server (NTRS)

Lee, H.; Divsalar, D.; Weber, C.

1994-01-01

This paper introduces a trellis coded modulation (TCM) scheme for non-coherent multiple full response M-ary CPFSK with modulation index 1/M. A proper branch metric for the trellis decoder is obtained by employing a simple approximation of the modified Bessel function for large signal to noise ratio (SNR). Pairwise error probability of coded sequences is evaluated by applying a linear approximation to the Rician random variable.
Synthetic Progress toward Azadirachtins. 1. Enantio- and Diastereoselective Synthesis of the Left-Wing Fragment of 11-epi-Azadirachtin I.

PubMed

Shi, Hang; Tan, Ceheng; Zhang, Weibin; Zhang, Zichun; Long, Rong; Luo, Tuoping; Yang, Zhen

2015-05-15

A highly enantio- and diastereoselective synthesis of the left-wing fragment of 11-epi-azadirachtin I characterized with the pairwise use of palladium- and gold-catalyzed cascade reactions is presented. By enlisting a sequence of stereocontrolled transformations, our 21-step route established the stereocenters of the left-wing fragment from one chiral starting material, (-)-carvone, which would significantly facilitate the synthetic studies of the azadirachtin-type limonoids.
Taxonomic evaluation of Streptomyces albus and related species using multilocus sequence analysis and proposals to emend the description of Streptomyces albus and describe Streptomyces pathocidini sp. nov

USDA-ARS?s Scientific Manuscript database

In phylogenetic analyses of the genus Streptomyces using 16S rRNA gene sequences, Streptomyces albus subsp. albus NRRL B-1811T forms a cluster with 5 other species having identical or nearly identical 16S rRNA gene sequences. Moreover, the morphological and physiological characteristics of these oth...

Manifold learning for automatically predicting articular cartilage morphology in the knee with data from the osteoarthritis initiative (OAI)

NASA Astrophysics Data System (ADS)

Donoghue, C.; Rao, A.; Bull, A. M. J.; Rueckert, D.

2011-03-01

Osteoarthritis (OA) is a degenerative, debilitating disease with a large socio-economic impact. This study looks to manifold learning as an automatic approach to harness the plethora of data provided by the Osteoarthritis Initiative (OAI). We construct several Laplacian Eigenmap embeddings of articular cartilage appearance from MR images of the knee using multiple MR sequences. A region of interest (ROI) defined as the weight bearing medial femur is automatically located in all images through non-rigid registration. A pairwise intensity based similarity measure is computed between all images, resulting in a fully connected graph, where each vertex represents an image and the weight of edges is the similarity measure. Spectral analysis is then applied to these pairwise similarities, which acts to reduce the dimensionality non-linearly and embeds these images in a manifold representation. In the manifold space, images that are close to each other are considered to be more "similar" than those far away. In the experiment presented here we use manifold learning to automatically predict the morphological changes in the articular cartilage by using the co-ordinates of the images in the manifold as independent variables for multiple linear regression. In the study presented here five manifolds are generated from five sequences of 390 distinct knees. We find statistically significant correlations (up to R2 = 0.75), between our predictors and the results presented in the literature.
Sequence specificity, statistical potentials, and three-dimensional structure prediction with self-correcting distance geometry calculations of beta-sheet formation in proteins.

PubMed Central

Zhu, H.; Braun, W.

1999-01-01

A statistical analysis of a representative data set of 169 known protein structures was used to analyze the specificity of residue interactions between spatial neighboring strands in beta-sheets. Pairwise potentials were derived from the frequency of residue pairs in nearest contact, second nearest and third nearest contacts across neighboring beta-strands compared to the expected frequency of residue pairs in a random model. A pseudo-energy function based on these statistical pairwise potentials recognized native beta-sheets among possible alternative pairings. The native pairing was found within the three lowest energies in 73% of the cases in the training data set and in 63% of beta-sheets in a test data set of 67 proteins, which were not part of the training set. The energy function was also used to detect tripeptides, which occur frequently in beta-sheets of native proteins. The majority of native partners of tripeptides were distributed in a low energy range. Self-correcting distance geometry (SECODG) calculations using distance constraints sets derived from possible low energy pairing of beta-strands uniquely identified the native pairing of the beta-sheet in pancreatic trypsin inhibitor (BPTI). These results will be useful for predicting the structure of proteins from their amino acid sequence as well as for the design of proteins containing beta-sheets. PMID:10048326
Web-Beagle: a web server for the alignment of RNA secondary structures.

PubMed

Mattei, Eugenio; Pietrosanto, Marco; Ferrè, Fabrizio; Helmer-Citterich, Manuela

2015-07-01

Web-Beagle (http://beagle.bio.uniroma2.it) is a web server for the pairwise global or local alignment of RNA secondary structures. The server exploits a new encoding for RNA secondary structure and a substitution matrix of RNA structural elements to perform RNA structural alignments. The web server allows the user to compute up to 10 000 alignments in a single run, taking as input sets of RNA sequences and structures or primary sequences alone. In the latter case, the server computes the secondary structure prediction for the RNAs on-the-fly using RNAfold (free energy minimization). The user can also compare a set of input RNAs to one of five pre-compiled RNA datasets including lncRNAs and 3' UTRs. All types of comparison produce in output the pairwise alignments along with structural similarity and statistical significance measures for each resulting alignment. A graphical color-coded representation of the alignments allows the user to easily identify structural similarities between RNAs. Web-Beagle can be used for finding structurally related regions in two or more RNAs, for the identification of homologous regions or for functional annotation. Benchmark tests show that Web-Beagle has lower computational complexity, running time and better performances than other available methods. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Characterization of acid-tolerant H/CO-utilizing methanogenic enrichment cultures from an acidic peat bog in New York State.

PubMed

Bräuer, Suzanna L; Yashiro, Erika; Ueno, Norikiyo G; Yavitt, Joseph B; Zinder, Stephen H

2006-08-01

Two methanogenic cultures were enriched from acidic peat soil using a growth medium buffered to c. pH 5. One culture, 6A, was obtained from peat after incubation with H(2)/CO(2), whereas culture NTA was derived from a 10(-4) dilution of untreated peat into a modified medium. 16S rRNA gene clone libraries from each culture contained one methanogen and two bacterial sequences. The methanogen 16S rRNA gene sequences were 99% identical with each other and belonged to the novel "R-10/Fen cluster" family of the Methanomicrobiales, whereas their mcrA sequences were 96% identical. One bacterial 16S rRNA gene sequence from culture 6A belonged to the Bacteroidetes and showed 99% identity with sequences from methanogenic enrichments from German and Russian bogs. The other sequence belonged to the Firmicutes and was identical to a thick rod-shaped citrate-utilizing organism isolated from culture 6A, the numbers of which decreased when the Ti (III) chelator was switched from citrate to nitrilotriacetate. Bacterial clones from the NTA culture clustered in the Delta- and Betaproteobacteria. Both cultures contained thin rods, presumably the methanogens, as the predominant morphotype, and represent a significant advance in characterization of the novel acidiphilic R-10 family methanogens.
Utility of 16S rDNA Sequencing for Identification of Rare Pathogenic Bacteria.

PubMed

Loong, Shih Keng; Khor, Chee Sieng; Jafar, Faizatul Lela; AbuBakar, Sazaly

2016-11-01

Phenotypic identification systems are established methods for laboratory identification of bacteria causing human infections. Here, the utility of phenotypic identification systems was compared against 16S rDNA identification method on clinical isolates obtained during a 5-year study period, with special emphasis on isolates that gave unsatisfactory identification. One hundred and eighty-seven clinical bacteria isolates were tested with commercial phenotypic identification systems and 16S rDNA sequencing. Isolate identities determined using phenotypic identification systems and 16S rDNA sequencing were compared for similarity at genus and species level, with 16S rDNA sequencing as the reference method. Phenotypic identification systems identified ~46% (86/187) of the isolates with identity similar to that identified using 16S rDNA sequencing. Approximately 39% (73/187) and ~15% (28/187) of the isolates showed different genus identity and could not be identified using the phenotypic identification systems, respectively. Both methods succeeded in determining the species identities of 55 isolates; however, only ~69% (38/55) of the isolates matched at species level. 16S rDNA sequencing could not determine the species of ~20% (37/187) of the isolates. The 16S rDNA sequencing is a useful method over the phenotypic identification systems for the identification of rare and difficult to identify bacteria species. The 16S rDNA sequencing method, however, does have limitation for species-level identification of some bacteria highlighting the need for better bacterial pathogen identification tools. © 2016 Wiley Periodicals, Inc.
tRNADB-CE: tRNA gene database well-timed in the era of big sequence data.

PubMed

Abe, Takashi; Inokuchi, Hachiro; Yamada, Yuko; Muto, Akira; Iwasaki, Yuki; Ikemura, Toshimichi

2014-01-01

The tRNA gene data base curated by experts "tRNADB-CE" (http://trna.ie.niigata-u.ac.jp) was constructed by analyzing 1,966 complete and 5,272 draft genomes of prokaryotes, 171 viruses', 121 chloroplasts', and 12 eukaryotes' genomes plus fragment sequences obtained by metagenome studies of environmental samples. 595,115 tRNA genes in total, and thus two times of genes compiled previously, have been registered, for which sequence, clover-leaf structure, and results of sequence-similarity and oligonucleotide-pattern searches can be browsed. To provide collective knowledge with help from experts in tRNA researches, we added a column for enregistering comments to each tRNA. By grouping bacterial tRNAs with an identical sequence, we have found high phylogenetic preservation of tRNA sequences, especially at the phylum level. Since many species-unknown tRNAs from metagenomic sequences have sequences identical to those found in species-known prokaryotes, the identical sequence group (ISG) can provide phylogenetic markers to investigate the microbial community in an environmental ecosystem. This strategy can be applied to a huge amount of short sequences obtained from next-generation sequencers, as showing that tRNADB-CE is a well-timed database in the era of big sequence data. It is also discussed that batch-learning self-organizing-map with oligonucleotide composition is useful for efficient knowledge discovery from big sequence data.
Sequence-Selective Formation of Synthetic H-Bonded Duplexes

PubMed Central

2017-01-01

Oligomers equipped with a sequence of phenol and pyridine N-oxide groups form duplexes via H-bonding interactions between these recognition units. Reductive amination chemistry was used to synthesize all possible 3-mer sequences: AAA, AAD, ADA, DAA, ADD, DAD, DDA, and DDD. Pairwise interactions between the oligomers were investigated using NMR titration and dilution experiments in toluene. The measured association constants vary by 3 orders of magnitude (102 to 105 M–1). Antiparallel sequence-complementary oligomers generally form more stable complexes than mismatched duplexes. Mismatched duplexes that have an excess of H-bond donors are stabilized by the interaction of two phenol donors with one pyridine N-oxide acceptor. Oligomers that have a H-bond donor and acceptor on the ends of the chain can fold to form intramolecular H-bonds in the free state. The 1,3-folding equilibrium competes with duplex formation and lowers the stability of duplexes involving these sequences. As a result, some of the mismatch duplexes are more stable than some of the sequence-complementary duplexes. However, the most stable mismatch duplexes contain DDD and compete with the most stable sequence-complementary duplex, AAA·DDD, so in mixtures that contain all eight sequences, sequence-complementary duplexes dominate. Even higher fidelity sequence selectivity can be achieved if alternating donor–acceptor sequences are avoided. PMID:28857551
Molecular characterization of a distinct monopartite begomovirus associated with betasatellites and alphasatellites infecting Pisum sativum in Nepal.

PubMed

Shahid, M S; Pudashini, B J; Khatri-Chhetri, G B; Briddon, R W; Natsuaki, K T

2017-04-01

Pea (Pisum sativum) plants exhibiting leaf distortion, yellowing, stunted growth and reduction in leaf size from Rampur, Nepal were shown to be infected by a begomovirus in association with betasatellites and alphasatellites. The begomovirus associated with the disease showed only low levels of nucleotide sequence identity (<91%) to previously characterized begomoviruses. This finding indicates that the pea samples were infected with an as yet undescribed begomovirus for which the name Pea leaf distortion virus (PLDV) is proposed. Two species of betasatellite were identified in association with PLDV. One group of sequences had high (>78%) nucleotide sequence identity to isolates of Ludwigia leaf distortion betasatellite (LuLDB), and the second group had less than 78% to all other betasatellite sequences. This showed PLDV to be associated with either LuLDB or a previously undescribed betasatellite for which the name Pea leaf distortion betasatellite is proposed. Two types of alphasatellites were identified in the PLDV-infected pea plants. The first type showed high levels of sequence identity to Ageratum yellow vein alphasatellite, and the second type showed high levels of identity to isolates of Sida yellow vein China alphasatellite. These are the first begomovirus, betasatellites and alphasatellites isolated from pea.
Phylogenetic Analysis of Phenotypically Characterized Cryptococcus laurentii Isolates Reveals High Frequency of Cryptic Species

PubMed Central

Ferreira-Paim, Kennio; Ferreira, Thatiana Bragine; Andrade-Silva, Leonardo; Mora, Delio Jose; Springer, Deborah J.; Heitman, Joseph; Fonseca, Fernanda Machado; Matos, Dulcilena; Melhem, Márcia Souza Carvalho; Silva-Vergara, Mario León

2014-01-01

Background Although Cryptococcus laurentii has been considered saprophytic and its taxonomy is still being described, several cases of human infections have already reported. This study aimed to evaluate molecular aspects of C. laurentii isolates from Brazil, Botswana, Canada, and the United States. Methods In this study, 100 phenotypically identified C. laurentii isolates were evaluated by sequencing the 18S nuclear ribosomal small subunit rRNA gene (18S-SSU), D1/D2 region of 28S nuclear ribosomal large subunit rRNA gene (28S-LSU), and the internal transcribed spacer (ITS) of the ribosomal region. Results BLAST searches using 550-bp, 650-bp, and 550-bp sequenced amplicons obtained from the 18S-SSU, 28S-LSU, and the ITS region led to the identification of 75 C. laurentii strains that shared 99–100% identity with C. laurentii CBS 139. A total of nine isolates shared 99% identity with both Bullera sp. VY-68 and C. laurentii RY1. One isolate shared 99% identity with Cryptococcus rajasthanensis CBS 10406, and eight isolates shared 100% identity with Cryptococcus sp. APSS 862 according to the 28S-LSU and ITS regions and designated as Cryptococcus aspenensis sp. nov. (CBS 13867). While 16 isolates shared 99% identity with Cryptococcus flavescens CBS 942 according to the 18S-SSU sequence, only six were confirmed using the 28S-LSU and ITS region sequences. The remaining 10 shared 99% identity with Cryptococcus terrestris CBS 10810, which was recently described in Brazil. Through concatenated sequence analyses, seven sequence types in C. laurentii, three in C. flavescens, one in C. terrestris, and one in the C. aspenensis sp. nov. were identified. Conclusions Sequencing permitted the characterization of 75% of the environmental C. laurentii isolates from different geographical areas and the identification of seven haplotypes of this species. Among sequenced regions, the increased variability of the ITS region in comparison to the 18S-SSU and 28S-LSU regions reinforces its applicability as a DNA barcode. PMID:25251413
Identification of Habitat-Specific Biomes of Aquatic Fungal Communities Using a Comprehensive Nearly Full-Length 18S rRNA Dataset Enriched with Contextual Data

PubMed Central

Panzer, Katrin; Yilmaz, Pelin; Weiß, Michael; Reich, Lothar; Richter, Michael; Wiese, Jutta; Schmaljohann, Rolf; Labes, Antje; Imhoff, Johannes F.; Glöckner, Frank Oliver; Reich, Marlis

2015-01-01

Molecular diversity surveys have demonstrated that aquatic fungi are highly diverse, and that they play fundamental ecological roles in aquatic systems. Unfortunately, comparative studies of aquatic fungal communities are few and far between, due to the scarcity of adequate datasets. We combined all publicly available fungal 18S ribosomal RNA (rRNA) gene sequences with new sequence data from a marine fungi culture collection. We further enriched this dataset by adding validated contextual data. Specifically, we included data on the habitat type of the samples assigning fungal taxa to ten different habitat categories. This dataset has been created with the intention to serve as a valuable reference dataset for aquatic fungi including a phylogenetic reference tree. The combined data enabled us to infer fungal community patterns in aquatic systems. Pairwise habitat comparisons showed significant phylogenetic differences, indicating that habitat strongly affects fungal community structure. Fungal taxonomic composition differed considerably even on phylum and class level. Freshwater fungal assemblage was most different from all other habitat types and was dominated by basal fungal lineages. For most communities, phylogenetic signals indicated clustering of sequences suggesting that environmental factors were the main drivers of fungal community structure, rather than species competition. Thus, the diversification process of aquatic fungi must be highly clade specific in some cases.The combined data enabled us to infer fungal community patterns in aquatic systems. Pairwise habitat comparisons showed significant phylogenetic differences, indicating that habitat strongly affects fungal community structure. Fungal taxonomic composition differed considerably even on phylum and class level. Freshwater fungal assemblage was most different from all other habitat types and was dominated by basal fungal lineages. For most communities, phylogenetic signals indicated clustering of sequences suggesting that environmental factors were the main drivers of fungal community structure, rather than species competition. Thus, the diversification process of aquatic fungi must be highly clade specific in some cases. PMID:26226014
LZW-Kernel: fast kernel utilizing variable length code blocks from LZW compressors for protein sequence classification.

PubMed

Filatov, Gleb; Bauwens, Bruno; Kertész-Farkas, Attila

2018-05-07

Bioinformatics studies often rely on similarity measures between sequence pairs, which often pose a bottleneck in large-scale sequence analysis. Here, we present a new convolutional kernel function for protein sequences called the LZW-Kernel. It is based on code words identified with the Lempel-Ziv-Welch (LZW) universal text compressor. The LZW-Kernel is an alignment-free method, it is always symmetric, is positive, always provides 1.0 for self-similarity and it can directly be used with Support Vector Machines (SVMs) in classification problems, contrary to normalized compression distance (NCD), which often violates the distance metric properties in practice and requires further techniques to be used with SVMs. The LZW-Kernel is a one-pass algorithm, which makes it particularly plausible for big data applications. Our experimental studies on remote protein homology detection and protein classification tasks reveal that the LZW-Kernel closely approaches the performance of the Local Alignment Kernel (LAK) and the SVM-pairwise method combined with Smith-Waterman (SW) scoring at a fraction of the time. Moreover, the LZW-Kernel outperforms the SVM-pairwise method when combined with BLAST scores, which indicates that the LZW code words might be a better basis for similarity measures than local alignment approximations found with BLAST. In addition, the LZW-Kernel outperforms n-gram based mismatch kernels, hidden Markov model based SAM and Fisher kernel, and protein family based PSI-BLAST, among others. Further advantages include the LZW-Kernel's reliance on a simple idea, its ease of implementation, and its high speed, three times faster than BLAST and several magnitudes faster than SW or LAK in our tests. LZW-Kernel is implemented as a standalone C code and is a free open-source program distributed under GPLv3 license and can be downloaded from https://github.com/kfattila/LZW-Kernel. akerteszfarkas@hse.ru. Supplementary data are available at Bioinformatics Online.
Molecular basis for specificity in the druggable kinome: sequence-based analysis.

PubMed

Chen, Jianping; Zhang, Xi; Fernández, Ariel

2007-03-01

Rational design of kinase inhibitors remains a challenge partly because there is no clear delineation of the molecular features that direct the pharmacological impact towards clinically relevant targets. Standard factors governing ligand affinity, such as potential for intermolecular hydrophobic interactions or for intermolecular hydrogen bonding do not provide good markers to assess cross reactivity. Thus, a core question in the informatics of drug design is what type of molecular similarity among targets promotes promiscuity and what type of molecular difference governs specificity. This work answers the question for a sizable screened sample of the human pharmacokinome including targets with unreported structure. We show that drug design aimed at promoting pairwise interactions between ligand and kinase target actually fosters promiscuity because of the high conservation of the partner groups on or around the ATP-binding site of the kinase. Alternatively, we focus on a structural marker that may be reliably determined from sequence and measures dehydration propensities mostly localized on the loopy regions of kinases. Based on this marker, we construct a sequence-based kinase classifier that enables the accurate prediction of pharmacological differences. Our indicator is a microenvironmental descriptor that quantifies the propensity for water exclusion around preformed polar pairs. The results suggest that targeting polar dehydration patterns heralds a new generation of drugs that enable a tighter control of specificity than designs aimed at promoting ligand-kinase pairwise interactions. The predictor of polar hot spots for dehydration propensity, or solvent-accessible hydrogen bonds in soluble proteins, named YAPView, may be freely downloaded from the University of Chicago website http://protlib.uchicago.edu/dloads.html. Supplementary data are available at Bioinformatics online.
Humans and Great Apes Cohabiting the Forest Ecosystem in Central African Republic Harbour the Same Hookworms

PubMed Central

Hasegawa, Hideo; Modrý, David; Kitagawa, Masahiro; Shutt, Kathryn A.; Todd, Angelique; Kalousová, Barbora; Profousová, Ilona; Petrželková, Klára J.

2014-01-01

Background Hookworms are important pathogens of humans. To date, Necator americanus is the sole, known species of the genus Necator infecting humans. In contrast, several Necator species have been described in African great apes and other primates. It has not yet been determined whether primate-originating Necator species are also parasitic in humans. Methodology/Principal Findings The infective larvae of Necator spp. were developed using modified Harada-Mori filter-paper cultures from faeces of humans and great apes inhabiting Dzanga-Sangha Protected Areas, Central African Republic. The first and second internal transcribed spacers (ITS-1 and ITS-2) of nuclear ribosomal DNA and partial cytochrome c oxidase subunit 1 (cox1) gene of mtDNA obtained from the hookworm larvae were sequenced and compared. Three sequence types (I–III) were recognized in the ITS region, and 34 cox1 haplotypes represented three phylogenetic groups (A–C). The combinations determined were I-A, II-B, II-C, III-B and III-C. Combination I-A, corresponding to N. americanus, was demonstrated in humans and western lowland gorillas; II-B and II-C were observed in humans, western lowland gorillas and chimpanzees; III-B and III-C were found only in humans. Pairwise nucleotide difference in the cox1 haplotypes between the groups was more than 8%, while the difference within each group was less than 2.1%. Conclusions/Significance The distinctness of ITS sequence variants and high number of pairwise nucleotide differences among cox1 variants indicate the possible presence of several species of Necator in both humans and great apes. We conclude that Necator hookworms are shared by humans and great apes co-habiting the same tropical forest ecosystems. PMID:24651493
Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

PubMed

Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

2014-09-18

Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
fRMSDPred: Predicting Local RMSD Between Structural Fragments Using Sequence Information

DTIC Science & Technology

2007-04-04

machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment-level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high-quality alignment segments. We present algorithms to solve this fragment-level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second-order pairwise exponential kernel
Using sobol sequences for planning computer experiments

NASA Astrophysics Data System (ADS)

Statnikov, I. N.; Firsov, G. I.

2017-12-01

Discusses the use for research of problems of multicriteria synthesis of dynamic systems method of Planning LP-search (PLP-search), which not only allows on the basis of the simulation model experiments to revise the parameter space within specified ranges of their change, but also through special randomized nature of the planning of these experiments is to apply a quantitative statistical evaluation of influence of change of varied parameters and their pairwise combinations to analyze properties of the dynamic system.Start your abstract here...
Molecular identification of a new begomovirus infecting yellow passion fruit (Passiflora edulis) in Colombia.

PubMed

Vaca-Vaca, Juan Carlos; Carrasco-Lozano, Emerson Clovis; López-López, Karina

2017-02-01

The complete genome sequence of a bipartite begomovirus (genus Begomovirus, family Geminiviridae) infecting yellow passion fruit (Passiflora edulis) in the state of Valle del Cauca (Colombia) has been determined. The complete DNA-A and DNA-B components were determined to be 2600 and 2572 nt in length, respectively. The DNA-A showed the highest nucleotide sequence identity (87.2 %) to bean dwarf mosaic virus (M88179), a begomovirus found in common bean crops in Colombia, and only 77.4 % identity to passion fruit severe leaf distortion virus (FJ972767), a begomovirus identified infecting passion fruit in Brazil. Based on its sequence identity to all other begomoviruses known to date and in accordance with the ICTV species demarcation criterion for the genus Begomovirus (≥91 % sequence identity for the complete DNA-A), the name passion fruit leaf distortion virus is proposed for this new begomovirus. To our knowledge, this is the first report of a bipartite begomovirus affecting passion fruit in Colombia and the second report of a geminivirus affecting this crop worldwide.
Sequence identity and antigenic cross-reactivity of white face hornet venom allergen, also a hyaluronidase, with other proteins.

PubMed

Lu, G; Kochoumian, L; King, T P

1995-03-03

White face hornet (Dolichovespula maculata) venom has three known protein allergens which induce IgE response in susceptible people. They are antigen 5, phospholipase A1, and hyaluronidase, also known as Dol m 5, 1, and 2, respectively. We have cloned Dol m 2, a protein of 331 residues. When expressed in bacteria, a mixture of recombinant Dol m 2 and its fragments was obtained. The fragments were apparently generated by proteolysis of a Met-Met bond at residue 122, as they were not observed for a Dol m 2 mutant with a Leu-Met bond. Dol m 2 has 56% sequence identity with the honey bee venom allergen hyaluronidase and 27% identity with PH-20, a human sperm protein with hyaluronidase activity. A common feature of hornet venom allergens is their sequence identity with other proteins in our environment. We showed previously the sequence identity of Dol m 5 with a plant protein and a mammalian testis protein and of Dol m 1 with mammalian lipases. In BALB/c mice, Dol m 2 and bee hyaluronidase showed cross-reactivity at both antibody and T cell levels. These findings are relevant to some patients' multiple sensitivity to hornet and bee stings.
Optimization of identity operation in NMR spectroscopy via genetic algorithm: Application to the TEDOR experiment

NASA Astrophysics Data System (ADS)

Manu, V. S.; Veglia, Gianluigi

2016-12-01

Identity operation in the form of π pulses is widely used in NMR spectroscopy. For an isolated single spin system, a sequence of even number of π pulses performs an identity operation, leaving the spin state essentially unaltered. For multi-spin systems, trains of π pulses with appropriate phases and time delays modulate the spin Hamiltonian to perform operations such as decoupling and recoupling. However, experimental imperfections often jeopardize the outcome, leading to severe losses in sensitivity. Here, we demonstrate that a newly designed Genetic Algorithm (GA) is able to optimize a train of π pulses, resulting in a robust identity operation. As proof-of-concept, we optimized the recoupling sequence in the transferred-echo double-resonance (TEDOR) pulse sequence, a key experiment in biological magic angle spinning (MAS) solid-state NMR for measuring multiple carbon-nitrogen distances. The GA modified TEDOR (GMO-TEDOR) experiment with improved recoupling efficiency results in a net gain of sensitivity up to 28% as tested on a uniformly 13C, 15N labeled microcrystalline ubiquitin sample. The robust identity operation achieved via GA paves the way for the optimization of several other pulse sequences used for both solid- and liquid-state NMR used for decoupling, recoupling, and relaxation experiments.
Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

PubMed

Soares, Inês; Goios, Ana; Amorim, António

2012-01-01

The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

The twilight zone of cis element alignments.

PubMed

Sebastian, Alvaro; Contreras-Moreira, Bruno

2013-02-01

Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein-DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein-DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.
The twilight zone of cis element alignments

PubMed Central

Sebastian, Alvaro; Contreras-Moreira, Bruno

2013-01-01

Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein–DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein–DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments. PMID:23268451
Analysis of Ribosome Inactivating Protein (RIP): A Bioinformatics Approach

NASA Astrophysics Data System (ADS)

Jothi, G. Edward Gnana; Majilla, G. Sahaya Jose; Subhashini, D.; Deivasigamani, B.

2012-10-01

In spite of the medical advances in recent years, the world is in need of different sources to encounter certain health issues.Ribosome Inactivating Proteins (RIPs) were found to be one among them. In order to get easy access about RIPs, there is a need to analyse RIPs towards constructing a database on RIPs. Also, multiple sequence alignment was done towards screening for homologues of significant RIPs from rare sources against RIPs from easily available sources in terms of similarity. Protein sequences were retrieved from SWISS-PROT and are further analysed using pair wise and multiple sequence alignment.Analysis shows that, 151 RIPs have been characterized to date. Amongst them, there are 87 type I, 37 type II, 1 type III and 25 unknown RIPs. The sequence length information of various RIPs about the availability of full or partial sequence was also found. The multiple sequence alignment of 37 type I RIP using the online server Multalin, indicates the presence of 20 conserved residues. Pairwise alignment and multiple sequence alignment of certain selected RIPs in two groups namely Group I and Group II were carried out and the consensus level was found to be 98%, 98% and 90% respectively.
Implementation of Objective PASC-Derived Taxon Demarcation Criteria for Official Classification of Filoviruses.

PubMed

Bào, Yīmíng; Amarasinghe, Gaya K; Basler, Christopher F; Bavari, Sina; Bukreyev, Alexander; Chandran, Kartik; Dolnik, Olga; Dye, John M; Ebihara, Hideki; Formenty, Pierre; Hewson, Roger; Kobinger, Gary P; Leroy, Eric M; Mühlberger, Elke; Netesov, Sergey V; Patterson, Jean L; Paweska, Janusz T; Smither, Sophie J; Takada, Ayato; Towner, Jonathan S; Volchkov, Viktor E; Wahl-Jensen, Victoria; Kuhn, Jens H

2017-05-11

The mononegaviral family Filoviridae has eight members assigned to three genera and seven species. Until now, genus and species demarcation were based on arbitrarily chosen filovirus genome sequence divergence values (≈50% for genera, ≈30% for species) and arbitrarily chosen phenotypic virus or virion characteristics. Here we report filovirus genome sequence-based taxon demarcation criteria using the publicly accessible PAirwise Sequencing Comparison (PASC) tool of the US National Center for Biotechnology Information (Bethesda, MD, USA). Comparison of all available filovirus genomes in GenBank using PASC revealed optimal genus demarcation at the 55-58% sequence diversity threshold range for genera and at the 23-36% sequence diversity threshold range for species. Because these thresholds do not change the current official filovirus classification, these values are now implemented as filovirus taxon demarcation criteria that may solely be used for filovirus classification in case additional data are absent. A near-complete, coding-complete, or complete filovirus genome sequence will now be required to allow official classification of any novel "filovirus." Classification of filoviruses into existing taxa or determining the need for novel taxa is now straightforward and could even become automated using a presented algorithm/flowchart rooted in RefSeq (type) sequences.
Sockeye: A 3D Environment for Comparative Genomics

PubMed Central

Montgomery, Stephen B.; Astakhova, Tamara; Bilenky, Mikhail; Birney, Ewan; Fu, Tony; Hassel, Maik; Melsopp, Craig; Rak, Marcin; Robertson, A. Gordon; Sleumer, Monica; Siddiqui, Asim S.; Jones, Steven J.M.

2004-01-01

Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization. PMID:15123592
CAFE: aCcelerated Alignment-FrEe sequence analysis.

PubMed

Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu

2017-07-03

Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Isolation and characterization of a novel chlorpyrifos degrading flavobacterium species EMBS0145 by 16S rRNA gene sequencing.

PubMed

Amareshwari, P; Bhatia, Mayuri; Venkatesh, K; Roja Rani, A; Ravi, G V; Bhakt, Priyanka; Bandaru, Srinivas; Yadav, Mukesh; Nayarisseri, Anuraj; Nair, Achuthsankar S

2015-03-01

Indiscriminate application of pesticides like chlorpyrifos, diazinon, or malathion contaminate the soil in addition has being unsafe often it has raised severe health concerns. Conversely, microorganisms like Trichoderma, Aspergillus and Bacteria like Rhizobium Bacillus, Azotobacter, Flavobacterium etc have evolved that are endowed with degradation of pesticides aforementioned to non-toxic products. The current study pitches into identification of a novel species of Flavobacterium bacteria capable to degrade the Organophosphorous pesticides. The bacterium was isolated from agricultural soil collected from Guntur District, Andhra Pradesh, India. The samples were serially diluted and the aliquots were incubated for a suitable time following which the suspected colony was subjected to 16S rDNA sequencing. The sequence thus obtained was aligned pairwise against Flavobacterium species, which resulted in identification of novel specie of Flavobacterium later named as EMBS0145, the sequence of which was deposited in in GenBank with accession number JN794045.
Cytochrome c oxidase subunit I barcoding of the green bee-eater (Merops orientalis).

PubMed

Arif, I A; Khan, H A; Shobrak, M; Williams, J

2011-10-21

DNA barcoding using mitochondrial cytochrome c oxidase subunit I (COI) is regarded as a standard method for species identification. Recent reports have also shown extended applications of COI gene analysis in phylogeny and molecular diversity studies. The bee-eaters are a group of near passerine birds in the family Meropidae. There are 26 species worldwide; five of them are found in Saudi Arabia. Until now, GenBank included a COI barcode for only one species of bee-eater, the European bee-eater (Merops apiaster). We sequenced the 694-bp segment of the COI gene of the green bee-eater M. orientalis and compared the sequences with those of M. apiaster. Pairwise sequence comparison showed 66 variable sites across all the eight sequences from both species, with an interspecific genetic distance of 0.0362. Two and one within-species variable sites were found, with genetic distances of 0.0005 and 0.0003 for M. apiaster and M. orientalis, respectively. This is the first study reporting barcodes for M. orientalis.
Polypeptide having or assisting in carbohydrate material degrading activity and uses thereof

DOEpatents

Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter

2016-02-16

The invention relates to a polypeptide which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 76% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well asmore » the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.« less
Polypeptide having swollenin activity and uses thereof

DOEpatents

Schoonneveld-Bergmans, Margot Elizabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica D; Damveld, Robbertus Antonius

2015-11-04

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having beta-glucosidase activity and uses thereof

DOEpatents

Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; De Jong, Rene Marcel; Damveld, Robbertus Antonius

2015-09-01

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 70% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having cellobiohydrolase activity and uses thereof

DOEpatents

Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter

2015-09-15

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 93% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having acetyl xylan esterase activity and uses thereof

DOEpatents

Schoonneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Los, Alrik Pieter

2015-10-20

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 82% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Polypeptide having carbohydrate degrading activity and uses thereof

DOEpatents

Schooneveld-Bergmans, Margot Elisabeth Francoise; Heijne, Wilbert Herman Marie; Vlasie, Monica Diana; Damveld, Robbertus Antonius

2015-08-18

The invention relates to a polypeptide comprising the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 73% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional polypeptide and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
On the Role of Aggregation Prone Regions in Protein Evolution, Stability, and Enzymatic Catalysis: Insights from Diverse Analyses

PubMed Central

Buck, Patrick M.; Kumar, Sandeep; Singh, Satish K.

2013-01-01

The various roles that aggregation prone regions (APRs) are capable of playing in proteins are investigated here via comprehensive analyses of multiple non-redundant datasets containing randomly generated amino acid sequences, monomeric proteins, intrinsically disordered proteins (IDPs) and catalytic residues. Results from this study indicate that the aggregation propensities of monomeric protein sequences have been minimized compared to random sequences with uniform and natural amino acid compositions, as observed by a lower average aggregation propensity and fewer APRs that are shorter in length and more often punctuated by gate-keeper residues. However, evidence for evolutionary selective pressure to disrupt these sequence regions among homologous proteins is inconsistent. APRs are less conserved than average sequence identity among closely related homologues (≥80% sequence identity with a parent) but APRs are more conserved than average sequence identity among homologues that have at least 50% sequence identity with a parent. Structural analyses of APRs indicate that APRs are three times more likely to contain ordered versus disordered residues and that APRs frequently contribute more towards stabilizing proteins than equal length segments from the same protein. Catalytic residues and APRs were also found to be in structural contact significantly more often than expected by random chance. Our findings suggest that proteins have evolved by optimizing their risk of aggregation for cellular environments by both minimizing aggregation prone regions and by conserving those that are important for folding and function. In many cases, these sequence optimizations are insufficient to develop recombinant proteins into commercial products. Rational design strategies aimed at improving protein solubility for biotechnological purposes should carefully evaluate the contributions made by candidate APRs, targeted for disruption, towards protein structure and activity. PMID:24146608
TaxI: a software tool for DNA barcoding using distance methods

PubMed Central

Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel

2005-01-01

DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755
Sequence Similarity Presenter: a tool for the graphic display of similarities of long sequences for use in presentations.

PubMed

Fröhlich, K U

1994-04-01

A new method for the presentation of alignments of long sequences is described. The degree of identity for the aligned sequences is averaged for sections of a fixed number of residues. The resulting values are converted to shades of gray, with white corresponding to lack of identity and black corresponding to perfect identity. A sequence alignment is represented as a bar filled with varying shades of gray. The display is compact and allows for a fast and intuitive recognition of the distribution of regions with a high similarity. It is well suited for the presentation of alignments of long sequences, e.g. of protein superfamilies, in plenary lectures. The method is implemented as a HyperCard stack for Apple Macintosh computers. Several options for the modification of the output are available (e.g. background reduction, size of the summation window, consideration of amino acid similarity, inclusion of graphic markers to indicate specific domains). The output is a PostScript file which can be printed, imported as EPS or processed further with Adobe Illustrator.
Isolation of prolactin and growth hormone from the pituitary of the holostean fish Amia calva.

PubMed

Dores, R M; Noso, T; Rand-Weaver, M; Kawauchi, H

1993-06-01

Pituitaries from adult male and female Amia calva (Order Holostei) were acid extracted and fractionated by gel filtration column chromatography and reversed-phase high performance liquid chromatography. This two-step isolation procedure yielded homogeneous pools of Amia prolaction (PRL) and growth hormone (GH). The amino acid composition of both purified polypeptides was determined. Primary sequence analysis of the first 22 positions at the N-terminal of Amia PRL revealed that this region has 63% sequence identity with eel PRL-1. The N-terminal region of Amia PRL lacks the disulfide bridge which is characteristic of tetrapod PRLs. Primary sequence analysis of the first 24 positions at the N-terminal of Amia GH revealed that this region has 62% sequence identity with eel GH and 54% sequence identity with both blue shark GH and sea turtle GH. Based on N-terminal analysis, it appears that Amia PRL and GH are more closely related to teleost PRLs and GHs than they are to tetrapod PRLs and GHs.
Complete mitochondrial genome of the larch hawk moth, Sphinx morio (Lepidoptera: Sphingidae).

PubMed

Kim, Min Jee; Choi, Sei-Woong; Kim, Iksoo

2013-12-01

The larch hawk moth, Sphinx morio, belongs to the lepidopteran family Sphingidae that has long been studied as a family of model insects in a diverse field. In this study, we describe the complete mitochondrial genome (mitogenome) sequences of the species in terms of general genomic features and characteristic short repetitive sequences found in the A + T-rich region. The 15,299-bp-long genome consisted of a typical set of genes (13 protein-coding genes, 2 rRNA genes, and 22 tRNA genes) and one major non-coding A + T-rich region, with the typical arrangement found in Lepidoptera. The 316-bp-long A + T-rich region located between srRNA and tRNA(Met) harbored the conserved sequence blocks that are typically found in lepidopteran insects. Additionally, the A + T-rich region of S. morio contained three characteristic repeat sequences that are rarely found in Lepidoptera: two identical 12-bp repeat, three identical 5-bp-long tandem repeat, and six nearly identical 5-6 bp long repeat sequences.

Lotka-Volterra pairwise modeling fails to capture diverse pairwise microbial interactions

PubMed Central

Momeni, Babak; Xie, Li; Shou, Wenying

2017-01-01

Pairwise models are commonly used to describe many-species communities. In these models, an individual receives additive fitness effects from pairwise interactions with each species in the community ('additivity assumption'). All pairwise interactions are typically represented by a single equation where parameters reflect signs and strengths of fitness effects ('universality assumption'). Here, we show that a single equation fails to qualitatively capture diverse pairwise microbial interactions. We build mechanistic reference models for two microbial species engaging in commonly-found chemical-mediated interactions, and attempt to derive pairwise models. Different equations are appropriate depending on whether a mediator is consumable or reusable, whether an interaction is mediated by one or more mediators, and sometimes even on quantitative details of the community (e.g. relative fitness of the two species, initial conditions). Our results, combined with potential violation of the additivity assumption in many-species communities, suggest that pairwise modeling will often fail to predict microbial dynamics. DOI: http://dx.doi.org/10.7554/eLife.25051.001 PMID:28350295
Detecting Earthquakes over a Seismic Network using Single-Station Similarity Measures

NASA Astrophysics Data System (ADS)

Bergen, Karianne J.; Beroza, Gregory C.

2018-03-01

New blind waveform-similarity-based detection methods, such as Fingerprint and Similarity Thresholding (FAST), have shown promise for detecting weak signals in long-duration, continuous waveform data. While blind detectors are capable of identifying similar or repeating waveforms without templates, they can also be susceptible to false detections due to local correlated noise. In this work, we present a set of three new methods that allow us to extend single-station similarity-based detection over a seismic network; event-pair extraction, pairwise pseudo-association, and event resolution complete a post-processing pipeline that combines single-station similarity measures (e.g. FAST sparse similarity matrix) from each station in a network into a list of candidate events. The core technique, pairwise pseudo-association, leverages the pairwise structure of event detections in its network detection model, which allows it to identify events observed at multiple stations in the network without modeling the expected move-out. Though our approach is general, we apply it to extend FAST over a sparse seismic network. We demonstrate that our network-based extension of FAST is both sensitive and maintains a low false detection rate. As a test case, we apply our approach to two weeks of continuous waveform data from five stations during the foreshock sequence prior to the 2014 Mw 8.2 Iquique earthquake. Our method identifies nearly five times as many events as the local seismicity catalog (including 95% of the catalog events), and less than 1% of these candidate events are false detections.
Diversity among Tacaribe serocomplex viruses (family Arenaviridae) naturally associated with the Mexican woodrat (Neotoma mexicana)

PubMed Central

Cajimat, Maria N. B.; Milazzo, Mary Louise; Borchert, Jeff N.; Abbott, Ken D.; Bradley, Robert D.; Fulhorst, Charles F.

2008-01-01

The results of analyses of glycoprotein precursor and nucleocapsid protein gene sequences indicated that an arenavirus isolated from a Mexican woodrat (Neotoma mexicana) captured in Arizona is a strain of a novel species (proposed name Skinner Tank virus) and that arenaviruses isolated from Mexican woodrats captured in Colorado, New Mexico, and Utah are strains of Whitewater Arroyo virus or species phylogenetically closely related to Whitewater Arroyo virus. Pairwise comparisons of glycoprotein precursor sequences and nucleocapsid protein sequences revealed a high level of divergence among the viruses isolated from the Mexican woodrats captured in Colorado, New Mexico, and Utah and the Whitewater Arroyo virus prototype strain AV 9310135, which originally was isolated from a white-throated woodrat (Neotoma albigula) captured in New Mexico. Conceptually, the viruses from Colorado, New Mexico, and Utah and strain AV 9310135 could be grouped together in a species complex in the family Arenaviridae, genus Arenavirus. PMID:18304671
Orthology detection combining clustering and synteny for very large datasets.

PubMed

Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K; Prohaska, Sonja J; Stadler, Peter F

2014-01-01

The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.
Orthology Detection Combining Clustering and Synteny for Very Large Datasets

PubMed Central

Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K.; Prohaska, Sonja J.; Stadler, Peter F.

2014-01-01

The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets. PMID:25137074
BrucellaBase: Genome information resource.

PubMed

Sankarasubramanian, Jagadesan; Vishnu, Udayakumar S; Khader, L K M Abdul; Sridhar, Jayavel; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash

2016-09-01

Brucella sp. causes a major zoonotic disease, brucellosis. Brucella belongs to the family Brucellaceae under the order Rhizobiales of Alphaproteobacteria. We present BrucellaBase, a web-based platform, providing features of a genome database together with unique analysis tools. We have developed a web version of the multilocus sequence typing (MLST) (Whatmore et al., 2007) and phylogenetic analysis of Brucella spp. BrucellaBase currently contains genome data of 510 Brucella strains along with the user interfaces for BLAST, VFDB, CARD, pairwise genome alignment and MLST typing. Availability of these tools will enable the researchers interested in Brucella to get meaningful information from Brucella genome sequences. BrucellaBase will regularly be updated with new genome sequences, new features along with improvements in genome annotations. BrucellaBase is available online at http://www.dbtbrucellosis.in/brucellabase.html or http://59.99.226.203/brucellabase/homepage.html. Copyright © 2016 Elsevier B.V. All rights reserved.
ITS rDNA sequences of Pomphorhynchus laevis (Zoega in Müller, 1776) and P. lucyi Williams and Rogers, 1984 (Acanthocephala: Palaeacanthocephala).

PubMed

Král'ová-Hromadová, Iva; Tietz, David F; Shinn, Andrew P; Spakulová, Marta

2003-10-01

The internal transcribed spacers (ITS-1 and ITS-2) of the ribosomal RNA gene of Pomphorhynchus laevis (Zoega in Müller, 1776) (Acanthocephala) isolated from various fish species across Central and Southern Europe were compared with those of P. lucyi Williams and Rogers, 1984 collected from the largemouth bass Micropterus salmonoides Boulenger from the USA. The nucleotide sequences of ITS regions of P. laevis from minnows Phoxinus phoxinus (L.) and chub Leuciscus cephalus (L.) from two distant localities in the Slovak Republic were found to be 100% identical. The ITS-1 and ITS-2 of P. laevis from chub from the Czech Republic and Italy were also mutually identical, but significantly different from Slovak worms (88.7% identity for ITS-1, 91.3% identity for ITS-2). A fifth sample collected from Barbus tyberinus Bonaparte from Italy was very similar to the sympatric Italian isolate from chub, possessing four nucleotide substitutions in ITS-1 (98.4% identity). The ITS rDNA sequences of P. lucyi differed significantly from those of P. laevis; the values of identity were 51.8-56.1% for ITS-1 and 63.1-65.3% for ITS-2, and were significantly higher than the range of P. laevis within-species variability. The results based on the ITS sequences confirmed the occurrence of strains in P. laevis from Continental Europe which are well defined by molecules but reveal only slight differences in their morphology.
Analysis of 16S-23S intergenic spacer regions of the rRNA operons in Edwardsiella ictaluri and Edwardsiella tarda isolates from fish.

PubMed

Panangala, V S; van Santen, V L; Shoemaker, C A; Klesius, P H

2005-01-01

To analyse interspecies and intraspecies differences based on the 16S-23S rRNA intergenic spacer region (ISR) sequences of the fish pathogens Edwardsiella ictaluri and Edwardsiella tarda. The 16S-23S rRNA spacer regions of 19 Edw. ictaluri and four Edw. tarda isolates from four geographical regions were amplified by PCR with primers complementary to conserved sequences within the flanking 16S-23S rRNA coding sequences. Two products were generated from all isolates, without interspecies or intraspecific size polymorphisms. Sequence analysis of the amplified fragments revealed a smaller ISR of 350 bp, which contained a gene for tRNA(Glu), and a larger ISR of 441 bp, which contained genes for tRNA(Ile) and tRNA(Ala). The sequences of the smaller ISR of different Edw. ictaluri isolates were essentially identical to each other. Partial sequences of larger ISR from several Edw. ictaluri isolates also revealed no differences from the one complete Edw. ictaluri large ISR sequence obtained. The sequences of the smaller ISR of Edw. tarda were 97% identical to the Edw. ictaluri smaller ISR and the larger ISR were 96-98% identical to the Edw. ictaluri larger ISR sequence. The Edw. tarda isolates displayed limited ISR sequence heterogeneity, with > or =97% sequence identity among isolates for both small and large ISR. There is a high degree of size and sequence similarity of 16S-23S ISR both among isolates within Edw. ictaluri and Edw. tarda species and between the two species. Our results confirm a close genetic relationship between Edw. ictaluri and Edw. tarda and the relative homogeneity of Edw. ictaluri isolates compared with Edw. tarda isolates. Because no differences were found in ISR sequences among Edw. ictaluri isolates, sequence analysis of the ISR will not be useful to distinguish isolates of Edw. ictaluri. However, we identified restriction sites that differ between ISR sequences of Edw. ictaluri and Edw. tarda, which will be useful in distinguishing the two species.
Genetic analysis of Fasciola isolates from cattle in Korea based on second internal transcribed spacer (ITS-2) sequence of nuclear ribosomal DNA.

PubMed

Choe, Se-Eun; Nguyen, Thuy Thi-Dieu; Kang, Tae-Gyu; Kweon, Chang-Hee; Kang, Seung-Won

2011-09-01

Nuclear ribosomal DNA sequence of the second internal transcribed spacer (ITS-2) has been used efficiently to identify the liver fluke species collected from different hosts and various geographic regions. ITS-2 sequences of 19 Fasciola samples collected from Korean native cattle were determined and compared. Sequence comparison including ITS-2 sequences of isolates from this study and reference sequences from Fasciola hepatica and Fasciola gigantica and intermediate Fasciola in Genbank revealed seven identical variable sites of investigated isolates. Among 19 samples, 12 individuals had ITS-2 sequences completely identical to that of pure F. hepatica, five possessed the sequences identical to F. gigantica type, whereas two shared the sequence of both F. hepatica and F. gigantica. No variations in length and nucleotide composition of ITS-2 sequence were observed within isolates that belonged to F. hepatica or F. gigantica. At the position of 218, five Fasciola containing a single-base substitution (C>T) formed a distinct branch inside the F. gigantica-type group which was similar to those of Asian-origin isolates. The phylogenetic tree of the Fasciola spp. based on complete ITS-2 sequences from this study and other representative isolates in different locations clearly showed that pure F. hepatica, F. gigantica type and intermediate Fasciola were observed. The result also provided additional genetic evidence for the existence of three forms of Fasciola isolated from native cattle in Korea by genetic approach using ITS-2 sequence.
First description of Grapevine leafroll-associated virus 5 in Argentina and partial genome sequence.

PubMed

Gómez Talquenca, Sebastián; Muñoz, Claudio; Grau, Oscar; Gracia, Olga

2009-02-01

An accession of Vitis vinifera cv. Red Globe from Argentina, was found to be infected with Grapevine leafroll-associated virus-5 by ELISA. It was partially sequenced, and three ORFs, corresponding to HSP70h, HSP90h, and CP, were found. This isolate shares a high aminoacid identity with the previously reported sequence of the virus, and identities between 80% and 90% with previously reported GLRaV-9 and GLRaV-4 isolates. The analysis of the sequence supports the clustering together with GLRaV-4 and GLRV-9 inside the Ampelovirus genus.
Structures of two Arabidopsis thaliana major latex proteins represent novel helix-grip folds

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lytle, Betsy L.; Song, Jikui; de la Cruz, Norberto B.

2009-06-02

Here we report the first structures of two major latex proteins (MLPs) which display unique structural differences from the canonical Bet v 1 fold described earlier. MLP28 (SwissProt/TrEMBL ID Q9SSK9), the product of gene At1g70830.1, and the At1g24000.1 gene product (Swiss- Prot/TrEMBL ID P0C0B0), proteins which share 32% sequence identity, were independently selected as foldspace targets by the Center for Eukaryotic Structural Genomics. The structure of a single domain (residues 17-173) of MLP28 was solved by NMR spectroscopy, while the full-length At1g24000.1 structure was determined by X-ray crystallography. MLP28 displays greater than 30% sequence identity to at least eight MLPsmore » from other species. For example, the MLP28 sequence shares 64% identity to peach Pp-MLP119 and 55% identity to cucumber Csf2.20 In contrast, the At1g24000.1 sequence is highly divergent (see Fig. 1), containing a gap of 33 amino acids when compared with all other known MLPs. Even when the gap is excluded, the sequence identity with MLPs from other species is less than 30%. Unlike some of the MLPs from other species, none of the A. thaliana MLPs have been characterized biochemically. We show by NMR chemical shift mapping that At1g24000.1 binds progesterone, demonstrating that despite its sequence dissimilarity, the hydrophobic binding pocket is conserved and, therefore, may play a role in its biological function and that of the MLP family in general.« less
First isolation of Rickettsia monacensis from a patient in South Korea.

PubMed

Kim, Yeon-Sook; Choi, Yeon-Joo; Lee, Kyung-Min; Ahn, Kyu-Joong; Kim, Heung-Chul; Klein, Terry; Jiang, Ju; Richards, Allen; Park, Kyung-Hee; Jang, Won-Jong

2017-07-01

A Rickettsia sp. was isolated from the blood of a patient with an acute febrile illness using the shell vial technique; the isolate was named CN45Kr and was identified by molecular assay as Rickettsia monacensis, which was first recognized as a pathogen in Spain. Sequencing analysis showed that the gltA sequence of the isolate was identical to that of Rickettsia sp. IRS3. The ompA-5mp fragment sequence showed 100% identity to those of R. monacensis and Rickettsia sp. In56 and ompA-3pA In56 and 100% identity to that of Rickettsia sp. IRS3. The ompB sequence was found to have 99.9% similarity to that of R. monacensis IrR/Munich. This study confirms the pathogenicity of this agent and provides additional information about its geographic distribution. © 2017 The Societies and John Wiley & Sons Australia, Ltd.
mESAdb: microRNA Expression and Sequence Analysis Database

PubMed Central

Kaya, Koray D.; Karakülah, Gökhan; Yakıcıer, Cengiz M.; Acar, Aybar C.; Konu, Özlen

2011-01-01

microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data. PMID:21177657
Transcription Factor Map Alignment of Promoter Regions

PubMed Central

Blanco, Enrique; Messeguer, Xavier; Smith, Temple F; Guigó, Roderic

2006-01-01

We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments. PMID:16733547
mESAdb: microRNA expression and sequence analysis database.

PubMed

Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen

2011-01-01

microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.
A new begomovirus associated with alpha- and betasatellite molecules isolated from Vernonia cinerea in China.

PubMed

Zulfiqar, Awais; Zhang, Jie; Cui, Xiaofeng; Qian, Yajuan; Zhou, Xueping; Xie, Yan

2012-01-01

A begomovirus disease complex associated with Vernonia cinerea showing yellow vein symptoms was studied. The full-length genomic DNA was comprised of 2739 nucleotides (nt) and contained the typical genome structure of begomoviruses. Comparison analysis showed that it shared the highest (78.9%) nucleotide sequence identity with recently characterized Vernonia yellow vein virus (VeYVV) from India. For associated satellites, betasatellite showed the highest nucleotide sequence identity (52.1%) with Vernonia yellow vein virus betasatellite (VeYVVB) and alphasatellite shared the highest sequence identity (70.7%) with Gossypium mustelinium symptomless alphasatellite (GMusSLA). It is a member of a distinct species with cognate alpha- and betasatellites for which the name Vernonia yellow vein Fujian virus (VeYVFjV) is proposed.
NGSCheckMate: software for validating sample identity in next-generation sequencing studies within and across data types.

PubMed

Lee, Sejoon; Lee, Soohyun; Ouellette, Scott; Park, Woong-Yang; Lee, Eunjung A; Park, Peter J

2017-06-20

In many next-generation sequencing (NGS) studies, multiple samples or data types are profiled for each individual. An important quality control (QC) step in these studies is to ensure that datasets from the same subject are properly paired. Given the heterogeneity of data types, file types and sequencing depths in a multi-dimensional study, a robust program that provides a standardized metric for genotype comparisons would be useful. Here, we describe NGSCheckMate, a user-friendly software package for verifying sample identities from FASTQ, BAM or VCF files. This tool uses a model-based method to compare allele read fractions at known single-nucleotide polymorphisms, considering depth-dependent behavior of similarity metrics for identical and unrelated samples. Our evaluation shows that NGSCheckMate is effective for a variety of data types, including exome sequencing, whole-genome sequencing, RNA-seq, ChIP-seq, targeted sequencing and single-cell whole-genome sequencing, with a minimal requirement for sequencing depth (>0.5X). An alignment-free module can be run directly on FASTQ files for a quick initial check. We recommend using this software as a QC step in NGS studies. https://github.com/parklab/NGSCheckMate. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
GRIL: genome rearrangement and inversion locator.

PubMed

Darling, Aaron E; Mau, Bob; Blattner, Frederick R; Perna, Nicole T

2004-01-01

GRIL is a tool to automatically identify collinear regions in a set of bacterial-size genome sequences. GRIL uses three basic steps. First, regions of high sequence identity are located. Second, some of these regions are filtered based on user-specified criteria. Finally, the remaining regions of sequence identity are used to define significant collinear regions among the sequences. By locating collinear regions of sequence, GRIL provides a basis for multiple genome alignment using current alignment systems. GRIL also provides a basis for using current inversion distance tools to infer phylogeny. GRIL is implemented in C++ and runs on any x86-based Linux or Windows platform. It is available from http://asap.ahabs.wisc.edu/gril
Amino Acid Properties Conserved in Molecular Evolution

PubMed Central

Rudnicki, Witold R.; Mroczek, Teresa; Cudek, Paweł

2014-01-01

That amino acid properties are responsible for the way protein molecules evolve is natural and is also reasonably well supported both by the structure of the genetic code and, to a large extent, by the experimental measures of the amino acid similarity. Nevertheless, there remains a significant gap between observed similarity matrices and their reconstructions from amino acid properties. Therefore, we introduce a simple theoretical model of amino acid similarity matrices, which allows splitting the matrix into two parts – one that depends only on mutabilities of amino acids and another that depends on pairwise similarities between them. Then the new synthetic amino acid properties are derived from the pairwise similarities and used to reconstruct similarity matrices covering a wide range of information entropies. Our model allows us to explain up to 94% of the variability in the BLOSUM family of the amino acids similarity matrices in terms of amino acid properties. The new properties derived from amino acid similarity matrices correlate highly with properties known to be important for molecular evolution such as hydrophobicity, size, shape and charge of amino acids. This result closes the gap in our understanding of the influence of amino acids on evolution at the molecular level. The methods were applied to the single family of similarity matrices used often in general sequence homology searches, but it is general and can be used also for more specific matrices. The new synthetic properties can be used in analyzes of protein sequences in various biological applications. PMID:24967708
Biological, serological and molecular typing of potato virus Y (PVY) isolates from Tunisia.

PubMed

Tayahi, M; Gharsallah, C; Khamassy, N; Fakhfakh, H; Djilani-Khouadja, F

2016-10-17

In Tunisia, potato virus Y (PVY) currently presents a significant threat to potato production, reducing tuber yield and quality. Three hundred and eighty-five potato samples (six different cultivars) collected in autumn 2007 from nine regions in Tunisia were tested for PVY infection by DAS-ELISA. The virus was detected in all regions surveyed, with an average incidence of 80.26%. Subsequently, a panel of 82 Tunisian PVY isolates (PVY-TN) was subjected to systematic biological, serological and molecular typing using immunocapture reverse-transcription polymerase chain reaction and a series of PVY OC - and PVY N -specific monoclonal antibodies. Combined analyses revealed ~67% of PVY NTN variants of which 17 were sequenced in the 5'NTR-P1 region to assess the genetic diversity and phylogenetic relationship of PVY-TN against other worldwide PVY isolates. To investigate whether selective constraints could act on viral genomic RNA, synonymous and non-synonymous substitution rates and their ratio were analyzed. Averages of all pairwise comparisons obtained in the 5'NTR-P1 region allowed more synonymous changes, suggesting selective constraint acting in this region. Selective neutrality test was significantly negative, suggesting a rapid expansion of PVY isolates. Pairwise mismatch distribution gave a bimodal pattern and pointed to an eventually early evolution characterizing these sequences. Genetic haplotype network topology provided evidence of the existence of a distinct geographical structure. This is the first report of such genetic analyses conducted on PVY isolates from Tunisia.

Prevalence and genetic characterization of eimeriid coccidia from feces of black-necked cranes, Grus nigricollis.

PubMed

Liang, Yu; Zhao, ZiJiao; Hu, JunJie; Esch, Gerald W; Peng, MingChun; Liu, Qiong; Chen, JinQing

2018-03-01

Disseminated visceral coccidiosis (DVC) is a widely distributed intestinal and extraintestinal disease of cranes caused by eimeriid coccidia and has lethal pathogenicity to several crane species. Here, feces of 164 black-necked cranes collected in Dashanbao Black-necked Crane National Nature Reserve, China, were examined to determine the prevalence of coccidial oocysts. Of the 164 fecal samples, 76 (46.3%) were positive for oocysts of Eimeria, including E. gruis in 59 (35.9%), E. reichenowi in 52 (31.7%), and E. bosquei in 47 (28.7%) by microscopic observation. Sixty-eight (89.5%) of these positive samples included two or more morphologically identifiable species of Eimeria. The nearly full length 18S rRNA gene (18S rRNA; about 1.8 kb) and partial mitochondrial cytochrome c oxidase I gene (COX1; about 1.3 kb) from oocysts of each morphologically distinct species of Eimeria were amplified, sequenced, and analyzed. BLAST searches using these new 18S rRNA sequences for E. gruis, E. reichenowi, or E. bosquei showed the most similar sequences were those of E. gruis (98.7-99.7% identity), E. reichenowi (97.9-100% identity), or E. gruis (98.6-99.6% identity) isolated from different species of Grus. BLAST searches using the new COX1 sequences for the three species of Eimeria showed that no nucleotide sequences of Eimeria and Isospora coccidia in GenBank have more than 83.0% identity with these species. Identities among the new COX1 sequences were 91.8% for E. gruis and E. reichenowi, 94.5% for E. gruis and E. bosquei, and 91.3% for E. reichenowi and E. bosquei. Phylogenetic analysis based on 18S rRNA or COX1 sequences indicated that Eimeria spp. in black-necked cranes were clustered together with other previously identified Eimeria species from different cranes.
Flexbar 3.0 - SIMD and multicore parallelization.

PubMed

Roehr, Johannes T; Dieterich, Christoph; Reinert, Knut

2017-09-15

High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. https://github.com/seqan/flexbar. johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Fast alignment-free sequence comparison using spaced-word frequencies.

PubMed

Leimeister, Chris-Andre; Boden, Marcus; Horwege, Sebastian; Lindner, Sebastian; Morgenstern, Burkhard

2014-07-15

Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent. To reduce the statistical dependency between adjacent word matches, we propose to use 'spaced words', defined by patterns of 'match' and 'don't care' positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words. Our program is freely available at http://spaced.gobics.de/. © The Author 2014. Published by Oxford University Press.
iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition.

PubMed

Feng, Peng-Mian; Chen, Wei; Lin, Hao; Chou, Kuo-Chen

2013-11-01

Heat shock proteins (HSPs) are a type of functionally related proteins present in all living organisms, both prokaryotes and eukaryotes. They play essential roles in protein-protein interactions such as folding and assisting in the establishment of proper protein conformation and prevention of unwanted protein aggregation. Their dysfunction may cause various life-threatening disorders, such as Parkinson's, Alzheimer's, and cardiovascular diseases. Based on their functions, HSPs are usually classified into six families: (i) HSP20 or sHSP, (ii) HSP40 or J-class proteins, (iii) HSP60 or GroEL/ES, (iv) HSP70, (v) HSP90, and (vi) HSP100. Although considerable progress has been achieved in discriminating HSPs from other proteins, it is still a big challenge to identify HSPs among their six different functional types according to their sequence information alone. With the avalanche of protein sequences generated in the post-genomic age, it is highly desirable to develop a high-throughput computational tool in this regard. To take up such a challenge, a predictor called iHSP-PseRAAAC has been developed by incorporating the reduced amino acid alphabet information into the general form of pseudo amino acid composition. One of the remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimension disaster or overfitting problem in statistical prediction. It was observed that the overall success rate achieved by iHSP-PseRAAAC in identifying the functional types of HSPs among the aforementioned six types was more than 87%, which was derived by the jackknife test on a stringent benchmark dataset in which none of HSPs included has ≥40% pairwise sequence identity to any other in the same subset. It has not escaped our notice that the reduced amino acid alphabet approach can also be used to investigate other protein classification problems. As a user-friendly web server, iHSP-PseRAAAC is accessible to the public at http://lin.uestc.edu.cn/server/iHSP-PseRAAAC. Copyright © 2013 Elsevier Inc. All rights reserved.
A novel psittacine adenovirus identified during an outbreak of avian chlamydiosis and human psittacosis: zoonosis associated with virus-bacterium coinfection in birds.

PubMed

To, Kelvin K W; Tse, Herman; Chan, Wan-Mui; Choi, Garnet K Y; Zhang, Anna J X; Sridhar, Siddharth; Wong, Sally C Y; Chan, Jasper F W; Chan, Andy S F; Woo, Patrick C Y; Lau, Susanna K P; Lo, Janice Y C; Chan, Kwok-Hung; Cheng, Vincent C C; Yuen, Kwok-Yung

2014-12-01

Chlamydophila psittaci is found worldwide, but is particularly common among psittacine birds in tropical and subtropical regions. While investigating a human psittacosis outbreak that was associated with avian chlamydiosis in Hong Kong, we identified a novel adenovirus in epidemiologically linked Mealy Parrots, which was not present in healthy birds unrelated to the outbreak or in other animals. The novel adenovirus (tentatively named Psittacine adenovirus HKU1) was most closely related to Duck adenovirus A in the Atadenovirus genus. Sequencing showed that the Psittacine adenovirus HKU1 genome consists of 31,735 nucleotides. Comparative genome analysis showed that the Psittacine adenovirus HKU1 genome contains 23 open reading frames (ORFs) with sequence similarity to known adenoviral genes, and six additional ORFs at the 3' end of the genome. Similar to Duck adenovirus A, the novel adenovirus lacks LH1, LH2 and LH3, which distinguishes it from other viruses in the Atadenovirus genus. Notably, fiber-2 protein, which is present in Aviadenovirus but not Atadenovirus, is also present in Psittacine adenovirus HKU1. Psittacine adenovirus HKU1 had pairwise amino acid sequence identities of 50.3-54.0% for the DNA polymerase, 64.6-70.7% for the penton protein, and 66.1-74.0% for the hexon protein with other Atadenovirus. The C. psittaci bacterial load was positively correlated with adenovirus viral load in the lung. Immunostaining for fiber protein expression was positive in lung and liver tissue cells of affected parrots, confirming active viral replication. No other viruses were found. This is the first documentation of an adenovirus-C. psittaci co-infection in an avian species that was associated with a human outbreak of psittacosis. Viral-bacterial co-infection often increases disease severity in both humans and animals. The role of viral-bacterial co-infection in animal-to-human transmission of infectious agents has not received sufficient attention and should be emphasized in the investigation of disease outbreaks in human and animals.
A Novel Psittacine Adenovirus Identified During an Outbreak of Avian Chlamydiosis and Human Psittacosis: Zoonosis Associated with Virus-Bacterium Coinfection in Birds

PubMed Central

Chan, Wan-Mui; Choi, Garnet K. Y.; Zhang, Anna J. X.; Sridhar, Siddharth; Wong, Sally C. Y.; Chan, Jasper F. W.; Chan, Andy S. F.; Woo, Patrick C. Y.; Lau, Susanna K. P.; Lo, Janice Y. C.; Chan, Kwok-Hung; Cheng, Vincent C. C.; Yuen, Kwok-Yung

2014-01-01

Chlamydophila psittaci is found worldwide, but is particularly common among psittacine birds in tropical and subtropical regions. While investigating a human psittacosis outbreak that was associated with avian chlamydiosis in Hong Kong, we identified a novel adenovirus in epidemiologically linked Mealy Parrots, which was not present in healthy birds unrelated to the outbreak or in other animals. The novel adenovirus (tentatively named Psittacine adenovirus HKU1) was most closely related to Duck adenovirus A in the Atadenovirus genus. Sequencing showed that the Psittacine adenovirus HKU1 genome consists of 31,735 nucleotides. Comparative genome analysis showed that the Psittacine adenovirus HKU1 genome contains 23 open reading frames (ORFs) with sequence similarity to known adenoviral genes, and six additional ORFs at the 3′ end of the genome. Similar to Duck adenovirus A, the novel adenovirus lacks LH1, LH2 and LH3, which distinguishes it from other viruses in the Atadenovirus genus. Notably, fiber-2 protein, which is present in Aviadenovirus but not Atadenovirus, is also present in Psittacine adenovirus HKU1. Psittacine adenovirus HKU1 had pairwise amino acid sequence identities of 50.3–54.0% for the DNA polymerase, 64.6–70.7% for the penton protein, and 66.1–74.0% for the hexon protein with other Atadenovirus. The C. psittaci bacterial load was positively correlated with adenovirus viral load in the lung. Immunostaining for fiber protein expression was positive in lung and liver tissue cells of affected parrots, confirming active viral replication. No other viruses were found. This is the first documentation of an adenovirus-C. psittaci co-infection in an avian species that was associated with a human outbreak of psittacosis. Viral-bacterial co-infection often increases disease severity in both humans and animals. The role of viral-bacterial co-infection in animal-to-human transmission of infectious agents has not received sufficient attention and should be emphasized in the investigation of disease outbreaks in human and animals. PMID:25474263
The Pinus taeda genome is characterized by diverse and highly diverged repetitive sequences

PubMed Central

2010-01-01

Background In today's age of genomic discovery, no attempt has been made to comprehensively sequence a gymnosperm genome. The largest genus in the coniferous family Pinaceae is Pinus, whose 110-120 species have extremely large genomes (c. 20-40 Gb, 2N = 24). The size and complexity of these genomes have prompted much speculation as to the feasibility of completing a conifer genome sequence. Conifer genomes are reputed to be highly repetitive, but there is little information available on the nature and identity of repetitive units in gymnosperms. The pines have extensive genetic resources, with approximately 329000 ESTs from eleven species and genetic maps in eight species, including a dense genetic map of the twelve linkage groups in Pinus taeda. Results We present here the Sanger sequence and annotation of ten P. taeda BAC clones and Genome Analyzer II whole genome shotgun (WGS) sequences representing 7.5% of the genome. Computational annotation of ten BACs predicts three putative protein-coding genes and at least fifteen likely pseudogenes in nearly one megabase of sequence. We found three conifer-specific LTR retroelements in the BACs, and tentatively identified at least 15 others based on evidence from the distantly related angiosperms. Alignment of WGS sequences to the BACs indicates that 80% of BAC sequences have similar copies (≥ 75% nucleotide identity) elsewhere in the genome, but only 23% have identical copies (99% identity). The three most common repetitive elements in the genome were identified and, when combined, represent less than 5% of the genome. Conclusions This study indicates that the majority of repeats in the P. taeda genome are 'novel' and will therefore require additional BAC or genomic sequencing for accurate characterization. The pine genome contains a very large number of diverged and probably defunct repetitive elements. This study also provides new evidence that sequencing a pine genome using a WGS approach is a feasible goal. PMID:20609256
OrthoANI: An improved algorithm and software for calculating average nucleotide identity.

PubMed

Lee, Imchang; Ouk Kim, Yeong; Park, Sang-Cheol; Chun, Jongsik

2016-02-01

Species demarcation in Bacteria and Archaea is mainly based on overall genome relatedness, which serves a framework for modern microbiology. Current practice for obtaining these measures between two strains is shifting from experimentally determined similarity obtained by DNA-DNA hybridization (DDH) to genome-sequence-based similarity. Average nucleotide identity (ANI) is a simple algorithm that mimics DDH. Like DDH, ANI values between two genome sequences may be different from each other when reciprocal calculations are compared. We compared 63 690 pairs of genome sequences and found that the differences in reciprocal ANI values are significantly high, exceeding 1 % in some cases. To resolve this problem of not being symmetrical, a new algorithm, named OrthoANI, was developed to accommodate the concept of orthology for which both genome sequences were fragmented and only orthologous fragment pairs taken into consideration for calculating nucleotide identities. OrthoANI is highly correlated with ANI (using BLASTn) and the former showed approximately 0.1 % higher values than the latter. In conclusion, OrthoANI provides a more robust and faster means of calculating average nucleotide identity for taxonomic purposes. The standalone software tools are freely available at http://www.ezbiocloud.net/sw/oat.
Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

PubMed

Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

2012-08-01

Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or <0.5% or >15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.
Three-gene identity coefficients demonstrate that clonal reproduction promotes inbreeding and spatial relatedness in yellow-cedar, Callitropsis nootkatensis.

PubMed

Thompson, Stacey Lee; Bérubé, Yanik; Bruneau, Anne; Ritland, Kermit

2008-10-01

Asexual reproduction has the potential to promote population structuring through matings between clones as well as through limited dispersal of related progeny. Here we present an application of three-gene identity coefficients that tests whether clonal reproduction promotes inbreeding and spatial relatedness within populations. With this method, the first two genes are sampled to estimate pairwise relatedness or inbreeding, whereas the third gene is sampled from either a clone or a sexually derived individual. If three-gene coefficients are significantly greater for clones than nonclones, then clonality contributes excessively to genetic structure. First, we describe an estimator of three-gene identity and briefly evaluate its properties. We then use this estimator to test the effect of clonality on the genetic structure within populations of yellow-cedar (Callitropsis nootkatensis) using a molecular marker survey. Five microsatellite loci were genotyped for 485 trees sampled from nine populations. Our three-gene analyses show that clonal ramets promote inbreeding and spatial structure in most populations. Among-population correlations between clonal extent and genetic structure generally support these trends, yet with less statistical significance. Clones appear to contribute to genetic structure through the limited dispersal of offspring from replicated ramets of the same clonal genet, whereas this structure is likely maintained by mating among these relatives.
Nucleotide sequence of a chickpea chlorotic stunt virus relative that infects pea and faba bean in China.

PubMed

Zhou, Cui-Ji; Xiang, Hai-Ying; Zhuo, Tao; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

2012-07-01

We determined the genome sequence of a new polerovirus that infects field pea and faba bean in China. Its entire nucleotide sequence (6021 nt) was most closely related (83.3% identity) to that of an Ethiopian isolate of chickpea chlorotic stunt virus (CpCSV-Eth). With the exception of the coat protein (encoded by ORF3), amino acid sequence identities of all gene products of this virus to those of CpCSV-Eth and other poleroviruses were <90%. This suggests that it is a new member of the genus Polerovirus, and the name pea mild chlorosis virus is proposed.
Cytochrome b gene reveals panmixia among Japanese Threadfin Bream, Nemipterus japonicus (Bloch, 1791) populations along the coasts of Peninsular Malaysia and provides evidence of a cryptic species.

PubMed

Lim, Hong-Chiun; Ahmad, Abu Talib; Nuruddin, Ahmad Adnan; Mohd Nor, Siti Azizah

2016-01-01

We evaluated genetic differentiation among ten presumed Japanese threadfin bream, Nemipterus japonicus populations along the coast of Peninsular Malaysia based on the partial sequence of the mitochondrial cytochrome b gene (982 bp). Genetic divergences (Kimura-2 parameter) ranged from 0.5% to 0.8% among nine of the ten populations while these nine populations were 4.4% to 4.6% diverged from the Kuala Besar population located at the Northeast coast. The constructed Neighbour Joining (NJ) phylogenetic trees based on haplotypes showed the Kuala Besar population forming an isolated cluster. The Analysis of Molecular Variance (AMOVA) of the ten populations a priori assigned into four regions, revealed that most of the variation occurred within population with a fairly low but significant level of regional differentiation (FST = 0.07, p < 0.05, FSC = 0.00, p > 0.05 and FCT = 0.07, p < 0.05) attributed to the Kuala Besar population. p Value after Bonferroni correction revealed that only pairwise FST values involving the Kuala Besar population with the other nine populations were significant. Thus, this study revealed that the N. japonicus populations off Peninsular Malaysia were panmictic. However, the Kuala Besar population, although morphologically identical was composed of a genetically discrete taxon from the rest. These findings are important contributions in formulating sustainable fishery management policies for this important fishery in Peninsular Malaysia.
Carbohydrate degrading polypeptide and uses thereof

DOEpatents

Sagt, Cornelis Maria Jacobus; Schooneveld-Bergmans, Margot Elisabeth Francoise; Roubos, Johannes Andries; Los, Alrik Pieter

2015-10-20

The invention relates to a polypeptide having carbohydrate material degrading activity which comprises the amino acid sequence set out in SEQ ID NO: 2 or an amino acid sequence encoded by the nucleotide sequence of SEQ ID NO: 1 or SEQ ID NO: 4, or a variant polypeptide or variant polynucleotide thereof, wherein the variant polypeptide has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2 or the variant polynucleotide encodes a polypeptide that has at least 96% sequence identity with the sequence set out in SEQ ID NO: 2. The invention features the full length coding sequence of the novel gene as well as the amino acid sequence of the full-length functional protein and functional equivalents of the gene or the amino acid sequence. The invention also relates to methods for using the polypeptide in industrial processes. Also included in the invention are cells transformed with a polynucleotide according to the invention suitable for producing these proteins.
Identification of four novel HLA-B alleles, B*1590, B*1591, B*2726, and B*4705, from an East African population by high-resolution sequence-based typing.

PubMed

Luo, M; Mao, X; Plummer, F A

2005-02-01

We report here four novel HLA-B alleles, B*1590, B*1591, B*2726, and B*4705, identified from an East African population during sequence-based HLA-B typing. The novel alleles were confirmed by sequencing two separate polymerase chain reaction products, and by molecular cloning and sequencing multiple clones. B*1590 is identical to B*1510 at exon 2 and exon 3, except for a difference (GCCGTC) at codon 158. Sequence differences at codon 152 (GAGGTG) and codon 167 (TGGTCG) differentiate B*1591 from B*1503 at exon 3. B*2726 is identical to B*2708 at exon 2 and exon 3, except for a difference (AAGCAG) at codon 70. B*4705 was identified in three Kenyan women. The allele is identical to B*47010101/02 at exon 2 and exon 3, except for differences at codon 97 (AGGAAT) and codon 99 (TTTTAT). These new alleles have been named by the WHO Nomenclature Committee. Identification of these novel HLA-B alleles reflects the genetic diversity of this East African population.
Phylogenetic characterization of Canine Parvovirus VP2 partial sequences from symptomatic dogs samples.

PubMed

Zienius, D; Lelešius, R; Kavaliauskis, H; Stankevičius, A; Šalomskas, A

2016-01-01

The aim of the present study was to detect canine parvovirus (CPV) from faecal samples of clinically ill domestic dogs by polymerase chain reaction (PCR) followed by VP2 gene partial sequencing and molecular characterization of circulating strains in Lithuania. Eleven clinically and antigen-tested positive dog faecal samples, collected during the period of 2014-2015, were investigated by using PCR. The phylogenetic investigations indicated that the Lithuanian CPV VP2 partial sequences (3025-3706 cds) were closely related and showed 99.0-99.9% identity. All Lithuanian sequences were associated with one phylogroup, but grouped in different clusters. Ten of investigated Lithuanian CPV VP2 sequences were closely associated with CPV 2a antigenic variant (99.4% nt identity). Five CPV VP2 sequences from Lithuania were related to CPV-2a, but were rather divergent (6.8 nt differences). Only one CPV VP2 sequence from Lithuania was associated (99.3% nt identity) with CPV-2b VP2 sequences from France, Italy, USA and Korea. The four of eleven investigated Lithuanian dogs with CPV infection symptoms were vaccinated with CPV-2 vaccine, but their VP2 sequences were phylogenetically distantly associated with CPV vaccine strains VP2 sequences (11.5-15.8 nt differences). Ten Lithuanian CPV VP2 sequences had monophyletic relations among the close geographically associated samples, but five of them were rather divergent (1.0% less sequence similarity). The one Lithuanian CPV VP2 sequence was closely related with CPV-2b antigenic variant. All the Lithuanian CPV VP2 partial sequences were conservative and phylogenetically low associated with most commonly used CPV vaccine strains.
Complementary DNA sequencing and identification of mRNAs from the venomous gland of Agkistrodon piscivorus leucostoma.

PubMed

Jia, Ying; Cantu, Bruno A; Sánchez, Elda E; Pérez, John C

2008-06-15

To advance our knowledge on the snake venom composition and transcripts expressed in venom gland at the molecular level, we constructed a cDNA library from the venom gland of Agkistrodon piscivorus leucostoma for the generation of expressed sequence tags (ESTs) database. From the randomly sequenced 2112 independent clones, we have obtained ESTs for 1309 (62%) cDNAs, which showed significant deduced amino acid sequence similarity (scores >80) to previously characterized proteins in National Center for Biotechnology Information (NCBI) database. Ribosomal proteins make up 47 clones (2%) and the remaining 756 (36%) cDNAs represent either unknown identity or show BLASTX sequence identity scores of <80 with known GenBank accessions. The most highly expressed gene encoding phospholipase A(2) (PLA(2)) accounting for 35% of A. p. leucostoma venom gland cDNAs was identified and further confirmed by crude venom applied to sodium dodecyl sulfate/polyacrylamide gel electrophoresis (SDS-PAGE) electrophoresis and protein sequencing. A total of 180 representative genes were obtained from the sequence assemblies and deposited to EST database. Clones showing sequence identity to disintegrins, thrombin-like enzymes, hemorrhagic toxins, fibrinogen clotting inhibitors and plasminogen activators were also identified in our EST database. These data can be used to develop a research program that will help us identify genes encoding proteins that are of medical importance or proteins involved in the mechanisms of the toxin venom.
Identification of an anaerobic bacterium which reduces perchlorate and chlorate as Wolinella succinogenes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wallace, W.; Attaway, H.

1995-12-31

Perchlorate and chlorate salts are widely used by the chemical, aerospace and defense industries as oxidizers in propellant, explosives and pyrotechnics. The authors have isolated a anaerobic bacterium which is capable of the dissimilatory reduction of both perchlorate and chlorate for energy and growth. Strain HAP-1 is a gram negative, thin rod, non-sporeforming, highly motile strict anaerobe. Antibiotic resistance profiles, utilization of carbon substrates and electron acceptors demonstrated similar physiological characteristics to Wolinella succinogenes. Pairwise comparisons of 16S RNA sequences showed only a 0.75% divergence between strain HAP-1 and W. succinogenes. Physiological, morphological and 16S RRNA sequence data indicate strainmore » HAP-1 is a subspecies of W. succinogenes that can utilize perchlorate and chlorate as terminal electron acceptors.« less
Infection of Taenia asiatica in a Bai Person in Dali, China.

PubMed

Wang, Li; Luo, Xuenong; Hou, Junling; Guo, Aijiang; Zhang, Shaohua; Li, Hailong; Cai, Xuepeng

2016-02-01

We report here a human case of Taenia asiatica infection which was confirmed by genetic analyses in Dali, China. A patient was found to have symptoms of taeniasis with discharge of tapeworm proglottids. By sequencing of the mitochondrial cytochrome c oxidase subunit 1 (cox1) gene, we observed nucleotide sequence identity of 99% with T. asiatica and 96% with T. saginata. Using the cytochrome b (cytb) gene, 99% identity with T. asiatica and 96% identity with T. saginata were found. Our findings suggest that taeniasis of people in Dali, China may be mainly caused by T. asiatica.
Infection of Taenia asiatica in a Bai Person in Dali, China

PubMed Central

Wang, Li; Luo, Xuenong; Hou, Junling; Guo, Aijiang; Zhang, Shaohua; Li, Hailong; Cai, Xuepeng

2016-01-01

We report here a human case of Taenia asiatica infection which was confirmed by genetic analyses in Dali, China. A patient was found to have symptoms of taeniasis with discharge of tapeworm proglottids. By sequencing of the mitochondrial cytochrome c oxidase subunit 1 (cox1) gene, we observed nucleotide sequence identity of 99% with T. asiatica and 96% with T. saginata. Using the cytochrome b (cytb) gene, 99% identity with T. asiatica and 96% identity with T. saginata were found. Our findings suggest that taeniasis of people in Dali, China may be mainly caused by T. asiatica. PMID:26951981
Tertiary Structural studies of Myotoxin a from Crotalus viridis viridis Venom by Nuclear Magnetic Resonance

DTIC Science & Technology

1993-05-01

in real time. RMSDs were calculated only to a single structure on which the others were then superimposed. To get a pairwise listing of RMSDs, a group...to fix the chirality, minimize and anneal in 4-D (if necessary) an increasing number of residues until the entire structure is treated as one get /sym...nstr "Number of structures to create: get /sym refseq "Sequence to use: . get /sym refbmx "Bounds matrix to use: get /sym fname "Filename for written

The prediction of biogenic magnetic nanoparticles biomineralization in human tissues and organs

NASA Astrophysics Data System (ADS)

Medviediev, O.; Gorobets, O. Yu; Gorobets, S. V.; Yadrykhins'ky, V. S.

2017-10-01

In this study, human homologs of magnetosome island proteins basing on pairwise and multiple alignment of amino acid sequences were found. The expression levels of genes, which encode magnetosome island proteins of M. gryphiswaldense MSR-1, that were cultured under oxygen deficiency conditions and also under microaerobic conditions were compared to the expression levels of genes that encode the relevant homologs in human organism. The possibility of BMN biomineralization in human tissues and organs, in which BMN were not experimentally found before, was predicted.
Precursors of vertebrate peptide antibiotics dermaseptin b and adenoregulin have extensive sequence identities with precursors of opioid peptides dermorphin, dermenkephalin, and deltorphins.

PubMed

Amiche, M; Ducancel, F; Mor, A; Boulain, J C; Menez, A; Nicolas, P

1994-07-08

The dermaseptins are a family of broad spectrum antimicrobial peptides, 27-34 amino acids long, involved in the defense of the naked skin of frogs against microbial invasion. They are the first vertebrate peptides to show lethal effects against the filamentous fungi responsible for severe opportunistic infections accompanying immunodeficiency syndrome and the use of immunosuppressive agents. A cDNA library was constructed from skin poly(A+) RNA of the arboreal frog Phyllomedusa bicolor and screened with an oligonucleotide probe complementary to the COOH terminus of dermaseptin b. Several clones contained a full-length DNA copy of a 443-nucleotide mRNA that encoded a 78-residue dermaseptin b precursor protein. The deduced precursor contained a putative signal sequence at the NH2 terminus, a 20-residue spacer sequence extremely rich (60%) in glutamic and aspartic acids, and a single copy of a dermaseptin b progenitor sequence at the COOH terminus. One clone contained a complete copy of adenoregulin, a 33-residue peptide reported to enhance the binding of agonists to the A1 adenosine receptor. The mRNAs encoding adenoregulin and dermaseptin b were very similar: 70 and 75% nucleotide identities between the 5'- and 3'-untranslated regions, respectively; 91% amino acid identity between the signal peptides; 82% identity between the acidic spacer sequences; and 38% identity between adenoregulin and dermaseptin b. Because adenoregulin and dermaseptin b have similar precursor designs and antimicrobial spectra, adenoregulin should be considered as a new member of the dermaseptin family and alternatively named dermaseptin b II. Preprodermaseptin b and preproadenoregulin have considerable sequence identities to the precursors encoding the opioid heptapeptides dermorphin, dermenkephalin, and deltorphins. This similarity extended into the 5'-untranslated regions of the mRNAs. These findings suggest that the genes encoding the four preproproteins are all members of the same family despite the fact that they encode end products having very different biological activities. These genes might contain a homologous export exon comprising the 5'-untranslated region, the 22-residue signal peptide, the 20-24-residue acidic spacer, and the basic pair Lys-Arg.
Application of Genotyping during an Extensive Outbreak of Waterborne Giardiasis in Bergen, Norway, during Autumn and Winter 2004†

PubMed Central

Robertson, L. J.; Hermansen, L.; Gjerde, B. K.; Strand, E.; Alvsvåg, J. O.; Langeland, N.

2006-01-01

During the autumn and winter of 2004 and 2005, an extensive outbreak of waterborne giardiasis occurred in Bergen, Norway. Over 1,500 patients were diagnosed with giardiasis. Analysis of water from the implicated source revealed low numbers of Giardia cysts, but the initial contamination event probably occurred up to 10 weeks previously. While sewage leakage from a residential area is now considered to be the probable source of contamination, during the episode waste from one particular septic tank was thought to be a possible source. Genotyping of cysts from the septic tank demonstrated that they were assemblage A cysts, although the sequences were not identical to any previously published sequences. For the β-giardin gene, the closest published subgenotype was subgenotype A3; for the gdh gene, the closest published subgenotype was subgenotype A2. Genotyping of cysts from 21 patient samples revealed that they were assemblage B cysts; thus, the septic tank was unlikely to be the contamination source. Sequencing of the β-giardin and gdh genes from patient samples and a comparison of the sequences gave complex results. For the β-giardin gene, three isolates had sequences identical to subgenotype B3 sequences. However, other isolates had between one and four single-nucleotide polymorphisms (SNPs). For the gdh gene, none of the sequences were identical to the sequence published for subgenotype B3, and the sequences had between one and three SNPs. One isolate, which was identical to subgenotype B3 at the β-giardin gene, was more similar to subgenotype B2 at the gdh gene. Grouping the isolates on the basis of SNPs resulted in different groups for the two genes. The results are discussed in relation to giardiasis in Norway and to other Giardia genotyping studies. PMID:16517674
Joint Effect of Habitat Identity and Spatial Distance on Spiders' Community Similarity in a Fragmented Transition Zone.

PubMed

Gavish, Yoni; Ziv, Yaron

2016-01-01

Understanding the main processes that affect community similarity have been the focus of much ecological research. However, the relative effects of environmental and spatial aspects in structuring ecological communities is still unresolved and is probably scale-dependent. Here, we examine the effect of habitat identity and spatial distance on fine-grained community similarity within a biogeographic transition zone. We compared four hypotheses: i) habitat identity alone, ii) spatial proximity alone, iii) non-interactive effects of both habitat identity and spatial proximity, and iv) interactive effect of habitat identity and spatial proximity. We explored these hypotheses for spiders in three fragmented landscapes located along the sharp climatic gradient of Southern Judea Lowlands (SJL), Israel. We sampled 14,854 spiders (from 199 species or morphospecies) in 644 samples, taken in 35 patches and stratified to nine different habitats. We calculated the Bray-Curtis similarity between all samples-pairs. We divided the pairwise values to four functional distance categories (same patch, different patches from the same landscape, adjacent landscapes and distant landscapes) and two habitat categories (same or different habitats) and compared them using non-parametric MANOVA. A significant interaction between habitat identity and spatial distance was found, such that the difference in mean similarity between same-habitat pairs and different-habitat pairs decreases with spatial distance. Additionally, community similarity decayed with spatial distance. Furthermore, at all distances, same-habitat pairs had higher similarity than different-habitats pairs. Our results support the fourth hypothesis of interactive effect of habitat identity and spatial proximity. We suggest that the environmental complexity of habitats or increased habitat specificity of species near the edge of their distribution range may explain this pattern. Thus, in transitions zones care should be taken when using habitats as surrogate of community composition in conservation planning since similar habitats in different locations are more likely to support different communities.
Identifying novel sequence variants of RNA 3D motifs

PubMed Central

Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

2015-01-01

Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723
Evolutionary distances in the twilight zone--a rational kernel approach.

PubMed

Schwarz, Roland F; Fletcher, William; Förster, Frank; Merget, Benjamin; Wolf, Matthias; Schultz, Jörg; Markowetz, Florian

2010-12-31

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU.

PubMed

Zhang, Jing; Wang, Hao; Feng, Wu-Chun

2017-01-01

BLAST, short for Basic Local Alignment Search Tool, is a ubiquitous tool used in the life sciences for pairwise sequence search. However, with the advent of next-generation sequencing (NGS), whether at the outset or downstream from NGS, the exponential growth of sequence databases is outstripping our ability to analyze the data. While recent studies have utilized the graphics processing unit (GPU) to speedup the BLAST algorithm for searching protein sequences (i.e., BLASTP), these studies use coarse-grained parallelism, where one sequence alignment is mapped to only one thread. Such an approach does not efficiently utilize the capabilities of a GPU, particularly due to the irregularity of BLASTP in both execution paths and memory-access patterns. To address the above shortcomings, we present a fine-grained approach to parallelize BLASTP, where each individual phase of sequence search is mapped to many threads on a GPU. This approach, which we refer to as cuBLASTP, reorders data-access patterns and reduces divergent branches of the most time-consuming phases (i.e., hit detection and ungapped extension). In addition, cuBLASTP optimizes the remaining phases (i.e., gapped extension and alignment with trace back) on a multicore CPU and overlaps their execution with the phases running on the GPU.
Characterization of gonadotrophin-releasing hormone precursor cDNA in the Old World mole-rat Cryptomys hottentotus pretoriae: high degree of identity with the New World guinea pig sequence.

PubMed

Kalamatianos, T; du Toit, L; Hrabovszky, E; Kalló, I; Marsh, P J; Bennett, N C; Coen, C W

2005-05-01

Regulation of pituitary gonadotrophins by the decapeptide gonadotrophin-releasing hormone 1 (GnRH1) is crucial for the development and maintenance of reproductive functions. A common amino acid sequence for this decapeptide, designated as 'mammalian' GnRH, has been identified in all mammals thus far investigated with the exception of the guinea pig, in which there are two amino acid substitutions. Among hystricognath rodents, the members of the family Bathyergidae regulate reproduction in response to diverse cues. Thus, highveld mole-rats (Cryptomys hottentotus pretoriae) are social bathyergids in which breeding is restricted to a particular season in the dominant female, but continuously suppressed in subordinate colony members. Elucidation of reproductive control in these animals will be facilitated by characterization of their GnRH1 gene. A partial sequence of GnRH1 precursor cDNA was isolated and characterized. Comparative analysis revealed the highest degree of identity (86%) to guinea pig GnRH1 precursor mRNA. Nevertheless, the deduced amino acid sequence of the mole-rat decapeptide is identical to the 'mammalian' sequence rather than that of guinea pigs. Successful detection of GnRH1-synthesizing neurones using either a guinea pig GnRH1 riboprobe or an antibody against the 'mammalian' decapeptide is consistent with the guinea pig-like sequence for the precursor and the classic 'mammalian' form for the decapeptide. The high degree of identity in the GnRH1 precursor sequence between this Old World mole-rat and the New World guinea pig is consistent with the theory that caviomorphs and phiomorphs originated from a common ancestral line in the Palaeocene to mid Eocene, some 63-45 million years ago.
Microbial genomic taxonomy

PubMed Central

2013-01-01

A need for a genomic species definition is emerging from several independent studies worldwide. In this commentary paper, we discuss recent studies on the genomic taxonomy of diverse microbial groups and a unified species definition based on genomics. Accordingly, strains from the same microbial species share >95% Average Amino Acid Identity (AAI) and Average Nucleotide Identity (ANI), >95% identity based on multiple alignment genes, <10 in Karlin genomic signature, and > 70% in silico Genome-to-Genome Hybridization similarity (GGDH). Species of the same genus will form monophyletic groups on the basis of 16S rRNA gene sequences, Multilocus Sequence Analysis (MLSA) and supertree analysis. In addition to the established requirements for species descriptions, we propose that new taxa descriptions should also include at least a draft genome sequence of the type strain in order to obtain a clear outlook on the genomic landscape of the novel microbe. The application of the new genomic species definition put forward here will allow researchers to use genome sequences to define simultaneously coherent phenotypic and genomic groups. PMID:24365132
Complete Nucleotide Sequence of Watermelon Chlorotic Stunt Virus Originating from Oman

PubMed Central

Khan, Akhtar J.; Akhtar, Sohail; Briddon, Rob W.; Ammara, Um; Al-Matrooshi, Abdulrahman M.; Mansoor, Shahid

2012-01-01

Watermelon chlorotic stunt virus (WmCSV) is a bipartite begomovirus (genus Begomovirus, family Geminiviridae) that causes economic losses to cucurbits, particularly watermelon, across the Middle East and North Africa. Recently squash (Cucurbita moschata) grown in an experimental field in Oman was found to display symptoms such as leaf curling, yellowing and stunting, typical of a begomovirus infection. Sequence analysis of the virus isolated from squash showed 97.6–99.9% nucleotide sequence identity to previously described WmCSV isolates for the DNA A component and 93–98% identity for the DNA B component. Agrobacterium-mediated inoculation to Nicotiana benthamiana resulted in the development of symptoms fifteen days post inoculation. This is the first bipartite begomovirus identified in Oman. Overall the Oman isolate showed the highest levels of sequence identity to a WmCSV isolate originating from Iran, which was confirmed by phylogenetic analysis. This suggests that WmCSV present in Oman has been introduced from Iran. The significance of this finding is discussed. PMID:22852046
Complete nucleotide sequence of watermelon chlorotic stunt virus originating from Oman.

PubMed

Khan, Akhtar J; Akhtar, Sohail; Briddon, Rob W; Ammara, Um; Al-Matrooshi, Abdulrahman M; Mansoor, Shahid

2012-07-01

Watermelon chlorotic stunt virus (WmCSV) is a bipartite begomovirus (genus Begomovirus, family Geminiviridae) that causes economic losses to cucurbits, particularly watermelon, across the Middle East and North Africa. Recently squash (Cucurbita moschata) grown in an experimental field in Oman was found to display symptoms such as leaf curling, yellowing and stunting, typical of a begomovirus infection. Sequence analysis of the virus isolated from squash showed 97.6-99.9% nucleotide sequence identity to previously described WmCSV isolates for the DNA A component and 93-98% identity for the DNA B component. Agrobacterium-mediated inoculation to Nicotiana benthamiana resulted in the development of symptoms fifteen days post inoculation. This is the first bipartite begomovirus identified in Oman. Overall the Oman isolate showed the highest levels of sequence identity to a WmCSV isolate originating from Iran, which was confirmed by phylogenetic analysis. This suggests that WmCSV present in Oman has been introduced from Iran. The significance of this finding is discussed.
DNA sequences of three beta-1,4-endoglucanase genes from Thermomonospora fusca.

PubMed Central

Lao, G; Ghangas, G S; Jung, E D; Wilson, D B

1991-01-01

The DNA sequences of the Thermomonospora fusca genes encoding cellulases E2 and E5 and the N-terminal end of E4 were determined. Each sequence contains an identical 14-bp inverted repeat upstream of the initiation codon. There were no significant homologies between the coding regions of the three genes. The E2 gene is 73% identical to the celA gene from Microbispora bispora, but this was the only homology found with other cellulase genes. E2 belongs to a family of cellulases that includes celA from M. bispora, cenA from Cellulomonas fimi, casA from an alkalophilic Streptomyces strain, and cellobiohydrolase II from Trichoderma reesei. E4 shows 44% identity to an avocado cellulase, while E5 belongs to the Bacillus cellulase family. There were strong similarities between the amino acid sequences of the E2 and E5 cellulose binding domains, and these regions also showed homology with C. fimi and Pseudomonas fluorescens cellulose binding domains. PMID:1904434
Detecting earthquakes over a seismic network using single-station similarity measures

NASA Astrophysics Data System (ADS)

Bergen, Karianne J.; Beroza, Gregory C.

2018-06-01

New blind waveform-similarity-based detection methods, such as Fingerprint and Similarity Thresholding (FAST), have shown promise for detecting weak signals in long-duration, continuous waveform data. While blind detectors are capable of identifying similar or repeating waveforms without templates, they can also be susceptible to false detections due to local correlated noise. In this work, we present a set of three new methods that allow us to extend single-station similarity-based detection over a seismic network; event-pair extraction, pairwise pseudo-association, and event resolution complete a post-processing pipeline that combines single-station similarity measures (e.g. FAST sparse similarity matrix) from each station in a network into a list of candidate events. The core technique, pairwise pseudo-association, leverages the pairwise structure of event detections in its network detection model, which allows it to identify events observed at multiple stations in the network without modeling the expected moveout. Though our approach is general, we apply it to extend FAST over a sparse seismic network. We demonstrate that our network-based extension of FAST is both sensitive and maintains a low false detection rate. As a test case, we apply our approach to 2 weeks of continuous waveform data from five stations during the foreshock sequence prior to the 2014 Mw 8.2 Iquique earthquake. Our method identifies nearly five times as many events as the local seismicity catalogue (including 95 per cent of the catalogue events), and less than 1 per cent of these candidate events are false detections.
Divergent nuclear 18S rDNA paralogs in a turkey coccidium, Eimeria meleagrimitis, complicate molecular systematics and identification.

PubMed

El-Sherry, Shiem; Ogedengbe, Mosun E; Hafeez, Mian A; Barta, John R

2013-07-01

Multiple 18S rDNA sequences were obtained from two single-oocyst-derived lines of each of Eimeria meleagrimitis and Eimeria adenoeides. After analysing the 15 new 18S rDNA sequences from two lines of E. meleagrimitis and 17 new sequences from two lines of E. adenoeides, there were clear indications that divergent, paralogous 18S rDNA copies existed within the nuclear genome of E. meleagrimitis. In contrast, mitochondrial cytochrome c oxidase subunit I (COI) partial sequences from all lines of a particular Eimeria sp. were identical and, in phylogenetic analyses, COI sequences clustered unambiguously in monophyletic and highly-supported clades specific to individual Eimeria sp. Phylogenetic analysis of the new 18S rDNA sequences from E. meleagrimitis showed that they formed two distinct clades: Type A with four new sequences; and Type B with nine new sequences; both Types A and B sequences were obtained from each of the single-oocyst-derived lines of E. meleagrimitis. Together these rDNA types formed a well-supported E. meleagrimitis clade. Types A and B 18S rDNA sequences from E. meleagrimitis had a mean sequence identity of only 97.4% whereas mean sequence identity within types was 99.1-99.3%. The observed intraspecific sequence divergence among E. meleagrimitis 18S rDNA sequence types was even higher (approximately 2.6%) than the interspecific sequence divergence present between some well-recognized species such as Eimeria tenella and Eimeria necatrix (1.1%). Our observations suggest that, unlike COI sequences, 18S rDNA sequences are not reliable molecular markers to be used alone for species identification with coccidia, although 18S rDNA sequences have clear utility for phylogenetic reconstruction of apicomplexan parasites at the genus and higher taxonomic ranks. Copyright © 2013. Published by Elsevier Ltd.
Experimental characterization of pairwise correlations from triple quantum correlated beams generated by cascaded four-wave mixing processes

NASA Astrophysics Data System (ADS)

Wang, Wei; Cao, Leiming; Lou, Yanbo; Du, Jinjian; Jing, Jietai

2018-01-01

We theoretically and experimentally characterize the performance of the pairwise correlations from triple quantum correlated beams based on the cascaded four-wave mixing (FWM) processes. The pairwise correlations between any two of the beams are theoretically calculated and experimentally measured. The experimental and theoretical results are in good agreement. We find that two of the three pairwise correlations can be in the quantum regime. The other pairwise correlation is always in the classical regime. In addition, we also measure the triple-beam correlation which is always in the quantum regime. Such unbalanced and controllable pairwise correlation structures may be taken as advantages in practical quantum communications, for example, hierarchical quantum secret sharing. Our results also open the way for the classification and application of quantum states generated from the cascaded FWM processes.
Molecular analysis of the split cox1 gene from the Basidiomycota Agrocybe aegerita: relationship of its introns with homologous Ascomycota introns and divergence levels from common ancestral copies.

PubMed

Gonzalez, P; Barroso, G; Labarère, J

1998-10-05

The Basidiomycota Agrocybe aegerita (Aa) mitochondrial cox1 gene (6790 nucleotides), encoding a protein of 527aa (58377Da), is split by four large subgroup IB introns possessing site-specific endonucleases assumed to be involved in intron mobility. When compared to other fungal COX1 proteins, the Aa protein is closely related to the COX1 one of the Basidiomycota Schizophyllum commune (Sc). This clade reveals a relationship with the studied Ascomycota ones, with the exception of Schizosaccharomyces pombe (Sp) which ranges in an out-group position compared with both higher fungi divisions. When comparison is extended to other kingdoms, fungal COX1 sequences are found to be more related to algae and plant ones (more than 57.5% aa similarity) than to animal sequences (53.6% aa similarity), contrasting with the previously established close relationship between fungi and animals, based on comparisons of nuclear genes. The four Aa cox1 introns are homologous to Ascomycota or algae cox1 introns sharing the same location within the exonic sequences. The percentages of identity of the intronic nucleotide sequences suggest a possible acquisition by lateral transfers of ancestral copies or of their derived sequences. These identities extend over the whole intronic sequences, arguing in favor of a transfer of the complete intron rather than a transfer limited to the encoded ORF. The intron i4 shares 74% of identity, at the nucleotidic level, with the Podospora anserina (Pa) intron i14, and up to 90.5% of aa similarity between the encoded proteins, i.e. the highest values reported to date between introns of two phylogenetically distant species. This low divergence argues for a recent lateral transfer between the two species. On the contrary, the low sequence identities (below 36%) observed between Aa i1 and the homologous Sp i1 or Prototheca wickeramii (Pw) i1 suggest a long evolution time after the separation of these sequences. The introns i2 and i3 possessed intermediate percentages of identity with their homologous Ascomycota introns. This is the first report of the complete nucleotide sequence and molecular organization of a mitochondrial cox1 gene of any member of the Basidiomycota division.
The genome sequence of pepper vein yellows virus (family Luteoviridae, genus Polerovirus).

PubMed

Murakami, Ritsuko; Nakashima, Nobuhiko; Hinomoto, Norihide; Kawano, Shinji; Toyosato, Tetsuya

2011-05-01

The complete genome of pepper vein yellows virus (PeVYV) was sequenced using random amplification of RNA samples isolated from vector insects (Aphis gossypii) that had been given access to PeVYV-infected plants. The PeVYV genome consisted of 6244 nucleotides and had a genomic organization characteristic of members of the genus Polerovirus. PeVYV had highest amino acid sequence identities in ORF0 to ORF3 (75.9 - 91.9%) with tobacco vein distorting polerovirus, with which it was only 25.1% identical in ORF5. These sequence comparisons and previously studied biological properties indicate that PeVYV is a distinctly different virus and belongs to a new species of the genus Polerovirus.
Estimating the degree of identity by descent in consanguineous couples.

PubMed

Carr, Ian M; Markham, Sir Alexander F; Pena, Sérgio D J

2011-12-01

In some clinical and research settings, it is often necessary to identify the true level of "identity by descent" (IBD) between two individuals. However, as the individuals become more distantly related, it is increasingly difficult to accurately calculate this value. Consequently, we have developed a computer program that uses genome-wide SNP genotype data from related individuals to estimate the size and extent of IBD in their genomes. In addition, the software can compare a couple's IBD regions with either the autozygous regions of a relative affected by an autosomal recessive disease of unknown cause, or the IBD regions in the parents of the affected relative. It is then possible to calculate the probability of one of the couple's children suffering from the same disease. The software works by finding SNPs that exclude any possible IBD and then identifies regions that lack these SNPs, while exceeding a minimum size and number of SNPs. The accuracy of the algorithm was established by estimating the pairwise IBD between different members of a large pedigree with varying known coefficients of genetic relationship (CGR). © 2011 Wiley Periodicals, Inc.
Exact calculation of distributions on integers, with application to sequence alignment.

PubMed

Newberg, Lee A; Lawrence, Charles E

2009-01-01

Computational biology is replete with high-dimensional discrete prediction and inference problems. Dynamic programming recursions can be applied to several of the most important of these, including sequence alignment, RNA secondary-structure prediction, phylogenetic inference, and motif finding. In these problems, attention is frequently focused on some scalar quantity of interest, a score, such as an alignment score or the free energy of an RNA secondary structure. In many cases, score is naturally defined on integers, such as a count of the number of pairing differences between two sequence alignments, or else an integer score has been adopted for computational reasons, such as in the test of significance of motif scores. The probability distribution of the score under an appropriate probabilistic model is of interest, such as in tests of significance of motif scores, or in calculation of Bayesian confidence limits around an alignment. Here we present three algorithms for calculating the exact distribution of a score of this type; then, in the context of pairwise local sequence alignments, we apply the approach so as to find the alignment score distribution and Bayesian confidence limits.
Genotype to Phenotype Mapping of the E. coli lac Promoter

NASA Astrophysics Data System (ADS)

Otwinowski, Jakub; Nemenman, Ilya

2014-03-01

Genotype-to-phenotype maps and the related fitness landscapes that include epistatic interactions are difficult to measure because of their high dimensional structure. Here we construct such a map using the recently collected corpora of high-throughput sequence data from the 75 base pairs long mutagenized E. coli lac promoter region, where each sequence is associated with induced transcriptional activity measured by a fluorescent reporter. We find that the additive (non-epistatic) contributions of individual mutations account for about two-thirds of the explainable phenotype variance, while pairwise epistasis explains about 7% of the variance for the full mutagenized sequence and about 15% for the subsequence associated with protein binding sites. Surprisingly, there is no evidence for third order epistatic contributions, and our inferred fitness landscape is essentially single peaked, with a small amount of antagonistic epistasis. We identify transcription factor (CRP) and RNA polymerase binding sites in the promotor region and their interactions. We conclude with a cautionary note that inferred properties of fitness landscapes may be severely influenced by biases in the sequence data. Funded in part by HFSP and James S. McDonnell Foundation.

Identification and Characterization of a Pesticide Degrading Flavobacterium Species EMBS0145 by 16S rRNA Gene Sequencing.

PubMed

Nayarisseri, Anuraj; Suppahia, Anjana; Nadh, Anuroopa G; Nair, Achuthsankar S

2015-06-01

Organophosphates like chlorpyrifos, diazinon, or malathion have become most common and indisputably most toxic pest control agents that adversely affects the human nervous system even at low levels of exposure. Because of their relatively low cost and ability to be applied on a wide range of target insects and crop, organophosphorus pesticides account for a large share of all insecticides used in India, and this in turn raises severe health concerns. In this view, the present investigation was aimed to identify novel species of Flavobacterium bacteria which is bestowed with the capacity to degrade pesticides like chlorpyrifos, diazinon, or malathion. The bacterium was isolated from agricultural soil collected from Guntur District, Andhra Pradesh, India. The samples were serially diluted, and the aliquots were incubated for a suitable time following which the suspected colony was subjected to 16S rRNA gene sequencing. The sequence thus obtained was aligned pairwise against Flavobacterium species, which resulted in identification of novel species of Flavobacterium later which was named as EMBS0145 and sequence was deposited in GenBank with Accession Number: JN794045.
Identification and characterization of a pesticide degrading flavobacterium species EMBS0145 by 16S rRNA gene sequencing.

PubMed

Nayarisseri, Anuraj; Suppahia, Anjana; Nadh, Anuroopa G; Nair, Achuthsankar S

2014-08-09

Organophosphates (OPs) like chlorpyrifos, diazinon, or malathion have become most common and indisputably most toxic pest-control agents that adversely affects the human nervous system even at low levels of exposure. Because of their relatively low cost and ability to be applied on a wide range of target insects and crop, organophosphorus pesticides account for a large share of all insecticides used in India, this in turn raises severe health concerns. In this view, the present investigation was aimed to identify novel species of Flavobacterium bacteria which is bestowed with the capacity to degrade pesticides like chlorpyrifos, diazinon or malathion. The bacterium was isolated from agricultural soil collected from Guntur District, Andhra Pradesh, India. The samples were serially diluted and the aliquots were incubated for a suitable time following which the suspected colony was subjected to 16S rRNA gene sequencing. The sequence thus obtained was aligned pairwise against Flavobacterium species, which resulted in identification of novel species of Flavobacterium later which was named as EMBS0145 and sequence was deposited in GenBank with accession number JN794045.
Complete genome analysis of jasmine virus T from Jasminum sambac in China.

PubMed

Tang, Yajun; Gao, Fangluan; Yang, Zhen; Wu, Zujian; Yang, Liang

2016-07-01

The genome of a potyvirus (isolate JaVT_FZ) recovered from jasmine (Jasminum sambac L.) showing yellow ringspot symptoms in Fuzhou, China, was sequenced. JaVT_FZ is closely related to seven other potyviruses with completely sequenced genomes, with which it shares 66-70 % nucleotide and 52-56 % amino acid sequence identity. However, the coat protein (CP) gene shares 82-92 % nucleotide and 90-97 % amino acid sequence identity with those of two partially sequenced potyviruses, named jasmine potyvirus T (JaVT-jasmine) and jasmine yellow mosaic potyvirus (JaYMV-India), respectively. This suggests that JaVT_FZ, JaVT-jasmine and JaYMV-India should be regarded as members of a single potyvirus species, for which the name "Jasmine virus T" has priority.
Selection of optimal oligonucleotide probes for microarrays usingmultiple criteria, global alignment and parameter estimation.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Xingyuan; He, Zhili; Zhou, Jizhong

2005-10-30

The oligonucleotide specificity for microarray hybridizationcan be predicted by its sequence identity to non-targets, continuousstretch to non-targets, and/or binding free energy to non-targets. Mostcurrently available programs only use one or two of these criteria, whichmay choose 'false' specific oligonucleotides or miss 'true' optimalprobes in a considerable proportion. We have developed a software tool,called CommOligo using new algorithms and all three criteria forselection of optimal oligonucleotide probes. A series of filters,including sequence identity, free energy, continuous stretch, GC content,self-annealing, distance to the 3'-untranslated region (3'-UTR) andmelting temperature (Tm), are used to check each possibleoligonucleotide. A sequence identity is calculated based onmore » gapped globalalignments. A traversal algorithm is used to generate alignments for freeenergy calculation. The optimal Tm interval is determined based on probecandidates that have passed all other filters. Final probes are pickedusing a combination of user-configurable piece-wise linear functions andan iterative process. The thresholds for identity, stretch and freeenergy filters are automatically determined from experimental data by anaccessory software tool, CommOligo_PE (CommOligo Parameter Estimator).The program was used to design probes for both whole-genome and highlyhomologous sequence data. CommOligo and CommOligo_PE are freely availableto academic users upon request.« less
New Ehrlichia Species Closely Related to Ehrlichia chaffeensis Isolated from Ixodes ovatus Ticks in Japan

PubMed Central

Shibata, Shin-ichiro; Kawahara, Makoto; Rikihisa, Yasuko; Fujita, Hiromi; Watanabe, Yuriko; Suto, Chiharu; Ito, Tadahiko

2000-01-01

Seven Ehrlichia strains (six HF strains and one Anan strain) that were obtained from laboratory mice by intraperitoneally inoculating homogenates of adult Ixodes ovatus collected in Japan were characterized. 16S rRNA sequences of all six HF strains were identical, and the sequences were 99.7, 98.2, and 97.7% identical to those of Anan strain, Ehrlichia chaffeensis (human monocytic ehrlichiosis agent), and E. muris, respectively. Partial GroEL amino acid sequencing also revealed that the six HF strains had identical sequences, which were 99.0, 98.5, and 97.3% identical to those of E. chaffeensis, the Anan strain, and E. canis, respectively. All HF strains were lethal to mice at higher dosages and intraperitoneal inoculation, whereas the Anan or E. muris strain induced only mild clinical signs. Light and electron microscopy of moribund mice inoculated with one of the HF strains revealed severe liver necrosis and the presence of numerous ehrlichial inclusions (morulae) in various organs. The study revealed that members of E. canis genogroup are naturally present in Ixodes ticks. HF strains that can cause severe illness in immunocompetent laboratory mice would be valuable in studying the pathogenesis and the roles of both cellular and humoral immune responses in ehrlichiosis caused by E. canis genogroup. PMID:10747103
DNA Barcode Sequence Identification Incorporating Taxonomic Hierarchy and within Taxon Variability

PubMed Central

Little, Damon P.

2011-01-01

For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple–sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple–sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment–free sequence identification algorithm–BRONX–that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple–sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user–defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini–barcode queries against a full–length barcode database). BRONX consistently produced better identifications at the genus–level for all query types. PMID:21857897
SSMap: a new UniProt-PDB mapping resource for the curation of structural-related information in the UniProt/Swiss-Prot Knowledgebase.

PubMed

David, Fabrice P A; Yip, Yum L

2008-09-23

Sequences and structures provide valuable complementary information on protein features and functions. However, it is not always straightforward for users to gather information concurrently from the sequence and structure levels. The UniProt knowledgebase (UniProtKB) strives to help users on this undertaking by providing complete cross-references to Protein Data Bank (PDB) as well as coherent feature annotation using available structural information. In this study, SSMap - a new UniProt-PDB residue-residue level mapping - was generated. The primary objective of this mapping is not only to facilitate the two tasks mentioned above, but also to palliate a number of shortcomings of existent mappings. SSMap is the first isoform sequence-specific mapping resource and is up-to-date for UniProtKB annotation tasks. The method employed by SSMap differs from the other mapping resources in that it stresses on the correct reconstruction of the PDB sequence from structures, and on the correct attribution of a UniProtKB entry to each PDB chain by using a series of post-processing steps. SSMap was compared to other existing mapping resources in terms of the correctness of the attribution of PDB chains to UniProtKB entries, and of the quality of the pairwise alignments supporting the residue-residue mapping. It was found that SSMap shared about 80% of the mappings with other mapping sources. New and alternative mappings proposed by SSMap were mostly good as assessed by manual verification of data subsets. As for local pairwise alignments, it was shown that major discrepancies (both in terms of alignment lengths and boundaries), when present, were often due to differences in methodologies used for the mappings. SSMap provides an independent, good quality UniProt-PDB mapping. The systematic comparison conducted in this study allows the further identification of general problems in UniProt-PDB mappings so that both the coverage and the quality of the mappings can be systematically improved for the benefit of the scientific community. SSMap mapping is currently used to provide PDB cross-references in UniProtKB.
Indigenous and introduced potyviruses of legumes and Passiflora spp. from Australia: biological properties and comparison of coat protein sequences

USDA-ARS?s Scientific Manuscript database

Coat protein sequences of 33 Potyvirus isolates from legume and Passiflora spp. were sequenced to determine the identity of infecting viruses. Phylogenetic analysis of the sequences revealed the presence of seven distinct virus species....
Using structural equation modeling for network meta-analysis.

PubMed

Tu, Yu-Kang; Wu, Yun-Chun

2017-07-14

Network meta-analysis overcomes the limitations of traditional pair-wise meta-analysis by incorporating all available evidence into a general statistical framework for simultaneous comparisons of several treatments. Currently, network meta-analyses are undertaken either within the Bayesian hierarchical linear models or frequentist generalized linear mixed models. Structural equation modeling (SEM) is a statistical method originally developed for modeling causal relations among observed and latent variables. As random effect is explicitly modeled as a latent variable in SEM, it is very flexible for analysts to specify complex random effect structure and to make linear and nonlinear constraints on parameters. The aim of this article is to show how to undertake a network meta-analysis within the statistical framework of SEM. We used an example dataset to demonstrate the standard fixed and random effect network meta-analysis models can be easily implemented in SEM. It contains results of 26 studies that directly compared three treatment groups A, B and C for prevention of first bleeding in patients with liver cirrhosis. We also showed that a new approach to network meta-analysis based on the technique of unrestricted weighted least squares (UWLS) method can also be undertaken using SEM. For both the fixed and random effect network meta-analysis, SEM yielded similar coefficients and confidence intervals to those reported in the previous literature. The point estimates of two UWLS models were identical to those in the fixed effect model but the confidence intervals were greater. This is consistent with results from the traditional pairwise meta-analyses. Comparing to UWLS model with common variance adjusted factor, UWLS model with unique variance adjusted factor has greater confidence intervals when the heterogeneity was larger in the pairwise comparison. The UWLS model with unique variance adjusted factor reflects the difference in heterogeneity within each comparison. SEM provides a very flexible framework for univariate and multivariate meta-analysis, and its potential as a powerful tool for advanced meta-analysis is still to be explored.
A protein block based fold recognition method for the annotation of twilight zone sequences.

PubMed

Suresh, V; Ganesan, K; Parthasarathy, S

2013-03-01

The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and P-value ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.
Reconstructing genealogies of serial samples under the assumption of a molecular clock using serial-sample UPGMA.

PubMed

Drummond, A; Rodrigo, A G

2000-12-01

Reconstruction of evolutionary relationships from noncontemporaneous molecular samples provides a new challenge for phylogenetic reconstruction methods. With recent biotechnological advances there has been an increase in molecular sequencing throughput, and the potential to obtain serial samples of sequences from populations, including rapidly evolving pathogens, is fast being realized. A new method called the serial-sample unweighted pair grouping method with arithmetic means (sUPGMA) is presented that reconstructs a genealogy or phylogeny of sequences sampled serially in time using a matrix of pairwise distances. The resulting tree depicts the terminal lineages of each sample ending at a different level consistent with the sample's temporal order. Since sUPGMA is a variant of UPGMA, it will perform best when sequences have evolved at a constant rate (i.e., according to a molecular clock). On simulated data, this new method performs better than standard cluster analysis under a variety of longitudinal sampling strategies. Serial-sample UPGMA is particularly useful for analysis of longitudinal samples of viruses and bacteria, as well as ancient DNA samples, with the minimal requirement that samples of sequences be ordered in time.
Molecular cloning of two human liver 3 alpha-hydroxysteroid/dihydrodiol dehydrogenase isoenzymes that are identical with chlordecone reductase and bile-acid binder.

PubMed Central

Deyashiki, Y; Ogasawara, A; Nakayama, T; Nakanishi, M; Miyabe, Y; Sato, K; Hara, A

1994-01-01

Human liver contains two dihydrodiol dehydrogenases, DD2 and DD4, associated with 3 alpha-hydroxysteroid dehydrogenase activity. We have raised polyclonal antibodies that cross-reacted with the two enzymes and isolated two 1.2 kb cDNA clones (C9 and C11) for the two enzymes from a human liver cDNA library using the antibodies. The clones of C9 and C11 contained coding sequences corresponding to 306 and 321 amino acid residues respectively, but lacked 5'-coding regions around the initiation codon. Sequence analyses of several peptides obtained by enzymic and chemical cleavages of the two purified enzymes verified that the C9 and C11 clones encoded DD2 and DD4 respectively, and further indicated that the sequence of DD2 had at least additional 16 residues upward from the N-terminal sequence deduced from the cDNA. There was 82% amino acid sequence identity between the two enzymes, indicating that the enzymes are genetic isoenzymes. A computer-based comparison of the cDNAs of the isoenzymes with the DNA sequence database revealed that the nucleotide and amino acid sequences of DD2 and DD4 are virtually identical with those of human bile-acid binder and human chlordecone reductase cDNAs respectively. Images Figure 1 PMID:8172617
Visualizing bacterial tRNA identity determinants and antideterminants using function logos and inverse function logos

PubMed Central

Freyhult, Eva; Moulton, Vincent; Ardell, David H.

2006-01-01

Sequence logos are stacked bar graphs that generalize the notion of consensus sequence. They employ entropy statistics very effectively to display variation in a structural alignment of sequences of a common function, while emphasizing its over-represented features. Yet sequence logos cannot display features that distinguish functional subclasses within a structurally related superfamily nor do they display under-represented features. We introduce two extensions to address these needs: function logos and inverse logos. Function logos display subfunctions that are over-represented among sequences carrying a specific feature. Inverse logos generalize both sequence logos and function logos by displaying under-represented, rather than over-represented, features or functions in structural alignments. To make inverse logos, a compositional inverse is applied to the feature or function frequency distributions before logo construction, where a compositional inverse is a mathematical transform that makes common features or functions rare and vice versa. We applied these methods to a database of structurally aligned bacterial tDNAs to create highly condensed, birds-eye views of potentially all so-called identity determinants and antideterminants that confer specific amino acid charging or initiator function on tRNAs in bacteria. We recovered both known and a few potentially novel identity elements. Function logos and inverse logos are useful tools for exploratory bioinformatic analysis of structure–function relationships in sequence families and superfamilies. PMID:16473848
Concerted evolution at the population level: pupfish HindIII satellite DNA sequences.

PubMed Central

Elder, J F; Turner, B J

1994-01-01

The canonical monomers (approximately 170 bp) of an abundant (1.9 x 10(6) copies per diploid genome) satellite DNA sequence family in the genome of Cyprinodon variegatus, a "pupfish" that ranges along the Atlantic coast from Cape Cod to central Mexico, are divergent in base sequence in 10 of 12 samples collected from natural populations. The divergence involves substitutions, deletions, and insertions, is marked in scope (mean pairwise sequence similarity = 61.6%; range = 35-95.9%), is largely confined to the 3' half of the monomer, and is not correlated with the distance among collecting sites. Repetitive cloning and direct genomic sequencing experiments failed to detect intrapopulation and intraindividual variation, suggesting high levels of sequence homogeneity within populations. The satellite sequence has therefore undergone "concerted evolution," at the level of the local population. Concerted evolution has previously almost always been discussed in terms of the divergence of species or higher taxa; its intraspecific occurrence apparently has not been reported previously. The generality of the observation is difficult to evaluate, for although satellite DNAs from a large number of organisms have been studied in detail, there appear to be little or no other data on their sequence variation in natural populations. The relationship (if any) between concerted, population level, satellite DNA divergence and the extent of gene flow/genetic isolation among conspecific natural populations remains to be established. Images PMID:8302879
Algorithm for selection of optimized EPR distance restraints for de novo protein structure determination

PubMed Central

Kazmier, Kelli; Alexander, Nathan S.; Meiler, Jens; Mchaourab, Hassane S.

2010-01-01

A hybrid protein structure determination approach combining sparse Electron Paramagnetic Resonance (EPR) distance restraints and Rosetta de novo protein folding has been previously demonstrated to yield high quality models (Alexander et al., 2008). However, widespread application of this methodology to proteins of unknown structures is hindered by the lack of a general strategy to place spin label pairs in the primary sequence. In this work, we report the development of an algorithm that optimally selects spin labeling positions for the purpose of distance measurements by EPR. For the α-helical subdomain of T4 lysozyme (T4L), simulated restraints that maximize sequence separation between the two spin labels while simultaneously ensuring pairwise connectivity of secondary structure elements yielded vastly improved models by Rosetta folding. 50% of all these models have the correct fold compared to only 21% and 8% correctly folded models when randomly placed restraints or no restraints are used, respectively. Moreover, the improvements in model quality require a limited number of optimized restraints, the number of which is determined by the pairwise connectivities of T4L α-helices. The predicted improvement in Rosetta model quality was verified by experimental determination of distances between spin labels pairs selected by the algorithm. Overall, our results reinforce the rationale for the combined use of sparse EPR distance restraints and de novo folding. By alleviating the experimental bottleneck associated with restraint selection, this algorithm sets the stage for extending computational structure determination to larger, traditionally elusive protein topologies of critical structural and biochemical importance. PMID:21074624
Network Analysis of Protein Adaptation: Modeling the Functional Impact of Multiple Mutations

PubMed Central

Beleva Guthrie, Violeta; Masica, David L; Fraser, Andrew; Federico, Joseph; Fan, Yunfan; Camps, Manel; Karchin, Rachel

2018-01-01

Abstract The evolution of new biochemical activities frequently involves complex dependencies between mutations and rapid evolutionary radiation. Mutation co-occurrence and covariation have previously been used to identify compensating mutations that are the result of physical contacts and preserve protein function and fold. Here, we model pairwise functional dependencies and higher order interactions that enable evolution of new protein functions. We use a network model to find complex dependencies between mutations resulting from evolutionary trade-offs and pleiotropic effects. We present a method to construct these networks and to identify functionally interacting mutations in both extant and reconstructed ancestral sequences (Network Analysis of Protein Adaptation). The time ordering of mutations can be incorporated into the networks through phylogenetic reconstruction. We apply NAPA to three distantly homologous β-lactamase protein clusters (TEM, CTX-M-3, and OXA-51), each of which has experienced recent evolutionary radiation under substantially different selective pressures. By analyzing the network properties of each protein cluster, we identify key adaptive mutations, positive pairwise interactions, different adaptive solutions to the same selective pressure, and complex evolutionary trajectories likely to increase protein fitness. We also present evidence that incorporating information from phylogenetic reconstruction and ancestral sequence inference can reduce the number of spurious links in the network, whereas preserving overall network community structure. The analysis does not require structural or biochemical data. In contrast to function-preserving mutation dependencies, which are frequently from structural contacts, gain-of-function mutation dependencies are most commonly between residues distal in protein structure. PMID:29522102
The OGCleaner: filtering false-positive homology clusters.

PubMed

Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Snell, Quinn; Bybee, Seth M

2017-01-01

Detecting homologous sequences in organisms is an essential step in protein structure and function prediction, gene annotation and phylogenetic tree construction. Heuristic methods are often employed for quality control of putative homology clusters. These heuristics, however, usually only apply to pairwise sequence comparison and do not examine clusters as a whole. We present the Orthology Group Cleaner (the OGCleaner), a tool designed for filtering putative orthology groups as homology or non-homology clusters by considering all sequences in a cluster. The OGCleaner relies on high-quality orthologous groups identified in OrthoDB to train machine learning algorithms that are able to distinguish between true-positive and false-positive homology groups. This package aims to improve the quality of phylogenetic tree construction especially in instances of lower-quality transcriptome assemblies. https://github.com/byucsl/ogcleaner CONTACT: sfujimoto@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Molecular systematics of higher primates: genealogical relations and classification.

PubMed Central

Miyamoto, M M; Koop, B F; Slightom, J L; Goodman, M; Tennant, M R

1988-01-01

We obtained 5' and 3' flanking sequences (5.4 kilobase pairs) from the psi eta-globin gene region of the rhesus macaque (Macaca mulatta) and combined them with available nucleotide data. The completed sequence, representing 10.8 kilobase pairs of contiguous noncoding DNA, was compared to the same orthologous regions available for human (Homo sapiens, as represented by five different alleles), common chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), and orangutan (Pongo pygmaeus). The nucleotide sequence for Macaca mulatta provided the outgroup perspective needed to evaluate better the relationships of humans and great apes. Pairwise comparisons and parsimony analysis of these orthologues clearly demonstrated (i) that humans and great apes share a high degree of genetic similarity and (ii) that humans, chimpanzees, and gorillas form a natural monophyletic group. These conclusions strongly favor a genealogical classification for higher primates consisting of a single family (Hominidae) with two subfamilies (Homininae for Homo, Pan, and Gorilla and Ponginae for Pongo). PMID:3174657
Gene Deletion in Barley Mediated by LTR-retrotransposon BARE

PubMed Central

Shang, Yi; Yang, Fei; Schulman, Alan H.; Zhu, Jinghuan; Jia, Yong; Wang, Junmei; Zhang, Xiao-Qi; Jia, Qiaojun; Hua, Wei; Yang, Jianming; Li, Chengdao

2017-01-01

A poly-row branched spike (prbs) barley mutant was obtained from soaking a two-rowed barley inflorescence in a solution of maize genomic DNA. Positional cloning and sequencing demonstrated that the prbs mutant resulted from a 28 kb deletion including the inflorescence architecture gene HvRA2. Sequence annotation revealed that the HvRA2 gene is flanked by two LTR (long terminal repeat) retrotransposons (BARE) sharing 89% sequence identity. A recombination between the integrase (IN) gene regions of the two BARE copies resulted in the formation of an intact BARE and loss of HvRA2. No maize DNA was detected in the recombination region although the flanking sequences of HvRA2 gene showed over 73% of sequence identity with repetitive sequences on 10 maize chromosomes. It is still unknown whether the interaction of retrotransposons between barley and maize has resulted in the recombination observed in the present study. PMID:28252053
Radiolabeled Escherichia coli heat-stable enterotoxin analogs for in vivo imaging of colorectal cancer

NASA Astrophysics Data System (ADS)

Giblin, M. F.; Sieckman, G. L.; Owen, N. K.; Hoffman, T. J.; Forte, L. R.; Volkert, W. A.

2005-12-01

The human Escherichia coli heat-stable enterotoxin (STh, amino acid sequence N1SSNYCCELCCNPACTGCY19) binds specifically to the guanylate cyclase C (GC-C) receptor, which is present in high density on the apical surface of normal intestinal epithelial cells as well as on the surface of human colon cancer cells. In the current study, two STh analogs were synthesized and evaluated in vitro and in vivo. Both analogs shared identical 6-19 core sequences, and had N-terminal pendant DOTA moieties. The analogs differed in the identity of a 6 amino acid peptide sequence intervening between DOTA and the 6-19 core. In one analog, the peptide was an RGD-containing sequence found in human fibronectin (GRGDSP), while in the other this peptide sequence was randomly scrambled (GRDSGP). The results indicated that the presence of the human fibronectin sequence in the hybrid peptide did not affect tumor localization in vivo.

Opsin cDNA sequences of a UV and green rhodopsin of the satyrine butterfly Bicyclus anynana.

PubMed

Vanhoutte, K J A; Eggen, B J L; Janssen, J J M; Stavenga, D G

2002-11-01

The cDNAs of an ultraviolet (UV) and long-wavelength (LW) (green) absorbing rhodopsin of the bush brown Bicyclus anynana were partially identified. The UV sequence, encoding 377 amino acids, is 76-79% identical to the UV sequences of the papilionids Papilio glaucus and Papilio xuthus and the moth Manduca sexta. A dendrogram derived from aligning the amino acid sequences reveals an equidistant position of Bicyclus between Papilio and Manduca. The sequence of the green opsin cDNA fragment, which encodes 242 amino acids, represents six of the seven transmembrane regions. At the amino acid level, this fragment is more than 80% identical to the corresponding LW opsin sequences of Dryas, Heliconius, Papilio (rhodopsin 2) and Manduca. Whereas three LW absorbing rhodopsins were identified in the papilionid butterflies, only one green opsin was found in B. anynana.
Characterization of apple stem grooving virus and apple chlorotic leaf spot virus identified in a crab apple tree.

PubMed

Li, Yongqiang; Deng, Congliang; Bian, Yong; Zhao, Xiaoli; Zhou, Qi

2017-04-01

Apple stem grooving virus (ASGV), apple chlorotic leaf spot virus (ACLSV), and prunus necrotic ringspot virus (PNRSV) were identified in a crab apple tree by small RNA deep sequencing. The complete genome sequence of ACLSV isolate BJ (ACLSV-BJ) was 7554 nucleotides and shared 67.0%-83.0% nucleotide sequence identity with other ACLSV isolates. A phylogenetic tree based on the complete genome sequence of all available ACLSV isolates showed that ACLSV-BJ clustered with the isolates SY01 from hawthorn, MO5 from apple, and JB, KMS and YH from pear. The complete nucleotide sequence of ASGV-BJ was 6509 nucleotides (nt) long and shared 78.2%-80.7% nucleotide sequence identity with other isolates. ASGV-BJ and the isolate ASGV_kfp clustered together in the phylogenetic tree as an independent clade. Recombination analysis showed that isolate ASGV-BJ was a naturally occurring recombinant.
Common Amino Acid Subsequences in a Universal Proteome—Relevance for Food Science

PubMed Central

Minkiewicz, Piotr; Darewicz, Małgorzata; Iwaniak, Anna; Sokołowska, Jolanta; Starowicz, Piotr; Bucholska, Justyna; Hrynkiewicz, Monika

2015-01-01

A common subsequence is a fragment of the amino acid chain that occurs in more than one protein. Common subsequences may be an object of interest for food scientists as biologically active peptides, epitopes, and/or protein markers that are used in comparative proteomics. An individual bioactive fragment, in particular the shortest fragment containing two or three amino acid residues, may occur in many protein sequences. An individual linear epitope may also be present in multiple sequences of precursor proteins. Although recent recommendations for prediction of allergenicity and cross-reactivity include not only sequence identity, but also similarities in secondary and tertiary structures surrounding the common fragment, local sequence identity may be used to screen protein sequence databases for potential allergens in silico. The main weakness of the screening process is that it overlooks allergens and cross-reactivity cases without identical fragments corresponding to linear epitopes. A single peptide may also serve as a marker of a group of allergens that belong to the same family and, possibly, reveal cross-reactivity. This review article discusses the benefits for food scientists that follow from the common subsequences concept. PMID:26340620
Novel Detection of Coxiella spp., Theileria luwenshuni, and T. ovis Endosymbionts in Deer Keds (Lipoptena fortisetosa).

PubMed

Lee, Seung-Hun; Kim, Kyoo-Tae; Kwon, Oh-Deog; Ock, Younsung; Kim, Taeil; Choi, Donghag; Kwak, Dongmi

2016-01-01

We describe for the first time the detection of Coxiella-like bacteria (CLB), Theileria luwenshuni, and T. ovis endosymbionts in blood-sucking deer keds. Eight deer keds attached to a Korean water deer were identified as Lipoptena fortisetosa (Diptera: Hippoboscidae) by morphological and genetic analyses. Among the endosymbionts assessed, CLB, Theileria luwenshuni, and T. ovis were identified in L. fortisetosa by PCR and nucleotide sequencing. Based on phylogeny, CLB 16S rRNA sequences were classified into clade B, sharing 99.4% identity with CLB from Haemaphysalis longicornis in South Korea. Although the virulence of CLB to vertebrates is still controversial, several studies have reported clinical symptoms in birds due to CLB infections. The 18S rRNA sequences of T. luwenshuni and T. ovis in this study were 98.8-100% identical to those in GenBank, and all of the obtained sequences of T. ovis and T. luwenshuni in this study were 100% identical to each other, respectively. Although further studies are required to positively confirm L. fortisetosa as a biological vector of these pathogens, strong genetic relationships among sequences from this and previous studies suggest potential transmission among mammalian hosts by ticks and keds.
Complete genome sequence of a new begomovirus associated with yellow mosaic disease of Hemidesmus indicus in India.

PubMed

Reddy, M Sreekanth; Kanakala, S; Srinivas, K P; Hema, M; Malathi, V G; Sreenivasulu, P

2014-05-01

The complete DNA A genome of a virus isolate associated with yellow mosaic disease of a medicinal plant, Hemidesmus indicus, from India was cloned and sequenced. The length of DNA A was 2825 nucleotides, 35 nucleotides longer than the unit genome of monopartite begomoviruses. Comparison of the nucleotide sequence of DNA A of the virus isolate with those of other begomoviruses showed maximum sequence identity of 69 % to DNA A of ageratum yellow vein China virus (AYVCNV; AJ558120) and 68 % with tomato yellow leaf curl virus- LBa4 (TYLCV; EF185318), and it formed a distinct clade in phylogenetic analysis. The genome organization of the present virus isolate was found to be similar to that of Old World monopartite begomoviruses. The genome was considered to be monopartite, because association of DNA B and β satellite DNA components was not detected. Based on its sequence identity (<70 %) to all other begomoviruses known to date and ICTV (International Committee on Taxonomy of Viruses) species demarcating criteria (<89 % identity), it is considered a member of a novel begomovirus species, and the tentative name "Hemidesmus yellow mosaic virus" (HeYMV) is proposed.
Phylogenetic Analysis of Theileria annulata Infected Cell Line S15 Iran Vaccine Strain.

PubMed

Habibi, Gh

2012-01-01

Bovine theileriosis results from infection with obligate intracellular protozoa of the genus Theileria. The phylogenetic relationships between two isolates of Theileria annulata, and 36 Theileria spp., as well as 6 outgroup including Babesia spp. and coccidian protozoa were analyzed using the 18S rRNA gene sequence. The target DNA segment was amplified by PCR. The PCR product was used for direct sequencing. The length of the 18S rRNA gene of all Theileria spp. involved in this study was around 1,400 bp. A phylogenetic tree was inferred based on the 18S rRNA gene sequence of the Iran and Iraq isolates, and other species of Theileria available in GenBank. In the constructed tree, Theileria annulata (Iran vaccine strain) was closely related to other T. annulata from Europe, Asia, as well as T. lestoquardi, T. parva and T. taurotragi all in one clade. Phylogenetic analyses based on small subunit ribosomal RNA gene suggested that the percent identity of the sequence of Iran vaccine strain was completely the same as Iraq sequence (100% identical), but the similarity of Iran vaccine strain with other T. annulata reported from China, Spain and Italy determined the 97.9 to 99.9% identity.
Cloning of an avilamycin biosynthetic gene cluster from Streptomyces viridochromogenes Tü57.

PubMed Central

Gaisser, S; Trefzer, A; Stockert, S; Kirschning, A; Bechthold, A

1997-01-01

A 65-kb region of DNA from Streptomyces viridochromogenes Tü57, containing genes encoding proteins involved in the biosynthesis of avilamycins, was isolated. The DNA sequence of a 6.4-kb fragment from this region revealed four open reading frames (ORF1 to ORF4), three of which are fully contained within the sequenced fragment. The deduced amino acid sequence of AviM, encoded by ORF2, shows 37% identity to a 6-methylsalicylic acid synthase from Penicillium patulum. Cultures of S. lividans TK24 and S. coelicolor CH999 containing plasmids with ORF2 on a 5.5-kb PstI fragment were able to produce orsellinic acid, an unreduced version of 6-methylsalicylic acid. The amino acid sequence encoded by ORF3 (AviD) is 62% identical to that of StrD, a dTDP-glucose synthase from S. griseus. The deduced amino acid sequence of AviE, encoded by ORF4, shows 55% identity to a dTDP-glucose dehydratase (StrE) from S. griseus. Gene insertional inactivation experiments of aviE abolished avilamycin production, indicating the involvement of aviE in the biosynthesis of avilamycins. PMID:9335272
Comparative genomic sequence analysis of novel Helicoverpa armigera nucleopolyhedrovirus (NPV) isolated from Kenya and three other previously sequenced Helicoverpa spp. NPVs.

PubMed

Ogembo, Javier Gordon; Caoili, Barbara L; Shikata, Masamitsu; Chaeychomsri, Sudawan; Kobayashi, Michihiro; Ikeda, Motoko

2009-10-01

A newly cloned Helicoverpa armigera nucleopolyhedrovirus (HearNPV) from Kenya, HearNPV-NNg1, has a higher insecticidal activity than HearNPV-G4, which also exhibits lower insecticidal activity than HearNPV-C1. In the search for genes and/or nucleotide sequences that might be involved in the observed virulence differences among Helicoverpa spp. NPVs, the entire genome of NNg1 was sequenced and compared with previously sequenced genomes of G4, C1 and Helicoverpa zea single-nucleocapsid NPV (Hz). The NNg1 genome was 132,425 bp in length, with a total of 143 putative open reading frames (ORFs), and shared high levels of overall amino acid and nucleotide sequence identities with G4, C1 and Hz. Three NNg1 ORFs, ORF5, ORF100 and ORF124, which were shared with C1, were absent in G4 and Hz, while NNg1 and C1 were missing a homologue of G4/Hz ORF5. Another three ORFs, ORF60 (bro-b), ORF119 and ORF120, and one direct repeat sequence (dr) were unique to NNg1. Relative to the overall nucleotide sequence identity, lower sequence identities were observed between NNg1 hrs and the homologous hrs in the other three Helicoverpa spp. NPVs, despite containing the same number of hrs located at essentially the same positions on the genomes. Differences were also observed between NNg1 and each of the other three Helicoverpa spp. NPVs in the diversity of bro genes encoded on the genomes. These results indicate several putative genes and nucleotide sequences that may be responsible for the virulence differences observed among Helicoverpa spp., yet the specific genes and/or nucleotide sequences responsible have not been identified.
Detection and molecular characterization of Babesia, Theileria, and Hepatozoon species in hard ticks collected from Kagoshima, the southern region in Japan.

PubMed

Masatani, Tatsunori; Hayashi, Kei; Andoh, Masako; Tateno, Morihiro; Endo, Yasuyuki; Asada, Masahito; Kusakisako, Kodai; Tanaka, Tetsuya; Gokuden, Mutsuyo; Hozumi, Nodoka; Nakadohzono, Fumiko; Matsuo, Tomohide

2017-06-01

To reveal the distribution of tick-borne parasites, we established a novel nested polymerase chain reaction (PCR) system to detect the most common agents of tick-borne parasitic diseases, namely Babesia, Theileria, and Hepatozoon parasites. We collected host-seeking or animal-feeding ticks in Kagoshima Prefecture, the southernmost region of Kyusyu Island in southwestern Japan. Twenty of the total of 776 tick samples displayed a specific band of the appropriate size (approximately 1.4-1.6kbp) for the 18S rRNA genes in the novel nested PCR (20/776: 2.58%). These PCR products have individual sequences of Babesia spp. (from 8 ticks), Theileria spp. (from 9 ticks: one tick sample including at least two Theileria spp. sequences), and Hepatozoon spp. (from 3 ticks). Phylogenetic analyses revealed that these sequences were close to those of undescribed Babesia spp. detected in feral raccoons in Japan (5 sequences; 3 sequences being identical), Babesia gibsoni-like parasites detected in pigs in China (3 sequences; all sequences being identical), Theileria spp. detected in sika deer in Japan and China (10 sequences; 2 sequences being identical), Hepatozoon canis (one sequence), and Hepatozoon spp. detected in Japanese martens in Japan (two sequences). In summary, we showed that various tick-borne parasites exist in Kagoshima, the southern region in Japan by using the novel nested PCR system. These including undescribed species such as Babesia gibsoni-like parasites previously detected in pigs in China. Importantly, our results revealed new combinations of ticks and protozoan parasites in southern Japan. The results of this study will aid in the recognition of potential parasitic animal diseases caused by tick-borne parasites. Copyright © 2017 Elsevier GmbH. All rights reserved.
Taxonomic evaluation of Streptomyces albus and related species using multilocus sequence analysis and proposals to emend the description of Streptomyces albus and describe Streptomyces pathocidini sp. nov.

PubMed Central

Doroghazi, J. R.; Ju, K.-S.; Metcalf, W. W.

2014-01-01

In phylogenetic analyses of the genus Streptomyces using 16S rRNA gene sequences, Streptomyces albus subsp. albus NRRL B-1811T forms a cluster with five other species having identical or nearly identical 16S rRNA gene sequences. Moreover, the morphological and physiological characteristics of these other species, including Streptomyces almquistii NRRL B-1685T, Streptomyces flocculus NRRL B-2465T, Streptomyces gibsonii NRRL B-1335T and Streptomyces rangoonensis NRRL B-12378T are quite similar. This cluster is of particular taxonomic interest because Streptomyces albus is the type species of the genus Streptomyces. The related strains were subjected to multilocus sequence analysis (MLSA) utilizing partial sequences of the housekeeping genes atpD, gyrB, recA, rpoB and trpB and confirmation of previously reported phenotypic characteristics. The five strains formed a coherent cluster supported by a 100 % bootstrap value in phylogenetic trees generated from sequence alignments prepared by concatenating the sequences of the housekeeping genes, and identical tree topology was observed using various different tree-making algorithms. Moreover, all but one strain, S. flocculus NRRL B-2465T, exhibited identical sequences for all of the five housekeeping gene loci sequenced, but NRRL B-2465T still exhibited an MLSA evolutionary distance of 0.005 from the other strains, a value that is lower than the 0.007 MLSA evolutionary distance threshold proposed for species-level relatedness. These data support a proposal to reclassify S. almquistii, S. flocculus, S. gibsonii and S. rangoonensis as later heterotypic synonyms of S. albus with NRRL B-1811T as the type strain. The MLSA sequence database also demonstrated utility for quickly and conclusively confirming that numerous strains within the ARS Culture Collection had been previously misidentified as subspecies of S. albus and that Streptomyces albus subsp. pathocidicus should be redescribed as a novel species, Streptomyces pathocidini sp. nov., with the type strain NRRL B-24287T. PMID:24277863
SubVis: an interactive R package for exploring the effects of multiple substitution matrices on pairwise sequence alignment

PubMed Central

Coan, Heather B.; Youker, Robert T.

2017-01-01

Understanding how proteins mutate is critical to solving a host of biological problems. Mutations occur when an amino acid is substituted for another in a protein sequence. The set of likelihoods for amino acid substitutions is stored in a matrix and input to alignment algorithms. The quality of the resulting alignment is used to assess the similarity of two or more sequences and can vary according to assumptions modeled by the substitution matrix. Substitution strategies with minor parameter variations are often grouped together in families. For example, the BLOSUM and PAM matrix families are commonly used because they provide a standard, predefined way of modeling substitutions. However, researchers often do not know if a given matrix family or any individual matrix within a family is the most suitable. Furthermore, predefined matrix families may inaccurately reflect a particular hypothesis that a researcher wishes to model or otherwise result in unsatisfactory alignments. In these cases, the ability to compare the effects of one or more custom matrices may be needed. This laborious process is often performed manually because the ability to simultaneously load multiple matrices and then compare their effects on alignments is not readily available in current software tools. This paper presents SubVis, an interactive R package for loading and applying multiple substitution matrices to pairwise alignments. Users can simultaneously explore alignments resulting from multiple predefined and custom substitution matrices. SubVis utilizes several of the alignment functions found in R, a common language among protein scientists. Functions are tied together with the Shiny platform which allows the modification of input parameters. Information regarding alignment quality and individual amino acid substitutions is displayed with the JavaScript language which provides interactive visualizations for revealing both high-level and low-level alignment information. PMID:28674656
Structure based alignment and clustering of proteins (STRALCP)

DOEpatents

Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.

2013-06-18

Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.
Bean common mosaic virus isolates causing different symptoms in asparagus bean in China differ greatly in the 5'-parts of their genomes.

PubMed

Zheng, Hongying; Chen, Jiong; Chen, Jianping; Adams, Michael J; Hou, Mingsheng

2002-06-01

Potyvirus isolates from asparagus bean ( Vigna sesquipedalis) plants in Zhejiang province, China, caused either rugose and vein banding mosaic symptoms (isolate R) or severe yellowing (isolate Y) in this host, but were otherwise similar in host range. Both isolates were completely sequenced and shown to be isolates of Bean common mosaic virus (BCMV). The complete sequences were 9992 (R) or 10062 (Y) nucleotides long and shared 91.7% identical nucleotides (93.2% identical amino acids) in their genomes and were more distantly related to the BCMV-Peanut stripe virus sequence (PStV). The isolates were much less similar to one another in the 5'-UTR and the N-terminal region of the P1 protein. In the P1, isolate Y was closer to PStV (76.1% identical amino acids) than to isolate R (64.8%). Phylogenetic analyses of the coat protein region showed that the new isolates grouped with other isolates from Vigna spp., forming the blackeye cowpea mosaic strain subgroup of BCMV with 94-98% nucleotides (96-99% amino acids) identical to one another and about 90% identity to other BCMV isolates. Other significant subgroupings amongst published BCMV isolates were detected.
Molecular characterisation of Atlantic salmon paramyxovirus (ASPV): A novel paramyxovirus associated with proliferative gill inflammation

USGS Publications Warehouse

Falk, K.; Batts, W.N.; Kvellestad, A.; Kurath, G.; Wiik-Nielsen, J.; Winton, J.R.

2008-01-01

Atlantic salmon paramyxovirus (ASPV) was isolated in 1995 from gills of farmed Atlantic salmon suffering from proliferative gill inflammation. The complete genome sequence of ASPV was determined, revealing a genome 16,968 nucleotides in length consisting of six non-overlapping genes coding for the nucleo- (N), phospho- (P), matrix- (M), fusion- (F), haemagglutinin-neuraminidase- (HN) and large polymerase (L) proteins in the order 3???-N-P-M-F-HN-L-5???. The various conserved features related to virus replication found in most paramyxoviruses were also found in ASPV. These include: conserved and complementary leader and trailer sequences, tri-nucleotide intergenic regions and highly conserved transcription start and stop signal sequences. The P gene expression strategy of ASPV was like that of the respiro-, morbilli- and henipaviruses, which express the P and C proteins from the primary transcript and edit a portion of the mRNA to encode V and W proteins. Sequence similarities among various features related to virus replication, pairwise comparisons of all deduced ASPV protein sequences with homologous regions from other members of the family Paramyxoviridae, and phylogenetic analyses of these amino acid sequences suggested that ASPV was a novel member of the sub-family Paramyxovirinae, most closely related to the respiroviruses. ?? 2008 Elsevier B.V. All rights reserved.
Dali server update.

PubMed

Holm, Liisa; Laakso, Laura M

2016-07-08

The Dali server (http://ekhidna2.biocenter.helsinki.fi/dali) is a network service for comparing protein structures in 3D. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. The Dali server has been running in various places for over 20 years and is used routinely by crystallographers on newly solved structures. The latest update of the server provides enhanced analytics for the study of sequence and structure conservation. The server performs three types of structure comparisons: (i) Protein Data Bank (PDB) search compares one query structure against those in the PDB and returns a list of similar structures; (ii) pairwise comparison compares one query structure against a list of structures specified by the user; and (iii) all against all structure comparison returns a structural similarity matrix, a dendrogram and a multidimensional scaling projection of a set of structures specified by the user. Structural superimpositions are visualized using the Java-free WebGL viewer PV. The structural alignment view is enhanced by sequence similarity searches against Uniprot. The combined structure-sequence alignment information is compressed to a stack of aligned sequence logos. In the stack, each structure is structurally aligned to the query protein and represented by a sequence logo. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Length Variation, Heteroplasmy and Sequence Divergence in the Mitochondrial DNA of Four Species of Sturgeon (Acipenser)

PubMed Central

Brown, J. R.; Beckenbach, K.; Beckenbach, A. T.; Smith, M. J.

1996-01-01

The extent of mtDNA length variation and heteroplasmy as well as DNA sequences of the control region and two tRNA genes were determined for four North American sturgeon species: Acipenser transmontanus, A. medirostris, A. fulvescens and A. oxyrhnychus. Across the Continental Divide, a division in the occurrence of length variation and heteroplasmy was observed that was concordant with species biogeography as well as with phylogenies inferred from restriction fragment length polymorphisms (RFLP) of whole mtDNA and pairwise comparisons of unique sequences of the control region. In all species, mtDNA length variation was due to repeated arrays of 78-82-bp sequences each containing a D-loop strand synthesis termination associated sequence (TAS). Individual repeats showed greater sequence conservation within individuals and species rather than between species, which is suggestive of concerted evolution. Differences in the frequencies of multiple copy genomes and heteroplasmy among the four species may be ascribed to differences in the rates of recurrent mutation. A mechanism that may offset the high rate of mutation for increased copy number is suggested on the basis that an increase in the number of functional TAS motifs might reduce the frequency of successfully initiated H-strand replications. PMID:8852850
Development of Genetic Markers in Eucalyptus Species by Target Enrichment and Exome Sequencing

PubMed Central

Dasgupta, Modhumita Ghosh; Dharanishanthi, Veeramuthu; Agarwal, Ishangi; Krutovsky, Konstantin V.

2015-01-01

The advent of next-generation sequencing has facilitated large-scale discovery, validation and assessment of genetic markers for high density genotyping. The present study was undertaken to identify markers in genes supposedly related to wood property traits in three Eucalyptus species. Ninety four genes involved in xylogenesis were selected for hybridization probe based nuclear genomic DNA target enrichment and exome sequencing. Genomic DNA was isolated from the leaf tissues and used for on-array probe hybridization followed by Illumina sequencing. The raw sequence reads were trimmed and high-quality reads were mapped to the E. grandis reference sequence and the presence of single nucleotide variants (SNVs) and insertions/ deletions (InDels) were identified across the three species. The average read coverage was 216X and a total of 2294 SNVs and 479 InDels were discovered in E. camaldulensis, 2383 SNVs and 518 InDels in E. tereticornis, and 1228 SNVs and 409 InDels in E. grandis. Additionally, SNV calling and InDel detection were conducted in pair-wise comparisons of E. tereticornis vs. E. grandis, E. camaldulensis vs. E. tereticornis and E. camaldulensis vs. E. grandis. This study presents an efficient and high throughput method on development of genetic markers for family– based QTL and association analysis in Eucalyptus. PMID:25602379
Deep Sequencing Analysis of Apple Infecting Viruses in Korea

PubMed Central

Cho, In-Sook; Igori, Davaajargal; Lim, Seungmo; Choi, Gug-Seoun; Hammond, John; Lim, Hyoun-Sub; Moon, Jae Sun

2016-01-01

Deep sequencing has generated 52 contigs derived from five viruses; Apple chlorotic leaf spot virus (ACLSV), Apple stem grooving virus (ASGV), Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV) were identified from eight apple samples showing small leaves and/or growth retardation. Nucleotide (nt) sequence identity of the assembled contigs was from 68% to 99% compared to the reference sequences of the five respective viral genomes. Sequences of ASPV and ASGV were the most abundantly represented by the 52 contigs assembled. The presence of the five viruses in the samples was confirmed by RT-PCR using specific primers based on the sequences of each assembled contig. All five viruses were detected in three of the samples, whereas all samples had mixed infections with at least two viruses. The most frequently detected virus was ASPV, followed by ASGV, ApLV, ACLSV, and AGCaV which were withal found in mixed infections in the tested samples. AGCaV was identified in assembled contigs ID 1012480 and 93549, which showed 82% and 78% nt sequence identity with ORF1 of AGCaV isolate Aurora-1. ApLV was identified in three assembled contigs, ID 65587, 1802365, and 116777, which showed 77%, 78%, and 76% nt sequence identity respectively with ORF1 of ApLV isolate LA2. Deep sequencing assay was shown to be a valuable and powerful tool for detection and identification of known and unknown virome in infected apple trees, here identifying ApLV and AGCaV in commercial orchards in Korea for the first time. PMID:27721694
Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

PubMed

Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

2014-02-01

Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.
Large-Scale Concatenation cDNA Sequencing

PubMed Central

Yu, Wei; Andersson, Björn; Worley, Kim C.; Muzny, Donna M.; Ding, Yan; Liu, Wen; Ricafrente, Jennifer Y.; Wentland, Meredith A.; Lennon, Greg; Gibbs, Richard A.

1997-01-01

A total of 100 kb of DNA derived from 69 individual human brain cDNA clones of 0.7–2.0 kb were sequenced by concatenated cDNA sequencing (CCS), whereby multiple individual DNA fragments are sequenced simultaneously in a single shotgun library. The method yielded accurate sequences and a similar efficiency compared with other shotgun libraries constructed from single DNA fragments (>20 kb). Computer analyses were carried out on 65 cDNA clone sequences and their corresponding end sequences to examine both nucleic acid and amino acid sequence similarities in the databases. Thirty-seven clones revealed no DNA database matches, 12 clones generated exact matches (≥98% identity), and 16 clones generated nonexact matches (57%–97% identity) to either known human or other species genes. Of those 28 matched clones, 8 had corresponding end sequences that failed to identify similarities. In a protein similarity search, 27 clone sequences displayed significant matches, whereas only 20 of the end sequences had matches to known protein sequences. Our data indicate that full-length cDNA insert sequences provide significantly more nucleic acid and protein sequence similarity matches than expressed sequence tags (ESTs) for database searching. [All 65 cDNA clone sequences described in this paper have been submitted to the GenBank data library under accession nos. U79240–U79304.] PMID:9110174

High-throughput sequence alignment using Graphics Processing Units

PubMed Central

Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh

2007-01-01

Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
Sequence and RT-PCR expression analysis of two peroxidases from Arabidopsis thaliana belonging to a novel evolutionary branch of plant peroxidases.

PubMed

Kjaersgård, I V; Jespersen, H M; Rasmussen, S K; Welinder, K G

1997-03-01

cDNA clones encoding two new Arabidopsis thaliana peroxidases, ATP 1a and ATP 2a, have been identified by searching the Arabidopsis database of expressed sequence tags (dbEST). They represent a novel branch of hitherto uncharacterized plant peroxidases which is only 35% identical in amino acid sequence to the well characterized group of basic plant peroxidases represented by the horseradish (Armoracia rusticana) isoperoxidases HRP C, HRP E5 and the similar Arabidopsis isoperoxidases ATP Ca, ATP Cb, and ATP Ea. However ATP 1a is 87% identical in amino acid sequence to a peroxidase encoded by an mRNA isolated from cotton (Gossypium hirsutum). As cotton and Arabidopsis belong to rather diverse families (Malvaceae and Crucifereae, respectively), in contrast with Arabidopsis and horseradish (both Crucifereae), the high degree of sequence identity indicates that this novel type of peroxidase, albeit of unknown function, is likely to be widespread in plant species. The atp 1 and atp 2 types of cDNA sequences were the most redundant among the 28 different isoperoxidases identified among about 200 peroxidase encoding ESTs. Interestingly, 8 out of totally 38 EST sequences coding for ATP 1 showed three identical nucleotide substitutions. This variant form is designated ATP 1b. Similarly, six out of totally 16 EST sequences coding for ATP 2 showed a number of deletions and nucleotide changes. This variant form is designated ATP 2b. The selected EST clones are full-length and contain coding regions of 993 nucleotides for atp 1a, and 984 nucleotides for atp 2a. These regions show 61% DNA sequence identity. The predicted mature proteins ATP 1a, and ATP 2a are 57% identical in sequence and contain the structurally and functionally important residues, characteristic of the plant peroxidase superfamily. However, they do show two differences of importance to peroxidase catalysis: (1) the asparagine residue linked with the active site distal histidine via hydrogen bonding is absent; (2) an N-glycosylation site is located right at the entrance to the heme channel. The reverse transcriptase polymerase chain reaction (RT-PCR) was used to identify mRNAs coding for ATP 1a/b and ATP 2a/b in germinating seeds, seedlings, roots, leaves, stems, flowers and cell suspension culture using elongation factor 1alpha (EF-1alpha) for the first time as a positive control. Both mRNAs were transcribed at levels comparable to EF-1alpha in all plant tissues investigated which were more than two days old, and in cell suspension culture. In addition, the mRNA coding for ATP 1a/b was found in two day old germinating seeds. The abundant transcription of ATP 1a/b and ATP 2a/b is in line with their many entries in dbEST, and indicates essential roles for these novel peroxidases.
Complete genome sequence of the first human parechovirus type 3 isolated in Taiwan.

PubMed

Chang, Jenn-Tzong; Yang, Chih-Shiang; Chen, Bao-Chen; Chen, Yao-Shen; Chang, Tsung-Hsien

2017-11-01

The first human parechovirus 3 (HPeV3 VGHKS-2007) in Taiwan was identified from a clinical specimen from a male infant. The entire genome of the HPeV3 isolate was sequenced and compared to known HPeV3 sequences. Genome alignment data showed that HPeV3 VGHKS-2007 shares the highest nucleotide identity, 99%, with the Japanese strain of HPeV3 1361K-162589-Yamagata-2008. All HPeV3 isolates possess at least 97% amino acid identity. The analysis of the genome sequence of HPeV3 VGHKS-2007 will facilitate future investigations of the epidemiology and pathogenicity of HPeV3 infection. Copyright © 2017. Published by Elsevier Taiwan LLC.
Genetic variation in potential Giardia vaccine candidates cyst wall protein 2 and α1-giardin.

PubMed

Radunovic, Matej; Klotz, Christian; Saghaug, Christina Skår; Brattbakk, Hans-Richard; Aebischer, Toni; Langeland, Nina; Hanevik, Kurt

2017-08-01

Giardia is a prevalent intestinal parasitic infection. The trophozoite structural protein a1-giardin (a1-g) and the cyst protein cyst wall protein 2 (CWP2) have shown promise as Giardia vaccine antigen candidates in murine models. The present study assesses the genetic diversity of a1-g and CWP2 between and within assemblages A and B in human clinical isolates. a1-g and CWP2 sequences were acquired from 15 Norwegian isolates by PCR amplification and 20 sequences from German cultured isolates by whole genome sequencing. Sequences were aligned to reference genomes from assemblage A2 and B to identify genetic variance. Genetic diversity was found between assemblage A and B reference sequences for both a1-g (90.8% nucleotide identity) and CWP2 (82.5% nucleotide identity). However, for a1-g, this translated into only 3 amino acid (aa) substitutions, while for CWP2 there were 41 aa substitutions, and also one aa deletion. Genetic diversity within assemblage B was larger; nucleotide identity 92.0% for a1-g and 94.3% for CWP2, than within assemblage A (nucleotide identity 99.0% for a1-g and 99.7% for CWP2). For CWP2, the diversity on both nucleotide and protein level was higher in the C-terminal end. Predicted antigenic epitopes were not affected for a1-g, but partially for CWP2. Despite genetic diversity in a1-g, we found aa sequence, characteristics, and antigenicity to be well preserved. CWP2 showed more aa variance and potential antigenic differences. Several CWP2 antigens might be necessary in a future Giardia vaccine to provide cross protection against both Giardia assemblages infecting humans.
Complete nucleotide sequence and genome structure of a Japanese isolate of hibiscus latent Fort Pierce virus, a unique tobamovirus that contains an internal poly(A) region in its 3' end.

PubMed

Yoshida, Tetsuya; Kitazawa, Yugo; Komatsu, Ken; Neriya, Yutaro; Ishikawa, Kazuya; Fujita, Naoko; Hashimoto, Masayoshi; Maejima, Kensaku; Yamaji, Yasuyuki; Namba, Shigetou

2014-11-01

In this study, we detected a Japanese isolate of hibiscus latent Fort Pierce virus (HLFPV-J), a member of the genus Tobamovirus, in a hibiscus plant in Japan and determined the complete sequence and organization of its genome. HLFPV-J has four open reading frames (ORFs), each of which shares more than 98 % nucleotide sequence identity with those of other HLFPV isolates. Moreover, HLFPV-J contains a unique internal poly(A) region of variable length, ranging from 44 to 78 nucleotides, in its 3'-untranslated region (UTR), as is the case with hibiscus latent Singapore virus (HLSV), another hibiscus-infecting tobamovirus. The length of the HLFPV-J genome was 6431 nucleotides, including the shortest internal poly(A) region. The sequence identities of ORFs 1, 2, 3 and 4 of HLFPV-J to other tobamoviruses were 46.6-68.7, 49.9-70.8, 31.0-70.8 and 39.4-70.1 %, respectively, at the nucleotide level and 39.8-75.0, 43.6-77.8, 19.2-70.4 and 31.2-74.2 %, respectively, at the amino acid level. The 5'- and 3'-UTRs of HLFPV-J showed 24.3-58.6 and 13.0-79.8 % identity, respectively, to other tobamoviruses. In particular, when compared to other tobamoviruses, each ORF and UTR of HLFPV-J showed the highest sequence identity to those of HLSV. Phylogenetic analysis showed that HLFPV-J, other HLFPV isolates and HLSV constitute a malvaceous-plant-infecting tobamovirus cluster. These results indicate that the genomic structure of HLFPV-J has unique features similar to those of HLSV. To our knowledge, this is the first report of the complete genome sequence of HLFPV.
Registration of 4D time-series of cardiac images with multichannel Diffeomorphic Demons.

PubMed

Peyrat, Jean-Marc; Delingette, Hervé; Sermesant, Maxime; Pennec, Xavier; Xu, Chenyang; Ayache, Nicholas

2008-01-01

In this paper, we propose a generic framework for intersubject non-linear registration of 4D time-series images. In this framework, spatio-temporal registration is defined by mapping trajectories of physical points as opposed to spatial registration that solely aims at mapping homologous points. First, we determine the trajectories we want to register in each sequence using a motion tracking algorithm based on the Diffeomorphic Demons algorithm. Then, we perform simultaneously pairwise registrations of corresponding time-points with the constraint to map the same physical points over time. We show this trajectory registration can be formulated as a multichannel registration of 3D images. We solve it using the Diffeomorphic Demons algorithm extended to vector-valued 3D images. This framework is applied to the inter-subject non-linear registration of 4D cardiac CT sequences.
Molecular characterization of two prunus necrotic ringspot virus isolates from Canada.

PubMed

Cui, Hongguang; Hong, Ni; Wang, Guoping; Wang, Aiming

2012-05-01

We determined the entire RNA1, 2 and 3 sequences of two prunus necrotic ringspot virus (PNRSV) isolates, Chr3 from cherry and Pch12 from peach, obtained from an orchard in the Niagara Fruit Belt, Canada. The RNA1, 2 and 3 of the two isolates share nucleotide sequence identities of 98.6%, 98.4% and 94.5%, respectively. Their RNA1- and 2-encoded amino acid sequences are about 98% identical to the corresponding sequences of a cherry isolate, CH57, the only other PNRSV isolate with complete RNA1 and 2 sequences available. Phylogenetic analysis of the coat protein and movement protein encoded by RNA3 of Pch12 and Chr3 and published PNRSV isolates indicated that Chr3 belongs to the PV96 group and Pch12 belongs to the PV32 group.
FragIdent--automatic identification and characterisation of cDNA-fragments.

PubMed

Seelow, Dominik; Goehler, Heike; Hoffmann, Katrin

2009-03-02

Many genetic studies and functional assays are based on cDNA fragments. After the generation of cDNA fragments from an mRNA sample, their content is at first unknown and must be assigned by sequencing reactions or hybridisation experiments. Even in characterised libraries, a considerable number of clones are wrongly annotated. Furthermore, mix-ups can happen in the laboratory. It is therefore essential to the relevance of experimental results to confirm or determine the identity of the employed cDNA fragments. However, the manual approach for the characterisation of these fragments using BLAST web interfaces is not suited for larger number of sequences and so far, no user-friendly software is publicly available. Here we present the development of FragIdent, an application for the automatic identification of open reading frames (ORFs) within cDNA-fragments. The software performs BLAST analyses to identify the genes represented by the sequences and suggests primers to complete the sequencing of the whole insert. Gene-specific information as well as the protein domains encoded by the cDNA fragment are retrieved from Internet-based databases and included in the output. The application features an intuitive graphical interface and is designed for researchers without any bioinformatics skills. It is suited for projects comprising up to several hundred different clones. We used FragIdent to identify 84 cDNA clones from a yeast two-hybrid experiment. Furthermore, we identified 131 protein domains within our analysed clones. The source code is freely available from our homepage at http://compbio.charite.de/genetik/FragIdent/.
Complete sequence and diversity of a maize-associated Polerovirus in East Africa.

PubMed

Massawe, Deogracious P; Stewart, Lucy R; Kamatenesi, Jovia; Asiimwe, Theodore; Redinbaugh, Margaret G

2018-06-01

Since 2011-2012, Maize lethal necrosis (MLN) has emerged in East Africa, causing massive yield loss and propelling research to identify viruses and virus populations present in maize. As expected, next generation sequencing (NGS) has revealed diverse and abundant viruses from the family Potyviridae, primarily sugarcane mosaic virus (SCMV), and maize chlorotic mottle virus (MCMV) (Tombusviridae), which are known to cause MLN by synergistic co-infection. In addition to these expected viruses, we identified a virus in the genus Polerovirus (family Luteoviridae) in 104/172 samples selected for MLN or other potential virus symptoms from Kenya, Uganda, Rwanda, and Tanzania. This polerovirus (MF974579) nucleotide sequence is 97% identical to maize-associated viruses recently reported in China, termed 'maize yellow mosaic virus' (MaYMV) and maize yellow dwarf virus (MaYMV; KU291101, KU291107, MYDV-RMV2; KT992824); and 99% identical to MaYMV (KY684356) infecting sugarcane and itch grass in Nigeria; 83% identical to a barley-associated polerovirus recently identified in Korea (BVG; KT962089); and 79% identical to the U.S. maize-infecting polerovirus maize yellow dwarf virus (MYDV-RMV; KT992824). Nucleotide sequences from ORF0 of 20 individual East African isolates collected from Kenya, Uganda, Rwanda, and Tanzania shared 98% or higher identity, and were detected in 104/172 (60.5%) of samples collected for virus-like symptoms, indicating extensive prevalence but limited diversity of this virus in East Africa. We refer to this virus as "MYDV-like polerovirus" until symptoms of the virus in maize are known.
Identification of a novel vitivirus from grapevines in New Zealand.

PubMed

Blouin, Arnaud G; Keenan, Sandi; Napier, Kathryn R; Barrero, Roberto A; MacDiarmid, Robin M

2018-01-01

We report a sequence of a novel vitivirus from Vitis vinifera obtained using two high-throughput sequencing (HTS) strategies on RNA. The initial discovery from small-RNA sequencing was confirmed by HTS of the total RNA and Sanger sequencing. The new virus has a genome structure similar to the one reported for other vitiviruses, with five open reading frames (ORFs) coding for the conserved domains described for members of that genus. Phylogenetic analysis of the complete genome sequence confirmed its affiliation to the genus Vitivirus, with the closest described viruses being grapevine virus E (GVE) and Agave tequilana leaf virus (ATLV). However, the virus we report is distinct and shares only 51% amino acid sequence identity with GVE in the replicase polyprotein and 66.8% amino acid sequence identity with ATLV in the coat protein. This is well below the threshold determined by the ICTV for species demarcation, and we propose that this virus represents a new species. It is provisionally named "grapevine virus G".
Sequences Associated with Centromere Competency in the Human Genome

PubMed Central

Hayden, Karen E.; Strome, Erin D.; Merrett, Stephanie L.; Lee, Hye-Ran; Rudd, M. Katharine

2013-01-01

Centromeres, the sites of spindle attachment during mitosis and meiosis, are located in specific positions in the human genome, normally coincident with diverse subsets of alpha satellite DNA. While there is strong evidence supporting the association of some subfamilies of alpha satellite with centromere function, the basis for establishing whether a given alpha satellite sequence is or is not designated a functional centromere is unknown, and attempts to understand the role of particular sequence features in establishing centromere identity have been limited by the near identity and repetitive nature of satellite sequences. Utilizing a broadly applicable experimental approach to test sequence competency for centromere specification, we have carried out a genomic and epigenetic functional analysis of endogenous human centromere sequences available in the current human genome assembly. The data support a model in which functionally competent sequences confer an opportunity for centromere specification, integrating genomic and epigenetic signals and promoting the concept of context-dependent centromere inheritance. PMID:23230266
Complete genome sequence of Southern tomato virus naturally infecting tomatoes in Bangladesh using small RNA deep sequencing

USDA-ARS?s Scientific Manuscript database

The complete genome sequence of a Southern tomato virus (STV) isolate on tomato plants in a seed production field in Bangladesh was obtained for the first time using next generation sequencing. The identified isolate STV_BD-13 shares high degree of sequence identity (99%) with several known STV isol...
Complete genome sequence of southern tomato virus identified from China using next generation sequencing

USDA-ARS?s Scientific Manuscript database

Complete genome sequence of a double-stranded RNA (dsRNA) virus, southern tomato virus (STV), on tomatoes in China, was elucidated using small RNAs deep sequencing. The identified STV_CN12 shares 99% sequence identity to other isolates from Mexico, France, Spain, and U.S. This is the first report ...
Candida mesorugosa sp. nov., a novel yeast species similar to Candida rugosa, isolated from a tertiary hospital in Brazil.

PubMed

Chaves, Guilherme M; Terçarioli, Gisela R; Padovan, Ana Carolina B; Rosas, Robert C; Ferreira, Renata C; Melo, Analy S A; Colombo, Arnaldo L

2013-04-01

Candida rugosa is a yeast species that is emerging as a causative agent of invasive infection, particularly in Latin America. Recently, C. pseudorugosa was proposed as a new species closely related to C. rugosa. We evaluated in this investigation the genetic heterogeneity within the C. rugosa species complex. All clinical isolates used in this study were identified phenotypically as C. rugosa but were genotypically different from the C. rugosa type, ATCC 10571. RAPD marker analysis revealed less than 83% similarity between our clinical isolates and the C. rugosa type strain. The D1/D2 region sequences of our clinical isolates showed 98% identity with C. rugosa but only 94-95% identity with C. pseudorugosa. The ITS rDNA sequences of the Brazilian isolates showed 91% identity with the C. rugosa ATCC 10571 ITS sequence. Network and Bayesian analyses of ITS and housekeeping gene sequences separated our clinical isolates into different branches from C. rugosa type strain. These differences are sufficient to reassign our isolates to a distinct species, named C. mesorugosa.
How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity?

PubMed

Rodriguez-R, Luis M; Castro, Juan C; Kyrpides, Nikos C; Cole, James R; Tiedje, James M; Konstantinidis, Konstantinos T

2018-03-15

The most common practice in studying and cataloguing prokaryotic diversity involves the grouping of sequences into operational taxonomic units (OTUs) at the 97% 16S rRNA gene sequence identity level, often using partial gene sequences, such as PCR-generated amplicons. Due to the high sequence conservation of rRNA genes, organisms belonging to closely related yet distinct species may be grouped under the same OTU. However, it remains unclear how much diversity has been underestimated by this practice. To address this question, we compared the OTUs of genomes defined at the 97% or 98.5% 16S rRNA gene identity level against OTUs of the same genomes defined at the 95% whole-genome average nucleotide identity (ANI), which is a much more accurate proxy for species. Our results show that OTUs resulting from a 98.5% 16S rRNA gene identity cutoff are more accurate than 97% compared to 95% ANI (90.5% versus 89.9% accuracy) but indistinguishable from any other threshold in the 98.29 to 98.78% range. Even with the more stringent thresholds, however, the 16S rRNA gene-based approach commonly underestimates the number of OTUs by ∼12%, on average, compared to the ANI-based approach (∼14% underestimation when using the 97% identity threshold). More importantly, the degree of underestimation can become 50% or more for certain taxa, such as the genera Pseudomonas , Burkholderia , Escherichia , Campylobacter , and Citrobacter These results provide a quantitative view of the degree of underestimation of extant prokaryotic diversity by 16S rRNA gene-defined OTUs and suggest that genomic resolution is often necessary. IMPORTANCE Species diversity is one of the most fundamental pieces of information for community ecology and conservational biology. Therefore, employing accurate proxies for what a species or the unit of diversity is are cornerstones for a large set of microbial ecology and diversity studies. The most common proxies currently used rely on the clustering of 16S rRNA gene sequences at some threshold of nucleotide identity, typically 97% or 98.5%. Here, we explore how well this strategy reflects the more accurate whole-genome-based proxies and determine the frequency with which the high conservation of 16S rRNA sequences masks substantial species-level diversity. Copyright © 2018 American Society for Microbiology.
Thermophilic cellobiohydrolase

DOEpatents

Sapra, Rajat; Park, Joshua I.; Datta, Supratim; Simmons, Blake A.

2017-04-18

The present invention provides for a composition comprising a polypeptide comprising a first amino acid sequence having at least 70% identity with the amino acid sequence of Csac GH5 wherein said first amino acid sequence has a thermostable or thermophilic cellobiohydrolase (CBH) or exoglucanase activity.
A new graph-based method for pairwise global network alignment

PubMed Central

Klau, Gunnar W

2009-01-01

Background In addition to component-based comparative approaches, network alignments provide the means to study conserved network topology such as common pathways and more complex network motifs. Yet, unlike in classical sequence alignment, the comparison of networks becomes computationally more challenging, as most meaningful assumptions instantly lead to NP-hard problems. Most previous algorithmic work on network alignments is heuristic in nature. Results We introduce the graph-based maximum structural matching formulation for pairwise global network alignment. We relate the formulation to previous work and prove NP-hardness of the problem. Based on the new formulation we build upon recent results in computational structural biology and present a novel Lagrangian relaxation approach that, in combination with a branch-and-bound method, computes provably optimal network alignments. The Lagrangian algorithm alone is a powerful heuristic method, which produces solutions that are often near-optimal and – unlike those computed by pure heuristics – come with a quality guarantee. Conclusion Computational experiments on the alignment of protein-protein interaction networks and on the classification of metabolic subnetworks demonstrate that the new method is reasonably fast and has advantages over pure heuristics. Our software tool is freely available as part of the LISA library. PMID:19208162
An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species

PubMed Central

Galpert, Deborah; del Río, Sara; Herrera, Francisco; Ancede-Gallardo, Evys; Antunes, Agostinho; Agüero-Chapin, Guillermin

2015-01-01

Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification. PMID:26605337
New Measurement for Correlation of Co-evolution Relationship of Subsequences in Protein.

PubMed

Gao, Hongyun; Yu, Xiaoqing; Dou, Yongchao; Wang, Jun

2015-12-01

Many computational tools have been developed to measure the protein residues co-evolution. Most of them only focus on co-evolution for pairwise residues in a protein sequence. However, number of residues participate in co-evolution might be multiple. And some co-evolved residues are clustered in several distinct regions in primary structure. Therefore, the co-evolution among the adjacent residues and the correlation between the distinct regions offer insights into function and evolution of the protein and residues. Subsequence is used to represent the adjacent multiple residues in one distinct region. In the paper, co-evolution relationship in each subsequence is represented by mutual information matrix (MIM). Then, Pearson's correlation coefficient: R value is developed to measure the similarity correlation of two MIMs. MSAs from Catalytic Data Base (Catalytic Site Atlas, CSA) are used for testing. R value characterizes a specific class of residues. In contrast to individual pairwise co-evolved residues, adjacent residues without high individual MI values are found since the co-evolved relationship among them is similar to that among another set of adjacent residues. These subsequences possess some flexibility in the composition of side chains, such as the catalyzed environment.
An Effective Big Data Supervised Imbalanced Classification Approach for Ortholog Detection in Related Yeast Species.

PubMed

Galpert, Deborah; Del Río, Sara; Herrera, Francisco; Ancede-Gallardo, Evys; Antunes, Agostinho; Agüero-Chapin, Guillermin

2015-01-01

Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.

Building-up of a DNA barcode library for true bugs (insecta: hemiptera: heteroptera) of Germany reveals taxonomic uncertainties and surprises.

PubMed

Raupach, Michael J; Hendrich, Lars; Küchler, Stefan M; Deister, Fabian; Morinière, Jérome; Gossner, Martin M

2014-01-01

During the last few years, DNA barcoding has become an efficient method for the identification of species. In the case of insects, most published DNA barcoding studies focus on species of the Ephemeroptera, Trichoptera, Hymenoptera and especially Lepidoptera. In this study we test the efficiency of DNA barcoding for true bugs (Hemiptera: Heteroptera), an ecological and economical highly important as well as morphologically diverse insect taxon. As part of our study we analyzed DNA barcodes for 1742 specimens of 457 species, comprising 39 families of the Heteroptera. We found low nucleotide distances with a minimum pairwise K2P distance <2.2% within 21 species pairs (39 species). For ten of these species pairs (18 species), minimum pairwise distances were zero. In contrast to this, deep intraspecific sequence divergences with maximum pairwise distances >2.2% were detected for 16 traditionally recognized and valid species. With a successful identification rate of 91.5% (418 species) our study emphasizes the use of DNA barcodes for the identification of true bugs and represents an important step in building-up a comprehensive barcode library for true bugs in Germany and Central Europe as well. Our study also highlights the urgent necessity of taxonomic revisions for various taxa of the Heteroptera, with a special focus on various species of the Miridae. In this context we found evidence for on-going hybridization events within various taxonomically challenging genera (e.g. Nabis Latreille, 1802 (Nabidae), Lygus Hahn, 1833 (Miridae), Phytocoris Fallén, 1814 (Miridae)) as well as the putative existence of cryptic species (e.g. Aneurus avenius (Duffour, 1833) (Aradidae) or Orius niger (Wolff, 1811) (Anthocoridae)).
Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved (Non-model) Organisms

PubMed Central

Joyce, Blake L.; Haug-Baltzell, Asher K.; Hulvey, Jonathan P.; McCarthy, Fiona; Devisetty, Upendra Kumar; Lyons, Eric

2017-01-01

This workflow allows novice researchers to leverage advanced computational resources such as cloud computing to carry out pairwise comparative transcriptomics. It also serves as a primer for biologists to develop data scientist computational skills, e.g. executing bash commands, visualization and management of large data sets. All command line code and further explanations of each command or step can be found on the wiki (https://wiki.cyverse.org/wiki/x/dgGtAQ). The Discovery Environment and Atmosphere platforms are connected together through the CyVerse Data Store. As such, once the initial raw sequencing data has been uploaded there is no more need to transfer large data files over an Internet connection, minimizing the amount of time needed to conduct analyses. This protocol is designed to analyze only two experimental treatments or conditions. Differential gene expression analysis is conducted through pairwise comparisons, and will not be suitable to test multiple factors. This workflow is also designed to be manual rather than automated. Each step must be executed and investigated by the user, yielding a better understanding of data and analytical outputs, and therefore better results for the user. Once complete, this protocol will yield de novo assembled transcriptome(s) for underserved (non-model) organisms without the need to map to previously assembled reference genomes (which are usually not available in underserved organism). These de novo transcriptomes are further used in pairwise differential gene expression analysis to investigate genes differing between two experimental conditions. Differentially expressed genes are then functionally annotated to understand the genetic response organisms have to experimental conditions. In total, the data derived from this protocol is used to test hypotheses about biological responses of underserved organisms. PMID:28518075
Building-Up of a DNA Barcode Library for True Bugs (Insecta: Hemiptera: Heteroptera) of Germany Reveals Taxonomic Uncertainties and Surprises

PubMed Central

Raupach, Michael J.; Hendrich, Lars; Küchler, Stefan M.; Deister, Fabian; Morinière, Jérome; Gossner, Martin M.

2014-01-01

During the last few years, DNA barcoding has become an efficient method for the identification of species. In the case of insects, most published DNA barcoding studies focus on species of the Ephemeroptera, Trichoptera, Hymenoptera and especially Lepidoptera. In this study we test the efficiency of DNA barcoding for true bugs (Hemiptera: Heteroptera), an ecological and economical highly important as well as morphologically diverse insect taxon. As part of our study we analyzed DNA barcodes for 1742 specimens of 457 species, comprising 39 families of the Heteroptera. We found low nucleotide distances with a minimum pairwise K2P distance <2.2% within 21 species pairs (39 species). For ten of these species pairs (18 species), minimum pairwise distances were zero. In contrast to this, deep intraspecific sequence divergences with maximum pairwise distances >2.2% were detected for 16 traditionally recognized and valid species. With a successful identification rate of 91.5% (418 species) our study emphasizes the use of DNA barcodes for the identification of true bugs and represents an important step in building-up a comprehensive barcode library for true bugs in Germany and Central Europe as well. Our study also highlights the urgent necessity of taxonomic revisions for various taxa of the Heteroptera, with a special focus on various species of the Miridae. In this context we found evidence for on-going hybridization events within various taxonomically challenging genera (e.g. Nabis Latreille, 1802 (Nabidae), Lygus Hahn, 1833 (Miridae), Phytocoris Fallén, 1814 (Miridae)) as well as the putative existence of cryptic species (e.g. Aneurus avenius (Duffour, 1833) (Aradidae) or Orius niger (Wolff, 1811) (Anthocoridae)). PMID:25203616
Sinorhizobium meliloti strains TII7 and A5 by Multilocus Sequence Typing (MLST) have chromsomes identical with Rm1021 and form an effective and ineffective symbiosis with Medicago truncatula line Jemalong A17, respectively

USDA-ARS?s Scientific Manuscript database

The strains TII7 and A5 formed an effective and ineffective symbiosis with Medicago truncatula Jemalong A17, respectively. Both were shown to have identical chromsomes with strains Rm1021 and RCR2011 using a Multilocus Sequence Typing method. The 2260 bp segments of DNA stretching from the 3’ end ...
Distant sequences determine 5′ end formation of cox3 transcripts in Arabidopsis thaliana ecotype C24

PubMed Central

Forner, Joachim; Weber, Bärbel; Wiethölter, Caterina; Meyer, Rhonda C.; Binder, Stefan

2005-01-01

The genomic environments and the transcripts of the mitochondrial cox3 gene are investigated in three Arabidopsis thaliana ecotypes. While the proximate 5′ sequences up to nucleotide position −584, the coding regions and the 3′ flanking regions are identical in Columbia (Col), C24 and Landsberg erecta (Ler), genomic variation is detected in regions further upstream. In the mitochondrial DNA of Col, a 1790 bp fragment flanked by a nonanucleotide direct repeat is present beyond position −584 with respect to the ATG. While in Ler only part of this insertion is conserved, this sequence is completely absent in C24, except for a single copy of the nonanucleotide direct repeat. Northern hybridization reveals identical major transcripts in the three ecotypes, but identifies an additional abundant 60 nt larger mRNA species in C24. The extremities of the most abundant mRNA species are identical in the three ecotypes. In C24, an extra major 5′ end is abundant. This terminus and the other major 5′ ends are located in identical sequence regions. Inspection of Atcox3 transcripts in C24/Col hybrids revealed a female inheritance of the mRNA species with the extra 5′ terminus. Thus, a mitochondrially encoded factor determines the generation of an extra 5′ mRNA end. PMID:16107557
Genome sequence of a distinct watermelon mosaic virus identified from ginseng (Panax ginseng) transcriptome.

PubMed

Park, D; Kim, H; Hahn, Y

Watermelon mosaic virus (WMV) is a member of the genus Potyvirus, which is the largest genus of plant viruses. WMV is a significant pathogen of crop plants, including Cucurbitaceae species. A WMV strain, designated as WMV-Pg, was identified in transcriptome data collected from ginseng (Panax ginseng) root. WMV-Pg showed 84% nucleotide sequence identity and 91% amino acid sequence identity with its closest related virus, WMV-Fr. A phylogenetic analysis of WMV-Pg with other WMVs and soybean mosaic viruses (SMVs) indicated that WMV-Pg is a distinct subtype of the WMV/SMV group of the genus Potyvirus in the family Potyviridae.
A gyrovirus infecting a sea bird

PubMed Central

Li, Linlin; Pesavento, Patricia A.; Gaynor, Anne M.; Duerr, Rebecca S.; Phan, Tung Gia; Zhang, Wen; Deng, Xutao

2015-01-01

We characterized the genome of a highly divergent gyrovirus (GyV8) in the spleen and uropygial gland tissues of a diseased northern fulmar (Fulmarus glacialis), a pelagic bird beached in San Francisco, California. No other exogenous viral sequences could be identified using viral metagenomics. The small circular DNA genome shared no significant nucleotide sequence identity, and only 38–42 % amino acid sequence identity in VP1, with any of the previously identified gyroviruses. GyV8 is the first member of the third major phylogenetic clade of this viral genus and the first gyrovirus detected in an avian species other than chicken. PMID:26036564
Discovery of a novel retrovirus sequence in an Australian native rodent (Melomys burtoni): a putative link between gibbon ape leukemia virus and koala retrovirus.

PubMed

Simmons, Greg; Clarke, Daniel; McKee, Jeff; Young, Paul; Meers, Joanne

2014-01-01

Gibbon ape leukaemia virus (GALV) and koala retrovirus (KoRV) share a remarkably close sequence identity despite the fact that they occur in distantly related mammals on different continents. It has previously been suggested that infection of their respective hosts may have occurred as a result of a species jump from another, as yet unidentified vertebrate host. To investigate possible sources of these retroviruses in the Australian context, DNA samples were obtained from 42 vertebrate species and screened using PCR in order to detect proviral sequences closely related to KoRV and GALV. Four proviral partial sequences totalling 2880 bases which share a strong similarity with KoRV and GALV were detected in DNA from a native Australian rodent, the grassland melomys, Melomys burtoni. We have designated this novel gammaretrovirus Melomys burtoni retrovirus (MbRV). The concatenated nucleotide sequence of MbRV shares 93% identity with the corresponding sequence from GALV-SEATO and 83% identity with KoRV. The geographic ranges of the grassland melomys and of the koala partially overlap. Thus a species jump by MbRV from melomys to koalas is conceivable. However the genus Melomys does not occur in mainland South East Asia and so it appears most likely that another as yet unidentified host was the source of GALV.
An approach for identification of unknown viruses using sequencing-by-hybridization.

PubMed

Katoski, Sarah E; Meyer, Hermann; Ibrahim, Sofi

2015-09-01

Accurate identification of biological threat agents, especially RNA viruses, in clinical or environmental samples can be challenging because the concentration of viral genomic material in a given sample is usually low, viral genomic RNA is liable to degradation, and RNA viruses are extremely diverse. A two-tiered approach was used for initial identification, then full genomic characterization of 199 RNA viruses belonging to virus families Arenaviridae, Bunyaviridae, Filoviridae, Flaviviridae, and Togaviridae. A Sequencing-by-hybridization (SBH) microarray was used to tentatively identify a viral pathogen then, the identity is confirmed by guided next-generation sequencing (NGS). After optimization and evaluation of the SBH and NGS methodologies with various virus species and strains, the approach was used to test the ability to identify viruses in blinded samples. The SBH correctly identified two Ebola viruses in the blinded samples within 24 hr, and by using guided amplicon sequencing with 454 GS FLX, the identities of the viruses in both samples were confirmed. SBH provides at relatively low-cost screening of biological samples against a panel of viral pathogens that can be custom-designed on a microarray. Once the identity of virus is deduced from the highest hybridization signal on the SBH microarray, guided (amplicon) NGS sequencing can be used not only to confirm the identity of the virus but also to provide further information about the strain or isolate, including a potential genetic manipulation. This approach can be useful in situations where natural or deliberate biological threat incidents might occur and a rapid response is required. © 2015 Wiley Periodicals, Inc.
Pairwise comparisons of ten porcine tissues identify differential transcriptional regulation at the gene, isoform, promoter and transcription start site level

DOE Office of Scientific and Technical Information (OSTI.GOV)

Farajzadeh, Leila; Hornshøj, Henrik; Momeni, Jamal

Highlights: •Transcriptome sequencing yielded 223 mill porcine RNA-seq reads, and 59,000 transcribed locations. •Establishment of unique transcription profiles for ten porcine tissues including four brain tissues. •Comparison of transcription profiles at gene, isoform, promoter and transcription start site level. •Highlights a high level of regulation of neuro-related genes at both gene, isoform, and TSS level. •Our results emphasize the pig as a valuable animal model with respect to human biological issues. -- Abstract: The transcriptome is the absolute set of transcripts in a tissue or cell at the time of sampling. In this study RNA-Seq is employed to enable themore » differential analysis of the transcriptome profile for ten porcine tissues in order to evaluate differences between the tissues at the gene and isoform expression level, together with an analysis of variation in transcription start sites, promoter usage, and splicing. Totally, 223 million RNA fragments were sequenced leading to the identification of 59,930 transcribed gene locations and 290,936 transcript variants using Cufflinks with similarity to approximately 13,899 annotated human genes. Pairwise analysis of tissues for differential expression at the gene level showed that the smallest differences were between tissues originating from the porcine brain. Interestingly, the relative level of differential expression at the isoform level did generally not vary between tissue contrasts. Furthermore, analysis of differential promoter usage between tissues, revealed a proportionally higher variation between cerebellum (CBE) versus frontal cortex and cerebellum versus hypothalamus (HYP) than in the remaining comparisons. In addition, the comparison of differential transcription start sites showed that the number of these sites is generally increased in comparisons including hypothalamus in contrast to other pairwise assessments. A comprehensive analysis of one of the tissue contrasts, i.e. cerebellum versus heart for differential variation at the gene, isoform, and transcription start site (TSS), and promoter level showed that several of the genes differed at all four levels. Interestingly, these genes were mainly annotated to the “electron transport chain” and neuronal differentiation, emphasizing that “tissue important” genes are regulated at several levels. Furthermore, our analysis shows that the “across tissue approach” has a promising potential when screening for possible explanations for variations, such as those observed at the gene expression levels.« less
Complete nucleotide sequence of Alfalfa mosaic virus isolated from alfalfa (Medicago sativa L.) in Argentina.

PubMed

Trucco, Verónica; de Breuil, Soledad; Bejerman, Nicolás; Lenardon, Sergio; Giolitti, Fabián

2014-06-01

The complete nucleotide sequence of an Alfalfa mosaic virus (AMV) isolate infecting alfalfa (Medicago sativa L.) in Argentina, AMV-Arg, was determined. The virus genome has the typical organization described for AMV, and comprises 3,643, 2,593, and 2,038 nucleotides for RNA1, 2 and 3, respectively. The whole genome sequence and each encoding region were compared with those of other four isolates that have been completely sequenced from China, Italy, Spain and USA. The nucleotide identity percentages ranged from 95.9 to 99.1 % for the three RNAs and from 93.7 to 99 % for the protein 1 (P1), protein 2 (P2), movement protein and coat protein (CP) encoding regions, whereas the amino acid identity percentages of these proteins ranged from 93.4 to 99.5 %, the lowest value corresponding to P2. CP sequences of AMV-Arg were compared with those of other 25 available isolates, and the phylogenetic analysis based on the CP gene was carried out. The highest percentage of nucleotide sequence identity of the CP gene was 98.3 % with a Chinese isolate and 98.6 % at the amino acid level with four isolates, two from Italy, one from Brazil and the remaining one from China. The phylogenetic analysis showed that AMV-Arg is closely related to subgroup I of AMV isolates. To our knowledge, this is the first report of a complete nucleotide sequence of AMV from South America and the first worldwide report of complete nucleotide sequence of AMV isolated from alfalfa as natural host.
Complete chloroplast genome sequences of Hordeum vulgare, Sorghum bicolor and Agrostis stolonifera, and comparative analyses with other grass genomes

PubMed Central

Saski, Christopher; Lee, Seung-Bum; Fjellheim, Siri; Guda, Chittibabu; Jansen, Robert K.; Luo, Hong; Tomkins, Jeffrey; Rognli, Odd Arne; Clarke, Jihong Liu

2009-01-01

Comparisons of complete chloroplast genome sequences of Hordeum vulgare, Sorghum bicolor and Agrostis stolonifera to six published grass chloroplast genomes reveal that gene content and order are similar but two microstructural changes have occurred. First, the expansion of the IR at the SSC/IRa boundary that duplicates a portion of the 5′ end of ndhH is restricted to the three genera of the subfamily Pooideae (Agrostis, Hordeum and Triticum). Second, a 6 bp deletion in ndhK is shared by Agrostis, Hordeum, Oryza and Triticum, and this event supports the sister relationship between the subfamilies Erhartoideae and Pooideae. Repeat analysis identified 19–37 direct and inverted repeats 30 bp or longer with a sequence identity of at least 90%. Seventeen of the 26 shared repeats are found in all the grass chloroplast genomes examined and are located in the same genes or intergenic spacer (IGS) regions. Examination of simple sequence repeats (SSRs) identified 16–21 potential polymorphic SSRs. Five IGS regions have 100% sequence identity among Zea mays, Saccharum officinarum and Sorghum bicolor, whereas no spacer regions were identical among Oryza sativa, Triticum aestivum, H. vulgare and A. stolonifera despite their close phylogenetic relationship. Alignment of EST sequences and DNA coding sequences identified six C–U conversions in both Sorghum bicolor and H. vulgare but only one in A. stolonifera. Phylogenetic trees based on DNA sequences of 61 protein-coding genes of 38 taxa using both maximum parsimony and likelihood methods provide moderate support for a sister relationship between the subfamilies Erhartoideae and Pooideae. PMID:17534593
Complete genome sequence of a tomato infecting tomato mottle mosaic virus in New York

USDA-ARS?s Scientific Manuscript database

Complete genome sequence of an emerging isolate of tomato mottle mosaic virus (ToMMV) infecting experimental nicotianan benthamiana plants in up-state New York was obtained using small RNA deep sequencing. ToMMV_NY-13 shared 99% sequence identity to ToMMV isolates from Mexico and Florida. Broader d...
SEAN: SNP prediction and display program utilizing EST sequence clusters.

PubMed

Huntley, Derek; Baldo, Angela; Johri, Saurabh; Sergot, Marek

2006-02-15

SEAN is an application that predicts single nucleotide polymorphisms (SNPs) using multiple sequence alignments produced from expressed sequence tag (EST) clusters. The algorithm uses rules of sequence identity and SNP abundance to determine the quality of the prediction. A Java viewer is provided to display the EST alignments and predicted SNPs.
Indigenous Arabs are descendants of the earliest split from ancient Eurasian populations

PubMed Central

Rodriguez-Flores, Juan L.; Fakhro, Khalid; Agosto-Perez, Francisco; Ramstetter, Monica D.; Arbiza, Leonardo; Vincent, Thomas L.; Robay, Amal; Malek, Joel A.; Suhre, Karsten; Chouchane, Lotfi; Badii, Ramin; Al-Nabet Al-Marri, Ajayeb; Abi Khalil, Charbel; Zirie, Mahmoud; Jayyousi, Amin; Salit, Jacqueline; Keinan, Alon; Clark, Andrew G.; Crystal, Ronald G.; Mezey, Jason G.

2016-01-01

An open question in the history of human migration is the identity of the earliest Eurasian populations that have left contemporary descendants. The Arabian Peninsula was the initial site of the out-of-Africa migrations that occurred between 125,000 and 60,000 yr ago, leading to the hypothesis that the first Eurasian populations were established on the Peninsula and that contemporary indigenous Arabs are direct descendants of these ancient peoples. To assess this hypothesis, we sequenced the entire genomes of 104 unrelated natives of the Arabian Peninsula at high coverage, including 56 of indigenous Arab ancestry. The indigenous Arab genomes defined a cluster distinct from other ancestral groups, and these genomes showed clear hallmarks of an ancient out-of-Africa bottleneck. Similar to other Middle Eastern populations, the indigenous Arabs had higher levels of Neanderthal admixture compared to Africans but had lower levels than Europeans and Asians. These levels of Neanderthal admixture are consistent with an early divergence of Arab ancestors after the out-of-Africa bottleneck but before the major Neanderthal admixture events in Europe and other regions of Eurasia. When compared to worldwide populations sampled in the 1000 Genomes Project, although the indigenous Arabs had a signal of admixture with Europeans, they clustered in a basal, outgroup position to all 1000 Genomes non-Africans when considering pairwise similarity across the entire genome. These results place indigenous Arabs as the most distant relatives of all other contemporary non-Africans and identify these people as direct descendants of the first Eurasian populations established by the out-of-Africa migrations. PMID:26728717
Analysis of artifacts suggests DGGE should not be used for quantitative diversity analysis.

PubMed

Neilson, Julia W; Jordan, Fiona L; Maier, Raina M

2013-03-01

PCR-denaturing gradient gel electrophoresis (PCR-DGGE) is widely used in microbial ecology for the analysis of comparative community structure. However, artifacts generated during PCR-DGGE of mixed template communities impede the application of this technique to quantitative analysis of community diversity. The objective of the current study was to employ an artificial bacterial community to document and analyze artifacts associated with multiband signatures and preferential template amplification and to highlight their impacts on the use of this technique for quantitative diversity analysis. Six bacterial species (three Betaproteobacteria, two Alphaproteobacteria, and one Firmicutes) were amplified individually and in combinations with primers targeting the V7/V8 region of the 16S rRNA gene. Two of the six isolates produced multiband profiles demonstrating that band number does not correlate directly with α-diversity. Analysis of the multiple bands from one of these isolates confirmed that both bands had identical sequences which lead to the hypothesis that the multiband pattern resulted from two distinct structural conformations of the same amplicon. In addition, consistent preferential amplification was demonstrated following pairwise amplifications of the six isolates. DGGE and real time PCR analysis identified primer mismatch and PCR inhibition due to 16S rDNA secondary structure as the most probable causes of preferential amplification patterns. Reproducible DGGE community profiles generated in this study confirm that PCR-DGGE provides an excellent high-throughput tool for comparative community structure analysis, but that method-specific artifacts preclude its use for accurate comparative diversity analysis. Copyright © 2013 Elsevier B.V. All rights reserved.
New advances in molecular epizootiology of canine hematic protozoa from Venezuela, Thailand and Spain.

PubMed

Criado-Fornelio, A; Rey-Valeiron, C; Buling, A; Barba-Carretero, J C; Jefferies, R; Irwin, P

2007-03-31

The prevalence of hematozoan infections (Hepatozoon canis and Babesia sp., particularly Babesia canis vogeli) in canids from Venezuela, Thailand and Spain was studied by amplification and sequencing of the 18S rRNA gene. H. canis infections caused simultaneously by two different isolates were confirmed by RFLP analysis in samples from all the geographic regions studied. In Venezuela, blood samples from 134 dogs were surveyed. Babesia infections were found in 2.24% of the dogs. Comparison of sequences of the 18S rRNA gene indicated that protozoan isolates were genetically identical to B. canis vogeli from Japan and Brazil. H. canis infected 44.77 per cent of the dogs. A representative sample of Venezuelan H. canis isolates (21.6% of PCR-positives) was sequenced. Many of them showed 18S rRNA gene sequences identical to H. canis Spain 2, albeit two less frequent genotypes were found in the sample studied. In Thailand, 20 dogs were analyzed. No infections caused by Babesia were diagnosed, whereas 30 per cent of the dogs were positive to hematozoan infection. Two protozoa isolates showing 99.7-100% identity to H. canis Spain 2 were found. In Spain, 250 dogs were studied. B. canis vogeli infected 0.01% of the animals. The sequence of the 18S rRNA gene in Spanish isolates of this protozoa was closely related to those previously deposited in GenBank (> 99% identity). Finally, 20 red foxes were screened for hematozoans employing semi-nested PCR and primers designed to detect Babesia/Theileria. Fifty percent of the foxes were positive to Theileria annae. In addition, it was found that the PCR assay was able as well to detect Hepatozoon infections. Thirty five percent of the foxes were infected with two different H. canis isolates showing 99.8-100% identity to Curupira 1 from Brazil.
Identity of Fasciola spp. in sheep in Egypt.

PubMed

Amer, Said; ElKhatam, Ahmed; Zidan, Shereif; Feng, Yaoyu; Xiao, Lihua

2016-12-01

In Egypt, liver flukes, Fasciola spp. (Digenea: Fasciolidae), have a serious impact on the farming industry and public health. Both Fasciola hepatica and Fasciola gigantica are known to occur in cattle, providing the opportunity for genetic recombination. Little is known on the identity and genetic variability of Fasciola populations in sheep. This study was performed to determine the prevalence of liver flukes in sheep in Menofia Province as a representative area of the delta region in Egypt, as measured by postmortem examination of slaughtered animals at three abattoirs. The identity and genetic variability of Fasciola spp. in slaughtered animals were determined by PCR-sequence analysis of the nuclear ribosomal internal transcribed spacer 1 (ITS1) and the mitochondrial NADH dehydrogenase subunit 1 (nad1) genes. Physical inspection of the liver indicated that 302 of 2058 (14.7%) slaughtered sheep were infected with Fasciola spp. Sequence analysis of the ITS1 and nad1 genes of liver flukes from 17 animals revealed that 11 animals were infected with F. hepatica, four with F. gigantica, and two with both species. Seventy eight of 103 flukes genetically characterized from these animals were F. hepatica, 23 were F. gigantica, and two had ITS1 sequences identical to F. hepatica but nad1 sequences identical to F. gigantica. nad1 sequences of Egyptian isolates of F. gigantica showed pronounced differences from those in the GenBank database. Egyptian F. gigantica haplotypes formed haplogroup D, which clustered in a sister clade with haplogroups A, B and C circulating in Asia, indicating the existence of geographic isolation in the species. Both F. hepatica and F. gigantica are prevalent in sheep in Egypt and an introgressed form of the two occurs as the result of genetic recombination. In addition, a geographically isolated F. gigantica population is present in the country. The importance of these observations in epidemiology of fascioliasis needs to be examined in future studies.
Characterization, genetic diversity, and evolutionary link of Cucumber mosaic virus strain New Delhi from India.

PubMed

Koundal, Vikas; Haq, Qazi Mohd Rizwanul; Praveen, Shelly

2011-02-01

The genome of Cucumber mosaic virus New Delhi strain (CMV-ND) from India, obtained from tomato, was completely sequenced and compared with full genome sequences of 14 known CMV strains from subgroups I and II, for their genetic diversity. Sequence analysis suggests CMV-ND shares maximum sequence identity at the nucleotide level with a CMV strain from Taiwan. Among all 15 strains of CMV, the encoded protein 2b is least conserved, whereas the coat protein (CP) is most conserved. Sequence identity values and phylogram results indicate that CMV-ND belongs to subgroup I. Based on the recombination detection program result, it appears that CMV is prone to recombination, and different RNA components of CMV-ND have evolved differently. Recombinational analysis of all 15 CMV strains detected maximum recombination breakpoints in RNA2; CP showed the least recombination sites.
In silico analysis of L-asparaginase from different source organisms.

PubMed

Dwivedi, Vivek Dhar; Mishra, Sarad Kumar

2014-06-01

L-asparaginases are widely distributed enzymes among plants, fungi and bacteria. This enzyme catalyzes the conversion of l-asparagine to l-aspartate and ammonia and to a lesser extent the formation of l-glutamate from l-glutamine. In the present study, forty-five full-length amino acid sequences of L-asparaginases from bacteria, fungi and plants were collected and subjected to multiple sequence alignment (MSA), domain identification, discovering individual amino acid composition, and phylogenetic tree construction. MSA revealed that two glycine residues were identically found in all analyzed species, two glycine residues were also identically found in all the fungal and bacterial sources and three glycine residues were identically found in all plant and bacterial sources while no residue was identically found in plant and fungal L-asparaginases. Two major sequence clusters were constructed by phylogenetic analysis. One cluster contains eleven species of fungi, twelve species of bacteria, and one species of plant, whereas the other one contains fourteen species of plant, four species of fungi and three species bacteria. The amino acid composition result revealed that the average frequency of amino acid alanine is 10.77 percent that is very high in comparison to other amino acids in all analyzed species.

Grapevine virus I, a putative new vitivirus detected in co-infection with grapevine virus G in New Zealand.

PubMed

Blouin, Arnaud G; Chooi, Kar Mun; Warren, Ben; Napier, Kathryn R; Barrero, Roberto A; MacDiarmid, Robin M

2018-05-01

A novel virus, with characteristics of viruses classified within the genus Vitivirus, was identified from a sample of Vitis vinifera cv. Chardonnay in New Zealand. The virus was detected with high throughput sequencing (small RNA and total RNA) and its sequence was confirmed by Sanger sequencing. Its genome is 7507 nt long (excluding the polyA tail) with an organisation similar to that described for other classifiable members of the genus Vitivirus. The closest relative of the virus is grapevine virus E (GVE) with 65% aa identity in ORF1 (65% nt identity) and 63% aa identity in the coat protein (66% nt identity). The relationship with GVE was confirmed with phylogenetic analysis, showing the new virus branching with GVE, Agave tequilina leaf virus and grapevine virus G (GVG). A limited survey revealed the presence of this virus in multiple plants from the same location where the newly described GVG was discovered, and in most cases both viruses were detected as co-infections. The genetic characteristics of this virus suggest it represents an isolate of a new species within the genus Vitivirus and following the current nomenclature, we propose the name "Grapevine virus I".
Vacuolar H[sup +]-ATPase 69-kilodalton catalytic subunit cDNA from developing cotton (Gossypium hirsutum) ovules

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wilkins, T.A.

1993-06-01

This study investigates the molecular events of vacuole ontogeny in rapidly elongated cotton plant cells. Within the DNA coding region, the cotton and carrot cDNA clones exhibit 82.2% nucleotide sequence homology; at the amino acid level cotton and carrot catalytic subunits exhibited 95.7% identity and 2.1% amino acid similarity. When aligned with the analogous sequences from yeast, the cotton protein shared only 60.5% amino acid identity and 12.7% similarity. 10 refs., 1 tab.
Molecular detection of kobuviruses in European roe deer (Capreolus capreolus) in Italy.

PubMed

Di Martino, Barbara; Di Profio, Federica; Melegari, Irene; Di Felice, Elisabetta; Robetto, Serena; Guidetti, Cristina; Orusa, Riccardo; Martella, Vito; Marsilio, Fulvio

2015-08-01

Kobuvirus RNA was found in 6.6 % (13/198) of stool specimens from roe deer (Capreolus capreolus) captured during the regular hunting season. Upon sequence analysis of a fragment of the 3D gene, nine strains displayed the highest nucleotide sequence identity (91.2-97.4 %) to bovine kobuviruses previously detected in either diarrhoeic or asymptomatic calves. Interestingly, four strains were genetically related to the newly discovered caprine kobuviruses (84.2-87.6 % nucleotide identity) identified in black goats in Korea.
Clustering and propulsion of isotropic catalytic swimmers

NASA Astrophysics Data System (ADS)

Varma, Akhil; Montenegro-Johnson, Thomas D.; Michelin, Sebastien

2017-11-01

Catalytic micro-swimmers such as phoretic particles use local gradients in solute concentration for propulsion. An isolated isotropic phoretic particle generates a uniform concentration field on its surface and hence cannot propel on its own. Symmetry of this field is broken by the presence of at least another similar particle in the system, which leads to phoretic attraction or repulsion. Phoretic attraction drives the clustering of identical homogeneous particles into stable clusters of various configurations which may self-propel or rotate due to their geometric asymmetry. Using full numerical simulations and analytic approximations based on pairwise interactions of the particles, we study the cluster formation and its impact on the statistics of the propulsion properties. We finally analyze the effect of background noise on the results. European Research Council (Grant Agreement 714027).
Some identities of generalized Fibonacci sequence

NASA Astrophysics Data System (ADS)

Chong, Chin-Yoon; Cheah, C. L.; Ho, C. K.

2014-07-01

We introduced the generalized Fibonacci sequence {Un} defined by U0 = 0, U1 = 1, and Un+2 = pUn+1+qUn for all p, q∈Z+ and for all non-negative integers n. In this paper, we obtained some recursive formulas of the sequence.
Complete genome sequence of a new maize-associated cytorhabdovirus

USDA-ARS?s Scientific Manuscript database

A new 11,877 nt cytorhabdovirus sequence with 6 open reading frames has been identified in a maize sample. It shares 50 and 51% genome-wide nucleotide sequence identity with northern cereal mosaic cytorhabdovirus (NCMV) and barley yellow striate mosaic cytorhabdovirus (BYSMV), respectively....
Sequence Variation in the Small-Subunit rRNA Gene of Plasmodium malariae and Prevalence of Isolates with the Variant Sequence in Sichuan, China

PubMed Central

Liu, Qing; Zhu, Shenghua; Mizuno, Sahoko; Kimura, Masatsugu; Liu, Peina; Isomura, Shin; Wang, Xingzhen; Kawamoto, Fumihiko

1998-01-01

By two PCR-based diagnostic methods, Plasmodium malariae infections have been rediscovered at two foci in the Sichuan province of China, a region where no cases of P. malariae have been officially reported for the last 2 decades. In addition, a variant form of P. malariae which has a deletion of 19 bp and seven substitutions of base pairs in the target sequence of the small-subunit (SSU) rRNA gene was detected with high frequency. Alignment analysis of Plasmodium sp. SSU rRNA gene sequences revealed that the 5′ region of the variant sequence is identical to that of P. vivax or P. knowlesi and its 3′ region is identical to that of P. malariae. The same sequence variations were also found in P. malariae isolates collected along the Thai-Myanmar border, suggesting a wide distribution of this variant form from southern China to Southeast Asia. PMID:9774600
Sequence analysis of MHC class I α2 from sockeye salmon (Oncorhynchus nerka).

PubMed

McClelland, Erin K; Ming, Tobi J; Tabata, Amy; Miller, Kristina M

2011-09-01

Most studies assessing adaptive MHC diversity in salmon populations have focused on the classical class II DAB or DAA loci, as these have been most amenable to single PCR amplifications due to their relatively low level of sequence divergence. Herein, we report the characterization of the classical class I UBA α2 locus based on collections taken throughout the species range of sockeye salmon (Oncorhynchus nerka). Through use of multiple lineage-specific primer sets, denaturing gradient gel electrophoresis and sequencing, we identified thirty-four alleles from three highly divergent lineages. Sequence identity between lineages ranged from 30.0% to 56.8% but was relatively high within lineages. Allelic identity within the antigen recognition site (ARS) was greater than for the longer sequence. Global positive selection on UBA was seen at the sequence level (dN:dS = 1.012) with four codons under positive selection and 12 codons under negative selection. Crown Copyright © 2011. Published by Elsevier Ltd. All rights reserved.
Increasing Sequence Diversity with Flexible Backbone Protein Design: The Complete Redesign of a Protein Hydrophobic Core

DOE Office of Scientific and Technical Information (OSTI.GOV)

Murphy, Grant S.; Mills, Jeffrey L.; Miley, Michael J.

2015-10-15

Protein design tests our understanding of protein stability and structure. Successful design methods should allow the exploration of sequence space not found in nature. However, when redesigning naturally occurring protein structures, most fixed backbone design algorithms return amino acid sequences that share strong sequence identity with wild-type sequences, especially in the protein core. This behavior places a restriction on functional space that can be explored and is not consistent with observations from nature, where sequences of low identity have similar structures. Here, we allow backbone flexibility during design to mutate every position in the core (38 residues) of a four-helixmore » bundle protein. Only small perturbations to the backbone, 12 {angstrom}, were needed to entirely mutate the core. The redesigned protein, DRNN, is exceptionally stable (melting point >140C). An NMR and X-ray crystal structure show that the side chains and backbone were accurately modeled (all-atom RMSD = 1.3 {angstrom}).« less
Comparison of the nucleotide and amino acid sequences of the RsrI and EcoRI restriction endonucleases.

PubMed

Stephenson, F H; Ballard, B T; Boyer, H W; Rosenberg, J M; Greene, P J

1989-12-21

The RsrI endonuclease, a type-II restriction endonuclease (ENase) found in Rhodobacter sphaeroides, is an isoschizomer of the EcoRI ENase. A clone containing an 11-kb BamHI fragment was isolated from an R. sphaeroides genomic DNA library by hybridization with synthetic oligodeoxyribonucleotide probes based on the N-terminal amino acid (aa) sequence of RsrI. Extracts of E. coli containing a subclone of the 11-kb fragment display RsrI activity. Nucleotide sequence analysis reveals an 831-bp open reading frame encoding a polypeptide of 277 aa. A 50% identity exists within a 266-aa overlap between the deduced aa sequences of RsrI and EcoRI. Regions of 75-100% aa sequence identity correspond to key structural and functional regions of EcoRI. The type-II ENases have many common properties, and a common origin might have been expected. Nevertheless, this is the first demonstration of aa sequence similarity between ENases produced by different organisms.
Pairwise-Comparison Software

NASA Technical Reports Server (NTRS)

Ricks, Wendell R.

1995-01-01

Pairwise comparison (PWC) is computer program that collects data for psychometric scaling techniques now used in cognitive research. It applies technique of pairwise comparisons, which is one of many techniques commonly used to acquire the data necessary for analyses. PWC administers task, collects data from test subject, and formats data for analysis. Written in Turbo Pascal v6.0.
Filling Gaps in Biodiversity Knowledge for Macrofungi: Contributions and Assessment of an Herbarium Collection DNA Barcode Sequencing Project

PubMed Central

Osmundson, Todd W.; Robert, Vincent A.; Schoch, Conrad L.; Baker, Lydia J.; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M.

2013-01-01

Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1–2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa. PMID:23638077
Filling gaps in biodiversity knowledge for macrofungi: contributions and assessment of an herbarium collection DNA barcode sequencing project.

PubMed

Osmundson, Todd W; Robert, Vincent A; Schoch, Conrad L; Baker, Lydia J; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M

2013-01-01

Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1-2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa.
SALAD database: a motif-based database of protein annotations for plant comparative genomics

PubMed Central

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933
Genome-wide screening of Oryza sativa ssp. japonica and indica reveals a complex family of proteins with ribosome-inactivating protein domains.

PubMed

Wytynck, Pieter; Rougé, Pierre; Van Damme, Els J M

2017-11-01

Ribosome-inactivating proteins (RIPs) are cytotoxic enzymes capable of halting protein synthesis by irreversible modification of ribosomes. Although RIPs are widespread they are not ubiquitous in the plant kingdom. The physiological importance of RIPs is not fully elucidated, but evidence suggests a role in the protection of the plant against biotic and abiotic stresses. Searches in the rice genome revealed a large and highly complex family of proteins with a RIP domain. A comparative analysis retrieved 38 RIP sequences from the genome sequence of Oryza sativa subspecies japonica and 34 sequences from the subspecies indica. The RIP sequences are scattered over different chromosomes but are mostly found on the third chromosome. The phylogenetic tree revealed the pairwise clustering of RIPs from japonica and indica. Molecular modeling and sequence analysis yielded information on the catalytic site of the enzyme, and suggested that a large part of RIP domains probably possess N-glycosidase activity. Several RIPs are differentially expressed in plant tissues and in response to specific abiotic stresses. This study provides an overview of RIP motifs in rice and will help to understand their biological role(s) and evolutionary relationships. Copyright © 2017 Elsevier Ltd. All rights reserved.
SALAD database: a motif-based database of protein annotations for plant comparative genomics.

PubMed

Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi

2010-01-01

Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.
CoCoNUT: an efficient system for the comparison and analysis of genomes

PubMed Central

2008-01-01

Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics. PMID:19014477
Protein contact prediction using patterns of correlation.

PubMed

Hamilton, Nicholas; Burrage, Kevin; Ragan, Mark A; Huber, Thomas

2004-09-01

We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two "windows" of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. Copyright 2004 Wiley-Liss, Inc.
Understanding co-polymerization in amyloid formation by direct observation of mixed oligomers† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc00620a Click here for additional data file.

PubMed Central

Young, Lydia M.; Tu, Ling-Hsien; Raleigh, Daniel P.; Ashcroft, Alison E.

2017-01-01

Although amyloid assembly in vitro is commonly investigated using single protein sequences, fibril formation in vivo can be more heterogeneous, involving co-assembly of proteins of different length, sequence and/or post-translational modifications. Emerging evidence suggests that co-polymerization can alter the rate and/or mechanism of aggregation and can contribute to pathogenicity. Electrospray ionization-ion mobility spectrometry-mass spectrometry (ESI-IMS-MS) is uniquely suited to the study of these heterogeneous ensembles. Here, ESI-IMS-MS combined with analysis of fibrillation rates using thioflavin T (ThT) fluorescence, is used to track the course of aggregation of variants of islet-amyloid polypeptide (IAPP) in isolation and in pairwise mixtures. We identify a sub-population of extended monomers as the key precursors of amyloid assembly, and reveal that the fastest aggregating sequence in peptide mixtures determines the lag time of fibrillation, despite being unable to cross-seed polymerization. The results demonstrate that co-polymerization of IAPP sequences radically alters the rate of amyloid assembly by altering the conformational properties of the mixed oligomers that form. PMID:28970890
CAFE: aCcelerated Alignment-FrEe sequence analysis

PubMed Central

Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A.; Waterman, Michael S.

2017-01-01

Abstract Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$d_2^*$\\end{document} and \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$d_2^S$\\end{document} are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. PMID:28472388

Zygosaccharomyces favi sp. nov., an obligate osmophilic yeast species from bee bread and honey.

PubMed

Čadež, Neža; Fülöp, László; Dlauchy, Dénes; Péter, Gábor

2015-03-01

Five yeast strains representing a hitherto undescribed yeast species were isolated from bee bread and honey in Hungary. They are obligate osmophilic, i.e. they are unable to grow in/on high water activity culture media. Following isogamous conjugation, they form 1-4 spheroid or subspheroid ascospores in persistent asci. The analysis of the sequences of their large subunit rRNA gene D1/D2 domain placed the new species in the Zygosaccharomyces clade. In terms of pairwise sequence similarity, Zygosaccharomyces gambellarensis is the most closely related species. Comparisons of D1/D2, internal transcribed spacer and translation elongation factor-1α (EF-1α) gene sequences of the five strains with that of the type strain of Z. gambellarensis revealed that they represent a new yeast species. The name Zygosaccharomyces favi sp. nov. (type strain: NCAIM Y.01994(T) = CBS 13653(T) = NRRL Y-63719(T) = ZIM 2551(T)) is proposed for this new yeast species, which based on phenotype can be distinguished from related Zygosaccharomyces species by its obligate osmophilic nature. Some intragenomic sequence variability, mainly indels, was detected among the ITS copies of the strains of the new species.
Initial Detection and Molecular Characterization of Namaycush Herpesvirus (Salmonid Herpesvirus 5) in Lake Trout.

PubMed

Glenney, Gavin W; Barbash, Patricia A; Coll, John A

2016-03-01

A novel herpesvirus was found by molecular methods in samples of Lake Trout Salvelinus namaycush from Lake Erie, Pennsylvania, and Lake Ontario, Keuka Lake, and Lake Otsego, New York. Based on PCR amplification and partial sequencing of polymerase, terminase, and glycoprotein genes, a number of isolates were identified as a novel virus, which we have named Namaycush herpesvirus (NamHV) salmonid herpesvirus 5 (SalHV5). Phylogenetic analyses of three NamHV genes indicated strong clustering with other members of the genus Salmonivirus, placing these isolates into family Alloherpesviridae. The NamHV isolates were identical in the three partially sequenced genes; however, they varied from other salmonid herpesviruses in nucleotide sequence identity. In all three of the genes sequenced, NamHV shared the highest sequence identity with Atlantic Salmon papillomatosis virus (ASPV; SalHV4) isolated from Atlantic Salmon Salmo salar in northern Europe, including northwestern Russia. These results lead one to believe that NamHV and ASPV have a common ancestor that may have made a relatively recent host jump from Atlantic Salmon to Lake Trout or vice versa. Partial nucleotide sequence comparisons between NamHV and ASPV for the polymerase and glycoprotein genes differ by >5% and >10%, respectively. Additional nucleotide sequence comparisons between NamHV and epizootic epitheliotropic disease virus (EEDV/SalHV3) in the terminase, glycoprotein, and polymerase genes differ by >5%, >20%, and >10%, respectively. Thus, NamHV and EEDV may be occupying discrete ecological niches in Lake Trout. Even though NamHV shared the highest genetic identity with ASPV, each of these viruses has a separate host species, which also implies speciation. Additionally, NamHV has been detected over the last 4 years in four separate water bodies across two states, which suggests that NamHV is a distinct, naturally replicating lineage. This, in combination with a divergence in nucleotide sequence from EEDV, indicates that NamHV is a new species in the genus Salmonivirus. Received April 20, 2015; accepted October 11, 2015.
Complete genome sequence of a novel genotype of squash mosaic virus

USDA-ARS?s Scientific Manuscript database

Complete genome sequence of a novel genotype of Squash mosaic virus (SqMV) infecting squash plants in Spain was obtained using deep sequencing of small ribonucleic acids and assembly. The low nucleotide sequence identities, with 87-88% on RNA1 and 84-86% on RNA2 to known SqMV isolates, suggest a new...
First complete genome sequence of an emerging cucumber green mottle mosaic virus isolate in North America

USDA-ARS?s Scientific Manuscript database

The complete genome sequence (6,423 nt) of an emerging Cucumber green mottle mosaic virus (CGMMV) isolate on cucumber in North America was determined through deep sequencing of sRNA and rapid amplification of cDNA ends. It shares 99% nucleotide sequence identity to the Asian genotype, but only 90% t...
Is ITS-2 rDNA suitable marker for genetic characterization of Sarcoptes mites from different wild animals in different geographic areas?

PubMed

Alasaad, S; Soglia, D; Spalenza, V; Maione, S; Soriguer, R C; Pérez, J M; Rasero, R; Degiorgis, M P Ryser; Nimmervoll, H; Zhu, X Q; Rossi, L

2009-02-05

The present study examined the relationship among individual Sarcoptes scabiei mites from 13 wild mammalian populations belonging to nine species in four European countries using the second internal transcribed spacer (ITS-2) of nuclear ribosomal DNA (rDNA) as genetic marker. The ITS-2 plus primer flanking 5.8S and 28S rDNA (ITS-2+) was amplified from individual mites by polymerase chain reaction (PCR) and the amplicons were sequenced directly. A total of 148 ITS-2+ sequences of 404bp in length were obtained and 67 variable sites were identified (16.59%). UPGMA analyses did not show any geographical or host-specific clustering, and a similar outcome was obtained using population pairwise Fst statistics. These results demonstrated that ITS-2 rDNA does not appear to be suitable for examining genetic diversity among mite populations.
Cytochrome b sequences in black-crowned night-herons (Nycticorax nycticorax) from heronries exposed to genotoxic contaminants

USGS Publications Warehouse

Dahl, Christopher R.; Bickham, John W.; Wickliffe, Jeffery K.; Custer, Thomas W.

2001-01-01

DNA sequence analysis of a 215 base-pair region of the mitochondrial cytochrome b gene was used to examine genetic variation and search for evidence of an increased mutation rate in black-crowned night-herons. We examined five populations exposed to environmental contamination (primarily PAHs and PCBs) and one reference population from the eastern U.S. There was no evidence of a high mutation rate even within populations previously shown to exhibit increased variation in DNA content among somatic cells as a result of petroleum exposure. Three haplotypes were observed among 99 individuals. The low level of variability could be evidence for a genetic bottleneck, or that cytochrome b is too conservative for use in population genetic studies of this species. With the exception of one population from Louisiana, pair-wise Phist estimates were very low, indicative of little population structure and potentially high rates of effective migration among populations.
First report of Beet western yellows virus infecting Epiphyllum spp

USDA-ARS?s Scientific Manuscript database

Beet western yellow virus (BWYV) was identified from an orchid cactus (Epiphyllum spp.) hybrid without obvious symptoms by high-throughput sequencing. The nearly complete genomic sequence of 5,458 nucleotides of the virus was determined. The isolate has the highest nucleotide sequence identity (93%)...
A new betasatellite associated with cotton leaf curl Burewala virus infecting tomato in India: influence on symptoms and viral accumulation.

PubMed

Kumar, Jitendra; Gunapati, Samatha; Singh, Sudhir P; Kumar, Abhinav; Lalit, Adarsh; Sharma, Naresh C; Puranik, Rekha; Tuli, Rakesh

2013-06-01

A begomovirus and its associated alpha- and betasatellite were detected in tomato plants affected with leaf curl disease. Based on a nucleotide sequence identity of 99 %, this begomovirus was designated an isolate of cotton leaf curl Burewala virus (CLCuBuV). The alphasatellite exhibited 93 % sequence identity to cotton leaf curl Burewala alphasatellite (CLCuBuA) and is hence referred to here as a variant of CLCuBuA. The detected betasatellite was recombinant in nature and showed 70 % sequence identity to the known betasatellites. Inoculation of healthy tomato with CLCuBuV plus betasatellite, either in the presence or the absence of alphasatellite, led to typical leaf curling, while inoculation with CLCuBuV in the absence of betasatellite resulted in mild symptoms. This confirmed the role of the betasatellite in expression of disease symptoms. We propose to name the newly detected betasatellite tomato leaf curl Hajipur betasatellite (ToLCHJB).
An Accurate Scalable Template-based Alignment Algorithm

PubMed Central

Gardner, David P.; Xu, Weijia; Miranker, Daniel P.; Ozer, Stuart; Cannone, Jamie J.; Gutell, Robin R.

2013-01-01

The rapid determination of nucleic acid sequences is increasing the number of sequences that are available. Inherent in a template or seed alignment is the culmination of structural and functional constraints that are selecting those mutations that are viable during the evolution of the RNA. While we might not understand these structural and functional, template-based alignment programs utilize the patterns of sequence conservation to encapsulate the characteristics of viable RNA sequences that are aligned properly. We have developed a program that utilizes the different dimensions of information in rCAD, a large RNA informatics resource, to establish a profile for each position in an alignment. The most significant include sequence identity and column composition in different phylogenetic taxa. We have compared our methods with a maximum of eight alternative alignment methods on different sets of 16S and 23S rRNA sequences with sequence percent identities ranging from 50% to 100%. The results showed that CRWAlign outperformed the other alignment methods in both speed and accuracy. A web-based alignment server is available at http://www.rna.ccbb.utexas.edu/SAE/2F/CRWAlign. PMID:24772376
rRNA Gene Internal Transcribed Spacer 1 and 2 Sequences of Asexual, Anthropophilic Dermatophytes Related to Trichophyton rubrum

PubMed Central

Summerbell, R. C.; Haugland, R. A.; Li, A.; Gupta, A. K.

1999-01-01

The ribosomal region spanning the two internal transcribed spacer (ITS) regions and the 5.8S ribosomal DNA region was sequenced for asexual, anthropophilic dermatophyte species with morphological similarity to Trichophyton rubrum, as well as for members of the three previously delineated, related major clades in the T. mentagrophytes complex. Representative isolates of T. raubitschekii, T. fischeri, and T. kanei were found to have ITS sequences identical to that of T. rubrum. The ITS sequences of T. soudanense and T. megninii differed from that of T. rubrum by only a small number of base pairs. Their continued status as species, however, appears to meet criteria outlined in the population genetics-based cohesion species concept of A. R. Templeton. The ITS sequence of T. tonsurans differed from that of the biologically distinct T. equinum by only 1 bp, while the ITS sequence of the recently described species T. krajdenii had a sequence identical to that of T. mentagrophytes isolates related to the teleomorph Arthroderma vanbreuseghemii. PMID:10565922
Sarcocystis spp. in domestic sheep in Kunming City, China: prevalence, morphology, and molecular characteristics.

PubMed

Hu, Jun-Jie; Huang, Si; Wen, Tao; Esch, Gerald W; Liang, Yu; Li, Hong-Liang

2017-01-01

Sheep (Ovis aries) are intermediate hosts for at least six named species of Sarcocystis: S. tenella, S. arieticanis, S. gigantea, S. medusiformis, S. mihoensis, and S. microps. Here, only two species, S. tenella and S. arieticanis, were found in 79 of 86 sheep (91.9%) in Kunming, China, based on their morphological characteristics. Four genetic markers, i.e., 18S rRNA gene, 28S rRNA gene, mitochondrial cox1 gene, and ITS-1 region, were sequenced and characterized for the two species of Sarcocystis. Sequences of the three former markers for S. tenella shared high identities with those of S. capracanis in goats, i.e., 99.0%, 98.3%, and 93.6%, respectively; the same three marker sequences of S. arieticanis shared high identities with those of S. hircicanis in goats, i.e., 98.5%, 96.5%, and 92.5%, respectively. No sequences in GenBank were found to significantly resemble the ITS-1 regions of S. tenella and S. arieticanis. Identities of the four genetic markers for S. tenella and S. arieticanis were 96.3%, 95.4%, 82.5%, and 66.2%, respectively. © J.-J. Hu et al., published by EDP Sciences, 2017.
Evidence of three new members of malignant catarrhal fever virus group in Muskox (Ovibos moschatus), Nubian ibex (Capra nubiana), and gemsbok (Oryx gazella)

USGS Publications Warehouse

Li, H.; Gailbreath, K.; Bender, L.C.; West, K.; Keller, J.; Crawford, T.B.

2003-01-01

Six members of the malignant catarrhal fever (MCF) virus group of ruminant rhadinoviruses have been identified to date. Four of these viruses are clearly associated with clinical disease: alcelaphine herpesvirus 1 (AlHV-1) carried by wildebeest (Connochaetes spp.); ovine herpesvirus 2 (OvHV-2), ubiquitous in domestic sheep; caprine herpesvirus 2 (CpHV-2), endemic in domestic goats; and the virus of unknown origin found causing classic MCF in white-tailed deer (Odocoileus virginianus; MCFV-WTD). Using serology and polymerase chain reaction with degenerate primers targeting a portion of the herpesviral DNA polymerase gene, evidence of three previously unrecognized rhadinoviruses in the MCF virus group was found in muskox (Ovibos moschatus), Nubian ibex (Capra nubiana), and gemsbok (South African oryx, Oryx gazella), respectively. Based on sequence alignment, the viral sequence in the muskox is most closely related to MCFV-WTD (81.5% sequence identity) and that in the Nubian ibex is closest to CpHV-2 (89.3% identity). The viral sequence in the gemsbok is most closely related to AlHV-1 (85.1% identity). No evidence of disease association with these viruses has been found. ?? Wildlife Disease Association 2003.
Mitochondrial DNA Evidence Supports the Hypothesis that Triodontophorus Species Belong to Cyathostominae

PubMed Central

Gao, Yuan; Zhang, Yan; Yang, Xin; Qiu, Jian-Hua; Duan, Hong; Xu, Wen-Wen; Chang, Qiao-Cheng; Wang, Chun-Ren

2017-01-01

Equine strongyles, the significant nematode pathogens of horses, are characterized by high quantities and species abundance, but classification of this group of parasitic nematodes is debated. Mitochondrial (mt) genome DNA data are often used to address classification controversies. Thus, the objectives of this study were to determine the complete mt genomes of three Cyathostominae nematode species (Cyathostomum catinatum, Cylicostephanus minutus, and Poteriostomum imparidentatum) of horses and reconstruct the phylogenetic relationship of Strongylidae with other nematodes in Strongyloidea to test the hypothesis that Triodontophorus spp. belong to Cyathostominae using the mt genomes. The mt genomes of Cy. catinatum, Cs. minutus, and P. imparidentatum were 13,838, 13,826, and 13,817 bp in length, respectively. Complete mt nucleotide sequence comparison of all Strongylidae nematodes revealed that sequence identity ranged from 77.8 to 91.6%. The mt genome sequences of Triodontophorus species had relatively high identity with Cyathostominae nematodes, rather than Strongylus species of the same subfamily (Strongylinae). Comparative analyses of mt genome organization for Strongyloidea nematodes sequenced to date revealed that members of this superfamily possess identical gene arrangements. Phylogenetic analyses using mtDNA data indicated that the Triodontophorus species clustered with Cyathostominae species instead of Strongylus species. The present study first determined the complete mt genome sequences of Cy. catinatum, Cs. minutus, and P. imparidentatum, which will provide novel genetic markers for further studies of Strongylidae taxonomy, population genetics, and systematics. Importantly, sequence comparison and phylogenetic analyses based on mtDNA sequences supported the hypothesis that Triodontophorus belongs to Cyathostominae. PMID:28824575
Genetic Characteristics of Coronaviruses from Korean Bats in 2016.

PubMed

Lee, Saemi; Jo, Seong-Deok; Son, Kidong; An, Injung; Jeong, Jipseol; Wang, Seung-Jun; Kim, Yongkwan; Jheong, Weonhwa; Oem, Jae-Ku

2018-01-01

Bats have increasingly been recognized as the natural reservoir of severe acute respiratory syndrome (SARS), coronavirus, and other coronaviruses found in mammals. However, little research has been conducted on bat coronaviruses in South Korea. In this study, bat samples (332 oral swabs, 245 fecal samples, 38 urine samples, and 57 bat carcasses) were collected at 33 natural bat habitat sites in South Korea. RT-PCR and sequencing were performed for specific coronavirus genes to identify the bat coronaviruses in different bat samples. Coronaviruses were detected in 2.7% (18/672) of the samples: 13 oral swabs from one species of the family Rhinolophidae, and four fecal samples and one carcass (intestine) from three species of the family Vespertiliodae. To determine the genetic relationships of the 18 sequences obtained in this study and previously known coronaviruses, the nucleotide sequences of a 392-nt region of the RNA-dependent RNA polymerase (RdRp) gene were analyzed phylogenetically. Thirteen sequences belonging to SARS-like betacoronaviruses showed the highest nucleotide identity (97.1-99.7%) with Bat-CoV-JTMC15 reported in China. The other five sequences were most similar to MERS-like betacoronaviruses. Four nucleotide sequences displayed the highest identity (94.1-95.1%) with Bat-CoV-HKU5 from Hong Kong. The one sequence from a carcass showed the highest nucleotide identity (99%) with Bat-CoV-SC2013 from China. These results suggest that careful surveillance of coronaviruses from bats should be continued, because animal and human infections may result from the genetic variants present in bat coronavirus reservoirs.
DNA Barcodes for Species Identification in the Hyperdiverse Ant Genus Pheidole (Formicidae: Myrmicinae)

PubMed Central

Ng'endo, R.N.; Osiemo, Z.B.; Brandl, R.

2013-01-01

DNA sequencing is increasingly being used to assist in species identification in order to overcome taxonomic impediment. However, few studies attempt to compare the results of these molecular studies with a more traditional species delineation approach based on morphological characters. Mitochondrial DNA Cytochrome oxidase subunit 1 (CO1) gene was sequenced, measuring 636 base pairs, from 47 ants of the genus Pheidole (Formicidae: Myrmicinae) collected in the Brazilian Atlantic Forest to test whether the morphology-based assignment of individuals into species is supported by DNA-based species delimitation. Twenty morphospecies were identified, whereas the barcoding analysis identified 19 Molecular Operational Taxonomic Units (MOTUs). Fifteen out of the 19 DNA-based clusters allocated, using sequence divergence thresholds of 2% and 3%, matched with morphospecies. Both thresholds yielded the same number of MOTUs. Only one MOTU was successfully identified to species level using the CO1 sequences of Pheidole species already in the Genbank. The average pairwise sequence divergence for all 47 sequences was 19%, ranging between 0–25%. In some cases, however, morphology and molecular based methods differed in their assignment of individuals to morphospecies or MOTUs. The occurrence of distinct mitochondrial lineages within morphological species highlights groups for further detailed genetic and morphological studies, and therefore a pluralistic approach using several methods to understand the taxonomy of difficult lineages is advocated. PMID:23902257
Molecular characterization of phytoplasma associated with four important ornamental plant species in India and identification of natural potential spread sources.

PubMed

Gopala; Rao, G P

2018-02-01

Phytoplasma suspected symptoms of phyllody, witches' broom, leaf yellowing, stunting and little leaf were observed in Chrysanthemum morifolium, Bougainvillea glabra, Jasminum sambac and Callistephus chinensis during survey of flower nurseries and experimental ornamental fields at Delhi, Maharashtra, Tamil Nadu and Karnataka from 2014 to 2016. Pleomorphic bodies typical to phytoplasma structures were observed in the phloem sieve elements of ultrathin sections of all the four symptomatic ornamental plants (stem tissue) in transmission electron microscope. Amplification of 1.8 and 1.2 kb phytoplasma DNA products was observed in all the four test plants in PCR assays using universal primer pairs P1/P7 followed by nested primer pair R16F2n/R16R2, respectively. Pairwise sequence comparison, phylogeny and virtual RFLP analysis of 16S rDNA sequences confirmed the association of two phytoplasma subgroups (16SrI-B and 16SrII-D) in four ornamental plant species. ' Ca. P. aurantifolia ' subgroup D (16SrII-D) was found associated with chrysanthemum phyllody and leaf yellowing at Delhi and Tamil Nadu, bougainvillea little leaf and yellowing at Delhi and Chinese aster phyllody at Bengaluru, Karnataka. However, jasmine little leaf and yellowing at Bengaluru, Karnataka and chrysanthemum stunting at Pune were found to be associated with ' Ca . P. asteris ' subgroup B-related strains (16SrI-B). The identification of 16SrII-D subgroup phytoplasma infecting bougainvillea and 16SrI-B subgroup infecting jasmine are the new reports to the world. Besides weed species, Cannabis sativa showing witches' broom in jasmine fields at Bengaluru and Parthenium hysterophorus showing witches' broom symptoms in chrysanthemum fields at Delhi were identified to be caused by phytoplasma strains classified under subgroups 16SrI-B and 16SrII-D, respectively, by PCR assays and 16Sr DNA sequence comparison analysis. Among the three major leafhopper species identified, only Hishimonas phycitis was identified positive for 16SrI-B and 16SrII-D subgroups of phytoplasmas from chrysanthemum fields at Delhi and jasmine fields at Bengaluru, respectively. The identity of similar phytoplasma strains infecting ornamental species in leafhopper and the weed species in the present study suggested that H. phycitis and weeds may act as potential natural sources for secondary spread of the identified phytoplasma strains.
Homology-based Modeling of Rhodopsin-like Family Members in the Inactive State: Structural Analysis and Deduction of Tips for Modeling and Optimization.

PubMed

Pappalardo, Matteo; Rayan, Mahmoud; Abu-Lafi, Saleh; Leonardi, Martha E; Milardi, Danilo; Guccione, Salvatore; Rayan, Anwar

2017-08-01

Modeling G-Protein Coupled Receptors (GPCRs) is an emergent field of research, since utility of high-quality models in receptor structure-based strategies might facilitate the discovery of interesting drug candidates. The findings from a quantitative analysis of eighteen resolved structures of rhodopsin family "A" receptors crystallized with antagonists and 153 pairs of structures are described. A strategy termed endeca-amino acids fragmentation was used to analyze the structures models aiming to detect the relationship between sequence identity and Root Mean Square Deviation (RMSD) at each trans-membrane-domain. Moreover, we have applied the leave-one-out strategy to study the shiftiness likelihood of the helices. The type of correlation between sequence identity and RMSD was studied using the aforementioned set receptors as representatives of membrane proteins and 98 serine proteases with 4753 pairs of structures as representatives of globular proteins. Data analysis using fragmentation strategy revealed that there is some extent of correlation between sequence identity and global RMSD of 11AA width windows. However, spatial conservation is not always close to the endoplasmic side as was reported before. A comparative study with globular proteins shows that GPCRs have higher standard deviation and higher slope in the graph with correlation between sequence identity and RMSD. The extracted information disclosed in this paper could be incorporated in the modeling protocols while using technique for model optimization and refinement. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Bradyrhizobium paxllaeri sp. nov. and Bradyrhizobium icense sp. nov., nitrogen-fixing rhizobial symbionts of Lima bean (Phaseolus lunatus L.) in Peru.

PubMed

Durán, David; Rey, Luis; Mayo, Juan; Zúñiga-Dávila, Doris; Imperial, Juan; Ruiz-Argüeso, Tomás; Martínez-Romero, Esperanza; Ormeño-Orrillo, Ernesto

2014-06-01

A group of strains isolated from root nodules of Phaseolus lunatus (Lima bean) in Peru were characterized by genotypic, genomic and phenotypic methods. All strains possessed identical 16S rRNA gene sequences that were 99.9% identical to that of Bradyrhizobium lablabi CCBAU 23086(T). Despite having identical 16S rRNA gene sequences, the Phaseolus lunatus strains could be divided into two clades by sequence analysis of recA, atpD, glnII, dnaK and gyrB genes. The genome sequence of a representative of each clade was obtained and compared to the genomes of closely related species of the genus Bradyrhizobium. Average nucleotide identity values below the species circumscription threshold were obtained when comparing the two clades to each other (88.6%) and with all type strains of the genus Bradyrhizobium (≤92.9%). Phenotypes distinguishing both clades from all described and closely related species of the genus Bradyrhizobium were found. On the basis of the results obtained, two novel species, Bradyrhizobium paxllaeri sp. nov. (type strain LMTR 21(T) = DSM 18454(T) = HAMBI 2911(T)) and Bradyrhizobium icense sp. nov. (type strain LMTR 13(T) = HAMBI 3584(T) = CECT 8509(T) = CNPSo 2583(T)), are proposed to accommodate the uncovered clades of Phaseolus lunatus bradyrhizobia. These species share highly related but distinct nifH and nodC symbiosis genes. © 2014 IUMS.
Cloning and Sequence Analysis of Vibrio halioticoli Genes Encoding Three Types of Polyguluronate Lyase.

PubMed

Sugimura; Sawabe; Ezura

2000-01-01

The alginate lyase-coding genes of Vibrio halioticoli IAM 14596(T), which was isolated from the gut of the abalone Haliotis discus hannai, were cloned using plasmid vector pUC 18, and expressed in Escherichia coli. Three alginate lyase-positive clones, pVHB, pVHC, and pVHE, were obtained, and all clones expressed the enzyme activity specific for polyguluronate. Three genes, alyVG1, alyVG2, and alyVG3, encoding polyguluronate lyase were sequenced: alyVG1 from pVHB was composed of a 1056-bp open reading frame (ORF) encoding 352 amino acid residues; alyVG2 gene from pVHC was composed of a 993-bp ORF encoding 331 amino acid residues; and alyVG3 gene from pVHE was composed of a 705-bp ORF encoding 235 amino acid residues. Comparison of nucleotide and deduced amino acid sequences among AlyVG1, AlyVG2, and AlyVG3 revealed low homologies. The identity value between AlyVG1 and AlyVG2 was 18.7%, and that between AlyVG2 and AlyVG3 was 17.0%. A higher identity value (26.0%) was observed between AlyVG1 and AlyVG3. Sequence comparison among known polyguluronate lyases including AlyVG1, AlyVG2, and AlyVG3 also did not reveal an identical region in these sequences. However, AlyVG1 showed the highest identity value (36.2%) and the highest similarity (73.3%) to AlyA from Klebsiella pneumoniae. A consensus region comprising nine amino acid (YFKAGXYXQ) in the carboxy-terminal region previously reported by Mallisard and colleagues was observed only in AlyVG1 and AlyVG2.
Identification of a third feline Demodex species through partial sequencing of the 16S rDNA and frequency of Demodex species in 74 cats using a PCR assay.

PubMed

Ferreira, Diana; Sastre, Natalia; Ravera, Iván; Altet, Laura; Francino, Olga; Bardagí, Mar; Ferrer, Lluís

2015-08-01

Demodex cati and Demodex gatoi are considered the two Demodex species of cats. However, several reports have identified Demodex mites morphologically different from these two species. The differentiation of Demodex mites is usually based on morphology, but within the same species different morphologies can occur. DNA amplification/sequencing has been used effectively to identify and differentiate Demodex mites in humans, dogs and cats. The aim was to develop a PCR technique to identify feline Demodex mites and use this technique to investigate the frequency of Demodex in cats. Demodex cati, D. gatoi and Demodex mites classified morphologically as the third unnamed feline species were obtained. Hair samples were taken from 74 cats. DNA was extracted; a 330 bp fragment of the 16S rDNA was amplified and sequenced. The sequences of D. cati and D. gatoi shared >98% identity with those published on GenBank. The sequence of the third unnamed species showed 98% identity with a recently published feline Demodex sequence and only 75.2 and 70.9% identity with D. gatoi and D. cati sequences, respectively. Demodex DNA was detected in 19 of 74 cats tested; 11 DNA sequences corresponded to Demodex canis, five to Demodex folliculorum, three to D. cati and two to Demodex brevis. Three Demodex species can be found in cats, because the third unnamed Demodex species is likely to be a distinct species. Apart from D. cati and D. gatoi, DNA from D. canis, D. folliculorum and D. brevis was found on feline skin. © 2015 ESVD and ACVD.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.