SVM-dependent pairwise HMM: an application to protein pairwise alignments.
Orlando, Gabriele; Raimondi, Daniele; Khan, Taushif; Lenaerts, Tom; Vranken, Wim F
2017-12-15
Methods able to provide reliable protein alignments are crucial for many bioinformatics applications. In the last years many different algorithms have been developed and various kinds of information, from sequence conservation to secondary structure, have been used to improve the alignment performances. This is especially relevant for proteins with highly divergent sequences. However, recent works suggest that different features may have different importance in diverse protein classes and it would be an advantage to have more customizable approaches, capable to deal with different alignment definitions. Here we present Rigapollo, a highly flexible pairwise alignment method based on a pairwise HMM-SVM that can use any type of information to build alignments. Rigapollo lets the user decide the optimal features to align their protein class of interest. It outperforms current state of the art methods on two well-known benchmark datasets when aligning highly divergent sequences. A Python implementation of the algorithm is available at http://ibsquare.be/rigapollo. wim.vranken@vub.be. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Zhou, Carol L Ecale
2015-01-01
In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
DIALIGN P: fast pair-wise and multiple sequence alignment using parallel processors.
Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard
2004-09-09
Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope.
AlignMe—a membrane protein sequence alignment web server
Stamm, Marcus; Staritzbichler, René; Khafizov, Kamil; Forrest, Lucy R.
2014-01-01
We present a web server for pair-wise alignment of membrane protein sequences, using the program AlignMe. The server makes available two operational modes of AlignMe: (i) sequence to sequence alignment, taking two sequences in fasta format as input, combining information about each sequence from multiple sources and producing a pair-wise alignment (PW mode); and (ii) alignment of two multiple sequence alignments to create family-averaged hydropathy profile alignments (HP mode). For the PW sequence alignment mode, four different optimized parameter sets are provided, each suited to pairs of sequences with a specific similarity level. These settings utilize different types of inputs: (position-specific) substitution matrices, secondary structure predictions and transmembrane propensities from transmembrane predictions or hydrophobicity scales. In the second (HP) mode, each input multiple sequence alignment is converted into a hydrophobicity profile averaged over the provided set of sequence homologs; the two profiles are then aligned. The HP mode enables qualitative comparison of transmembrane topologies (and therefore potentially of 3D folds) of two membrane proteins, which can be useful if the proteins have low sequence similarity. In summary, the AlignMe web server provides user-friendly access to a set of tools for analysis and comparison of membrane protein sequences. Access is available at http://www.bioinfo.mpg.de/AlignMe PMID:24753425
Saving the Best for Last? A Cross-Species Analysis of Choices between Reinforcer Sequences
ERIC Educational Resources Information Center
Andrade, Leonardo F.; Hackenberg, Timothy D.
2012-01-01
Two experiments were conducted to compare choices between sequences of reinforcers in pigeon (Experiment 1) and human (Experiment 2) subjects, using functionally analogous procedures. The subjects made pairwise choices among 3 sequence types, all of which provided the same overall reinforcement rate, but differed in their temporal patterning.…
Score distributions of gapped multiple sequence alignments down to the low-probability tail
NASA Astrophysics Data System (ADS)
Fieth, Pascal; Hartmann, Alexander K.
2016-08-01
Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.
Breaking the computational barriers of pairwise genome comparison.
Torreno, Oscar; Trelles, Oswaldo
2015-08-11
Conventional pairwise sequence comparison software algorithms are being used to process much larger datasets than they were originally designed for. This can result in processing bottlenecks that limit software capabilities or prevent full use of the available hardware resources. Overcoming the barriers that limit the efficient computational analysis of large biological sequence datasets by retrofitting existing algorithms or by creating new applications represents a major challenge for the bioinformatics community. We have developed C libraries for pairwise sequence comparison within diverse architectures, ranging from commodity systems to high performance and cloud computing environments. Exhaustive tests were performed using different datasets of closely- and distantly-related sequences that span from small viral genomes to large mammalian chromosomes. The tests demonstrated that our solution is capable of generating high quality results with a linear-time response and controlled memory consumption, being comparable or faster than the current state-of-the-art methods. We have addressed the problem of pairwise and all-versus-all comparison of large sequences in general, greatly increasing the limits on input data size. The approach described here is based on a modular out-of-core strategy that uses secondary storage to avoid reaching memory limits during the identification of High-scoring Segment Pairs (HSPs) between the sequences under comparison. Software engineering concepts were applied to avoid intermediate result re-calculation, to minimise the performance impact of input/output (I/O) operations and to modularise the process, thus enhancing application flexibility and extendibility. Our computationally-efficient approach allows tasks such as the massive comparison of complete genomes, evolutionary event detection, the identification of conserved synteny blocks and inter-genome distance calculations to be performed more effectively.
Parente, Daniel J; Ray, J Christian J; Swint-Kruse, Liskin
2015-12-01
As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; "central" positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints-detectable by divergent algorithms--that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions. © 2015 Wiley Periodicals, Inc.
Hong, Seung Beom; Kim, Ki Cheol; Kim, Wook
2015-07-01
We generated complete mitochondrial DNA (mtDNA) control region sequences from 704 unrelated individuals residing in six major provinces in Korea. In addition to our earlier survey of the distribution of mtDNA haplogroup variation, a total of 560 different haplotypes characterized by 271 polymorphic sites were identified, of which 473 haplotypes were unique. The gene diversity and random match probability were 0.9989 and 0.0025, respectively. According to the pairwise comparison of the 704 control region sequences, the mean number of pairwise differences between individuals was 13.47±6.06. Based on the result of mtDNA control region sequences, pairwise FST genetic distances revealed genetic homogeneity of the Korean provinces on a peninsular level, except in samples from Jeju Island. This result indicates there may be a need to formulate a local mtDNA database for Jeju Island, to avoid bias in forensic parameter estimates caused by genetic heterogeneity of the population. Thus, the present data may help not only in personal identification but also in determining maternal lineages to provide an expanded and reliable Korean mtDNA database. These data will be available on the EMPOP database via accession number EMP00661. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Metabolic network prediction through pairwise rational kernels.
Roche-Lima, Abiel; Domaratzki, Michael; Fristensky, Brian
2014-09-26
Metabolic networks are represented by the set of metabolic pathways. Metabolic pathways are a series of biochemical reactions, in which the product (output) from one reaction serves as the substrate (input) to another reaction. Many pathways remain incompletely characterized. One of the major challenges of computational biology is to obtain better models of metabolic pathways. Existing models are dependent on the annotation of the genes. This propagates error accumulation when the pathways are predicted by incorrectly annotated genes. Pairwise classification methods are supervised learning methods used to classify new pair of entities. Some of these classification methods, e.g., Pairwise Support Vector Machines (SVMs), use pairwise kernels. Pairwise kernels describe similarity measures between two pairs of entities. Using pairwise kernels to handle sequence data requires long processing times and large storage. Rational kernels are kernels based on weighted finite-state transducers that represent similarity measures between sequences or automata. They have been effectively used in problems that handle large amount of sequence information such as protein essentiality, natural language processing and machine translations. We create a new family of pairwise kernels using weighted finite-state transducers (called Pairwise Rational Kernel (PRK)) to predict metabolic pathways from a variety of biological data. PRKs take advantage of the simpler representations and faster algorithms of transducers. Because raw sequence data can be used, the predictor model avoids the errors introduced by incorrect gene annotations. We then developed several experiments with PRKs and Pairwise SVM to validate our methods using the metabolic network of Saccharomyces cerevisiae. As a result, when PRKs are used, our method executes faster in comparison with other pairwise kernels. Also, when we use PRKs combined with other simple kernels that include evolutionary information, the accuracy values have been improved, while maintaining lower construction and execution times. The power of using kernels is that almost any sort of data can be represented using kernels. Therefore, completely disparate types of data can be combined to add power to kernel-based machine learning methods. When we compared our proposal using PRKs with other similar kernel, the execution times were decreased, with no compromise of accuracy. We also proved that by combining PRKs with other kernels that include evolutionary information, the accuracy can also also be improved. As our proposal can use any type of sequence data, genes do not need to be properly annotated, avoiding accumulation errors because of incorrect previous annotations.
Bergin, Sarah M; Periaswamy, Balamurugan; Barkham, Timothy; Chua, Hong Choon; Mok, Yee Ming; Fung, Daniel Shuen Sheng; Su, Alex Hsin Chuan; Lee, Yen Ling; Chua, Ming Lai Ivan; Ng, Poh Yong; Soon, Wei Jia Wendy; Chu, Collins Wenhan; Tan, Siyun Lucinda; Meehan, Mary; Ang, Brenda Sze Peng; Leo, Yee Sin; Holden, Matthew T G; De, Partha; Hsu, Li Yang; Chen, Swaine L; de Sessions, Paola Florez; Marimuthu, Kalisvar
2018-05-09
OBJECTIVEWe report the utility of whole-genome sequencing (WGS) conducted in a clinically relevant time frame (ie, sufficient for guiding management decision), in managing a Streptococcus pyogenes outbreak, and present a comparison of its performance with emm typing.SETTINGA 2,000-bed tertiary-care psychiatric hospital.METHODSActive surveillance was conducted to identify new cases of S. pyogenes. WGS guided targeted epidemiological investigations, and infection control measures were implemented. Single-nucleotide polymorphism (SNP)-based genome phylogeny, emm typing, and multilocus sequence typing (MLST) were performed. We compared the ability of WGS and emm typing to correctly identify person-to-person transmission and to guide the management of the outbreak.RESULTSThe study included 204 patients and 152 staff. We identified 35 patients and 2 staff members with S. pyogenes. WGS revealed polyclonal S. pyogenes infections with 3 genetically distinct phylogenetic clusters (C1-C3). Cluster C1 isolates were all emm type 4, sequence type 915 and had pairwise SNP differences of 0-5, which suggested recent person-to-person transmissions. Epidemiological investigation revealed that cluster C1 was mediated by dermal colonization and transmission of S. pyogenes in a male residential ward. Clusters C2 and C3 were genomically diverse, with pairwise SNP differences of 21-45 and 26-58, and emm 11 and mostly emm120, respectively. Clusters C2 and C3, which may have been considered person-to-person transmissions by emm typing, were shown by WGS to be unlikely by integrating pairwise SNP differences with epidemiology.CONCLUSIONSWGS had higher resolution than emm typing in identifying clusters with recent and ongoing person-to-person transmissions, which allowed implementation of targeted intervention to control the outbreak.Infect Control Hosp Epidemiol 2018;1-9.
Polanski, A; Kimmel, M; Chakraborty, R
1998-05-12
Distribution of pairwise differences of nucleotides from data on a sample of DNA sequences from a given segment of the genome has been used in the past to draw inferences about the past history of population size changes. However, all earlier methods assume a given model of population size changes (such as sudden expansion), parameters of which (e.g., time and amplitude of expansion) are fitted to the observed distributions of nucleotide differences among pairwise comparisons of all DNA sequences in the sample. Our theory indicates that for any time-dependent population size, N(tau) (in which time tau is counted backward from present), a time-dependent coalescence process yields the distribution, p(tau), of the time of coalescence between two DNA sequences randomly drawn from the population. Prediction of p(tau) and N(tau) requires the use of a reverse Laplace transform known to be unstable. Nevertheless, simulated data obtained from three models of monotone population change (stepwise, exponential, and logistic) indicate that the pattern of a past population size change leaves its signature on the pattern of DNA polymorphism. Application of the theory to the published mtDNA sequences indicates that the current mtDNA sequence variation is not inconsistent with a logistic growth of the human population.
Bastien, Olivier; Ortet, Philippe; Roy, Sylvaine; Maréchal, Eric
2005-03-10
Popular methods to reconstruct molecular phylogenies are based on multiple sequence alignments, in which addition or removal of data may change the resulting tree topology. We have sought a representation of homologous proteins that would conserve the information of pair-wise sequence alignments, respect probabilistic properties of Z-scores (Monte Carlo methods applied to pair-wise comparisons) and be the basis for a novel method of consistent and stable phylogenetic reconstruction. We have built up a spatial representation of protein sequences using concepts from particle physics (configuration space) and respecting a frame of constraints deduced from pair-wise alignment score properties in information theory. The obtained configuration space of homologous proteins (CSHP) allows the representation of real and shuffled sequences, and thereupon an expression of the TULIP theorem for Z-score probabilities. Based on the CSHP, we propose a phylogeny reconstruction using Z-scores. Deduced trees, called TULIP trees, are consistent with multiple-alignment based trees. Furthermore, the TULIP tree reconstruction method provides a solution for some previously reported incongruent results, such as the apicomplexan enolase phylogeny. The CSHP is a unified model that conserves mutual information between proteins in the way physical models conserve energy. Applications include the reconstruction of evolutionary consistent and robust trees, the topology of which is based on a spatial representation that is not reordered after addition or removal of sequences. The CSHP and its assigned phylogenetic topology, provide a powerful and easily updated representation for massive pair-wise genome comparisons based on Z-score computations.
Delineating slowly and rapidly evolving fractions of the Drosophila genome.
Keith, Jonathan M; Adams, Peter; Stephen, Stuart; Mattick, John S
2008-05-01
Evolutionary conservation is an important indicator of function and a major component of bioinformatic methods to identify non-protein-coding genes. We present a new Bayesian method for segmenting pairwise alignments of eukaryotic genomes while simultaneously classifying segments into slowly and rapidly evolving fractions. We also describe an information criterion similar to the Akaike Information Criterion (AIC) for determining the number of classes. Working with pairwise alignments enables detection of differences in conservation patterns among closely related species. We analyzed three whole-genome and three partial-genome pairwise alignments among eight Drosophila species. Three distinct classes of conservation level were detected. Sequences comprising the most slowly evolving component were consistent across a range of species pairs, and constituted approximately 62-66% of the D. melanogaster genome. Almost all (>90%) of the aligned protein-coding sequence is in this fraction, suggesting much of it (comprising the majority of the Drosophila genome, including approximately 56% of non-protein-coding sequences) is functional. The size and content of the most rapidly evolving component was species dependent, and varied from 1.6% to 4.8%. This fraction is also enriched for protein-coding sequence (while containing significant amounts of non-protein-coding sequence), suggesting it is under positive selection. We also classified segments according to conservation and GC content simultaneously. This analysis identified numerous sub-classes of those identified on the basis of conservation alone, but was nevertheless consistent with that classification. Software, data, and results available at www.maths.qut.edu.au/-keithj/. Genomic segments comprising the conservation classes available in BED format.
Bazsalovicsová, Eva; Králová-Hromadová, Ivica; Stefka, Jan; Scholz, Tomáš
2012-05-01
Sequence structure of complete internal transcribed spacer 1 and 2 (ITS1 and ITS2) of the ribosomal DNA region and partial mitochondrial cytochrome c oxidase subunit I (cox1) gene sequences were studied in the monozoic tapeworm Atractolytocestus sagittatus (Kulakovskaya et Akhmerov, 1965) (Cestoda: Caryophyllidea), a parasite of common carp (Cyprinus carpio carpio L.). Intraindividual sequence diversity was observed in both ribosomal spacers. In ITS1, a total number of 19 recombinant clones yielded eight different sequence types (pairwise sequence identity, 99.7-100%) which, however, did not resemble the structure typical for divergent intragenomic ITS copies (paralogues). Polymorphism was displayed by several single nucleotide mutations present exclusively in single clones, but variation in the number of short repetitive motifs was not observed. In ITS2, a total of 21 recombinant clones yielded ten different sequence types (pairwise sequence identity, 97.5-100%). They were mostly characterized by a varying number of (TCGT)(n) repeats resulting in assortment of ITS2 sequences into two sequence variants, which reflected the structure specific for ITS paralogues. The third DNA region analysed, mitochondrial cox1 gene (669 bp) was detected to be 100% identical in all studied A. sagittatus individuals. Comparison of molecular data on A. sagittatus with those on Atractolytocestus huronensis Anthony, 1958, an invasive parasite of common carp, has shown that interspecific differences significantly exceeded intraspecific variation in both ribosomal spacers (81.4-82.5% in ITS1, 74.4-75.2% in ITS2) as well as in mitochondrial cox1, which confirms validity of both congeneric tapeworms parasitic in the same fish host.
Krajewski, C; Fain, M G; Buckley, L; King, D G
1999-11-01
ki ctes over whether molecular sequence data should be partitioned for phylogenetic analysis often confound two types of heterogeneity among partitions. We distinguish historical heterogeneity (i.e., different partitions have different evolutionary relationships) from dynamic heterogeneity (i.e., different partitions show different patterns of sequence evolution) and explore the impact of the latter on phylogenetic accuracy and precision with a two-gene, mitochondrial data set for cranes. The well-established phylogeny of cranes allows us to contrast tree-based estimates of relevant parameter values with estimates based on pairwise comparisons and to ascertain the effects of incorporating different amounts of process information into phylogenetic estimates. We show that codon positions in the cytochrome b and NADH dehydrogenase subunit 6 genes are dynamically heterogenous under both Poisson and invariable-sites + gamma-rates versions of the F84 model and that heterogeneity includes variation in base composition and transition bias as well as substitution rate. Estimates of transition-bias and relative-rate parameters from pairwise sequence comparisons were comparable to those obtained as tree-based maximum likelihood estimates. Neither rate-category nor mixed-model partitioning strategies resulted in a loss of phylogenetic precision relative to unpartitioned analyses. We suggest that weighted-average distances provide a computationally feasible alternative to direct maximum likelihood estimates of phylogeny for mixed-model analyses of large, dynamically heterogenous data sets. Copyright 1999 Academic Press.
Birky, C William
2013-01-01
Phylogenetic trees of DNA sequences of a group of specimens may include clades of two kinds: those produced by stochastic processes (random genetic drift) within a species, and clades that represent different species. The ratio of the mean pairwise sequence difference between a pair of clades (K) to the mean pairwise sequence difference within a clade (θ) can be used to determine whether the clades are samples from different species (K/θ ≥ 4) or the same species (K/θ<4) with probability ≥ 0.95. Previously I applied this criterion to delimit species of asexual organisms. Here I use data from the literature to show how it can also be applied to delimit sexual species using four groups of sexual organisms as examples: ravens, spotted leopards, sea butterflies, and liverworts. Mitochondrial or chloroplast genes are used because these segregate earlier during speciation than most nuclear genes and hence detect earlier stages of speciation. In several cases the K/θ ratio was greater than 4, confirming the original authors' intuition that the clades were sufficiently different to be assigned to different species. But the K/θ ratio split each of two liverwort species into two evolutionary species, and showed that support for the distinction between the common and Chihuahuan raven species is weak. I also discuss some possible sources of error in using the K/θ ratio; the most significant one would be cases where males migrate between different populations but females do not, making the use of maternally inherited organelle genes problematic. The K/θ ratio must be used with some caution, like all other methods for species delimitation. Nevertheless, it is a simple theory-based quantitative method for using DNA sequences to make rigorous decisions about species delimitation in sexual as well as asexual eukaryotes.
SFESA: a web server for pairwise alignment refinement by secondary structure shifts.
Tong, Jing; Pei, Jimin; Grishin, Nick V
2015-09-03
Protein sequence alignment is essential for a variety of tasks such as homology modeling and active site prediction. Alignment errors remain the main cause of low-quality structure models. A bioinformatics tool to refine alignments is needed to make protein alignments more accurate. We developed the SFESA web server to refine pairwise protein sequence alignments. Compared to the previous version of SFESA, which required a set of 3D coordinates for a protein, the new server will search a sequence database for the closest homolog with an available 3D structure to be used as a template. For each alignment block defined by secondary structure elements in the template, SFESA evaluates alignment variants generated by local shifts and selects the best-scoring alignment variant. A scoring function that combines the sequence score of profile-profile comparison and the structure score of template-derived contact energy is used for evaluation of alignments. PROMALS pairwise alignments refined by SFESA are more accurate than those produced by current advanced alignment methods such as HHpred and CNFpred. In addition, SFESA also improves alignments generated by other software. SFESA is a web-based tool for alignment refinement, designed for researchers to compute, refine, and evaluate pairwise alignments with a combined sequence and structure scoring of alignment blocks. To our knowledge, the SFESA web server is the only tool that refines alignments by evaluating local shifts of secondary structure elements. The SFESA web server is available at http://prodata.swmed.edu/sfesa.
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.
Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut
2018-05-03
Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.
Analysis of Neuronal Sequences Using Pairwise Biases
2015-08-27
semantic memory (knowledge of facts) and implicit memory (e.g., how to ride a bike ). Evidence for the participation of the hippocampus in the formation of...hippocampal formation in an attempt to be cured of severe epileptic seizures. Although the surgery was successful in regards to reducing the frequency and...very different from each other in many ways including duration and number of spikes. Still, these sequences share a similar trend in the general order
Newton, L A; Chilton, N B; Beveridge, I; Gasser, R B
1998-02-01
Genetic differences among Nematodirus spathiger, Nematodirus filicollis, Nematodirus helvetianus and Nematodirus battus in the nucleotide sequence of the second internal transcribed spacer (ITS-2) of ribosomal DNA ranged from 3.9 to 24.7%. Pairwise comparisons of their ITS-2 sequences indicated that the most genetically similar species were N. spathiger and N. helvetianus. N. battus was the most genetically distinct species, with differences ranging from 22.8 to 24.7% with respect to the other three species. Some of the nucleotide differences among species provided different endonuclease restriction sites that could be used in restriction fragment length polymorphism studies. The ITS-2 sequence data may prove useful in studies of the systematics of molineid nematodes.
Bioinformatic prediction and in vivo validation of residue-residue interactions in human proteins
NASA Astrophysics Data System (ADS)
Jordan, Daniel; Davis, Erica; Katsanis, Nicholas; Sunyaev, Shamil
2014-03-01
Identifying residue-residue interactions in protein molecules is important for understanding both protein structure and function in the context of evolutionary dynamics and medical genetics. Such interactions can be difficult to predict using existing empirical or physical potentials, especially when residues are far from each other in sequence space. Using a multiple sequence alignment of 46 diverse vertebrate species we explore the space of allowed sequences for orthologous protein families. Amino acid changes that are known to damage protein function allow us to identify specific changes that are likely to have interacting partners. We fit the parameters of the continuous-time Markov process used in the alignment to conclude that these interactions are primarily pairwise, rather than higher order. Candidates for sites under pairwise epistasis are predicted, which can then be tested by experiment. We report the results of an initial round of in vivo experiments in a zebrafish model that verify the presence of multiple pairwise interactions predicted by our model. These experimentally validated interactions are novel, distant in sequence, and are not readily explained by known biochemical or biophysical features.
Bào, Yīmíng; Kuhn, Jens H
2018-01-01
During the last decade, genome sequence-based classification of viruses has become increasingly prominent. Viruses can be even classified based on coding-complete genome sequence data alone. Nevertheless, classification remains arduous as experts are required to establish phylogenetic trees to depict the evolutionary relationships of such sequences for preliminary taxonomic placement. Pairwise sequence comparison (PASC) of genomes is one of several novel methods for establishing relationships among viruses. This method, provided by the US National Center for Biotechnology Information as an open-access tool, circumvents phylogenetics, and yet PASC results are often in agreement with those of phylogenetic analyses. Computationally inexpensive, PASC can be easily performed by non-taxonomists. Here we describe how to use the PASC tool for the preliminary classification of novel viral hemorrhagic fever-causing viruses.
Multiple alignment-free sequence comparison
Ren, Jie; Song, Kai; Sun, Fengzhu; Deng, Minghua; Reinert, Gesine
2013-01-01
Motivation: Recently, a range of new statistics have become available for the alignment-free comparison of two sequences based on k-tuple word content. Here, we extend these statistics to the simultaneous comparison of more than two sequences. Our suite of statistics contains, first, and , extensions of statistics for pairwise comparison of the joint k-tuple content of all the sequences, and second, , and , averages of sums of pairwise comparison statistics. The two tasks we consider are, first, to identify sequences that are similar to a set of target sequences, and, second, to measure the similarity within a set of sequences. Results: Our investigation uses both simulated data as well as cis-regulatory module data where the task is to identify cis-regulatory modules with similar transcription factor binding sites. We find that although for real data, all of our statistics show a similar performance, on simulated data the Shepp-type statistics are in some instances outperformed by star-type statistics. The multiple alignment-free statistics are more sensitive to contamination in the data than the pairwise average statistics. Availability: Our implementation of the five statistics is available as R package named ‘multiAlignFree’ at be http://www-rcf.usc.edu/∼fsun/Programs/multiAlignFree/multiAlignFreemain.html. Contact: reinert@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23990418
Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts.
Göke, Jonathan; Schulz, Marcel H; Lasserre, Julia; Vingron, Martin
2012-03-01
The identity of cells and tissues is to a large degree governed by transcriptional regulation. A major part is accomplished by the combinatorial binding of transcription factors at regulatory sequences, such as enhancers. Even though binding of transcription factors is sequence-specific, estimating the sequence similarity of two functionally similar enhancers is very difficult. However, a similarity measure for regulatory sequences is crucial to detect and understand functional similarities between two enhancers and will facilitate large-scale analyses like clustering, prediction and classification of genome-wide datasets. We present the standardized alignment-free sequence similarity measure N2, a flexible framework that is defined for word neighbourhoods. We explore the usefulness of adding reverse complement words as well as words including mismatches into the neighbourhood. On simulated enhancer sequences as well as functional enhancers in mouse development, N2 is shown to outperform previous alignment-free measures. N2 is flexible, faster than competing methods and less susceptible to single sequence noise and the occurrence of repetitive sequences. Experiments on the mouse enhancers reveal that enhancers active in different tissues can be separated by pairwise comparison using N2. N2 represents an improvement over previous alignment-free similarity measures without compromising speed, which makes it a good candidate for large-scale sequence comparison of regulatory sequences. The software is part of the open-source C++ library SeqAn (www.seqan.de) and a compiled version can be downloaded at http://www.seqan.de/projects/alf.html. Supplementary data are available at Bioinformatics online.
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daily, Jeffrey A.
Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments
Daily, Jeffrey A.
2016-02-10
Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less
Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments.
Daily, Jeff
2016-02-10
Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. A faster intra-sequence local pairwise alignment implementation is described and benchmarked, including new global and semi-global variants. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 24-core processor system, the highest reported for an implementation based on Farrar's 'striped' approach. Rognes's SWIPE optimal database search application is still generally the fastest available at 1.2 to at best 2.4 times faster than Parasail for sequences shorter than 500 amino acids. However, Parasail was faster for longer sequences. For global alignments, Parasail's prefix scan implementation is generally the fastest, faster even than Farrar's 'striped' approach, however the opal library is faster for single-threaded applications. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. Applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.
Remarkable sequence conservation of the last intron in the PKD1 gene.
Rodova, Marianna; Islam, M Rafiq; Peterson, Kenneth R; Calvet, James P
2003-10-01
The last intron of the PKD1 gene (intron 45) was found to have exceptionally high sequence conservation across four mammalian species: human, mouse, rat, and dog. This conservation did not extend to the comparable intron in pufferfish. Pairwise comparisons for intron 45 showed 91% identity (human vs. dog) to 100% identity (mouse vs. rat) for an average for all four species of 94% identity. In contrast, introns 43 and 44 of the PKD1 gene had average pairwise identities of 57% and 54%, and exons 43, 44, and 45 and the coding region of exon 46 had average pairwise identities of 80%, 84%, 82%, and 80%. Intron 45 is 90 to 95 bp in length, with the major region of sequence divergence being in a central 4-bp to 9-bp variable region. RNA secondary structure analysis of intron 45 predicts a branching stem-loop structure in which the central variable region lies in one loop and the putative branch point sequence lies in another loop, suggesting that the intron adopts a specific stem-loop structure that may be important for its removal. Although intron 45 appears to conform to the class of small, G-triplet-containing introns that are spliced by a mechanism utilizing intron definition, its high sequence conservation may be a reflection of constraints imposed by a unique mechanism that coordinates splicing of this last PKD1 intron with polyadenylation.
Jacquin, Hugo; Gilson, Amy; Shakhnovich, Eugene; Cocco, Simona; Monasson, Rémi
2016-05-01
Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of 'true' LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations.
Memory-efficient dynamic programming backtrace and pairwise local sequence alignment.
Newberg, Lee A
2008-08-15
A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward-backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis. Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10,000. Sample C++-code for optimal backtrace is available in the Supplementary Materials. Supplementary data is available at Bioinformatics online.
mtDNA sequence diversity in Africa.
Watson, E.; Bauer, K.; Aman, R.; Weiss, G.; von Haeseler, A.; Pääbo, S.
1996-01-01
mtDNA sequences were determined from 241 individuals from nine ethnic groups in Africa. When they were compared with published data from other groups, it was found that the !Kung, Mbuti, and Biaka show on the order of 10 times more sequence differences between the three groups, as well as between those and the other groups (the Fulbe, Hausa, Tuareg, Songhai, Kanuri, Yoruba, Mandenka, Somali, Tukana, and Kikuyu), than these other groups do between one other. Furthermore, the pairwise sequence distributions, patterns of coalescence events, and numbers of variable positions relative to the mean sequence difference indicate that the former three groups have been of constant size over time, whereas the latter have expanded in size. We suggest that this reflects subsistence patterns in that the populations that have expanded in size are food producers whereas those that have not are hunters and gatherers. PMID:8755932
Fast and accurate estimation of the covariance between pairwise maximum likelihood distances.
Gil, Manuel
2014-01-01
Pairwise evolutionary distances are a model-based summary statistic for a set of molecular sequences. They represent the leaf-to-leaf path lengths of the underlying phylogenetic tree. Estimates of pairwise distances with overlapping paths covary because of shared mutation events. It is desirable to take these covariance structure into account to increase precision in any process that compares or combines distances. This paper introduces a fast estimator for the covariance of two pairwise maximum likelihood distances, estimated under general Markov models. The estimator is based on a conjecture (going back to Nei & Jin, 1989) which links the covariance to path lengths. It is proven here under a simple symmetric substitution model. A simulation shows that the estimator outperforms previously published ones in terms of the mean squared error.
Fast and accurate estimation of the covariance between pairwise maximum likelihood distances
2014-01-01
Pairwise evolutionary distances are a model-based summary statistic for a set of molecular sequences. They represent the leaf-to-leaf path lengths of the underlying phylogenetic tree. Estimates of pairwise distances with overlapping paths covary because of shared mutation events. It is desirable to take these covariance structure into account to increase precision in any process that compares or combines distances. This paper introduces a fast estimator for the covariance of two pairwise maximum likelihood distances, estimated under general Markov models. The estimator is based on a conjecture (going back to Nei & Jin, 1989) which links the covariance to path lengths. It is proven here under a simple symmetric substitution model. A simulation shows that the estimator outperforms previously published ones in terms of the mean squared error. PMID:25279263
Panzer, Katrin; Yilmaz, Pelin; Weiß, Michael; Reich, Lothar; Richter, Michael; Wiese, Jutta; Schmaljohann, Rolf; Labes, Antje; Imhoff, Johannes F.; Glöckner, Frank Oliver; Reich, Marlis
2015-01-01
Molecular diversity surveys have demonstrated that aquatic fungi are highly diverse, and that they play fundamental ecological roles in aquatic systems. Unfortunately, comparative studies of aquatic fungal communities are few and far between, due to the scarcity of adequate datasets. We combined all publicly available fungal 18S ribosomal RNA (rRNA) gene sequences with new sequence data from a marine fungi culture collection. We further enriched this dataset by adding validated contextual data. Specifically, we included data on the habitat type of the samples assigning fungal taxa to ten different habitat categories. This dataset has been created with the intention to serve as a valuable reference dataset for aquatic fungi including a phylogenetic reference tree. The combined data enabled us to infer fungal community patterns in aquatic systems. Pairwise habitat comparisons showed significant phylogenetic differences, indicating that habitat strongly affects fungal community structure. Fungal taxonomic composition differed considerably even on phylum and class level. Freshwater fungal assemblage was most different from all other habitat types and was dominated by basal fungal lineages. For most communities, phylogenetic signals indicated clustering of sequences suggesting that environmental factors were the main drivers of fungal community structure, rather than species competition. Thus, the diversification process of aquatic fungi must be highly clade specific in some cases.The combined data enabled us to infer fungal community patterns in aquatic systems. Pairwise habitat comparisons showed significant phylogenetic differences, indicating that habitat strongly affects fungal community structure. Fungal taxonomic composition differed considerably even on phylum and class level. Freshwater fungal assemblage was most different from all other habitat types and was dominated by basal fungal lineages. For most communities, phylogenetic signals indicated clustering of sequences suggesting that environmental factors were the main drivers of fungal community structure, rather than species competition. Thus, the diversification process of aquatic fungi must be highly clade specific in some cases. PMID:26226014
Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold
Li, Weizhong; Lopez, Rodrigo
2017-01-01
Abstract Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. PMID:27923999
Roelens, Baptiste; Schvarzstein, Mara; Villeneuve, Anne M.
2015-01-01
Meiotic chromosome segregation requires pairwise association between homologs, stabilized by the synaptonemal complex (SC). Here, we investigate factors contributing to pairwise synapsis by investigating meiosis in polyploid worms. We devised a strategy, based on transient inhibition of cohesin function, to generate polyploid derivatives of virtually any Caenorhabditis elegans strain. We exploited this strategy to investigate the contribution of recombination to pairwise synapsis in tetraploid and triploid worms. In otherwise wild-type polyploids, chromosomes first sort into homolog groups, then multipartner interactions mature into exclusive pairwise associations. Pairwise synapsis associations still form in recombination-deficient tetraploids, confirming a propensity for synapsis to occur in a strictly pairwise manner. However, the transition from multipartner to pairwise association was perturbed in recombination-deficient triploids, implying a role for recombination in promoting this transition when three partners compete for synapsis. To evaluate the basis of synapsis partner preference, we generated polyploid worms heterozygous for normal sequence and rearranged chromosomes sharing the same pairing center (PC). Tetraploid worms had no detectable preference for identical partners, indicating that PC-adjacent homology drives partner choice in this context. In contrast, triploid worms exhibited a clear preference for identical partners, indicating that homology outside the PC region can influence partner choice. Together, our findings, suggest a two-phase model for C. elegans synapsis: an early phase, in which initial synapsis interactions are driven primarily by recombination-independent assessment of homology near PCs and by a propensity for pairwise SC assembly, and a later phase in which mature synaptic interactions are promoted by recombination. PMID:26500263
Baneth, Gad; Barta, John R.; Shkap, Varda; Martin, Donald S.; Macintire, Douglass K.; Vincent-Johnson, Nancy
2000-01-01
Recognition of Hepatozoon canis and Hepatozoon americanum as distinct species was supported by the results of Western immunoblotting of canine anti-H. canis and anti-H. americanum sera against H. canis gamonts. Sequence analysis of 368 bases near the 3′ end of the 18S rRNA gene from each species revealed a pairwise difference of 13.59%. PMID:10699047
Improving pairwise comparison of protein sequences with domain co-occurrence
Gascuel, Olivier
2018-01-01
Comparing and aligning protein sequences is an essential task in bioinformatics. More specifically, local alignment tools like BLAST are widely used for identifying conserved protein sub-sequences, which likely correspond to protein domains or functional motifs. However, to limit the number of false positives, these tools are used with stringent sequence-similarity thresholds and hence can miss several hits, especially for species that are phylogenetically distant from reference organisms. A solution to this problem is then to integrate additional contextual information to the procedure. Here, we propose to use domain co-occurrence to increase the sensitivity of pairwise sequence comparisons. Domain co-occurrence is a strong feature of proteins, since most protein domains tend to appear with a limited number of other domains on the same protein. We propose a method to take this information into account in a typical BLAST analysis and to construct new domain families on the basis of these results. We used Plasmodium falciparum as a case study to evaluate our method. The experimental findings showed an increase of 14% of the number of significant BLAST hits and an increase of 25% of the proteome area that can be covered with a domain. Our method identified 2240 new domains for which, in most cases, no model of the Pfam database could be linked. Moreover, our study of the quality of the new domains in terms of alignment and physicochemical properties show that they are close to that of standard Pfam domains. Source code of the proposed approach and supplementary data are available at: https://gite.lirmm.fr/menichelli/pairwise-comparison-with-cooccurrence PMID:29293498
Non-rigid multi-frame registration of cell nuclei in live cell fluorescence microscopy image data.
Tektonidis, Marco; Kim, Il-Han; Chen, Yi-Chun M; Eils, Roland; Spector, David L; Rohr, Karl
2015-01-01
The analysis of the motion of subcellular particles in live cell microscopy images is essential for understanding biological processes within cells. For accurate quantification of the particle motion, compensation of the motion and deformation of the cell nucleus is required. We introduce a non-rigid multi-frame registration approach for live cell fluorescence microscopy image data. Compared to existing approaches using pairwise registration, our approach exploits information from multiple consecutive images simultaneously to improve the registration accuracy. We present three intensity-based variants of the multi-frame registration approach and we investigate two different temporal weighting schemes. The approach has been successfully applied to synthetic and live cell microscopy image sequences, and an experimental comparison with non-rigid pairwise registration has been carried out. Copyright © 2014 Elsevier B.V. All rights reserved.
Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign
2007-01-01
Background Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction. Results The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources. Conclusion Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download. PMID:17445273
Risk of breast cancer with CXCR4-using HIV defined by V3 loop sequencing.
Goedert, James J; Swenson, Luke C; Napolitano, Laura A; Haddad, Mojgan; Anastos, Kathryn; Minkoff, Howard; Young, Mary; Levine, Alexandra; Adeyemi, Oluwatoyin; Seaberg, Eric C; Aouizerat, Bradley; Rabkin, Charles S; Harrigan, P Richard; Hessol, Nancy A
2015-01-01
Evaluate the risk of female breast cancer associated with HIV-CXCR4 (X4) tropism as determined by various genotypic measures. A breast cancer case-control study, with pairwise comparisons of tropism determination methods, was conducted. From the Women's Interagency HIV Study repository, one stored plasma specimen was selected from 25 HIV-infected cases near the breast cancer diagnosis date and 75 HIV-infected control women matched for age and calendar date. HIV-gp120 V3 sequences were derived by Sanger population sequencing (PS) and 454-pyro deep sequencing (DS). Sequencing-based HIV-X4 tropism was defined using the geno2pheno algorithm, with both high-stringency DS [false-positive rate (3.5) and 2% X4 cutoff], and lower stringency DS (false-positive rate, 5.75 and 15% X4 cutoff). Concordance of tropism results by PS, DS, and previously performed phenotyping was assessed with kappa (κ) statistics. Case-control comparisons used exact P values and conditional logistic regression. In 74 women (19 cases, 55 controls) with complete results, prevalence of HIV-X4 by PS was 5% in cases vs 29% in controls (P = 0.06; odds ratio, 0.14; confidence interval: 0.003 to 1.03). Smaller case-control prevalence differences were found with high-stringency DS (21% vs 36%, P = 0.32), lower stringency DS (16% vs 35%, P = 0.18), and phenotyping (11% vs 31%, P = 0.10). HIV-X4 tropism concordance was best between PS and lower stringency DS (93%, κ = 0.83). Other pairwise concordances were 82%-92% (κ = 0.56-0.81). Concordance was similar among cases and controls. HIV-X4 defined by population sequencing (PS) had good agreement with lower stringency DS and was significantly associated with lower odds of breast cancer.
Deep Sequencing Reveals a Divergent Ugandan cassava brown streak virus Isolate from Malawi
Winter, Stephan; Mukasa, Settumba; Tairo, Fred; Sseruwagi, Peter; Ndunguru, Joseph; Duffy, Siobain
2017-01-01
ABSTRACT Illumina sequencing of RNA from a cassava cutting from northern Malawi produced a genome of Ugandan cassava brown streak virus (UCBSV-MW-NB7_2013). Sequence comparisons revealed stronger similarity to an isolate from nearby Tanzania (93.4% pairwise nucleotide identity) than to those previously reported from Malawi (86.9 to 87.0%). PMID:28818908
Hasegawa, Hideo; Modrý, David; Kitagawa, Masahiro; Shutt, Kathryn A.; Todd, Angelique; Kalousová, Barbora; Profousová, Ilona; Petrželková, Klára J.
2014-01-01
Background Hookworms are important pathogens of humans. To date, Necator americanus is the sole, known species of the genus Necator infecting humans. In contrast, several Necator species have been described in African great apes and other primates. It has not yet been determined whether primate-originating Necator species are also parasitic in humans. Methodology/Principal Findings The infective larvae of Necator spp. were developed using modified Harada-Mori filter-paper cultures from faeces of humans and great apes inhabiting Dzanga-Sangha Protected Areas, Central African Republic. The first and second internal transcribed spacers (ITS-1 and ITS-2) of nuclear ribosomal DNA and partial cytochrome c oxidase subunit 1 (cox1) gene of mtDNA obtained from the hookworm larvae were sequenced and compared. Three sequence types (I–III) were recognized in the ITS region, and 34 cox1 haplotypes represented three phylogenetic groups (A–C). The combinations determined were I-A, II-B, II-C, III-B and III-C. Combination I-A, corresponding to N. americanus, was demonstrated in humans and western lowland gorillas; II-B and II-C were observed in humans, western lowland gorillas and chimpanzees; III-B and III-C were found only in humans. Pairwise nucleotide difference in the cox1 haplotypes between the groups was more than 8%, while the difference within each group was less than 2.1%. Conclusions/Significance The distinctness of ITS sequence variants and high number of pairwise nucleotide differences among cox1 variants indicate the possible presence of several species of Necator in both humans and great apes. We conclude that Necator hookworms are shared by humans and great apes co-habiting the same tropical forest ecosystems. PMID:24651493
Pairwise contact energy statistical potentials can help to find probability of point mutations.
Saravanan, K M; Suvaithenamudhan, S; Parthasarathy, S; Selvaraj, S
2017-01-01
To adopt a particular fold, a protein requires several interactions between its amino acid residues. The energetic contribution of these residue-residue interactions can be approximated by extracting statistical potentials from known high resolution structures. Several methods based on statistical potentials extracted from unrelated proteins are found to make a better prediction of probability of point mutations. We postulate that the statistical potentials extracted from known structures of similar folds with varying sequence identity can be a powerful tool to examine probability of point mutation. By keeping this in mind, we have derived pairwise residue and atomic contact energy potentials for the different functional families that adopt the (α/β) 8 TIM-Barrel fold. We carried out computational point mutations at various conserved residue positions in yeast Triose phosphate isomerase enzyme for which experimental results are already reported. We have also performed molecular dynamics simulations on a subset of point mutants to make a comparative study. The difference in pairwise residue and atomic contact energy of wildtype and various point mutations reveals probability of mutations at a particular position. Interestingly, we found that our computational prediction agrees with the experimental studies of Silverman et al. (Proc Natl Acad Sci 2001;98:3092-3097) and perform better prediction than i Mutant and Cologne University Protein Stability Analysis Tool. The present work thus suggests deriving pairwise contact energy potentials and molecular dynamics simulations of functionally important folds could help us to predict probability of point mutations which may ultimately reduce the time and cost of mutation experiments. Proteins 2016; 85:54-64. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Montoya-Ruiz, Carolina; Cajimat, Maria N B; Milazzo, Mary Louise; Diaz, Francisco J; Rodas, Juan David; Valbuena, Gustavo; Fulhorst, Charles F
2015-07-01
The results of a previous study suggested that Cherrie's cane rat (Zygodontomys cherriei) is the principal host of Necoclí virus (family Bunyaviridae, genus Hantavirus) in Colombia. Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences in this study confirmed that Necoclí virus is phylogenetically closely related to Maporal virus, which is principally associated with the delicate pygmy rice rat (Oligoryzomys delicatus) in western Venezuela. In pairwise comparisons, nonidentities between the complete amino acid sequence of the nucleocapsid protein of Necoclí virus and the complete amino acid sequences of the nucleocapsid proteins of other hantaviruses were ≥8.7%. Likewise, nonidentities between the complete amino acid sequence of the glycoprotein precursor of Necoclí virus and the complete amino acid sequences of the glycoprotein precursors of other hantaviruses were ≥11.7%. Collectively, the unique association of Necoclí virus with Z. cherriei in Colombia, results of the Bayesian analyses of complete nucleocapsid protein gene sequences and complete glycoprotein precursor gene sequences, and results of the pairwise comparisons of amino acid sequences strongly support the notion that Necoclí virus represents a novel species in the genus Hantavirus. Further work is needed to determine whether Calabazo virus (a hantavirus associated with Z. brevicauda cherriei in Panama) and Necoclí virus are conspecific.
Nanopore DNA Sequencing and Genome Assembly on the International Space Station.
Castro-Wallace, Sarah L; Chiu, Charles Y; John, Kristen K; Stahl, Sarah E; Rubins, Kathleen H; McIntyre, Alexa B R; Dworkin, Jason P; Lupisella, Mark L; Smith, David J; Botkin, Douglas J; Stephenson, Timothy A; Juul, Sissel; Turner, Daniel J; Izquierdo, Fernando; Federman, Scot; Stryke, Doug; Somasekar, Sneha; Alexander, Noah; Yu, Guixia; Mason, Christopher E; Burton, Aaron S
2017-12-21
We evaluated the performance of the MinION DNA sequencer in-flight on the International Space Station (ISS), and benchmarked its performance off-Earth against the MinION, Illumina MiSeq, and PacBio RS II sequencing platforms in terrestrial laboratories. Samples contained equimolar mixtures of genomic DNA from lambda bacteriophage, Escherichia coli (strain K12, MG1655) and Mus musculus (female BALB/c mouse). Nine sequencing runs were performed aboard the ISS over a 6-month period, yielding a total of 276,882 reads with no apparent decrease in performance over time. From sequence data collected aboard the ISS, we constructed directed assemblies of the ~4.6 Mb E. coli genome, ~48.5 kb lambda genome, and a representative M. musculus sequence (the ~16.3 kb mitochondrial genome), at 100%, 100%, and 96.7% consensus pairwise identity, respectively; de novo assembly of the E. coli genome from raw reads yielded a single contig comprising 99.9% of the genome at 98.6% consensus pairwise identity. Simulated real-time analyses of in-flight sequence data using an automated bioinformatic pipeline and laptop-based genomic assembly demonstrated the feasibility of sequencing analysis and microbial identification aboard the ISS. These findings illustrate the potential for sequencing applications including disease diagnosis, environmental monitoring, and elucidating the molecular basis for how organisms respond to spaceflight.
Choi, Young Jin; Park, Kwi Sung; Baek, Kyoung Ah; Jung, Eun Hye; Nam, Hae Seon; Kim, Yong Bae; Park, Joon Soo
2010-03-01
Evaluation of the primary etiologic agents that cause aseptic meningitis outbreaks may provide valuable information regarding the prevention and management of aseptic meningitis. In Korea, an outbreak of aseptic meningitis caused by echovirus type 30 (E30) occurred from May to October in 2008. In order to determine the etiologic agent, CSF and/or stool specimens from 140 children hospitalized for aseptic meningitis at Soonchunhyang University Cheonan Hospital between June and October of 2008 were tested for virus isolation and identification. E30 accounted for 61.7% (37 cases) and echovirus 6 accounted for 21.7% (13 cases) of all the human enteroviruses (HEVs) isolates (60 cases in total). For the molecular characterization of the isolates, the VP1 gene sequence of 18 Korean E30 isolates was compared pairwise using the MegAlign with 34 reference strains from the GenBank database. The pairwise comparison of the nucleotide sequences of the VP1 genes demonstrated that the sequences of the Korean strains differed from those of lineage groups A, B, C, D, E, F and G. Reconstruction of the phylogenetic tree based on the complete VP1 nucleotide sequences resulted in a monophyletic tree, with eight clustered lineage groups. All Korean isolates were segregated from other lineage groups, thus suggesting that the Korean strains were a distinct lineage of E30, and a probable cause of this outbreak. This manuscript is the first report, to the best of our knowledge, of the molecular characteristics of E30 strains associated with an aseptic meningitis outbreak in Korea, and their respective phylogenetic relationships.
Risk of Breast Cancer with CXCR4-using HIV Defined by V3-Loop Sequencing
Goedert, James J.; Swenson, Luke C.; Napolitano, Laura A.; Haddad, Mojgan; Anastos, Kathryn; Minkoff, Howard; Young, Mary; Levine, Alexandra; Adeyemi, Oluwatoyin; Seaberg, Eric C.; Aouizerat, Bradley; Rabkin, Charles S.; Harrigan, P. Richard; Hessol, Nancy A.
2014-01-01
Objective Evaluate the risk of female breast cancer associated with HIV-CXCR4 (X4) tropism as determined by various genotypic measures. Methods A breast cancer case-control study, with pairwise comparisons of tropism determination methods, was conducted. From the Women's Interagency HIV Study repository, one stored plasma specimen was selected from 25 HIV-infected cases near the breast cancer diagnosis date and 75 HIV-infected control women matched for age and calendar date. HIVgp120-V3 sequences were derived by Sanger population sequencing (PS) and 454-pyro deep sequencing (DS). Sequencing-based HIV-X4 tropism was defined using the geno2pheno algorithm, with both high-stringency DS [False-Positive-Rate (FPR 3.5) and 2% X4 cutoff], and lower stringency DS (FPR 5.75, 15% X4 cut-off). Concordance of tropism results by PS, DS, and previously performed phenotyping was assessed with kappa (κ) statistics. Case-control comparisons used exact P-values and conditional logistic regression. Results In 74 women (19 cases, 55 controls) with complete results, prevalence of HIV-X4 by PS was 5% in cases vs 29% in controls (P=0.06, odds ratio 0.14, confidence interval 0.003-1.03). Smaller case-control prevalence differences were found with high-stringency DS (21% vs 36%, P=0.32), lower-stringency DS (16% vs 35%, P=0.18), and phenotyping (11% vs 31%, P=0.10). HIV-X4-tropism concordance was best between PS and lower-stringency DS (93%, κ=0.83). Other pairwise concordances were 82%-92% (κ=0.56-0.81). Concordance was similar among cases and controls. Conclusions HIV-X4 defined by population sequencing (PS) had good agreement with lower stringency deep sequencing and was significantly associated with lower odds of breast cancer. PMID:25321183
Tian, Ye; Huang, Xiaoqiang; Zhu, Yushan
2015-08-01
Enzyme amino-acid sequences at ligand-binding interfaces are evolutionarily optimized for reactions, and the natural conformation of an enzyme-ligand complex must have a low free energy relative to alternative conformations in native-like or non-native sequences. Based on this assumption, a combined energy function was developed for enzyme design and then evaluated by recapitulating native enzyme sequences at ligand-binding interfaces for 10 enzyme-ligand complexes. In this energy function, the electrostatic interaction between polar or charged atoms at buried interfaces is described by an explicitly orientation-dependent hydrogen-bonding potential and a pairwise-decomposable generalized Born model based on the general side chain in the protein design framework. The energy function is augmented with a pairwise surface-area based hydrophobic contribution for nonpolar atom burial. Using this function, on average, 78% of the amino acids at ligand-binding sites were predicted correctly in the minimum-energy sequences, whereas 84% were predicted correctly in the most-similar sequences, which were selected from the top 20 sequences for each enzyme-ligand complex. Hydrogen bonds at the enzyme-ligand binding interfaces in the 10 complexes were usually recovered with the correct geometries. The binding energies calculated using the combined energy function helped to discriminate the active sequences from a pool of alternative sequences that were generated by repeatedly solving a series of mixed-integer linear programming problems for sequence selection with increasing integer cuts.
Analysis of DNA methylation in Arabidopsis thaliana based on methylation-sensitive AFLP markers.
Cervera, M T; Ruiz-García, L; Martínez-Zapater, J M
2002-12-01
AFLP analysis using restriction enzyme isoschizomers that differ in their sensitivity to methylation of their recognition sites has been used to analyse the methylation state of anonymous CCGG sequences in Arabidopsis thaliana. The technique was modified to improve the quality of fingerprints and to visualise larger numbers of scorable fragments. Sequencing of amplified fragments indicated that detection was generally associated with non-methylation of the cytosine to which the isoschizomer is sensitive. Comparison of EcoRI/ HpaII and EcoRI/ MspI patterns in different ecotypes revealed that 35-43% of CCGG sites were differentially digested by the isoschizomers. Interestingly, the pattern of digestion among different plants belonging to the same ecotype is highly conserved, with the rate of intra-ecotype methylation-sensitive polymorphisms being less than 1%. However, pairwise comparisons of methylation patterns between samples belonging to different ecotypes revealed differences in up to 34% of the methylation-sensitive polymorphisms. The lack of correlation between inter-ecotype similarity matrices based on methylation-insensitive or methylation-sensitive polymorphisms suggests that whatever the mechanisms regulating methylation may be, they are not related to nucleotide sequence variation.
Ganesan, K; Parthasarathy, S
2011-12-01
Annotation of any newly determined protein sequence depends on the pairwise sequence identity with known sequences. However, for the twilight zone sequences which have only 15-25% identity, the pair-wise comparison methods are inadequate and the annotation becomes a challenging task. Such sequences can be annotated by using methods that recognize their fold. Bowie et al. described a 3D1D profile method in which the amino acid sequences that fold into a known 3D structure are identified by their compatibility to that known 3D structure. We have improved the above method by using the predicted secondary structure information and employ it for fold recognition from the twilight zone sequences. In our Protein Secondary Structure 3D1D (PSS-3D1D) method, a score (w) for the predicted secondary structure of the query sequence is included in finding the compatibility of the query sequence to the known fold 3D structures. In the benchmarks, the PSS-3D1D method shows a maximum of 21% improvement in predicting correctly the α + β class of folds from the sequences with twilight zone level of identity, when compared with the 3D1D profile method. Hence, the PSS-3D1D method could offer more clues than the 3D1D method for the annotation of twilight zone sequences. The web based PSS-3D1D method is freely available in the PredictFold server at http://bioinfo.bdu.ac.in/servers/ .
ScaffoldSeq: Software for characterization of directed evolution populations.
Woldring, Daniel R; Holec, Patrick V; Hackel, Benjamin J
2016-07-01
ScaffoldSeq is software designed for the numerous applications-including directed evolution analysis-in which a user generates a population of DNA sequences encoding for partially diverse proteins with related functions and would like to characterize the single site and pairwise amino acid frequencies across the population. A common scenario for enzyme maturation, antibody screening, and alternative scaffold engineering involves naïve and evolved populations that contain diversified regions, varying in both sequence and length, within a conserved framework. Analyzing the diversified regions of such populations is facilitated by high-throughput sequencing platforms; however, length variability within these regions (e.g., antibody CDRs) encumbers the alignment process. To overcome this challenge, the ScaffoldSeq algorithm takes advantage of conserved framework sequences to quickly identify diverse regions. Beyond this, unintended biases in sequence frequency are generated throughout the experimental workflow required to evolve and isolate clones of interest prior to DNA sequencing. ScaffoldSeq software uniquely handles this issue by providing tools to quantify and remove background sequences, cluster similar protein families, and dampen the impact of dominant clones. The software produces graphical and tabular summaries for each region of interest, allowing users to evaluate diversity in a site-specific manner as well as identify epistatic pairwise interactions. The code and detailed information are freely available at http://research.cems.umn.edu/hackel. Proteins 2016; 84:869-874. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Molecular basis for specificity in the druggable kinome: sequence-based analysis.
Chen, Jianping; Zhang, Xi; Fernández, Ariel
2007-03-01
Rational design of kinase inhibitors remains a challenge partly because there is no clear delineation of the molecular features that direct the pharmacological impact towards clinically relevant targets. Standard factors governing ligand affinity, such as potential for intermolecular hydrophobic interactions or for intermolecular hydrogen bonding do not provide good markers to assess cross reactivity. Thus, a core question in the informatics of drug design is what type of molecular similarity among targets promotes promiscuity and what type of molecular difference governs specificity. This work answers the question for a sizable screened sample of the human pharmacokinome including targets with unreported structure. We show that drug design aimed at promoting pairwise interactions between ligand and kinase target actually fosters promiscuity because of the high conservation of the partner groups on or around the ATP-binding site of the kinase. Alternatively, we focus on a structural marker that may be reliably determined from sequence and measures dehydration propensities mostly localized on the loopy regions of kinases. Based on this marker, we construct a sequence-based kinase classifier that enables the accurate prediction of pharmacological differences. Our indicator is a microenvironmental descriptor that quantifies the propensity for water exclusion around preformed polar pairs. The results suggest that targeting polar dehydration patterns heralds a new generation of drugs that enable a tighter control of specificity than designs aimed at promoting ligand-kinase pairwise interactions. The predictor of polar hot spots for dehydration propensity, or solvent-accessible hydrogen bonds in soluble proteins, named YAPView, may be freely downloaded from the University of Chicago website http://protlib.uchicago.edu/dloads.html. Supplementary data are available at Bioinformatics online.
Bastien, Olivier; Maréchal, Eric
2008-08-07
Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate. Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes.
Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim
2010-03-01
Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith-Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. The database can be accessed through http://proteinworlddb.org
Motriuk-Smith, Dagmara; Seville, R Scott; Quealy, Leah; Oliver, Clinton E.
2011-01-01
The taxonomy of the coccidia has historically been morphologically based. The purpose of this study was to establish if conspecificity of isolates of Eimeria callospermophili from 4 ground-dwelling squirrel hosts (Rodentia: Sciuridae) is supported by comparison of rDNA sequence data and to examine how this species relates to eimerian species from other sciurid hosts. Eimeria callospermophili was isolated from 4 wild caught hosts, i.e., Urocitellus elegans, Cynomys leucurus, Marmota flaviventris, and Cynomys ludovicianus. The ITS1 and ITS2 genomic rDNA sequences were PCR generated, sequenced, and analyzed. The highest intraspecific pairwise distance values of 6.0% in ITS1 and 7.1% in ITS2 were observed in C. leucurus. Interspecific pairwise distance values greater than 5% do not support E. callospermophili conspecificity. Generated E. callospermophili sequences were compared to Eimeria lancasterensis from Sciuris niger and Sciurus niger cinereus, and Eimeria ontarioensis from S. niger. A single well-supported clade was formed by E. callospermophili amplicons in Neighbor Joining and Maximum Parsimony analyses. However, within the clade there was little evidence of host or geographic structuring of the species. PMID:21506777
James, Delano; Sanderson, Dan; Varga, Aniko; Sheveleva, Anna; Chirkov, Sergei
2016-04-01
Plum pox virus (PPV) is genetically diverse with nine different strains identified. Mutations, indel events, and interstrain recombination events are known to contribute to the genetic diversity of PPV. This is the first report of intrastrain recombination events that contribute to PPV's genetic diversity. Fourteen isolates of the PPV strain Winona (W) were analyzed including nine new strain W isolates sequenced completely in this study. Isolates of other strains of PPV with more than one isolate with the complete genome sequence available in GenBank were included also in this study for comparison and analysis. Five intrastrain recombination events were detected among the PPV W isolates, one among PPV C strain isolates, and one among PPV M strain isolates. Four (29%) of the PPV W isolates analyzed are recombinants; one of which (P2-1) is a mosaic, with three recombination events identified. A new interstrain recombinant event was identified between a strain M isolate and a strain Rec isolate, a known recombinant. In silico recombination studies and pairwise distance analyses of PPV strain D isolates indicate that a threshold of genetic diversity exists for the detectability of recombination events, in the range of approximately 0.78×10(-2) to 1.33×10(-2) mean pairwise distance. RDP4 analyses indicate that in the case of PPV Rec isolates there may be a recombinant breakpoint distinct from the obvious transition point of strain sequences. Evidence was obtained that indicates that the frequency of PPV recombination is underestimated, which may be true for other RNA viruses where low genetic diversity exists.
Strydom, Elrea; Pietersen, Gerhard
2018-05-01
Infection of soybean by the plant cytorhabdovirus soybean blotchy mosaic virus (SbBMV) results in significant yield losses in the temperate, lower-lying soybean production regions of South Africa. A 277 bp portion of the RNA-dependent RNA polymerase gene of 66 SbBMV isolates from different: hosts, geographical locations in South Africa, and times of collection (spanning 16 years) were amplified by RT-PCR and sequenced to investigate the genetic diversity of isolates. Phylogenetic reconstruction revealed three main lineages, designated Groups A, B and C, with isolates grouping primarily according to geographic origin. Pairwise nucleotide identities ranged between 85.7% and 100% among all isolates, with isolates in Group A exhibiting the highest degree of sequence identity, and isolates of Groups A and B being more closely related to each other than to those in Group C. This is the first study investigating the genetic diversity of SbBMV.
2010-01-01
Background Cryptic species complexes are common among anophelines. Previous phylogenetic analysis based on the complete mtDNA COI gene sequences detected paraphyly in the Neotropical malaria vector Anopheles marajoara. The "Folmer region" detects a single taxon using a 3% divergence threshold. Methods To test the paraphyletic hypothesis and examine the utility of the Folmer region, genealogical trees based on a concatenated (white + 3' COI sequences) dataset and pairwise differentiation of COI fragments were examined. The population structure and demographic history were based on partial COI sequences for 294 individuals from 14 localities in Amazonian Brazil. 109 individuals from 12 localities were sequenced for the nDNA white gene, and 57 individuals from 11 localities were sequenced for the ribosomal DNA (rDNA) internal transcribed spacer 2 (ITS2). Results Distinct A. marajoara lineages were detected by combined genealogical analysis and were also supported among COI haplotypes using a median joining network and AMOVA, with time since divergence during the Pleistocene (<100,000 ya). COI sequences at the 3' end were more variable, demonstrating significant pairwise differentiation (3.82%) compared to the more moderate 2.92% detected by the Folmer region. Lineage 1 was present in all localities, whereas lineage 2 was restricted mainly to the west. Mismatch distributions for both lineages were bimodal, likely due to multiple colonization events and spatial expansion (~798 - 81,045 ya). There appears to be gene flow within, not between lineages, and a partial barrier was detected near Rio Jari in Amapá state, separating western and eastern populations. In contrast, both nDNA data sets (white gene sequences with or without the retention of the 4th intron, and ITS2 sequences and length) detected a single A. marajoara lineage. Conclusions Strong support for combined data with significant differentiation detected in the COI and absent in the nDNA suggest that the divergence is recent, and detectable only by the faster evolving mtDNA. A within subgenus threshold of >2% may be more appropriate among sister taxa in cryptic anopheline complexes than the standard 3%. Differences in demographic history and climatic changes may have contributed to mtDNA lineage divergence in A. marajoara. PMID:20929572
NASA Astrophysics Data System (ADS)
Pickering, William; Lim, Chjan
2017-07-01
We investigate a family of urn models that correspond to one-dimensional random walks with quadratic transition probabilities that have highly diverse applications. Well-known instances of these two-urn models are the Ehrenfest model of molecular diffusion, the voter model of social influence, and the Moran model of population genetics. We also provide a generating function method for diagonalizing the corresponding transition matrix that is valid if and only if the underlying mean density satisfies a linear differential equation and express the eigenvector components as terms of ordinary hypergeometric functions. The nature of the models lead to a natural extension to interaction between agents in a general network topology. We analyze the dynamics on uncorrelated heterogeneous degree sequence networks and relate the convergence times to the moments of the degree sequences for various pairwise interaction mechanisms.
NASA Astrophysics Data System (ADS)
Tian, Caihong; Tek Tay, Wee; Feng, Hongqiang; Wang, Ying; Hu, Yongmin; Li, Guoping
2015-06-01
Adelphocoris suturalis is one of the most serious pest insects of Bt cotton in China, however its molecular genetics, biochemistry and physiology are poorly understood. We used high throughput sequencing platform to perform de novo transcriptome assembly and gene expression analyses across different developmental stages (eggs, 2nd and 5th instar nymphs, female and male adults). We obtained 20 GB of clean data and revealed 88,614 unigenes, including 23,830 clusters and 64,784 singletons. These unigene sequences were annotated and classified by Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases. A large number of differentially expressed genes were discovered through pairwise comparisons between these developmental stages. Gene expression profiles were dramatically different between life stage transitions, with some of these most differentially expressed genes being associated with sex difference, metabolism and development. Quantitative real-time PCR results confirm deep-sequencing findings based on relative expression levels of nine randomly selected genes. Furthermore, over 791,390 single nucleotide polymorphisms and 2,682 potential simple sequence repeats were identified. Our study provided comprehensive transcriptional gene expression information for A. suturalis that will form the basis to better understanding of development pathways, hormone biosynthesis, sex differences and wing formation in mirid bugs.
Tian, Caihong; Tek Tay, Wee; Feng, Hongqiang; Wang, Ying; Hu, Yongmin; Li, Guoping
2015-01-01
Adelphocoris suturalis is one of the most serious pest insects of Bt cotton in China, however its molecular genetics, biochemistry and physiology are poorly understood. We used high throughput sequencing platform to perform de novo transcriptome assembly and gene expression analyses across different developmental stages (eggs, 2nd and 5th instar nymphs, female and male adults). We obtained 20 GB of clean data and revealed 88,614 unigenes, including 23,830 clusters and 64,784 singletons. These unigene sequences were annotated and classified by Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases. A large number of differentially expressed genes were discovered through pairwise comparisons between these developmental stages. Gene expression profiles were dramatically different between life stage transitions, with some of these most differentially expressed genes being associated with sex difference, metabolism and development. Quantitative real-time PCR results confirm deep-sequencing findings based on relative expression levels of nine randomly selected genes. Furthermore, over 791,390 single nucleotide polymorphisms and 2,682 potential simple sequence repeats were identified. Our study provided comprehensive transcriptional gene expression information for A. suturalis that will form the basis to better understanding of development pathways, hormone biosynthesis, sex differences and wing formation in mirid bugs. PMID:26047353
High-speed multiple sequence alignment on a reconfigurable platform.
Oliver, Tim; Schmidt, Bertil; Maskell, Douglas; Nathan, Darran; Clemens, Ralf
2006-01-01
Progressive alignment is a widely used approach to compute multiple sequence alignments (MSAs). However, aligning several hundred sequences by popular progressive alignment tools requires hours on sequential computers. Due to the rapid growth of sequence databases biologists have to compute MSAs in a far shorter time. In this paper we present a new approach to MSA on reconfigurable hardware platforms to gain high performance at low cost. We have constructed a linear systolic array to perform pairwise sequence distance computations using dynamic programming. This results in an implementation with significant runtime savings on a standard FPGA.
K2 and K2*: efficient alignment-free sequence similarity measurement based on Kendall statistics.
Lin, Jie; Adjeroh, Donald A; Jiang, Bing-Hua; Jiang, Yue
2018-05-15
Alignment-free sequence comparison methods can compute the pairwise similarity between a huge number of sequences much faster than sequence-alignment based methods. We propose a new non-parametric alignment-free sequence comparison method, called K2, based on the Kendall statistics. Comparing to the other state-of-the-art alignment-free comparison methods, K2 demonstrates competitive performance in generating the phylogenetic tree, in evaluating functionally related regulatory sequences, and in computing the edit distance (similarity/dissimilarity) between sequences. Furthermore, the K2 approach is much faster than the other methods. An improved method, K2*, is also proposed, which is able to determine the appropriate algorithmic parameter (length) automatically, without first considering different values. Comparative analysis with the state-of-the-art alignment-free sequence similarity methods demonstrates the superiority of the proposed approaches, especially with increasing sequence length, or increasing dataset sizes. The K2 and K2* approaches are implemented in the R language as a package and is freely available for open access (http://community.wvu.edu/daadjeroh/projects/K2/K2_1.0.tar.gz). yueljiang@163.com. Supplementary data are available at Bioinformatics online.
Population Expansion and Genetic Structure in Carcharhinus brevipinna in the Southern Indo-Pacific
Geraghty, Pascal T.; Williamson, Jane E.; Macbeth, William G.; Wintner, Sabine P.; Harry, Alastair V.; Ovenden, Jennifer R.; Gillings, Michael R.
2013-01-01
Background Quantifying genetic diversity and metapopulation structure provides insights into the evolutionary history of a species and helps develop appropriate management strategies. We provide the first assessment of genetic structure in spinner sharks (Carcharhinus brevipinna), a large cosmopolitan carcharhinid, sampled from eastern and northern Australia and South Africa. Methods and Findings Sequencing of the mitochondrial DNA NADH dehydrogenase subunit 4 gene for 430 individuals revealed 37 haplotypes and moderately high haplotype diversity (h = 0.6770 ±0.025). While two metrics of genetic divergence (ΦST and F ST) revealed somewhat different results, subdivision was detected between South Africa and all Australian locations (pairwise ΦST, range 0.02717–0.03508, p values ≤ 0.0013; pairwise F ST South Africa vs New South Wales = 0.04056, p = 0.0008). Evidence for fine-scale genetic structuring was also detected along Australia’s east coast (pairwise ΦST = 0.01328, p < 0.015), and between south-eastern and northern locations (pairwise ΦST = 0.00669, p < 0.04). Conclusions The Indian Ocean represents a robust barrier to contemporary gene flow in C. brevipinna between Australia and South Africa. Gene flow also appears restricted along a continuous continental margin in this species, with data tentatively suggesting the delineation of two management units within Australian waters. Further sampling, however, is required for a more robust evaluation of the latter finding. Evidence indicates that all sampled populations were shaped by a substantial demographic expansion event, with the resultant high genetic diversity being cause for optimism when considering conservation of this commercially-targeted species in the southern Indo-Pacific. PMID:24086462
NoFold: RNA structure clustering without folding or alignment.
Middleton, Sarah A; Kim, Junhyong
2014-11-01
Structures that recur across multiple different transcripts, called structure motifs, often perform a similar function-for example, recruiting a specific RNA-binding protein that then regulates translation, splicing, or subcellular localization. Identifying common motifs between coregulated transcripts may therefore yield significant insight into their binding partners and mechanism of regulation. However, as most methods for clustering structures are based on folding individual sequences or doing many pairwise alignments, this results in a tradeoff between speed and accuracy that can be problematic for large-scale data sets. Here we describe a novel method for comparing and characterizing RNA secondary structures that does not require folding or pairwise alignment of the input sequences. Our method uses the idea of constructing a distance function between two objects by their respective distances to a collection of empirical examples or models, which in our case consists of 1973 Rfam family covariance models. Using this as a basis for measuring structural similarity, we developed a clustering pipeline called NoFold to automatically identify and annotate structure motifs within large sequence data sets. We demonstrate that NoFold can simultaneously identify multiple structure motifs with an average sensitivity of 0.80 and precision of 0.98 and generally exceeds the performance of existing methods. We also perform a cross-validation analysis of the entire set of Rfam families, achieving an average sensitivity of 0.57. We apply NoFold to identify motifs enriched in dendritically localized transcripts and report 213 enriched motifs, including both known and novel structures. © 2014 Middleton and Kim; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Andreotti, Renato; Pedroso, Marisela S; Caetano, Alexandre R; Martins, Natália F
2008-01-01
This paper reports the sequence analysis of Bm86 Campo Grande strain comparing it with Bm86 and Bm95 antigens from the preparations TickGardPLUS and Gavac, respectively. The PCR product was cloned into pMOSBlue and sequenced. The secondary structure prediction tool PSIPRED was used to calculate alpha helices and beta strand contents of the predicted polypeptide. The hydrophobicity profile was calculated using the algorithms from the Hopp and Woods method, in addition to identification of potential MHC class-I binding regions in the antigens. Pair-wise alignment revealed that the similarity between Bm86 Campo Grande strain and Bm86 is 0.2% higher than that between Bm86 Campo Grande strain and Bm95 antigens. The identities were 96.5% and 96.3% respectively. Major suggestive differences in hydrophobicity were predicted among the sequences in two specific regions.
Earls, Megan R.; Kinnevey, Peter M.; Brennan, Gráinne I.; Lazaris, Alexandros; Skally, Mairead; O’Connell, Brian; Humphreys, Hilary; Shore, Anna C.
2017-01-01
Community-associated spa type t127/t922 methicillin-resistant Staphylococcus aureus (MRSA) prevalence increased from 1%-7% in Ireland between 2010–2015. This study tracked the spread of 89 such isolates from June 2013-June 2016. These included 78 healthcare-associated and 11 community associated-MRSA isolates from a prolonged hospital outbreak (H1) (n = 46), 16 other hospitals (n = 28), four other healthcare facilities (n = 4) and community-associated sources (n = 11). Isolates underwent antimicrobial susceptibility testing, DNA microarray profiling and whole-genome sequencing. Minimum spanning trees were generated following core-genome multilocus sequence typing and pairwise single nucleotide variation (SNV) analysis was performed. All isolates were sequence type 1 MRSA staphylococcal cassette chromosome mec type IV (ST1-MRSA-IV) and 76/89 were multidrug-resistant. Fifty isolates, including 40/46 from H1, were high-level mupirocin-resistant, carrying a conjugative 39 kb iles2-encoding plasmid. Two closely related ST1-MRSA-IV strains (I and II) and multiple sporadic strains were identified. Strain I isolates (57/89), including 43/46 H1 and all high-level mupirocin-resistant isolates, exhibited ≤80 SNVs. Two strain I isolates from separate H1 healthcare workers differed from other H1/strain I isolates by 7–47 and 12–53 SNVs, respectively, indicating healthcare worker involvement in this outbreak. Strain II isolates (19/89), including the remaining H1 isolates, exhibited ≤127 SNVs. For each strain, the pairwise SNVs exhibited by healthcare-associated and community-associated isolates indicated recent transmission of ST1-MRSA-IV within and between multiple hospitals, healthcare facilities and communities in Ireland. Given the interchange between healthcare-associated and community-associated isolates in hospitals, the risk factors that inform screening for MRSA require revision. PMID:28399151
Brown, J. R.; Beckenbach, K.; Beckenbach, A. T.; Smith, M. J.
1996-01-01
The extent of mtDNA length variation and heteroplasmy as well as DNA sequences of the control region and two tRNA genes were determined for four North American sturgeon species: Acipenser transmontanus, A. medirostris, A. fulvescens and A. oxyrhnychus. Across the Continental Divide, a division in the occurrence of length variation and heteroplasmy was observed that was concordant with species biogeography as well as with phylogenies inferred from restriction fragment length polymorphisms (RFLP) of whole mtDNA and pairwise comparisons of unique sequences of the control region. In all species, mtDNA length variation was due to repeated arrays of 78-82-bp sequences each containing a D-loop strand synthesis termination associated sequence (TAS). Individual repeats showed greater sequence conservation within individuals and species rather than between species, which is suggestive of concerted evolution. Differences in the frequencies of multiple copy genomes and heteroplasmy among the four species may be ascribed to differences in the rates of recurrent mutation. A mechanism that may offset the high rate of mutation for increased copy number is suggested on the basis that an increase in the number of functional TAS motifs might reduce the frequency of successfully initiated H-strand replications. PMID:8852850
Lotka-Volterra pairwise modeling fails to capture diverse pairwise microbial interactions
Momeni, Babak; Xie, Li; Shou, Wenying
2017-01-01
Pairwise models are commonly used to describe many-species communities. In these models, an individual receives additive fitness effects from pairwise interactions with each species in the community ('additivity assumption'). All pairwise interactions are typically represented by a single equation where parameters reflect signs and strengths of fitness effects ('universality assumption'). Here, we show that a single equation fails to qualitatively capture diverse pairwise microbial interactions. We build mechanistic reference models for two microbial species engaging in commonly-found chemical-mediated interactions, and attempt to derive pairwise models. Different equations are appropriate depending on whether a mediator is consumable or reusable, whether an interaction is mediated by one or more mediators, and sometimes even on quantitative details of the community (e.g. relative fitness of the two species, initial conditions). Our results, combined with potential violation of the additivity assumption in many-species communities, suggest that pairwise modeling will often fail to predict microbial dynamics. DOI: http://dx.doi.org/10.7554/eLife.25051.001 PMID:28350295
2013-01-01
Background Hypodontus macropi is a common intestinal nematode of a range of kangaroos and wallabies (macropodid marsupials). Based on previous multilocus enzyme electrophoresis (MEE) and nuclear ribosomal DNA sequence data sets, H. macropi has been proposed to be complex of species. To test this proposal using independent molecular data, we sequenced the whole mitochondrial (mt) genomes of individuals of H. macropi from three different species of hosts (Macropus robustus robustus, Thylogale billardierii and Macropus [Wallabia] bicolor) as well as that of Macropicola ocydromi (a related nematode), and undertook a comparative analysis of the amino acid sequence datasets derived from these genomes. Results The mt genomes sequenced by next-generation (454) technology from H. macropi from the three host species varied from 13,634 bp to 13,699 bp in size. Pairwise comparisons of the amino acid sequences predicted from these three mt genomes revealed differences of 5.8% to 18%. Phylogenetic analysis of the amino acid sequence data sets using Bayesian Inference (BI) showed that H. macropi from the three different host species formed distinct, well-supported clades. In addition, sliding window analysis of the mt genomes defined variable regions for future population genetic studies of H. macropi in different macropodid hosts and geographical regions around Australia. Conclusions The present analyses of inferred mt protein sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. PMID:24261823
Alasaad, S; Soglia, D; Spalenza, V; Maione, S; Soriguer, R C; Pérez, J M; Rasero, R; Degiorgis, M P Ryser; Nimmervoll, H; Zhu, X Q; Rossi, L
2009-02-05
The present study examined the relationship among individual Sarcoptes scabiei mites from 13 wild mammalian populations belonging to nine species in four European countries using the second internal transcribed spacer (ITS-2) of nuclear ribosomal DNA (rDNA) as genetic marker. The ITS-2 plus primer flanking 5.8S and 28S rDNA (ITS-2+) was amplified from individual mites by polymerase chain reaction (PCR) and the amplicons were sequenced directly. A total of 148 ITS-2+ sequences of 404bp in length were obtained and 67 variable sites were identified (16.59%). UPGMA analyses did not show any geographical or host-specific clustering, and a similar outcome was obtained using population pairwise Fst statistics. These results demonstrated that ITS-2 rDNA does not appear to be suitable for examining genetic diversity among mite populations.
Godoy, Oscar; Stouffer, Daniel B; Kraft, Nathan J B; Levine, Jonathan M
2017-05-01
Intransitive competition is often projected to be a widespread mechanism of species coexistence in ecological communities. However, it is unknown how much of the coexistence we observe in nature results from this mechanism when species interactions are also stabilized by pairwise niche differences. We combined field-parameterized models of competition among 18 annual plant species with tools from network theory to quantify the prevalence of intransitive competitive relationships. We then analyzed the predicted outcome of competitive interactions with and without pairwise niche differences. Intransitive competition was found for just 15-19% of the 816 possible triplets, and this mechanism was never sufficient to stabilize the coexistence of the triplet when the pair-wise niche differences between competitors were removed. Of the transitive and intransitive triplets, only four were predicted to coexist and these were more similar in multidimensional trait space defined by 11 functional traits than non-coexisting triplets. Our results argue that intransitive competition may be less frequent than recently posed, and that even when it does operate, pairwise niche differences may be key to possible coexistence. © 2017 by the Ecological Society of America.
ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes
Otto, Thomas Dan; Catanho, Marcos; Tristão, Cristian; Bezerra, Márcia; Fernandes, Renan Mathias; Elias, Guilherme Steinberger; Scaglia, Alexandre Capeletto; Bovermann, Bill; Berstis, Viktors; Lifschitz, Sergio; de Miranda, Antonio Basílio; Degrave, Wim
2010-01-01
Motivation: Many analyses in modern biological research are based on comparisons between biological sequences, resulting in functional, evolutionary and structural inferences. When large numbers of sequences are compared, heuristics are often used resulting in a certain lack of accuracy. In order to improve and validate results of such comparisons, we have performed radical all-against-all comparisons of 4 million protein sequences belonging to the RefSeq database, using an implementation of the Smith–Waterman algorithm. This extremely intensive computational approach was made possible with the help of World Community Grid™, through the Genome Comparison Project. The resulting database, ProteinWorldDB, which contains coordinates of pairwise protein alignments and their respective scores, is now made available. Users can download, compare and analyze the results, filtered by genomes, protein functions or clusters. ProteinWorldDB is integrated with annotations derived from Swiss-Prot, Pfam, KEGG, NCBI Taxonomy database and gene ontology. The database is a unique and valuable asset, representing a major effort to create a reliable and consistent dataset of cross-comparisons of the whole protein content encoded in hundreds of completely sequenced genomes using a rigorous dynamic programming approach. Availability: The database can be accessed through http://proteinworlddb.org Contact: otto@fiocruz.br PMID:20089515
Network Analysis of Protein Adaptation: Modeling the Functional Impact of Multiple Mutations
Beleva Guthrie, Violeta; Masica, David L; Fraser, Andrew; Federico, Joseph; Fan, Yunfan; Camps, Manel; Karchin, Rachel
2018-01-01
Abstract The evolution of new biochemical activities frequently involves complex dependencies between mutations and rapid evolutionary radiation. Mutation co-occurrence and covariation have previously been used to identify compensating mutations that are the result of physical contacts and preserve protein function and fold. Here, we model pairwise functional dependencies and higher order interactions that enable evolution of new protein functions. We use a network model to find complex dependencies between mutations resulting from evolutionary trade-offs and pleiotropic effects. We present a method to construct these networks and to identify functionally interacting mutations in both extant and reconstructed ancestral sequences (Network Analysis of Protein Adaptation). The time ordering of mutations can be incorporated into the networks through phylogenetic reconstruction. We apply NAPA to three distantly homologous β-lactamase protein clusters (TEM, CTX-M-3, and OXA-51), each of which has experienced recent evolutionary radiation under substantially different selective pressures. By analyzing the network properties of each protein cluster, we identify key adaptive mutations, positive pairwise interactions, different adaptive solutions to the same selective pressure, and complex evolutionary trajectories likely to increase protein fitness. We also present evidence that incorporating information from phylogenetic reconstruction and ancestral sequence inference can reduce the number of spurious links in the network, whereas preserving overall network community structure. The analysis does not require structural or biochemical data. In contrast to function-preserving mutation dependencies, which are frequently from structural contacts, gain-of-function mutation dependencies are most commonly between residues distal in protein structure. PMID:29522102
Analysis of Ribosome Inactivating Protein (RIP): A Bioinformatics Approach
NASA Astrophysics Data System (ADS)
Jothi, G. Edward Gnana; Majilla, G. Sahaya Jose; Subhashini, D.; Deivasigamani, B.
2012-10-01
In spite of the medical advances in recent years, the world is in need of different sources to encounter certain health issues.Ribosome Inactivating Proteins (RIPs) were found to be one among them. In order to get easy access about RIPs, there is a need to analyse RIPs towards constructing a database on RIPs. Also, multiple sequence alignment was done towards screening for homologues of significant RIPs from rare sources against RIPs from easily available sources in terms of similarity. Protein sequences were retrieved from SWISS-PROT and are further analysed using pair wise and multiple sequence alignment.Analysis shows that, 151 RIPs have been characterized to date. Amongst them, there are 87 type I, 37 type II, 1 type III and 25 unknown RIPs. The sequence length information of various RIPs about the availability of full or partial sequence was also found. The multiple sequence alignment of 37 type I RIP using the online server Multalin, indicates the presence of 20 conserved residues. Pairwise alignment and multiple sequence alignment of certain selected RIPs in two groups namely Group I and Group II were carried out and the consensus level was found to be 98%, 98% and 90% respectively.
Profiling cellular protein complexes by proximity ligation with dual tag microarray readout.
Hammond, Maria; Nong, Rachel Yuan; Ericsson, Olle; Pardali, Katerina; Landegren, Ulf
2012-01-01
Patterns of protein interactions provide important insights in basic biology, and their analysis plays an increasing role in drug development and diagnostics of disease. We have established a scalable technique to compare two biological samples for the levels of all pairwise interactions among a set of targeted protein molecules. The technique is a combination of the proximity ligation assay with readout via dual tag microarrays. In the proximity ligation assay protein identities are encoded as DNA sequences by attaching DNA oligonucleotides to antibodies directed against the proteins of interest. Upon binding by pairs of antibodies to proteins present in the same molecular complexes, ligation reactions give rise to reporter DNA molecules that contain the combined sequence information from the two DNA strands. The ligation reactions also serve to incorporate a sample barcode in the reporter molecules to allow for direct comparison between pairs of samples. The samples are evaluated using a dual tag microarray where information is decoded, revealing which pairs of tags that have become joined. As a proof-of-concept we demonstrate that this approach can be used to detect a set of five proteins and their pairwise interactions both in cellular lysates and in fixed tissue culture cells. This paper provides a general strategy to analyze the extent of any pairwise interactions in large sets of molecules by decoding reporter DNA strands that identify the interacting molecules.
Sequence determination and analysis of the NSs genes of two tospoviruses.
Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O
2012-03-01
The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Farajzadeh, Leila; Hornshøj, Henrik; Momeni, Jamal
Highlights: •Transcriptome sequencing yielded 223 mill porcine RNA-seq reads, and 59,000 transcribed locations. •Establishment of unique transcription profiles for ten porcine tissues including four brain tissues. •Comparison of transcription profiles at gene, isoform, promoter and transcription start site level. •Highlights a high level of regulation of neuro-related genes at both gene, isoform, and TSS level. •Our results emphasize the pig as a valuable animal model with respect to human biological issues. -- Abstract: The transcriptome is the absolute set of transcripts in a tissue or cell at the time of sampling. In this study RNA-Seq is employed to enable themore » differential analysis of the transcriptome profile for ten porcine tissues in order to evaluate differences between the tissues at the gene and isoform expression level, together with an analysis of variation in transcription start sites, promoter usage, and splicing. Totally, 223 million RNA fragments were sequenced leading to the identification of 59,930 transcribed gene locations and 290,936 transcript variants using Cufflinks with similarity to approximately 13,899 annotated human genes. Pairwise analysis of tissues for differential expression at the gene level showed that the smallest differences were between tissues originating from the porcine brain. Interestingly, the relative level of differential expression at the isoform level did generally not vary between tissue contrasts. Furthermore, analysis of differential promoter usage between tissues, revealed a proportionally higher variation between cerebellum (CBE) versus frontal cortex and cerebellum versus hypothalamus (HYP) than in the remaining comparisons. In addition, the comparison of differential transcription start sites showed that the number of these sites is generally increased in comparisons including hypothalamus in contrast to other pairwise assessments. A comprehensive analysis of one of the tissue contrasts, i.e. cerebellum versus heart for differential variation at the gene, isoform, and transcription start site (TSS), and promoter level showed that several of the genes differed at all four levels. Interestingly, these genes were mainly annotated to the “electron transport chain” and neuronal differentiation, emphasizing that “tissue important” genes are regulated at several levels. Furthermore, our analysis shows that the “across tissue approach” has a promising potential when screening for possible explanations for variations, such as those observed at the gene expression levels.« less
CoCoNUT: an efficient system for the comparison and analysis of genomes
2008-01-01
Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit) that allows solving several different tasks in a unified framework: (1) finding regions of high similarity among multiple genomic sequences and aligning them, (2) comparing two draft or multi-chromosomal genomes, (3) locating large segmental duplications in large genomic sequences, and (4) mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component), CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics. PMID:19014477
Adam, Benoit; Charloteaux, Benoit; Beaufays, Jerome; Vanhamme, Luc; Godfroid, Edmond; Brasseur, Robert; Lins, Laurence
2008-01-01
Background Lipocalins are widely distributed in nature and are found in bacteria, plants, arthropoda and vertebra. In hematophagous arthropods, they are implicated in the successful accomplishment of the blood meal, interfering with platelet aggregation, blood coagulation and inflammation and in the transmission of disease parasites such as Trypanosoma cruzi and Borrelia burgdorferi. The pairwise sequence identity is low among this family, often below 30%, despite a well conserved tertiary structure. Under the 30% identity threshold, alignment methods do not correctly assign and align proteins. The only safe way to assign a sequence to that family is by experimental determination. However, these procedures are long and costly and cannot always be applied. A way to circumvent the experimental approach is sequence and structure analyze. To further help in that task, the residues implicated in the stabilisation of the lipocalin fold were determined. This was done by analyzing the conserved interactions for ten lipocalins having a maximum pairwise identity of 28% and various functions. Results It was determined that two hydrophobic clusters of residues are conserved by analysing the ten lipocalin structures and sequences. One cluster is internal to the barrel, involving all strands and the 310 helix. The other is external, involving four strands and the helix lying parallel to the barrel surface. These clusters are also present in RaHBP2, a unusual "outlier" lipocalin from tick Rhipicephalus appendiculatus. This information was used to assess assignment of LIR2 a protein from Ixodes ricinus and to build a 3D model that helps to predict function. FTIR data support the lipocalin fold for this protein. Conclusion By sequence and structural analyzes, two conserved clusters of hydrophobic residues in interactions have been identified in lipocalins. Since the residues implicated are not conserved for function, they should provide the minimal subset necessary to confer the lipocalin fold. This information has been used to assign LIR2 to lipocalins and to investigate its structure/function relationship. This study could be applied to other protein families with low pairwise similarity, such as the structurally related fatty acid binding proteins or avidins. PMID:18190694
SARA-Coffee web server, a tool for the computation of RNA sequence and structure multiple alignments
Di Tommaso, Paolo; Bussotti, Giovanni; Kemena, Carsten; Capriotti, Emidio; Chatzou, Maria; Prieto, Pablo; Notredame, Cedric
2014-01-01
This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for every sequence, column and aligned residue. SARA-Coffee combines SARA, a pairwise structural RNA aligner with the R-Coffee multiple RNA aligner in a way that has been shown to improve alignment accuracy over most sequence aligners when enough structural data is available. The server can be accessed from http://tcoffee.crg.cat/apps/tcoffee/do:saracoffee. PMID:24972831
Trading genes along the silk road: mtDNA sequences and the origin of central Asian populations.
Comas, D; Calafell, F; Mateu, E; Pérez-Lezaun, A; Bosch, E; Martínez-Arias, R; Clarimon, J; Facchini, F; Fiori, G; Luiselli, D; Pettener, D; Bertranpetit, J
1998-01-01
Central Asia is a vast region at the crossroads of different habitats, cultures, and trade routes. Little is known about the genetics and the history of the population of this region. We present the analysis of mtDNA control-region sequences in samples of the Kazakh, the Uighurs, the lowland Kirghiz, and the highland Kirghiz, which we have used to address both the population history of the region and the possible selective pressures that high altitude has on mtDNA genes. Central Asian mtDNA sequences present features intermediate between European and eastern Asian sequences, in several parameters-such as the frequencies of certain nucleotides, the levels of nucleotide diversity, mean pairwise differences, and genetic distances. Several hypotheses could explain the intermediate position of central Asia between Europe and eastern Asia, but the most plausible would involve extensive levels of admixture between Europeans and eastern Asians in central Asia, possibly enhanced during the Silk Road trade and clearly after the eastern and western Eurasian human groups had diverged. Lowland and highland Kirghiz mtDNA sequences are very similar, and the analysis of molecular variance has revealed that the fraction of mitochondrial genetic variance due to altitude is not significantly different from zero. Thus, it seems unlikely that altitude has exerted a major selective pressure on mitochondrial genes in central Asian populations. PMID:9837835
Nepusz, Tamás; Sasidharan, Rajkumar; Paccanaro, Alberto
2010-03-09
An important problem in genomics is the automatic inference of groups of homologous proteins from pairwise sequence similarities. Several approaches have been proposed for this task which are "local" in the sense that they assign a protein to a cluster based only on the distances between that protein and the other proteins in the set. It was shown recently that global methods such as spectral clustering have better performance on a wide variety of datasets. However, currently available implementations of spectral clustering methods mostly consist of a few loosely coupled Matlab scripts that assume a fair amount of familiarity with Matlab programming and hence they are inaccessible for large parts of the research community. SCPS (Spectral Clustering of Protein Sequences) is an efficient and user-friendly implementation of a spectral method for inferring protein families. The method uses only pairwise sequence similarities, and is therefore practical when only sequence information is available. SCPS was tested on difficult sets of proteins whose relationships were extracted from the SCOP database, and its results were extensively compared with those obtained using other popular protein clustering algorithms such as TribeMCL, hierarchical clustering and connected component analysis. We show that SCPS is able to identify many of the family/superfamily relationships correctly and that the quality of the obtained clusters as indicated by their F-scores is consistently better than all the other methods we compared it with. We also demonstrate the scalability of SCPS by clustering the entire SCOP database (14,183 sequences) and the complete genome of the yeast Saccharomyces cerevisiae (6,690 sequences). Besides the spectral method, SCPS also implements connected component analysis and hierarchical clustering, it integrates TribeMCL, it provides different cluster quality tools, it can extract human-readable protein descriptions using GI numbers from NCBI, it interfaces with external tools such as BLAST and Cytoscape, and it can produce publication-quality graphical representations of the clusters obtained, thus constituting a comprehensive and effective tool for practical research in computational biology. Source code and precompiled executables for Windows, Linux and Mac OS X are freely available at http://www.paccanarolab.org/software/scps.
TaxI: a software tool for DNA barcoding using distance methods
Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel
2005-01-01
DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755
NASA Astrophysics Data System (ADS)
Amiroch, S.; Pradana, M. S.; Irawan, M. I.; Mukhlash, I.
2017-09-01
Multiple Alignment (MA) is a particularly important tool for studying the viral genome and determine the evolutionary process of the specific virus. Application of MA in the case of the spread of the Severe acute respiratory syndrome (SARS) epidemic is an interesting thing because this virus epidemic a few years ago spread so quickly that medical attention in many countries. Although there has been a lot of software to process multiple sequences, but the use of pairwise alignment to process MA is very important to consider. In previous research, the alignment between the sequences to process MA algorithm, Super Pairwise Alignment, but in this study used a dynamic programming algorithm Needleman wunchs simulated in Matlab. From the analysis of MA obtained and stable region and unstable which indicates the position where the mutation occurs, the system network topology that produced the phylogenetic tree of the SARS epidemic distance method, and system area networks mutation.
Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.
Khaled, Heba; Faheem, Hossam El Deen Mostafa; El Gohary, Rania
2015-01-01
This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.
Qu, Cheng; Fu, Ningning; Xu, Yihua
2016-01-01
The sycamore lace bug, Corythucha ciliata (Hemiptera: Tingidae), is an invasive forestry pest rapidly expanding in many countries. This pest poses a considerable threat to the urban forestry ecosystem, especially to Platanus spp. However, its molecular biology and biochemistry are poorly understood. This study reports the first C. ciliata transcriptome, encompassing three different life stages (Nymphs, adults female (AF) and adults male (AM)). In total, 26.53 GB of clean data and 60,879 unigenes were obtained from three RNA-seq libraries. These unigenes were annotated and classified by Nr (NCBI non-redundant protein sequences), Nt (NCBI non-redundant nucleotide sequences), Pfam (Protein family), KOG/COG (Clusters of Orthologous Groups of proteins), Swiss-Prot (A manually annotated and reviewed protein sequence database), and KO (KEGG Ortholog database). After all pairwise comparisons between these three different samples, a large number of differentially expressed genes were revealed. The dramatic differences in global gene expression profiles were found between distinct life stages (nymphs and AF, nymphs and AM) and sex difference (AF and AM), with some of the significantly differentially expressed genes (DEGs) being related to metamorphosis, digestion, immune and sex difference. The different express of unigenes were validated through quantitative Real-Time PCR (qRT-PCR) for 16 randomly selected unigenes. In addition, 17,462 potential simple sequence repeat molecular markers were identified in these transcriptome resources. These comprehensive C. ciliata transcriptomic information can be utilized to promote the development of environmentally friendly methodologies to disrupt the processes of metamorphosis, digestion, immune and sex differences. PMID:27494615
David, Fabrice P A; Yip, Yum L
2008-09-23
Sequences and structures provide valuable complementary information on protein features and functions. However, it is not always straightforward for users to gather information concurrently from the sequence and structure levels. The UniProt knowledgebase (UniProtKB) strives to help users on this undertaking by providing complete cross-references to Protein Data Bank (PDB) as well as coherent feature annotation using available structural information. In this study, SSMap - a new UniProt-PDB residue-residue level mapping - was generated. The primary objective of this mapping is not only to facilitate the two tasks mentioned above, but also to palliate a number of shortcomings of existent mappings. SSMap is the first isoform sequence-specific mapping resource and is up-to-date for UniProtKB annotation tasks. The method employed by SSMap differs from the other mapping resources in that it stresses on the correct reconstruction of the PDB sequence from structures, and on the correct attribution of a UniProtKB entry to each PDB chain by using a series of post-processing steps. SSMap was compared to other existing mapping resources in terms of the correctness of the attribution of PDB chains to UniProtKB entries, and of the quality of the pairwise alignments supporting the residue-residue mapping. It was found that SSMap shared about 80% of the mappings with other mapping sources. New and alternative mappings proposed by SSMap were mostly good as assessed by manual verification of data subsets. As for local pairwise alignments, it was shown that major discrepancies (both in terms of alignment lengths and boundaries), when present, were often due to differences in methodologies used for the mappings. SSMap provides an independent, good quality UniProt-PDB mapping. The systematic comparison conducted in this study allows the further identification of general problems in UniProt-PDB mappings so that both the coverage and the quality of the mappings can be systematically improved for the benefit of the scientific community. SSMap mapping is currently used to provide PDB cross-references in UniProtKB.
Srinivas, T N R; Aditya, S; Bhumika, V; Kumar, P Anil
2014-02-01
Novel pinkish-orange pigmented, Gram-negative staining, half-moon shaped, non-motile, strictly aerobic strains designated AK24(T) and AK26 were isolated from water and sediment samples of Lonar Lake, Buldhana district, Maharahstra, India. Both strains were positive for oxidase, catalase and β-galactosidase activities. The predominant fatty acids were iso-C15:0 (41.5%), anteiso-C15:0 (9.7%), iso-C17:0 3OH (9.6%), iso-C17:1 ω9c (10.2%) and C16:1 ω7c/C16:1 ω6c/iso-C15:0 2OH (summed feature 3) (14.4%). The strains contained MK-7 as the major respiratory quinone, and phosphatidylethanolamine and five unidentified lipids as the polar lipids. Blast analysis of the 16S rRNA gene sequence of strain AK24(T) showed that it was closely related to Aquiflexum balticum, with a pair-wise sequence similarity of 91.6%, as well as to Fontibacter ferrireducens, Belliella baltica and Indibacter alkaliphilus (91.3, 91.2 and 91.2% pair-wise sequence similarity, respectively), but it only had between 88.6 and 91.0% pair-wise sequence similarity to the rest of the family members. The MALDI-TOF assay reported no significant similarities for AK24(T) and AK26, since they potentially represented a new species. A MALDI MSP dendrogram showed close similarity between the two strains, but they maintained a distance from their phylogenetic neighbors. The genome of AK24(T) showed the presence of heavy metal tolerance genes, including the genes providing resistance to arsenic, cadmium, cobalt and zinc. A cluster of heat shock resistance genes was also found in the genome. Two lantibiotic producing genes, LanR and LasB, were also found in the genome of AK24(T). Strains AK24(T) and AK26 were very closely related to each other with 99.5% pair-wise sequence similarity. Phylogenetic analysis indicated that the strains were members of the family Cyclobacteriaceae and they clustered with the genus Mariniradius, as well as with the genera Aquiflexum, Cecembia, Fontibacter, Indibacter, and Shivajiella. DNA-DNA hybridization between strains AK24(T) and AK26 showed a relatedness of 82% and their rep-PCR banding patterns were very similar. Based on data from the current polyphasic study, it is proposed that the isolates be placed in a new genus and species with the name Lunatimonas lonarensis gen. nov., sp. nov. The type strain of Lunatimonas lonarensis is AK24(T) (=JCM 18822(T)=MTCC 11627(T)). Copyright © 2013 Elsevier GmbH. All rights reserved.
Králová-Hromadová, Ivica; Štefka, Jan; Bazsalovicsová, Eva; Bokorová, Silvia; Oros, Mikuláš
2013-10-01
Atractolytocestus tenuicollis (Li, 1964) Xi, Wang, Wu, Gao et Nie, 2009 is a monozoic, non-segmented tapeworm of the order Caryophyllidea, parasitizing exclusively common carp (Cyprinus carpio L.). In the current work, the first molecular data, in particular complete ribosomal internal transcribed spacer 2 (ITS2) and partial mitochondrial cytochrome c oxidase subunit I (cox1) on A. tenuicollis from Niushan Lake, Wuhan, China, are provided. In order to evaluate molecular interrelationships within Atractolytocestus, the data on A. tenuicollis were compared with relevant data on two other congeners, Atractolytocestus huronensis and Atractolytocestus sagittatus. Divergent intragenomic copies (ITS2 paralogues) were detected in the ITS2 ribosomal spacer of A. tenuicollis; the same phenomenon has previously been observed also in two other congeners. ITS2 structure of A. tenuicollis was very similar to that of A. huronensis from Slovakia, USA and UK; overall pairwise sequence identity was 91.7-95.2%. On the other hand, values of sequence identity between A. tenuicollis and A. sagittatus were lower, 69.7-70.9%. Cox1 sequence, analysed in five A. tenuicollis individuals, were 100 % identical and no intraspecific variation was observed. Comparison of A. tenuicollis cox1 with respective sequences of two other Atractolytocestus species showed that the mitochondrial haplotype found in Chinese A. tenuicollis is structurally specific (haplotype 4; Ha4) and differs from all so far determined Atractolytocestus haplotypes (Ha1 and Ha2 for A. huronensis; Ha3 for A. sagittatus). Pairwise sequence identity between A. tenuicollis cox1 haplotype and remaining three haplotypes followed the same pattern as in ITS2. The nucleotide and amino acide (aa) sequence comparison with A. huronensis Ha1 and Ha2 revealed higher sequence identity, 90.3-90.8% (96.9% in aa), while lower values were achieved between A. tenuicollis haplotype and Ha3 of Japanese A. sagittatus-75.2 % (81.9 % in aa). The phylogenetic analyses using cox1, ITS2 and combined cox1 + ITS2 sequences revealed close genetic interrelationship between A. tenuicollis and A. huronensis. Independently of a type of analysis and DNA region used, the topology of obtained trees was always identical; A. tenuicollis formed separate clade with A. huronensis forming a closely related sister group.
Briddon, Rob W; Martin, Darren P; Roumagnac, Philippe; Navas-Castillo, Jesús; Fiallo-Olivé, Elvira; Moriones, Enrique; Lett, Jean-Michel; Zerbini, F Murilo; Varsani, Arvind
2018-05-09
Nanoviruses and geminiviruses are circular, single stranded DNA viruses that infect many plant species around the world. Nanoviruses and certain geminiviruses that belong to the Begomovirus and Mastrevirus genera are associated with additional circular, single stranded DNA molecules (~ 1-1.4 kb) that encode a replication-associated protein (Rep). These Rep-encoding satellite molecules are commonly referred to as alphasatellites and here we communicate the establishment of the family Alphasatellitidae to which these have been assigned. Within the Alphasatellitidae family two subfamilies, Geminialphasatellitinae and Nanoalphasatellitinae, have been established to respectively accommodate the geminivirus- and nanovirus-associated alphasatellites. Whereas the pairwise nucleotide sequence identity distribution of all the known geminialphasatellites (n = 628) displayed a troughs at ~ 70% and 88% pairwise identity, that of the known nanoalphasatellites (n = 54) had a troughs at ~ 67% and ~ 80% pairwise identity. We use these pairwise identity values as thresholds together with phylogenetic analyses to establish four genera and 43 species of geminialphasatellites and seven genera and 19 species of nanoalphasatellites. Furthermore, a divergent alphasatellite associated with coconut foliar decay disease is assigned to a species but not a subfamily as it likely represents a new alphasatellite subfamily that could be established once other closely related molecules are discovered.
Dynamic facial expression recognition based on geometric and texture features
NASA Astrophysics Data System (ADS)
Li, Ming; Wang, Zengfu
2018-04-01
Recently, dynamic facial expression recognition in videos has attracted growing attention. In this paper, we propose a novel dynamic facial expression recognition method by using geometric and texture features. In our system, the facial landmark movements and texture variations upon pairwise images are used to perform the dynamic facial expression recognition tasks. For one facial expression sequence, pairwise images are created between the first frame and each of its subsequent frames. Integration of both geometric and texture features further enhances the representation of the facial expressions. Finally, Support Vector Machine is used for facial expression recognition. Experiments conducted on the extended Cohn-Kanade database show that our proposed method can achieve a competitive performance with other methods.
DNA Barcodes of Asian Houbara Bustard (Chlamydotis undulata macqueenii)
Arif, Ibrahim A.; Khan, Haseeb A.; Williams, Joseph B.; Shobrak, Mohammad; Arif, Waad I.
2012-01-01
Populations of Houbara Bustards have dramatically declined in recent years. Captive breeding and reintroduction programs have had limited success in reviving population numbers and thus new technological solutions involving molecular methods are essential for the long term survival of this species. In this study, we sequenced the 694 bp segment of COI gene of the four specimens of Asian Houbara Bustard (Chlamydotis undulata macqueenii). We also compared these sequences with earlier published barcodes of 11 individuals comprising different families of the orders Gruiformes, Ciconiiformes, Podicipediformes and Crocodylia (out group). The pair-wise sequence comparison showed a total of 254 variable sites across all the 15 sequences from different taxa. Three of the four specimens of Houbara Bustard had an identical sequence of COI gene and one individual showed a single nucleotide difference (G > A transition at position 83). Within the bustard family (Otididae), comparison among the three species (Asian Houbara Bustard, Great Bustard (Otis tarda) and the Little Bustard (Tetrax tetrax)), representing three different genera, showed 116 variable sites. For another family (Rallidae), the intra-family variable sites among the individuals of four different genera were found to be 146. The COI genetic distances among the 15 individuals varied from 0.000 to 0.431. Phylogenetic analysis using 619 bp nucleotide segment of COI clearly discriminated all the species representing different genera, families and orders. All the four specimens of Houbara Bustard formed a single clade and are clearly separated from other two individuals of the same family (Otis tarda and Tetrax tetrax). The nucleotide sequence of partial segment of COI gene effectively discriminated the closely related species. This is the first study reporting the barcodes of Houbara Bustard and would be helpful in future molecular studies, particularly for the conservation of this threatened bird in Saudi Arabia. PMID:22408462
Protein alignment algorithms with an efficient backtracking routine on multiple GPUs.
Blazewicz, Jacek; Frohmberg, Wojciech; Kierzynka, Michal; Pesch, Erwin; Wojciechowski, Pawel
2011-05-20
Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.
Biological, serological and molecular typing of potato virus Y (PVY) isolates from Tunisia.
Tayahi, M; Gharsallah, C; Khamassy, N; Fakhfakh, H; Djilani-Khouadja, F
2016-10-17
In Tunisia, potato virus Y (PVY) currently presents a significant threat to potato production, reducing tuber yield and quality. Three hundred and eighty-five potato samples (six different cultivars) collected in autumn 2007 from nine regions in Tunisia were tested for PVY infection by DAS-ELISA. The virus was detected in all regions surveyed, with an average incidence of 80.26%. Subsequently, a panel of 82 Tunisian PVY isolates (PVY-TN) was subjected to systematic biological, serological and molecular typing using immunocapture reverse-transcription polymerase chain reaction and a series of PVY OC - and PVY N -specific monoclonal antibodies. Combined analyses revealed ~67% of PVY NTN variants of which 17 were sequenced in the 5'NTR-P1 region to assess the genetic diversity and phylogenetic relationship of PVY-TN against other worldwide PVY isolates. To investigate whether selective constraints could act on viral genomic RNA, synonymous and non-synonymous substitution rates and their ratio were analyzed. Averages of all pairwise comparisons obtained in the 5'NTR-P1 region allowed more synonymous changes, suggesting selective constraint acting in this region. Selective neutrality test was significantly negative, suggesting a rapid expansion of PVY isolates. Pairwise mismatch distribution gave a bimodal pattern and pointed to an eventually early evolution characterizing these sequences. Genetic haplotype network topology provided evidence of the existence of a distinct geographical structure. This is the first report of such genetic analyses conducted on PVY isolates from Tunisia.
Gene order in rosid phylogeny, inferred from pairwise syntenies among extant genomes
2012-01-01
Background Ancestral gene order reconstruction for flowering plants has lagged behind developments in yeasts, insects and higher animals, because of the recency of widespread plant genome sequencing, sequencers' embargoes on public data use, paralogies due to whole genome duplication (WGD) and fractionation of undeleted duplicates, extensive paralogy from other sources, and the computational cost of existing methods. Results We address these problems, using the gene order of four core eudicot genomes (cacao, castor bean, papaya and grapevine) that have escaped any recent WGD events, and two others (poplar and cucumber) that descend from independent WGDs, in inferring the ancestral gene order of the rosid clade and those of its main subgroups, the fabids and malvids. We improve and adapt techniques including the OMG method for extracting large, paralogy-free, multiple orthologies from conflated pairwise synteny data among the six genomes and the PATHGROUPS approach for ancestral gene order reconstruction in a given phylogeny, where some genomes may be descendants of WGD events. We use the gene order evidence to evaluate the hypothesis that the order Malpighiales belongs to the malvids rather than as traditionally assigned to the fabids. Conclusions Gene orders of ancestral eudicot species, involving 10,000 or more genes can be reconstructed in an efficient, parsimonious and consistent way, despite paralogies due to WGD and other processes. Pairwise genomic syntenies provide appropriate input to a parameter-free procedure of multiple ortholog identification followed by gene-order reconstruction in solving instances of the "small phylogeny" problem. PMID:22759433
Ultrafast Comparison of Personal Genomes via Precomputed Genome Fingerprints.
Glusman, Gustavo; Mauldin, Denise E; Hood, Leroy E; Robinson, Max
2017-01-01
We present an ultrafast method for comparing personal genomes. We transform the standard genome representation (lists of variants relative to a reference) into "genome fingerprints" via locality sensitive hashing. The resulting genome fingerprints can be meaningfully compared even when the input data were obtained using different sequencing technologies, processed using different pipelines, represented in different data formats and relative to different reference versions. Furthermore, genome fingerprints are robust to up to 30% missing data. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. For example, we could compute all-against-all pairwise comparisons among the 2504 genomes in the 1000 Genomes data set in 67 s at high quality (21 μs per comparison, on a single processor), and achieved a lower quality approximation in just 11 s. Efficient computation enables scaling up a variety of important genome analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genome representation cannot be reconstructed from its fingerprint, effectively decoupling genome comparison from genome interpretation; the method thus has significant implications for privacy-preserving genome analytics.
Exact calculation of distributions on integers, with application to sequence alignment.
Newberg, Lee A; Lawrence, Charles E
2009-01-01
Computational biology is replete with high-dimensional discrete prediction and inference problems. Dynamic programming recursions can be applied to several of the most important of these, including sequence alignment, RNA secondary-structure prediction, phylogenetic inference, and motif finding. In these problems, attention is frequently focused on some scalar quantity of interest, a score, such as an alignment score or the free energy of an RNA secondary structure. In many cases, score is naturally defined on integers, such as a count of the number of pairing differences between two sequence alignments, or else an integer score has been adopted for computational reasons, such as in the test of significance of motif scores. The probability distribution of the score under an appropriate probabilistic model is of interest, such as in tests of significance of motif scores, or in calculation of Bayesian confidence limits around an alignment. Here we present three algorithms for calculating the exact distribution of a score of this type; then, in the context of pairwise local sequence alignments, we apply the approach so as to find the alignment score distribution and Bayesian confidence limits.
Molecular systematics of higher primates: genealogical relations and classification.
Miyamoto, M M; Koop, B F; Slightom, J L; Goodman, M; Tennant, M R
1988-01-01
We obtained 5' and 3' flanking sequences (5.4 kilobase pairs) from the psi eta-globin gene region of the rhesus macaque (Macaca mulatta) and combined them with available nucleotide data. The completed sequence, representing 10.8 kilobase pairs of contiguous noncoding DNA, was compared to the same orthologous regions available for human (Homo sapiens, as represented by five different alleles), common chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla), and orangutan (Pongo pygmaeus). The nucleotide sequence for Macaca mulatta provided the outgroup perspective needed to evaluate better the relationships of humans and great apes. Pairwise comparisons and parsimony analysis of these orthologues clearly demonstrated (i) that humans and great apes share a high degree of genetic similarity and (ii) that humans, chimpanzees, and gorillas form a natural monophyletic group. These conclusions strongly favor a genealogical classification for higher primates consisting of a single family (Hominidae) with two subfamilies (Homininae for Homo, Pan, and Gorilla and Ponginae for Pongo). PMID:3174657
Chan, Agnes P; Choi, Yongwook; Brinkac, Lauren M; Krishnakumar, Radha; DePew, Jessica; Kim, Maria; Hinkle, Mary K; Lesho, Emil P; Fouts, Derrick E
2018-06-05
In light of the ongoing antimicrobial resistance crisis, there is a need to understand the role of co-pathogens, commensals, and the local microbiome in modulating virulence and antibiotic resistance. To identify possible interactions that influence the expression of virulence or survival mechanisms in both the multidrug-resistant organisms (MDROs) and human host cells, unique cohorts of clinical isolates were selected for whole genome sequencing with enhanced assembly and full annotation, pairwise co-culturing, and transcriptome profiling. The MDROs were co-cultured in pairwise combinations either with: (1) another MDRO, (2) skin commensals (Staphylococcus epidermidis and Corynebacterium jeikeium), (3) the common probiotic Lactobacillus reuteri, and (4) human fibroblasts. RNA-Seq analysis showed distinct regulation of virulence and antimicrobial resistance gene responses across different combinations of MDROs, commensals, and human cells. Co-culture assays demonstrated that microbial interactions can modulate gene responses of both the target and pathogen/commensal species, and that the responses are specific to the identity of the pathogen/commensal species. In summary, bacteria have mechanisms to distinguish between friends, foe and host cells. These results provide foundational data and insight into the possibility of manipulating the local microbiome when treating complicated polymicrobial wound, intra-abdominal, or respiratory infections.
Mikaeili, F; Mirhendi, H; Mohebali, M; Hosseini, M; Sharbatkhori, M; Zarei, Z; Kia, E B
2015-07-01
The study was conducted to determine the sequence variation in two mitochondrial genes, namely cytochrome c oxidase 1 (pcox1) and NADH dehydrogenase 1 (pnad1) within and among isolates of Toxocara cati, Toxocara canis and Toxascaris leonina. Genomic DNA was extracted from 32 isolates of T. cati, 9 isolates of T. canis and 19 isolates of T. leonina collected from cats and dogs in different geographical areas of Iran. Mitochondrial genes were amplified by polymerase chain reaction (PCR) and sequenced. Sequence data were aligned using the BioEdit software and compared with published sequences in GenBank. Phylogenetic analysis was performed using Bayesian inference and maximum likelihood methods. Based on pairwise comparison, intra-species genetic diversity within Iranian isolates of T. cati, T. canis and T. leonina amounted to 0-2.3%, 0-1.3% and 0-1.0% for pcox1 and 0-2.0%, 0-1.7% and 0-2.6% for pnad1, respectively. Inter-species sequence variation among the three ascaridoid nematodes was significantly higher, being 9.5-16.6% for pcox1 and 11.9-26.7% for pnad1. Sequence and phylogenetic analysis of the pcox1 and pnad1 genes indicated that there is significant genetic diversity within and among isolates of T. cati, T. canis and T. leonina from different areas of Iran, and these genes can be used for studying genetic variation of ascaridoid nematodes.
Drummond, A; Rodrigo, A G
2000-12-01
Reconstruction of evolutionary relationships from noncontemporaneous molecular samples provides a new challenge for phylogenetic reconstruction methods. With recent biotechnological advances there has been an increase in molecular sequencing throughput, and the potential to obtain serial samples of sequences from populations, including rapidly evolving pathogens, is fast being realized. A new method called the serial-sample unweighted pair grouping method with arithmetic means (sUPGMA) is presented that reconstructs a genealogy or phylogeny of sequences sampled serially in time using a matrix of pairwise distances. The resulting tree depicts the terminal lineages of each sample ending at a different level consistent with the sample's temporal order. Since sUPGMA is a variant of UPGMA, it will perform best when sequences have evolved at a constant rate (i.e., according to a molecular clock). On simulated data, this new method performs better than standard cluster analysis under a variety of longitudinal sampling strategies. Serial-sample UPGMA is particularly useful for analysis of longitudinal samples of viruses and bacteria, as well as ancient DNA samples, with the minimal requirement that samples of sequences be ordered in time.
GATA: A graphic alignment tool for comparative sequenceanalysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nix, David A.; Eisen, Michael B.
2005-01-01
Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
Assessment of the cPAS-based BGISEQ-500 platform for metagenomic sequencing.
Fang, Chao; Zhong, Huanzi; Lin, Yuxiang; Chen, Bing; Han, Mo; Ren, Huahui; Lu, Haorong; Luber, Jacob M; Xia, Min; Li, Wangsheng; Stein, Shayna; Xu, Xun; Zhang, Wenwei; Drmanac, Radoje; Wang, Jian; Yang, Huanming; Hammarström, Lennart; Kostic, Aleksandar D; Kristiansen, Karsten; Li, Junhua
2018-03-01
More extensive use of metagenomic shotgun sequencing in microbiome research relies on the development of high-throughput, cost-effective sequencing. Here we present a comprehensive evaluation of the performance of the new high-throughput sequencing platform BGISEQ-500 for metagenomic shotgun sequencing and compare its performance with that of 2 Illumina platforms. Using fecal samples from 20 healthy individuals, we evaluated the intra-platform reproducibility for metagenomic sequencing on the BGISEQ-500 platform in a setup comprising 8 library replicates and 8 sequencing replicates. Cross-platform consistency was evaluated by comparing 20 pairwise replicates on the BGISEQ-500 platform vs the Illumina HiSeq 2000 platform and the Illumina HiSeq 4000 platform. In addition, we compared the performance of the 2 Illumina platforms against each other. By a newly developed overall accuracy quality control method, an average of 82.45 million high-quality reads (96.06% of raw reads) per sample, with 90.56% of bases scoring Q30 and above, was obtained using the BGISEQ-500 platform. Quantitative analyses revealed extremely high reproducibility between BGISEQ-500 intra-platform replicates. Cross-platform replicates differed slightly more than intra-platform replicates, yet a high consistency was observed. Only a low percentage (2.02%-3.25%) of genes exhibited significant differences in relative abundance comparing the BGISEQ-500 and HiSeq platforms, with a bias toward genes with higher GC content being enriched on the HiSeq platforms. Our study provides the first set of performance metrics for human gut metagenomic sequencing data using BGISEQ-500. The high accuracy and technical reproducibility confirm the applicability of the new platform for metagenomic studies, though caution is still warranted when combining metagenomic data from different platforms.
Functional brain activation differences in stuttering identified with a rapid fMRI sequence
Kraft, Shelly Jo; Choo, Ai Leen; Sharma, Harish; Ambrose, Nicoline G.
2011-01-01
The purpose of this study was to investigate whether brain activity related to the presence of stuttering can be identified with rapid functional MRI (fMRI) sequences that involved overt and covert speech processing tasks. The long-term goal is to develop sensitive fMRI approaches with developmentally appropriate tasks to identify deviant speech motor and auditory brain activity in children who stutter closer to the age at which recovery from stuttering is documented. Rapid sequences may be preferred for individuals or populations who do not tolerate long scanning sessions. In this report, we document the application of a picture naming and phoneme monitoring task in three minute fMRI sequences with adults who stutter (AWS). If relevant brain differences are found in AWS with these approaches that conform to previous reports, then these approaches can be extended to younger populations. Pairwise contrasts of brain BOLD activity between AWS and normally fluent adults indicated the AWS showed higher BOLD activity in the right inferior frontal gyrus (IFG), right temporal lobe and sensorimotor cortices during picture naming and and higher activity in the right IFG during phoneme monitoring. The right lateralized pattern of BOLD activity together with higher activity in sensorimotor cortices is consistent with previous reports, which indicates rapid fMRI sequences can be considered for investigating stuttering in younger participants. PMID:22133409
POEM: Identifying Joint Additive Effects on Regulatory Circuits.
Botzman, Maya; Nachshon, Aharon; Brodt, Avital; Gat-Viks, Irit
2016-01-01
Expression Quantitative Trait Locus (eQTL) mapping tackles the problem of identifying variation in DNA sequence that have an effect on the transcriptional regulatory network. Major computational efforts are aimed at characterizing the joint effects of several eQTLs acting in concert to govern the expression of the same genes. Yet, progress toward a comprehensive prediction of such joint effects is limited. For example, existing eQTL methods commonly discover interacting loci affecting the expression levels of a module of co-regulated genes. Such "modularization" approaches, however, are focused on epistatic relations and thus have limited utility for the case of additive (non-epistatic) effects. Here we present POEM (Pairwise effect On Expression Modules), a methodology for identifying pairwise eQTL effects on gene modules. POEM is specifically designed to achieve high performance in the case of additive joint effects. We applied POEM to transcription profiles measured in bone marrow-derived dendritic cells across a population of genotyped mice. Our study reveals widespread additive, trans-acting pairwise effects on gene modules, characterizes their organizational principles, and highlights high-order interconnections between modules within the immune signaling network. These analyses elucidate the central role of additive pairwise effect in regulatory circuits, and provide computational tools for future investigations into the interplay between eQTLs. The software described in this article is available at csgi.tau.ac.il/POEM/.
POEM: Identifying Joint Additive Effects on Regulatory Circuits
Botzman, Maya; Nachshon, Aharon; Brodt, Avital; Gat-Viks, Irit
2016-01-01
Motivation: Expression Quantitative Trait Locus (eQTL) mapping tackles the problem of identifying variation in DNA sequence that have an effect on the transcriptional regulatory network. Major computational efforts are aimed at characterizing the joint effects of several eQTLs acting in concert to govern the expression of the same genes. Yet, progress toward a comprehensive prediction of such joint effects is limited. For example, existing eQTL methods commonly discover interacting loci affecting the expression levels of a module of co-regulated genes. Such “modularization” approaches, however, are focused on epistatic relations and thus have limited utility for the case of additive (non-epistatic) effects. Results: Here we present POEM (Pairwise effect On Expression Modules), a methodology for identifying pairwise eQTL effects on gene modules. POEM is specifically designed to achieve high performance in the case of additive joint effects. We applied POEM to transcription profiles measured in bone marrow-derived dendritic cells across a population of genotyped mice. Our study reveals widespread additive, trans-acting pairwise effects on gene modules, characterizes their organizational principles, and highlights high-order interconnections between modules within the immune signaling network. These analyses elucidate the central role of additive pairwise effect in regulatory circuits, and provide computational tools for future investigations into the interplay between eQTLs. Availability: The software described in this article is available at csgi.tau.ac.il/POEM/. PMID:27148351
Stability of Tandem Repeats in the Drosophila Melanogaster HSR-Omega Nuclear RNA
Hogan, N. C.; Slot, F.; Traverse, K. L.; Garbe, J. C.; Bendena, W. G.; Pardue, M. L.
1995-01-01
The Drosophila melanogaster Hsr-omega locus produces a nuclear RNA containing >5 kb of tandem repeat sequences. These repeats are unique to Hsr-omega and show concerted evolution similar to that seen with classical satellite DNAs. In D. melanogaster the monomer is ~280 bp. Sequences of 191/2 monomers differ by 8 +/- 5% (mean +/- SD), when all pairwise comparisons are considered. Differences are single nucleotide substitutions and 1-3 nucleotide deletions/insertions. Changes appear to be randomly distributed over the repeat unit. Outer repeats do not show the decrease in monomer homogeneity that might be expected if homogeneity is maintained by recombination. However, just outside the last complete repeat at each end, there are a few fragments of sequence similar to the monomer. The sequences in these flanking regions are not those predicted for sequences decaying in the absence of recombination. Instead, the fragmentation of the sequence homology suggests that flanking regions have undergone more severe disruptions, possibly during an insertion or amplification event. Hsr-omega alleles differing in the number of repeats are detected and appear to be stable over a few thousand generations; however, both increases and decreases in repeat numbers have been observed. The new alleles appear to be as stable as their predecessors. No alleles of less than ~5 kb nor more than ~16 kb of repeats were seen in any stocks examined. The evidence that there is a limit on the minimum number of repeats is consistent with the suggestion that these repeats are important in the function of the unusual Hsr-omega nuclear RNA. PMID:7540581
Phylogeny of the Genus Flavivirus
Kuno, Goro; Chang, Gwong-Jen J.; Tsuchiya, K. Richard; Karabatsos, Nick; Cropp, C. Bruce
1998-01-01
We undertook a comprehensive phylogenetic study to establish the genetic relationship among the viruses of the genus Flavivirus and to compare the classification based on molecular phylogeny with the existing serologic method. By using a combination of quantitative definitions (bootstrap support level and the pairwise nucleotide sequence identity), the viruses could be classified into clusters, clades, and species. Our phylogenetic study revealed for the first time that from the putative ancestor two branches, non-vector and vector-borne virus clusters, evolved and from the latter cluster emerged tick-borne and mosquito-borne virus clusters. Provided that the theory of arthropod association being an acquired trait was correct, pairwise nucleotide sequence identity among these three clusters provided supporting data for a possibility that the non-vector cluster evolved first, followed by the separation of tick-borne and mosquito-borne virus clusters in that order. Clades established in our study correlated significantly with existing antigenic complexes. We also resolved many of the past taxonomic problems by establishing phylogenetic relationships of the antigenically unclassified viruses with the well-established viruses and by identifying synonymous viruses. PMID:9420202
Phylogeny of the genus Flavivirus.
Kuno, G; Chang, G J; Tsuchiya, K R; Karabatsos, N; Cropp, C B
1998-01-01
We undertook a comprehensive phylogenetic study to establish the genetic relationship among the viruses of the genus Flavivirus and to compare the classification based on molecular phylogeny with the existing serologic method. By using a combination of quantitative definitions (bootstrap support level and the pairwise nucleotide sequence identity), the viruses could be classified into clusters, clades, and species. Our phylogenetic study revealed for the first time that from the putative ancestor two branches, non-vector and vector-borne virus clusters, evolved and from the latter cluster emerged tick-borne and mosquito-borne virus clusters. Provided that the theory of arthropod association being an acquired trait was correct, pairwise nucleotide sequence identity among these three clusters provided supporting data for a possibility that the non-vector cluster evolved first, followed by the separation of tick-borne and mosquito-borne virus clusters in that order. Clades established in our study correlated significantly with existing antigenic complexes. We also resolved many of the past taxonomic problems by establishing phylogenetic relationships of the antigenically unclassified viruses with the well-established viruses and by identifying synonymous viruses.
Busk, Peter Kamp; Lange, Lene
2013-06-01
Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.
Automatic Camera Calibration Using Multiple Sets of Pairwise Correspondences.
Vasconcelos, Francisco; Barreto, Joao P; Boyer, Edmond
2018-04-01
We propose a new method to add an uncalibrated node into a network of calibrated cameras using only pairwise point correspondences. While previous methods perform this task using triple correspondences, these are often difficult to establish when there is limited overlap between different views. In such challenging cases we must rely on pairwise correspondences and our solution becomes more advantageous. Our method includes an 11-point minimal solution for the intrinsic and extrinsic calibration of a camera from pairwise correspondences with other two calibrated cameras, and a new inlier selection framework that extends the traditional RANSAC family of algorithms to sampling across multiple datasets. Our method is validated on different application scenarios where a lack of triple correspondences might occur: addition of a new node to a camera network; calibration and motion estimation of a moving camera inside a camera network; and addition of views with limited overlap to a Structure-from-Motion model.
Pereira, J O P; Freitas, B M; Jorge, D M M; Torres, D C; Soares, C E A; Grangeiro, T B
2009-01-01
Melipona quinquefasciata is a ground-nesting South American stingless bee whose geographic distribution was believed to comprise only the central and southern states of Brazil. We obtained partial sequences (about 500-570 bp) of first internal transcribed spacer (ITS1) nuclear ribosomal DNA from Melipona specimens putatively identified as M. quinquefasciata collected from different localities in northeastern Brazil. To confirm the taxonomic identity of the northeastern samples, specimens from the state of Goiás (Central region of Brazil) were included for comparison. All sequences were deposited in GenBank (accession numbers EU073751-EU073759). The mean nucleotide divergence (excluding sites with insertions/deletions) in the ITS1 sequences was only 1.4%, ranging from 0 to 4.1%. When the sites with insertions/deletions were also taken into account, sequence divergences varied from 0 to 5.3%. In all pairwise comparisons, the ITS1 sequence from the specimens collected in Goiás was most divergent compared to the ITS1 sequences of the bees from the other locations. However, neighbor-joining phylogenetic analysis showed that all ITS1 sequences from northeastern specimens along with the sample of Goiás were resolved in a single clade with a bootstrap support of 100%. The ITS1 sequencing data thus support the occurrence of M. quinquefasciata in northeast Brazil.
Kosakovsky Pond, Sergei L; Weaver, Steven; Leigh Brown, Andrew J; Wertheim, Joel O
2018-01-31
In modern applications of molecular epidemiology, genetic sequence data are routinely used to identify clusters of transmission in rapidly evolving pathogens, most notably HIV-1. Traditional 'shoeleather' epidemiology infers transmission clusters by tracing chains of partners sharing epidemiological connections (e.g., sexual contact). Here, we present a computational tool for identifying a molecular transmission analog of such clusters: HIV-TRACE (TRAnsmission Cluster Engine). HIV-TRACE implements an approach inspired by traditional epidemiology, by identifying chains of partners whose viral genetic relatedness imply direct or indirect epidemiological connections. Molecular transmission clusters are constructed using codon-aware pairwise alignment to a reference sequence followed by pairwise genetic distance estimation among all sequences. This approach is computationally tractable and is capable of identifying HIV-1 transmission clusters in large surveillance databases comprising tens or hundreds of thousands of sequences in near real time, i.e., on the order of minutes to hours. HIV-TRACE is available at www.hivtrace.org and from github.com/veg/hivtrace, along with the accompanying result visualization module from github.com/veg/hivtrace-viz. Importantly, the approach underlying HIV-TRACE is not limited to the study of HIV-1 and can be applied to study outbreaks and epidemics of other rapidly evolving pathogens. © The Author 2018. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Identification of a Herbal Powder by Deoxyribonucleic Acid Barcoding and Structural Analyses.
Sheth, Bhavisha P; Thaker, Vrinda S
2015-10-01
Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. To identify a herbal powder obtained from a herbalist in the local vicinity of Rajkot, Gujarat, using deoxyribonucleic acid (DNA) barcoding and molecular tools. The DNA was extracted from a herbal powder and selected Cassia species, followed by the polymerase chain reaction (PCR) and sequencing of the rbcL barcode locus. Thereafter the sequences were subjected to National Center for Biotechnology Information (NCBI) basic local alignment search tool (BLAST) analysis, followed by the protein three-dimension structure determination of the rbcL protein from the herbal powder and Cassia species namely Cassia fistula, Cassia tora and Cassia javanica (sequences obtained in the present study), Cassia Roxburghii, and Cassia abbreviata (sequences retrieved from Genbank). Further, the multiple and pairwise structural alignment were carried out in order to identify the herbal powder. The nucleotide sequences obtained from the selected species of Cassia were submitted to Genbank (Accession No. JX141397, JX141405, JX141420). The NCBI BLAST analysis of the rbcL protein from the herbal powder showed an equal sequence similarity (with reference to different parameters like E value, maximum identity, total score, query coverage) to C. javanica and C. roxburghii. In order to solve the ambiguities of the BLAST result, a protein structural approach was implemented. The protein homology models obtained in the present study were submitted to the protein model database (PM0079748-PM0079753). The pairwise structural alignment of the herbal powder (as template) and C. javanica and C. roxburghii (as targets individually) revealed a close similarity of the herbal powder with C. javanica. A strategy as used here, incorporating the integrated use of DNA barcoding and protein structural analyses could be adopted, as a novel rapid and economic procedure, especially in cases when protein coding loci are considered. Authentic identification of plants is essential for exploiting their medicinal properties as well as to stop the adulteration and malpractices with the trade of the same. A herbal powder was obtained from a herbalist in the local vicinity of Rajkot, Gujarat. An integrated approach using DNA barcoding and structural analyses was carried out to identify the herbal powder. The herbal powder was identified as Cassia javanica L.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-05-01
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-01-01
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
Nanba, K.; King, G. M.; Dunfield, K.
2004-01-01
A 492- to 495-bp fragment of the gene coding for the large subunit of the form I ribulose 1,5-bisphosphate carboxylase/oxygenase (RubisCO) (rbcL) was amplified by PCR from facultatively lithotrophic aerobic CO-oxidizing bacteria, colorless and purple sulfide-oxidizing microbial mats, and genomic DNA extracts from tephra and ash deposits from Kilauea volcano, for which atmospheric CO and hydrogen have been previously documented as important substrates. PCR products from the mats and volcanic sites were used to construct rbcL clone libraries. Phylogenetic analyses showed that the rbcL sequences from all isolates clustered with form IC rbcL sequences derived from facultative lithotrophs. In contrast, the microbial mat clone sequences clustered with sequences from obligate lithotrophs representative of form IA rbcL. Clone sequences from volcanic sites fell within the form IC clade, suggesting that these sites were dominated by facultative lithotrophs, an observation consistent with biogeochemical patterns at the sites. Based on phylogenetic and statistical analyses, clone libraries differed significantly among volcanic sites, indicating that they support distinct lithotrophic assemblages. Although some of the clone sequences were similar to known rbcL sequences, most were novel. Based on nucleotide diversity and average pairwise difference, a forested site and an 1894 lava flow were found to support the most diverse and least diverse lithotrophic populations, respectively. These indices of diversity were not correlated with rates of atmospheric CO and hydrogen uptake but were correlated with estimates of respiration and microbial biomass. PMID:15066819
Nanba, K; King, G M; Dunfield, K
2004-04-01
A 492- to 495-bp fragment of the gene coding for the large subunit of the form I ribulose 1,5-bisphosphate carboxylase/oxygenase (RubisCO) (rbcL) was amplified by PCR from facultatively lithotrophic aerobic CO-oxidizing bacteria, colorless and purple sulfide-oxidizing microbial mats, and genomic DNA extracts from tephra and ash deposits from Kilauea volcano, for which atmospheric CO and hydrogen have been previously documented as important substrates. PCR products from the mats and volcanic sites were used to construct rbcL clone libraries. Phylogenetic analyses showed that the rbcL sequences from all isolates clustered with form IC rbcL sequences derived from facultative lithotrophs. In contrast, the microbial mat clone sequences clustered with sequences from obligate lithotrophs representative of form IA rbcL. Clone sequences from volcanic sites fell within the form IC clade, suggesting that these sites were dominated by facultative lithotrophs, an observation consistent with biogeochemical patterns at the sites. Based on phylogenetic and statistical analyses, clone libraries differed significantly among volcanic sites, indicating that they support distinct lithotrophic assemblages. Although some of the clone sequences were similar to known rbcL sequences, most were novel. Based on nucleotide diversity and average pairwise difference, a forested site and an 1894 lava flow were found to support the most diverse and least diverse lithotrophic populations, respectively. These indices of diversity were not correlated with rates of atmospheric CO and hydrogen uptake but were correlated with estimates of respiration and microbial biomass.
Scalable Creation of Long-Lived Multipartite Entanglement
NASA Astrophysics Data System (ADS)
Kaufmann, H.; Ruster, T.; Schmiegelow, C. T.; Luda, M. A.; Kaushal, V.; Schulz, J.; von Lindenfels, D.; Schmidt-Kaler, F.; Poschinger, U. G.
2017-10-01
We demonstrate the deterministic generation of multipartite entanglement based on scalable methods. Four qubits are encoded in 40Ca+, stored in a microstructured segmented Paul trap. These qubits are sequentially entangled by laser-driven pairwise gate operations. Between these, the qubit register is dynamically reconfigured via ion shuttling operations, where ion crystals are separated and merged, and ions are moved in and out of a fixed laser interaction zone. A sequence consisting of three pairwise entangling gates yields a four-ion Greenberger-Horne-Zeilinger state |ψ ⟩=(1 /√{2 })(|0000 ⟩+|1111 ⟩) , and full quantum state tomography reveals a state fidelity of 94.4(3)%. We analyze the decoherence of this state and employ dynamic decoupling on the spatially distributed constituents to maintain 69(5)% coherence at a storage time of 1.1 sec.
Complete Genome Sequence of a Genomovirus Associated with Common Bean Plant Leaves in Brazil.
Lamas, Natalia Silva; Fontenele, Rafaela Salgado; Melo, Fernando Lucas; Costa, Antonio Felix; Varsani, Arvind; Ribeiro, Simone Graça
2016-11-10
A new genomovirus has been identified in three common bean plants in Brazil. This virus has a circular genome of 2,220 nucleotides and 3 major open reading frames. It shares 80.7% genome-wide pairwise identity with a genomovirus recovered from Tongan fruit bat guano. Copyright © 2016 Lamas et al.
Structured prediction models for RNN based sequence labeling in clinical text.
Jagannatha, Abhyuday N; Yu, Hong
2016-11-01
Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies for structured prediction in order to improve the exact phrase detection of various medical entities.
Structured prediction models for RNN based sequence labeling in clinical text
Jagannatha, Abhyuday N; Yu, Hong
2016-01-01
Sequence labeling is a widely used method for named entity recognition and information extraction from unstructured natural language data. In clinical domain one major application of sequence labeling involves extraction of medical entities such as medication, indication, and side-effects from Electronic Health Record narratives. Sequence labeling in this domain, presents its own set of challenges and objectives. In this work we experimented with various CRF based structured learning models with Recurrent Neural Networks. We extend the previously studied LSTM-CRF models with explicit modeling of pairwise potentials. We also propose an approximate version of skip-chain CRF inference with RNN potentials. We use these methodologies1 for structured prediction in order to improve the exact phrase detection of various medical entities. PMID:28004040
Classification and evolution of human rhinoviruses.
Palmenberg, Ann C; Gern, James E
2015-01-01
The historical classification of human rhinoviruses (RV) by serotyping has been replaced by a logical system of comparative sequencing. Given that strains must diverge within their capsid sequenced by a reasonable degree (>12-13 % pairwise base identities) before becoming immunologically distinct, the new nomenclature system makes allowances for the addition of new, future types, without compromising historical designations. Currently, three species, the RV-A, RV-B, and RV-C, are recognized. Of these, the RV-C, discovered in 2006, are the most unusual in terms of capsid structure, receptor use, and association with severe disease in children.
NASA Astrophysics Data System (ADS)
Liao, Yibo; Shou, Lu; Tang, Yanbin; Zeng, Jiangning; Gao, Aigen; Chen, Quanzhen; Yan, Xiaojun
2017-05-01
To assess the effects of hypoxia, macrobenthic communities along an estuarine gradient of the Changjiang estuary and adjacent continental shelf were analyzed. This revealed spatial variations in the communities and relationships with environmental variables during periods of reduced dissolved oxygen (DO) concentration in summer. Statistical analyses revealed significant differences in macrobenthic community composition among the three zones: estuarine zone (EZ), mildly hypoxic zone (MHZ) in the continental shelf, and normoxic zone (NZ) in the continental shelf (Global R =0.206, P =0.002). Pairwise tests showed that the macrobenthic community composition of the EZ was significantly different from the MHZ (pairwise test R =0.305, P =0.001) and the NZ (pairwise test R =0.259, P =0.001). There was no significant difference in macrobenthic communities between the MHZ and the NZ (pairwise test R =0.062, P =0.114). The taxa included small and typically opportunistic polychaetes, which made the greatest contribution to the dissimilarity between the zones. The effects of mild hypoxia on the macrobenthic communities are a result not only of reduced DO concentration but also of differences in environmental variables such as temperature, salinity, and nutrient concentrations caused by stratification.
Protein contact prediction using patterns of correlation.
Hamilton, Nicholas; Burrage, Kevin; Ragan, Mark A; Huber, Thomas
2004-09-01
We describe a new method for using neural networks to predict residue contact pairs in a protein. The main inputs to the neural network are a set of 25 measures of correlated mutation between all pairs of residues in two "windows" of size 5 centered on the residues of interest. While the individual pair-wise correlations are a relatively weak predictor of contact, by training the network on windows of correlation the accuracy of prediction is significantly improved. The neural network is trained on a set of 100 proteins and then tested on a disjoint set of 1033 proteins of known structure. An average predictive accuracy of 21.7% is obtained taking the best L/2 predictions for each protein, where L is the sequence length. Taking the best L/10 predictions gives an average accuracy of 30.7%. The predictor is also tested on a set of 59 proteins from the CASP5 experiment. The accuracy is found to be relatively consistent across different sequence lengths, but to vary widely according to the secondary structure. Predictive accuracy is also found to improve by using multiple sequence alignments containing many sequences to calculate the correlations. Copyright 2004 Wiley-Liss, Inc.
Young, Lydia M.; Tu, Ling-Hsien; Raleigh, Daniel P.; Ashcroft, Alison E.
2017-01-01
Although amyloid assembly in vitro is commonly investigated using single protein sequences, fibril formation in vivo can be more heterogeneous, involving co-assembly of proteins of different length, sequence and/or post-translational modifications. Emerging evidence suggests that co-polymerization can alter the rate and/or mechanism of aggregation and can contribute to pathogenicity. Electrospray ionization-ion mobility spectrometry-mass spectrometry (ESI-IMS-MS) is uniquely suited to the study of these heterogeneous ensembles. Here, ESI-IMS-MS combined with analysis of fibrillation rates using thioflavin T (ThT) fluorescence, is used to track the course of aggregation of variants of islet-amyloid polypeptide (IAPP) in isolation and in pairwise mixtures. We identify a sub-population of extended monomers as the key precursors of amyloid assembly, and reveal that the fastest aggregating sequence in peptide mixtures determines the lag time of fibrillation, despite being unable to cross-seed polymerization. The results demonstrate that co-polymerization of IAPP sequences radically alters the rate of amyloid assembly by altering the conformational properties of the mixed oligomers that form. PMID:28970890
Ultrafast Comparison of Personal Genomes via Precomputed Genome Fingerprints
Glusman, Gustavo; Mauldin, Denise E.; Hood, Leroy E.; Robinson, Max
2017-01-01
We present an ultrafast method for comparing personal genomes. We transform the standard genome representation (lists of variants relative to a reference) into “genome fingerprints” via locality sensitive hashing. The resulting genome fingerprints can be meaningfully compared even when the input data were obtained using different sequencing technologies, processed using different pipelines, represented in different data formats and relative to different reference versions. Furthermore, genome fingerprints are robust to up to 30% missing data. Because of their reduced size, computation on the genome fingerprints is fast and requires little memory. For example, we could compute all-against-all pairwise comparisons among the 2504 genomes in the 1000 Genomes data set in 67 s at high quality (21 μs per comparison, on a single processor), and achieved a lower quality approximation in just 11 s. Efficient computation enables scaling up a variety of important genome analyses, including quantifying relatedness, recognizing duplicative sequenced genomes in a set, population reconstruction, and many others. The original genome representation cannot be reconstructed from its fingerprint, effectively decoupling genome comparison from genome interpretation; the method thus has significant implications for privacy-preserving genome analytics. PMID:29018478
Li, Weiwen; Dai, Xiaojie; Zhu, Jiangfeng; Tian, Siquan; He, Shan; Wu, Feng
2017-07-01
Six hundred and ninety-seven base pairs of cytochrome b gene of mtDNA was sequenced and analyzed for 78 blue shark Prionace glauca individuals from three sampled locations in the central Pacific Ocean (CPO). In total, three polymorphic sites were detected which defined four haplotypes. The haplotype diversity (h) ranged from 0.517 to 0.768, and nucleotide diversity (π) was between 0.0007 and 0.0011. Analysis of molecular variance indicated a non-significant differentiation among subpopulations. Furthermore, pairwise F ST score analysis revealed a non-significant differentiation among three sampled regions. Generally, low genetic differences were found between different geographic locations in the CPO. This study suggests a single panmictic population of P. glauca in the CPO.
Wytynck, Pieter; Rougé, Pierre; Van Damme, Els J M
2017-11-01
Ribosome-inactivating proteins (RIPs) are cytotoxic enzymes capable of halting protein synthesis by irreversible modification of ribosomes. Although RIPs are widespread they are not ubiquitous in the plant kingdom. The physiological importance of RIPs is not fully elucidated, but evidence suggests a role in the protection of the plant against biotic and abiotic stresses. Searches in the rice genome revealed a large and highly complex family of proteins with a RIP domain. A comparative analysis retrieved 38 RIP sequences from the genome sequence of Oryza sativa subspecies japonica and 34 sequences from the subspecies indica. The RIP sequences are scattered over different chromosomes but are mostly found on the third chromosome. The phylogenetic tree revealed the pairwise clustering of RIPs from japonica and indica. Molecular modeling and sequence analysis yielded information on the catalytic site of the enzyme, and suggested that a large part of RIP domains probably possess N-glycosidase activity. Several RIPs are differentially expressed in plant tissues and in response to specific abiotic stresses. This study provides an overview of RIP motifs in rice and will help to understand their biological role(s) and evolutionary relationships. Copyright © 2017 Elsevier Ltd. All rights reserved.
N -term pairwise-correlation inequalities, steering, and joint measurability
NASA Astrophysics Data System (ADS)
Karthik, H. S.; Devi, A. R. Usha; Tej, J. Prabhu; Rajagopal, A. K.; Sudha, Narayanan, A.
2017-05-01
Chained inequalities involving pairwise correlations of qubit observables in the equatorial plane are constructed based on the positivity of a sequence of moment matrices. When a jointly measurable set of positive-operator-valued measures (POVMs) is employed in the first measurement of every pair of sequential measurements, the chained pairwise correlations do not violate the classical bound imposed by the moment matrix positivity. We find that incompatibility of the set of POVMs employed in first measurements is only necessary, but not sufficient, in general, for the violation of the inequality. On the other hand, there exists a one-to-one equivalence between the degree of incompatibility (which quantifies the joint measurability) of the equatorial qubit POVMs and the optimal violation of a nonlocal steering inequality, proposed by Jones and Wiseman [S. J. Jones and H. M. Wiseman, Phys. Rev. A 84, 012110 (2011), 10.1103/PhysRevA.84.012110]. To this end, we construct a local analog of this steering inequality in a single-qubit system and show that its violation is a mere reflection of measurement incompatibility of equatorial qubit POVMs, employed in first measurements in the sequential unsharp-sharp scheme.
Joyce, Blake L.; Haug-Baltzell, Asher K.; Hulvey, Jonathan P.; McCarthy, Fiona; Devisetty, Upendra Kumar; Lyons, Eric
2017-01-01
This workflow allows novice researchers to leverage advanced computational resources such as cloud computing to carry out pairwise comparative transcriptomics. It also serves as a primer for biologists to develop data scientist computational skills, e.g. executing bash commands, visualization and management of large data sets. All command line code and further explanations of each command or step can be found on the wiki (https://wiki.cyverse.org/wiki/x/dgGtAQ). The Discovery Environment and Atmosphere platforms are connected together through the CyVerse Data Store. As such, once the initial raw sequencing data has been uploaded there is no more need to transfer large data files over an Internet connection, minimizing the amount of time needed to conduct analyses. This protocol is designed to analyze only two experimental treatments or conditions. Differential gene expression analysis is conducted through pairwise comparisons, and will not be suitable to test multiple factors. This workflow is also designed to be manual rather than automated. Each step must be executed and investigated by the user, yielding a better understanding of data and analytical outputs, and therefore better results for the user. Once complete, this protocol will yield de novo assembled transcriptome(s) for underserved (non-model) organisms without the need to map to previously assembled reference genomes (which are usually not available in underserved organism). These de novo transcriptomes are further used in pairwise differential gene expression analysis to investigate genes differing between two experimental conditions. Differentially expressed genes are then functionally annotated to understand the genetic response organisms have to experimental conditions. In total, the data derived from this protocol is used to test hypotheses about biological responses of underserved organisms. PMID:28518075
Object-oriented sequence analysis: SCL--a C++ class library.
Vahrson, W; Hermann, K; Kleffe, J; Wittig, B
1996-04-01
SCL (Sequence Class Library) is a class library written in the C++ programming language. Designed using object-oriented programming principles, SCL consists of classes of objects performing tasks typically needed for analyzing DNA or protein sequences. Among them are very flexible sequence classes, classes accessing databases in various formats, classes managing collections of sequences, as well as classes performing higher-level tasks like calculating a pairwise sequence alignment. SCL also includes classes that provide general programming support, like a dynamically growing array, sets, matrices, strings, classes performing file input/output, and utilities for error handling. By providing these components, SCL fosters an explorative programming style: experimenting with algorithms and alternative implementations is encouraged rather than punished. A description of SCL's overall structure as well as an overview of its classes is given. Important aspects of the work with SCL are discussed in the context of a sample program.
Kimura, M; Kimura, J; Hatakeyama, T
1988-11-21
The complete amino acid sequences of ribosomal proteins S11 from the Gram-positive eubacterium Bacillus stearothermophilus and of S19 from the archaebacterium Halobacterium marismortui have been determined. A search for homologous sequences of these proteins revealed that they belong to the ribosomal protein S11 family. Homologous proteins have previously been sequenced from Escherichia coli as well as from chloroplast, yeast and mammalian ribosomes. A pairwise comparison of the amino acid sequences showed that Bacillus protein S11 shares 68% identical residues with S11 from Escherichia coli and a slightly lower homology (52%) with the homologous chloroplast protein. The halophilic protein S19 is more related to the eukaryotic (45-49%) than to the eubacterial counterparts (35%).
USDA-ARS?s Scientific Manuscript database
In this study, the taxonomic position and group classification of the phytoplasma associated with a lethal yellowing-type disease (LYD) of coconut (Cocos nucifera L.) in Mozambique were addressed. Pairwise sequence similarity values based on alignment of near full-length 16SrRNA genes (1530 bp) reve...
Oliveira-Neto, Osmundo B; Batista, João A N; Rigden, Daniel J; Fragoso, Rodrigo R; Silva, Rodrigo O; Gomes, Eliane A; Franco, Octávio L; Dias, Simoni C; Cordeiro, Célia M T; Monnerat, Rose G; Grossi-De-Sá, Maria F
2004-09-01
Fourteen different cDNA fragments encoding serine proteinases were isolated by reverse transcription-PCR from cotton boll weevil (Anthonomus grandis) larvae. A large diversity between the sequences was observed, with a mean pairwise identity of 22% in the amino acid sequence. The cDNAs encompassed 11 trypsin-like sequences classifiable into three families and three chymotrypsin-like sequences belonging to a single family. Using a combination of 5' and 3' RACE, the full-length sequence was obtained for five of the cDNAs, named Agser2, Agser5, Agser6, Agser10 and Agser21. The encoded proteins included amino acid sequence motifs of serine proteinase active sites, conserved cysteine residues, and both zymogen activation and signal peptides. Southern blotting analysis suggested that one or two copies of these serine proteinase genes exist in the A. grandis genome. Northern blotting analysis of Agser2 and Agser5 showed that for both genes, expression is induced upon feeding and is concentrated in the gut of larvae and adult insects. Reverse northern analysis of the 14 cDNA fragments showed that only two trypsin-like and two chymotrypsin-like were expressed at detectable levels. Under the effect of the serine proteinase inhibitors soybean Kunitz trypsin inhibitor and black-eyed pea trypsin/chymotrypsin inhibitor, expression of one of the trypsin-like sequences was upregulated while expression of the two chymotrypsin-like sequences was downregulated. Copyright 2004 Elsevier Ltd.
Fuzzy measures on the Gene Ontology for gene product similarity.
Popescu, Mihail; Keller, James M; Mitchell, Joyce A
2006-01-01
One of the most important objects in bioinformatics is a gene product (protein or RNA). For many gene products, functional information is summarized in a set of Gene Ontology (GO) annotations. For these genes, it is reasonable to include similarity measures based on the terms found in the GO or other taxonomy. In this paper, we introduce several novel measures for computing the similarity of two gene products annotated with GO terms. The fuzzy measure similarity (FMS) has the advantage that it takes into consideration the context of both complete sets of annotation terms when computing the similarity between two gene products. When the two gene products are not annotated by common taxonomy terms, we propose a method that avoids a zero similarity result. To account for the variations in the annotation reliability, we propose a similarity measure based on the Choquet integral. These similarity measures provide extra tools for the biologist in search of functional information for gene products. The initial testing on a group of 194 sequences representing three proteins families shows a higher correlation of the FMS and Choquet similarities to the BLAST sequence similarities than the traditional similarity measures such as pairwise average or pairwise maximum.
Ng'endo, R.N.; Osiemo, Z.B.; Brandl, R.
2013-01-01
DNA sequencing is increasingly being used to assist in species identification in order to overcome taxonomic impediment. However, few studies attempt to compare the results of these molecular studies with a more traditional species delineation approach based on morphological characters. Mitochondrial DNA Cytochrome oxidase subunit 1 (CO1) gene was sequenced, measuring 636 base pairs, from 47 ants of the genus Pheidole (Formicidae: Myrmicinae) collected in the Brazilian Atlantic Forest to test whether the morphology-based assignment of individuals into species is supported by DNA-based species delimitation. Twenty morphospecies were identified, whereas the barcoding analysis identified 19 Molecular Operational Taxonomic Units (MOTUs). Fifteen out of the 19 DNA-based clusters allocated, using sequence divergence thresholds of 2% and 3%, matched with morphospecies. Both thresholds yielded the same number of MOTUs. Only one MOTU was successfully identified to species level using the CO1 sequences of Pheidole species already in the Genbank. The average pairwise sequence divergence for all 47 sequences was 19%, ranging between 0–25%. In some cases, however, morphology and molecular based methods differed in their assignment of individuals to morphospecies or MOTUs. The occurrence of distinct mitochondrial lineages within morphological species highlights groups for further detailed genetic and morphological studies, and therefore a pluralistic approach using several methods to understand the taxonomy of difficult lineages is advocated. PMID:23902257
Scalable Creation of Long-Lived Multipartite Entanglement.
Kaufmann, H; Ruster, T; Schmiegelow, C T; Luda, M A; Kaushal, V; Schulz, J; von Lindenfels, D; Schmidt-Kaler, F; Poschinger, U G
2017-10-13
We demonstrate the deterministic generation of multipartite entanglement based on scalable methods. Four qubits are encoded in ^{40}Ca^{+}, stored in a microstructured segmented Paul trap. These qubits are sequentially entangled by laser-driven pairwise gate operations. Between these, the qubit register is dynamically reconfigured via ion shuttling operations, where ion crystals are separated and merged, and ions are moved in and out of a fixed laser interaction zone. A sequence consisting of three pairwise entangling gates yields a four-ion Greenberger-Horne-Zeilinger state |ψ⟩=(1/sqrt[2])(|0000⟩+|1111⟩), and full quantum state tomography reveals a state fidelity of 94.4(3)%. We analyze the decoherence of this state and employ dynamic decoupling on the spatially distributed constituents to maintain 69(5)% coherence at a storage time of 1.1 sec.
Osmundson, Todd W.; Robert, Vincent A.; Schoch, Conrad L.; Baker, Lydia J.; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M.
2013-01-01
Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1–2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa. PMID:23638077
Osmundson, Todd W; Robert, Vincent A; Schoch, Conrad L; Baker, Lydia J; Smith, Amy; Robich, Giovanni; Mizzan, Luca; Garbelotto, Matteo M
2013-01-01
Despite recent advances spearheaded by molecular approaches and novel technologies, species description and DNA sequence information are significantly lagging for fungi compared to many other groups of organisms. Large scale sequencing of vouchered herbarium material can aid in closing this gap. Here, we describe an effort to obtain broad ITS sequence coverage of the approximately 6000 macrofungal-species-rich herbarium of the Museum of Natural History in Venice, Italy. Our goals were to investigate issues related to large sequencing projects, develop heuristic methods for assessing the overall performance of such a project, and evaluate the prospects of such efforts to reduce the current gap in fungal biodiversity knowledge. The effort generated 1107 sequences submitted to GenBank, including 416 previously unrepresented taxa and 398 sequences exhibiting a best BLAST match to an unidentified environmental sequence. Specimen age and taxon affected sequencing success, and subsequent work on failed specimens showed that an ITS1 mini-barcode greatly increased sequencing success without greatly reducing the discriminating power of the barcode. Similarity comparisons and nonmetric multidimensional scaling ordinations based on pairwise distance matrices proved to be useful heuristic tools for validating the overall accuracy of specimen identifications, flagging potential misidentifications, and identifying taxa in need of additional species-level revision. Comparison of within- and among-species nucleotide variation showed a strong increase in species discriminating power at 1-2% dissimilarity, and identified potential barcoding issues (same sequence for different species and vice-versa). All sequences are linked to a vouchered specimen, and results from this study have already prompted revisions of species-sequence assignments in several taxa.
Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA.
Kelly, Brendan J; Gross, Robert; Bittinger, Kyle; Sherrill-Mix, Scott; Lewis, James D; Collman, Ronald G; Bushman, Frederic D; Li, Hongzhe
2015-08-01
The variation in community composition between microbiome samples, termed beta diversity, can be measured by pairwise distance based on either presence-absence or quantitative species abundance data. PERMANOVA, a permutation-based extension of multivariate analysis of variance to a matrix of pairwise distances, partitions within-group and between-group distances to permit assessment of the effect of an exposure or intervention (grouping factor) upon the sampled microbiome. Within-group distance and exposure/intervention effect size must be accurately modeled to estimate statistical power for a microbiome study that will be analyzed with pairwise distances and PERMANOVA. We present a framework for PERMANOVA power estimation tailored to marker-gene microbiome studies that will be analyzed by pairwise distances, which includes: (i) a novel method for distance matrix simulation that permits modeling of within-group pairwise distances according to pre-specified population parameters; (ii) a method to incorporate effects of different sizes within the simulated distance matrix; (iii) a simulation-based method for estimating PERMANOVA power from simulated distance matrices; and (iv) an R statistical software package that implements the above. Matrices of pairwise distances can be efficiently simulated to satisfy the triangle inequality and incorporate group-level effects, which are quantified by the adjusted coefficient of determination, omega-squared (ω2). From simulated distance matrices, available PERMANOVA power or necessary sample size can be estimated for a planned microbiome study. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Candida ficus sp. nov., a novel yeast species from the gut of Apriona germari larvae.
Hui, Feng-Li; Niu, Qiu-Hong; Ke, Tao; Liu, Zheng
2012-11-01
A novel yeast species is described based on three strains from the gut of wood-boring larvae collected in a tree trunk of Ficus carica cultivated in parks near Nanyang, central China. Phylogenetic analysis based on sequences of the D1/D2 domains of the large subunit rRNA gene showed that these strains occurred in a separate clade that was genetically distinct from all known ascomycetous yeasts. In terms of pairwise sequence divergence, the novel strains differed by 15.3% divergence from the type strain of Pichia terricola, and by 15.8% divergence from the type strains of Pichia exigua and Candida rugopelliculosa in the D1/D2 domains. All three are ascomycetous yeasts in the Pichia clade. Unlike P. terricola, P. exigua and C. rugopelliculosa, the novel isolates did not ferment glucose. The name Candida ficus sp. nov. is proposed to accommodate these highly divergent organisms, with STN-8(T) (=CICC 1980(T)=CBS 12638(T)) as the type strain.
E, G X; Na, R S; Zhao, Y J; Chen, L P; Qiu, X Y; Huang, Y F
2015-04-10
Cathelicidins are a major family of antimicrobial peptides (AMPs), an important component of innate immune system, playing a critical role in host defense and disease resistance in virtually all living species. Polymorphism and functional studies on cathelicidin of Tianzhu white yak contribute to understanding the specific innate immune mechanism in animals living at high altitudes in comparison to cattle and domesticated white yak. Thirty-six individuals of Tianzhu white yak, originating from the area of three ecotypes (Gansu in China), were investigated. The total length of the aligned Yak cathelicidin 6 (CATHL-6) sequences was 1923 bp, including six single nucleotide polymorphisms and one indel. Ten haplotypes were identified, and phylogenetic analyses resolved those 10 haplotypes in two clusters. The results indicate that the white yak originated from two domestication sites. In addition, lack of significant pairwise difference between sequences (Tajima's D = 0.92865, P > 0.10) in the CATHL-6 region indicates absence of population size expansion in current white yak population.
Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices.
Li, Guang; Wang, Yadong; Su, Xiaohong
2012-10-01
When developing personal DNA databases, there must be an appropriate guarantee of anonymity, which means that the data cannot be related back to individuals. DNA lattice anonymization (DNALA) is a successful method for making personal DNA sequences anonymous. However, it uses time-consuming multiple sequence alignment and a low-accuracy greedy clustering algorithm. Furthermore, DNALA is not an online algorithm, and so it cannot quickly return results when the database is updated. This study improves the DNALA method. Specifically, we replaced the multiple sequence alignment in DNALA with global pairwise sequence alignment to save time, and we designed a hybrid clustering algorithm comprised of a maximum weight matching (MWM)-based algorithm and an online algorithm. The MWM-based algorithm is more accurate than the greedy algorithm in DNALA and has the same time complexity. The online algorithm can process data quickly when the database is updated. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Sobti, Ranbir Chander; Kumari, Mamtesh; Sharma, Vijay Lakshmi; Sodhi, Monika; Mukesh, Manishi; Shouche, Yogesh
2009-11-01
The present study was aimed to get the nucleotide sequences of a part of COII mitochondrial gene amplified from individuals of five species of Termites (Isoptera: Termitidae: Macrotermitinae). Four of them belonged to the genus Odontotermes (O. obesus, O. horni, O. bhagwatii and Odontotermes sp.) and one to Microtermes (M. obesi). Partial COII gene fragments were amplified by using specific primers. The sequences so obtained were characterized to calculate the frequencies of each nucleotide bases and a high A + T content was observed. The interspecific pairwise sequence divergence in Odontotermes species ranged from 6.5% to 17.1% across COII fragment. M. obesi sequence diversity ranged from 2.5 with Odontotermes sp. to 19.0% with O. bhagwatii. Phylogenetic trees drawn on the basis of distance neighbour-joining method revealed three main clades clustering all the individuals according to their genera and families.
A Comparative Study of Pairwise Learning Methods Based on Kernel Ridge Regression.
Stock, Michiel; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem
2018-06-12
Many machine learning problems can be formulated as predicting labels for a pair of objects. Problems of that kind are often referred to as pairwise learning, dyadic prediction, or network inference problems. During the past decade, kernel methods have played a dominant role in pairwise learning. They still obtain a state-of-the-art predictive performance, but a theoretical analysis of their behavior has been underexplored in the machine learning literature. In this work we review and unify kernel-based algorithms that are commonly used in different pairwise learning settings, ranging from matrix filtering to zero-shot learning. To this end, we focus on closed-form efficient instantiations of Kronecker kernel ridge regression. We show that independent task kernel ridge regression, two-step kernel ridge regression, and a linear matrix filter arise naturally as a special case of Kronecker kernel ridge regression, implying that all these methods implicitly minimize a squared loss. In addition, we analyze universality, consistency, and spectral filtering properties. Our theoretical results provide valuable insights into assessing the advantages and limitations of existing pairwise learning methods.
Smith, R F; Wiese, B A; Wojzynski, M K; Davison, D B; Worley, K C
1996-05-01
The BCM Search Launcher is an integrated set of World Wide Web (WWW) pages that organize molecular biology-related search and analysis services available on the WWW by function, and provide a single point of entry for related searches. The Protein Sequence Search Page, for example, provides a single sequence entry form for submitting sequences to WWW servers that offer remote access to a variety of different protein sequence search tools, including BLAST, FASTA, Smith-Waterman, BEAUTY, PROSITE, and BLOCKS searches. Other Launch pages provide access to (1) nucleic acid sequence searches, (2) multiple and pair-wise sequence alignments, (3) gene feature searches, (4) protein secondary structure prediction, and (5) miscellaneous sequence utilities (e.g., six-frame translation). The BCM Search Launcher also provides a mechanism to extend the utility of other WWW services by adding supplementary hypertext links to results returned by remote servers. For example, links to the NCBI's Entrez data base and to the Sequence Retrieval System (SRS) are added to search results returned by the NCBI's WWW BLAST server. These links provide easy access to auxiliary information, such as Medline abstracts, that can be extremely helpful when analyzing BLAST data base hits. For new or infrequent users of sequence data base search tools, we have preset the default search parameters to provide the most informative first-pass sequence analysis possible. We have also developed a batch client interface for Unix and Macintosh computers that allows multiple input sequences to be searched automatically as a background task, with the results returned as individual HTML documents directly to the user's system. The BCM Search Launcher and batch client are available on the WWW at URL http:@gc.bcm.tmc.edu:8088/search-launcher.html.
Impact of Sampling Density on the Extent of HIV Clustering
Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor
2014-01-01
Abstract Identifying and monitoring HIV clusters could be useful in tracking the leading edge of HIV transmission in epidemics. Currently, greater specificity in the definition of HIV clusters is needed to reduce confusion in the interpretation of HIV clustering results. We address sampling density as one of the key aspects of HIV cluster analysis. The proportion of viral sequences in clusters was estimated at sampling densities from 1.0% to 70%. A set of 1,248 HIV-1C env gp120 V1C5 sequences from a single community in Botswana was utilized in simulation studies. Matching numbers of HIV-1C V1C5 sequences from the LANL HIV Database were used as comparators. HIV clusters were identified by phylogenetic inference under bootstrapped maximum likelihood and pairwise distance cut-offs. Sampling density below 10% was associated with stochastic HIV clustering with broad confidence intervals. HIV clustering increased linearly at sampling density >10%, and was accompanied by narrowing confidence intervals. Patterns of HIV clustering were similar at bootstrap thresholds 0.7 to 1.0, but the extent of HIV clustering decreased with higher bootstrap thresholds. The origin of sampling (local concentrated vs. scattered global) had a substantial impact on HIV clustering at sampling densities ≥10%. Pairwise distances at 10% were estimated as a threshold for cluster analysis of HIV-1 V1C5 sequences. The node bootstrap support distribution provided additional evidence for 10% sampling density as the threshold for HIV cluster analysis. The detectability of HIV clusters is substantially affected by sampling density. A minimal genotyping density of 10% and sampling density of 50–70% are suggested for HIV-1 V1C5 cluster analysis. PMID:25275430
Riojas, Marco A; McGough, Katya J; Rider-Riojas, Cristin J; Rastogi, Nalin; Hazbón, Manzour Hernando
2018-01-01
The species within the Mycobacterium tuberculosis Complex (MTBC) have undergone numerous taxonomic and nomenclatural changes, leaving the true structure of the MTBC in doubt. We used next-generation sequencing (NGS), digital DNA-DNA hybridization (dDDH), and average nucleotide identity (ANI) to investigate the relationship between these species. The type strains of Mycobacterium africanum, Mycobacterium bovis, Mycobacterium caprae, Mycobacterium microti and Mycobacterium pinnipedii were sequenced via NGS. Pairwise dDDH and ANI comparisons between these, previously sequenced MTBC type strain genomes (including 'Mycobacterium canettii', 'Mycobacterium mungi' and 'Mycobacterium orygis') and M. tuberculosis H37Rv T were performed. Further, all available genome sequences in GenBank for species in or putatively in the MTBC were compared to H37Rv T . Pairwise results indicated that all of the type strains of the species are extremely closely related to each other (dDDH: 91.2-99.2 %, ANI: 99.21-99.92 %), greatly exceeding the respective species delineation thresholds, thus indicating that they belong to the same species. Results from the GenBank genomes indicate that all the strains examined are within the circumscription of H37Rv T (dDDH: 83.5-100 %). We, therefore, formally propose a union of the species of the MTBC as M. tuberculosis. M. africanum, M. bovis, M. caprae, M. microti and M. pinnipedii are reclassified as later heterotypic synonyms of M. tuberculosis. 'M. canettii', 'M. mungi', and 'M. orygis' are classified as strains of the species M. tuberculosis. We further recommend use of the infrasubspecific term 'variant' ('var.') and infrasubspecific designations that generally retain the historical nomenclature associated with the groups or otherwise convey such characteristics, e.g. M. tuberculosis var. bovis.
HLA Diversity in the 1000 Genomes Dataset
Gourraud, Pierre-Antoine; Khankhanian, Pouya; Cereb, Nezih; Yang, Soo Young; Feolo, Michael; Maiers, Martin; D. Rioux, John; Hauser, Stephen; Oksenberg, Jorge
2014-01-01
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC), only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD) decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies. PMID:24988075
HLA diversity in the 1000 genomes dataset.
Gourraud, Pierre-Antoine; Khankhanian, Pouya; Cereb, Nezih; Yang, Soo Young; Feolo, Michael; Maiers, Martin; Rioux, John D; Hauser, Stephen; Oksenberg, Jorge
2014-01-01
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation by sequencing at a level that should allow the genome-wide detection of most variants with frequencies as low as 1%. However, in the major histocompatibility complex (MHC), only the top 10 most frequent haplotypes are in the 1% frequency range whereas thousands of haplotypes are present at lower frequencies. Given the limitation of both the coverage and the read length of the sequences generated by the 1000 Genomes Project, the highly variable positions that define HLA alleles may be difficult to identify. We used classical Sanger sequencing techniques to type the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 genes in the available 1000 Genomes samples and combined the results with the 103,310 variants in the MHC region genotyped by the 1000 Genomes Project. Using pairwise identity-by-descent distances between individuals and principal component analysis, we established the relationship between ancestry and genetic diversity in the MHC region. As expected, both the MHC variants and the HLA phenotype can identify the major ancestry lineage, informed mainly by the most frequent HLA haplotypes. To some extent, regions of the genome with similar genetic or similar recombination rate have similar properties. An MHC-centric analysis underlines departures between the ancestral background of the MHC and the genome-wide picture. Our analysis of linkage disequilibrium (LD) decay in these samples suggests that overestimation of pairwise LD occurs due to a limited sampling of the MHC diversity. This collection of HLA-specific MHC variants, available on the dbMHC portal, is a valuable resource for future analyses of the role of MHC in population and disease studies.
García-Varela, Martín; García-Prieto, Luís; Rodríguez, Rodolfo Pérez
2011-12-01
The morphology of the males of Neoechinorhynchus schmidti (Acanthocephala: Neoechinorhynchidae) is unknown, because this species was described based exclusively on females. However, recently we collected 2 common slider turtles Trachemys scripta in Centla swamps, Tabasco, Mexico, parasitized by 27 specimens of an acanthocephalan whose females were morphologically identical to N. schmidti. The domains D2 and D3 of the large subunit of the nuclear ribosomal RNA (LSU) of 3 males and 2 females of this material were sequenced. The sequences of both sexes were identical, and based on this result, we described for the first time the morphology of the males of N. schmidti. In addition, 6 sequences of a congeneric species, also parasite of turtles (Neoechinorhynchus emyditoides) were generated in the current research. The 11 sequences of these 2 species were aligned with 13 sequences of another 4 species of the same genus, producing a data set of 24 taxa with 674 nucleotides. The genetic divergence between N. schmidti and N. emyditoides was 4% and intraspecific differences ranged from 0.01 to 0.02%. Pairwise differences between either of these species and 4 other congeners parasitic in fresh and brackish water fishes (Neoechinorhynchus golvani, Neoechinorhynchus roseum, Neoechinorhynchus saginatus, and Neoechinorhynchus sp.) varied from 9.5 to 33%. Maximum likelihood and maximum parsimony analyses show that N. schmidti and N. emyditoides are sister taxa. Bootstrap analysis also indicates that the sister relationship is reliably supported. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Genomic Organization Under Different Environmental Conditions: Hoplosternum Littorale as a Model
Schneider, Carlos Henrique; Feldberg, Eliana; Baccaro, Fabricio Beggiato; Carvalho, Natália Dayane Moura; Gross, Maria Claudia
2016-01-01
Abstract The Amazon has abundant rivers, streams, and floodplains in both polluted and nonpolluted environments, which show great adaptability. Thus, the goal of this study was to map repetitive DNA sequences in both mitotic chromosomes and erythrocyte micronuclei of tamoatás from polluted and nonpolluted environments and to assess the possible genotoxic effects of these environments. Individuals were collected in Manaus, Amazonas (AM), and submitted to classical and molecular cytogenetic techniques, as well as to a blood micronucleus test. Diploid number equal to 60 chromosomes are present in all individuals, with 18S ribosomal DNA sites present in one chromosome pair and no interstitial telomeric sites on chromosomes. The micronucleus test showed no significant differences in pairwise comparisons between environments or collection sites, but the Rex3 retroelement was dispersed on the chromosomes of individuals from unpolluted environments and compartmentalized in individuals from polluted environments. Divergent numbers of 5S rDNA sites are present in individuals from unpolluted and polluted environments. The mapping of repetitive sequences revealed that micronuclei have different compositions both intra- and interindividually that suggests different regions are lost in the formation of micronuclei, and no single fragile region undergoes breaks, although repetitive DNA elements are involved in this process. PMID:26981695
A pluggable framework for parallel pairwise sequence search.
Archuleta, Jeremy; Feng, Wu-chun; Tilevich, Eli
2007-01-01
The current and near future of the computing industry is one of multi-core and multi-processor technology. Most existing sequence-search tools have been designed with a focus on single-core, single-processor systems. This discrepancy between software design and hardware architecture substantially hinders sequence-search performance by not allowing full utilization of the hardware. This paper presents a novel framework that will aid the conversion of serial sequence-search tools into a parallel version that can take full advantage of the available hardware. The framework, which is based on a software architecture called mixin layers with refined roles, enables modules to be plugged into the framework with minimal effort. The inherent modular design improves maintenance and extensibility, thus opening up a plethora of opportunities for advanced algorithmic features to be developed and incorporated while routine maintenance of the codebase persists.
O'Rourke, Jamie A; Fu, Fengli; Bucciarelli, Bruna; Yang, S Sam; Samac, Deborah A; Lamb, JoAnn F S; Monteros, Maria J; Graham, Michelle A; Gronwald, John W; Krom, Nick; Li, Jun; Dai, Xinbin; Zhao, Patrick X; Vance, Carroll P
2015-07-07
Alfalfa (Medicago sativa L.) is the primary forage legume crop species in the United States and plays essential economic and ecological roles in agricultural systems across the country. Modern alfalfa is the result of hybridization between tetraploid M. sativa ssp. sativa and M. sativa ssp. falcata. Due to its large and complex genome, there are few genomic resources available for alfalfa improvement. A de novo transcriptome assembly from two alfalfa subspecies, M. sativa ssp. sativa (B47) and M. sativa ssp. falcata (F56) was developed using Illumina RNA-seq technology. Transcripts from roots, nitrogen-fixing root nodules, leaves, flowers, elongating stem internodes, and post-elongation stem internodes were assembled into the Medicago sativa Gene Index 1.2 (MSGI 1.2) representing 112,626 unique transcript sequences. Nodule-specific and transcripts involved in cell wall biosynthesis were identified. Statistical analyses identified 20,447 transcripts differentially expressed between the two subspecies. Pair-wise comparisons of each tissue combination identified 58,932 sequences differentially expressed in B47 and 69,143 sequences differentially expressed in F56. Comparing transcript abundance in floral tissues of B47 and F56 identified expression differences in sequences involved in anthocyanin and carotenoid synthesis, which determine flower pigmentation. Single nucleotide polymorphisms (SNPs) unique to each M. sativa subspecies (110,241) were identified. The Medicago sativa Gene Index 1.2 increases the expressed sequence data available for alfalfa by ninefold and can be expanded as additional experiments are performed. The MSGI 1.2 transcriptome sequences, annotations, expression profiles, and SNPs were assembled into the Alfalfa Gene Index and Expression Database (AGED) at http://plantgrn.noble.org/AGED/ , a publicly available genomic resource for alfalfa improvement and legume research.
Archaebacterial rhodopsin sequences: Implications for evolution
NASA Technical Reports Server (NTRS)
Lanyi, J. K.
1991-01-01
It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.
Shaped Ceria Nanocrystals Catalyze Efficient and Selective Para-Hydrogen-Enhanced Polarization.
Zhao, Evan W; Zheng, Haibin; Zhou, Ronghui; Hagelin-Weaver, Helena E; Bowers, Clifford R
2015-11-23
Intense para-hydrogen-enhanced NMR signals are observed in the hydrogenation of propene and propyne over ceria nanocubes, nano-octahedra, and nanorods. The well-defined ceria shapes, synthesized by a hydrothermal method, expose different crystalline facets with various oxygen vacancy densities, which are known to play a role in hydrogenation and oxidation catalysis. While the catalytic activity of the hydrogenation of propene over ceria is strongly facet-dependent, the pairwise selectivity is low (2.4% at 375 °C), which is consistent with stepwise H atom transfer, and it is the same for all three nanocrystal shapes. Selective semi-hydrogenation of propyne over ceria nanocubes yields hyperpolarized propene with a similar pairwise selectivity of (2.7% at 300 °C), indicating product formation predominantly by a non-pairwise addition. Ceria is also shown to be an efficient pairwise replacement catalyst for propene. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Gao, Xiang; Lin, Huaiying; Revanna, Kashi; Dong, Qunfeng
2017-05-10
Species-level classification for 16S rRNA gene sequences remains a serious challenge for microbiome researchers, because existing taxonomic classification tools for 16S rRNA gene sequences either do not provide species-level classification, or their classification results are unreliable. The unreliable results are due to the limitations in the existing methods which either lack solid probabilistic-based criteria to evaluate the confidence of their taxonomic assignments, or use nucleotide k-mer frequency as the proxy for sequence similarity measurement. We have developed a method that shows significantly improved species-level classification results over existing methods. Our method calculates true sequence similarity between query sequences and database hits using pairwise sequence alignment. Taxonomic classifications are assigned from the species to the phylum levels based on the lowest common ancestors of multiple database hits for each query sequence, and further classification reliabilities are evaluated by bootstrap confidence scores. The novelty of our method is that the contribution of each database hit to the taxonomic assignment of the query sequence is weighted by a Bayesian posterior probability based upon the degree of sequence similarity of the database hit to the query sequence. Our method does not need any training datasets specific for different taxonomic groups. Instead only a reference database is required for aligning to the query sequences, making our method easily applicable for different regions of the 16S rRNA gene or other phylogenetic marker genes. Reliable species-level classification for 16S rRNA or other phylogenetic marker genes is critical for microbiome research. Our software shows significantly higher classification accuracy than the existing tools and we provide probabilistic-based confidence scores to evaluate the reliability of our taxonomic classification assignments based on multiple database matches to query sequences. Despite its higher computational costs, our method is still suitable for analyzing large-scale microbiome datasets for practical purposes. Furthermore, our method can be applied for taxonomic classification of any phylogenetic marker gene sequences. Our software, called BLCA, is freely available at https://github.com/qunfengdong/BLCA .
Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA
Kelly, Brendan J.; Gross, Robert; Bittinger, Kyle; Sherrill-Mix, Scott; Lewis, James D.; Collman, Ronald G.; Bushman, Frederic D.; Li, Hongzhe
2015-01-01
Motivation: The variation in community composition between microbiome samples, termed beta diversity, can be measured by pairwise distance based on either presence–absence or quantitative species abundance data. PERMANOVA, a permutation-based extension of multivariate analysis of variance to a matrix of pairwise distances, partitions within-group and between-group distances to permit assessment of the effect of an exposure or intervention (grouping factor) upon the sampled microbiome. Within-group distance and exposure/intervention effect size must be accurately modeled to estimate statistical power for a microbiome study that will be analyzed with pairwise distances and PERMANOVA. Results: We present a framework for PERMANOVA power estimation tailored to marker-gene microbiome studies that will be analyzed by pairwise distances, which includes: (i) a novel method for distance matrix simulation that permits modeling of within-group pairwise distances according to pre-specified population parameters; (ii) a method to incorporate effects of different sizes within the simulated distance matrix; (iii) a simulation-based method for estimating PERMANOVA power from simulated distance matrices; and (iv) an R statistical software package that implements the above. Matrices of pairwise distances can be efficiently simulated to satisfy the triangle inequality and incorporate group-level effects, which are quantified by the adjusted coefficient of determination, omega-squared (ω2). From simulated distance matrices, available PERMANOVA power or necessary sample size can be estimated for a planned microbiome study. Availability and implementation: http://github.com/brendankelly/micropower. Contact: brendank@mail.med.upenn.edu or hongzhe@upenn.edu PMID:25819674
Palma, Leopoldo; Muñoz, Delia; Berry, Colin; Murillo, Jesús; Ruiz de Escudero, Iñigo; Caballero, Primitivo
2014-01-01
This study describes the insecticidal activity of a novel Bacillus thuringiensis Cry-related protein with a deduced 799 amino acid sequence (~89 kDa) and ~19% pairwise identity to the 95-kDa-aphidicidal protein (sequence number 204) from patent US 8318900 and ~40% pairwise identity to the cancer cell killing Cry proteins (parasporins Cry41Ab1 and Cry41Aa1), respectively. This novel Cry-related protein contained the five conserved amino acid blocks and the three conserved domains commonly found in 3-domain Cry proteins. The protein exhibited toxic activity against the green peach aphid, Myzus persicae (Sulzer) (Homoptera: Aphididae) with the lowest mean lethal concentration (LC50 = 32.7 μg/mL) reported to date for a given Cry protein and this insect species, whereas it had no lethal toxicity against the Lepidoptera of the family Noctuidae Helicoverpa armigera (Hübner), Mamestra brassicae (L.), Spodoptera exigua (Hübner), S. frugiperda (J.E. Smith) and S. littoralis (Boisduval), at concentrations as high as ~3.5 μg/cm2. This novel Cry-related protein may become a promising environmentally friendly tool for the biological control of M. persicae and possibly also for other sap sucking insect pests. PMID:25384108
NASA Astrophysics Data System (ADS)
Ordóñez Cabrera, Manuel; Volodin, Andrei I.
2005-05-01
From the classical notion of uniform integrability of a sequence of random variables, a new concept of integrability (called h-integrability) is introduced for an array of random variables, concerning an array of constantsE We prove that this concept is weaker than other previous related notions of integrability, such as Cesàro uniform integrability [Chandra, Sankhya Ser. A 51 (1989) 309-317], uniform integrability concerning the weights [Ordóñez Cabrera, Collect. Math. 45 (1994) 121-132] and Cesàro [alpha]-integrability [Chandra and Goswami, J. Theoret. ProbabE 16 (2003) 655-669]. Under this condition of integrability and appropriate conditions on the array of weights, mean convergence theorems and weak laws of large numbers for weighted sums of an array of random variables are obtained when the random variables are subject to some special kinds of dependence: (a) rowwise pairwise negative dependence, (b) rowwise pairwise non-positive correlation, (c) when the sequence of random variables in every row is [phi]-mixing. Finally, we consider the general weak law of large numbers in the sense of Gut [Statist. Probab. Lett. 14 (1992) 49-52] under this new condition of integrability for a Banach space setting.
A statistical view of FMRFamide neuropeptide diversity.
Espinoza, E; Carrigan, M; Thomas, S G; Shaw, G; Edison, A S
2000-01-01
FMRFamide-like peptide (FLP) amino acid sequences have been collected and statistically analyzed. FLP amino acid composition as a function of position in the peptide is graphically presented for several major phyla. Results of total amino acid composition and frequencies of pairs of FLP amino acids have been computed and compared with corresponding values from the entire GenBank protein sequence database. The data for pairwise distributions of amino acids should help in future structure-function studies of FLPs. To aid in future peptide discovery, a computer program and search protocol was developed to identify FLPs from the GenBank protein database without the use of keywords.
Batts, William N.; LaPatra, Scott E.; Katona, Ryan; Leis, Eric; Fei Fan Ng, Terry; Bruieuc, Marine S.O.; Breyta, Rachel; Purcell, Maureen; Waltzek, Thomas B.; Delwart, Eric; Winton, James
2017-01-01
A novel virus, rainbow trout orthomyxovirus (RbtOV), was isolated in 1997 and again in 2000 from commercially-reared rainbow trout (Oncorhynchus mykiss) in Idaho, USA. The virus grew optimally in the CHSE-214 cell line at 15°C producing a diffuse cytopathic effect; however, juvenile rainbow trout exposed to cell culture-grown virus showed no mortality or gross pathology. Electron microscopy of preparations from infected cell cultures revealed the presence of typical orthomyxovirus particles. The complete genome of RbtOV is comprised of eight linear segments of single-stranded, negative-sense RNA having highly conserved 5′ and 3′-terminal nucleotide sequences. Another virus isolated in 2014 from steelhead trout (also O. mykiss) in Wisconsin, USA, and designated SttOV was found to have eight genome segments with high amino acid sequence identities (89–99%) to the corresponding genes of RbtOV, suggesting these new viruses are isolates of the same virus species and may be more widespread than currently realized. The new isolates had the same genome segment order and the closest pairwise amino acid sequence identities of 16–42% with Infectious salmon anemia virus (ISAV), the type species and currently only member of the genus Isavirus in the family Orthomyxoviridae. However, pairwise comparisons of the predicted amino acid sequences of the 10 RbtOV and SttOV proteins with orthologs from representatives of the established orthomyxoviral genera and a phylogenetic analysis using the PB1 protein showed that while RbtOV and SttOV clustered most closely with ISAV, they diverged sufficiently to merit consideration as representatives of a novel genus. A set of PCR primers was designed using conserved regions of the PB1 gene to produce amplicons that may be sequenced for identification of similar fish orthomyxoviruses in the future.
Integrative Approaches to Enhance Understanding of Plant Metabolic Pathway Structure and Regulation1
Tohge, Takayuki; Scossa, Federico; Fernie, Alisdair R.
2015-01-01
Huge insight into molecular mechanisms and biological network coordination have been achieved following the application of various profiling technologies. Our knowledge of how the different molecular entities of the cell interact with one another suggests that, nevertheless, integration of data from different techniques could drive a more comprehensive understanding of the data emanating from different techniques. Here, we provide an overview of how such data integration is being used to aid the understanding of metabolic pathway structure and regulation. We choose to focus on the pairwise integration of large-scale metabolite data with that of the transcriptomic, proteomics, whole-genome sequence, growth- and yield-associated phenotypes, and archival functional genomic data sets. In doing so, we attempt to provide an update on approaches that integrate data obtained at different levels to reach a better understanding of either single gene function or metabolic pathway structure and regulation within the context of a broader biological process. PMID:26371234
Tsuchiaka, Shinobu; Rahpaya, Sayed Samim; Otomaru, Konosuke; Aoki, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Omatsu, Tsutomu; Sano, Kaori; Okazaki-Terashima, Sachiko; Katayama, Yukie; Oba, Mami; Nagai, Makoto; Mizutani, Tetsuya
2017-01-17
Bovine enterovirus (BEV) belongs to the species Enterovirus E or F, genus Enterovirus and family Picornaviridae. Although numerous studies have identified BEVs in the feces of cattle with diarrhea, the pathogenicity of BEVs remains unclear. Previously, we reported the detection of novel kobu-like virus in calf feces, by metagenomics analysis. In the present study, we identified a novel BEV in diarrheal feces collected for that survey. Complete genome sequences were determined by deep sequencing in feces. Secondary RNA structure analysis of the 5' untranslated region (UTR), phylogenetic tree construction and pairwise identity analysis were conducted. The complete genome sequences of BEV were genetically distant from other EVs and the VP1 coding region contained novel and unique amino acid sequences. We named this strain as BEV AN12/Bos taurus/JPN/2014 (referred to as BEV-AN12). According to genome analysis, the genome length of this virus is 7414 nucleotides excluding the poly (A) tail and its genome consists of a 5'UTR, open reading frame encoding a single polyprotein, and 3'UTR. The results of secondary RNA structure analysis showed that in the 5'UTR, BEV-AN12 had an additional clover leaf structure and small stem loop structure, similarly to other BEVs. In pairwise identity analysis, BEV-AN12 showed high amino acid (aa) identities to Enterovirus F in the polyprotein, P2 and P3 regions (aa identity ≥82.4%). Therefore, BEV-AN12 is closely related to Enterovirus F. However, aa sequences in the capsid protein regions, particularly the VP1 encoding region, showed significantly low aa identity to other viruses in genus Enterovirus (VP1 aa identity ≤58.6%). In addition, BEV-AN12 branched separately from Enterovirus E and F in phylogenetic trees based on the aa sequences of P1 and VP1, although it clustered with Enterovirus F in trees based on sequences in the P2 and P3 genome region. We identified novel BEV possessing highly divergent aa sequences in the VP1 coding region in Japan. According to species definition, we proposed naming this strain as "Enterovirus K", which is a novel species within genus Enterovirus. Further genomic studies are needed to understand the pathogenicity of BEVs.
Introducing difference recurrence relations for faster semi-global alignment of long sequences.
Suzuki, Hajime; Kasahara, Masahiro
2018-02-19
The read length of single-molecule DNA sequencers is reaching 1 Mb. Popular alignment software tools widely used for analyzing such long reads often take advantage of single-instruction multiple-data (SIMD) operations to accelerate calculation of dynamic programming (DP) matrices in the Smith-Waterman-Gotoh (SWG) algorithm with a fixed alignment start position at the origin. Nonetheless, 16-bit or 32-bit integers are necessary for storing the values in a DP matrix when sequences to be aligned are long; this situation hampers the use of the full SIMD width of modern processors. We proposed a faster semi-global alignment algorithm, "difference recurrence relations," that runs more rapidly than the state-of-the-art algorithm by a factor of 2.1. Instead of calculating and storing all the values in a DP matrix directly, our algorithm computes and stores mainly the differences between the values of adjacent cells in the matrix. Although the SWG algorithm and our algorithm can output exactly the same result, our algorithm mainly involves 8-bit integer operations, enabling us to exploit the full width of SIMD operations (e.g., 32) on modern processors. We also developed a library, libgaba, so that developers can easily integrate our algorithm into alignment programs. Our novel algorithm and optimized library implementation will facilitate accelerating nucleotide long-read analysis algorithms that use pairwise alignment stages. The library is implemented in the C programming language and available at https://github.com/ocxtal/libgaba .
Lee, Justin S.; Bevins, Sarah N.; Serieys, Laurel E.K.; Vickers, Winston; Logan, Ken A.; Aldredge, Mat; Boydston, Erin E.; Lyren, Lisa M.; McBride, Roy; Roelke-Parker, Melody; Pecon-Slattery, Jill; Troyer, Jennifer L.; Riley, Seth P.; Boyce, Walter M.; Crooks, Kevin R.; VandeWoude, Sue
2014-01-01
Mountain lions (Puma concolor) throughout North and South America are infected with puma lentivirus clade B (PLVB). A second, highly divergent lentiviral clade, PLVA, infects mountain lions in southern California and Florida. Bobcats (Lynx rufus) in these two geographic regions are also infected with PLVA, and to date, this is the only strain of lentivirus identified in bobcats. We sequenced full-length PLV genomes in order to characterize the molecular evolution of PLV in bobcats and mountain lions. Low sequence homology (88% average pairwise identity) and frequent recombination (1 recombination breakpoint per 3 isolates analyzed) were observed in both clades. Viral proteins have markedly different patterns of evolution; sequence homology and negative selection were highest in Gag and Pol and lowest in Vif and Env. A total of 1.7% of sites across the PLV genome evolve under positive selection, indicating that host-imposed selection pressure is an important force shaping PLV evolution. PLVA strains are highly spatially structured, reflecting the population dynamics of their primary host, the bobcat. In contrast, the phylogeography of PLVB reflects the highly mobile mountain lion, with diverse PLVB isolates cocirculating in some areas and genetically related viruses being present in populations separated by thousands of kilometers. We conclude that PLVA and PLVB are two different viral species with distinct feline hosts and evolutionary histories.
Darré, Leonardo; Machado, Matías Rodrigo; Brandner, Astrid Febe; González, Humberto Carlos; Ferreira, Sebastián; Pantano, Sergio
2015-02-10
Modeling of macromolecular structures and interactions represents an important challenge for computational biology, involving different time and length scales. However, this task can be facilitated through the use of coarse-grained (CG) models, which reduce the number of degrees of freedom and allow efficient exploration of complex conformational spaces. This article presents a new CG protein model named SIRAH, developed to work with explicit solvent and to capture sequence, temperature, and ionic strength effects in a topologically unbiased manner. SIRAH is implemented in GROMACS, and interactions are calculated using a standard pairwise Hamiltonian for classical molecular dynamics simulations. We present a set of simulations that test the capability of SIRAH to produce a qualitatively correct solvation on different amino acids, hydrophilic/hydrophobic interactions, and long-range electrostatic recognition leading to spontaneous association of unstructured peptides and stable structures of single polypeptides and protein-protein complexes.
Yeast species diversity in apple juice for cider production evidenced by culture-based method.
Lorenzini, Marilinda; Simonato, Barbara; Zapparoli, Giacomo
2018-05-07
Identification of yeasts isolated from apple juices of two cider houses (one located in a plain area and one in an alpine area) was carried out by culture-based method. Wallerstein Laboratory Nutrient Agar was used as medium for isolation and preliminary yeasts identification. A total of 20 species of yeasts belonging to ten different genera were identified using both BLAST algorithm for pairwise sequence comparison and phylogenetic approaches. A wide variety of non-Saccharomyces species was found. Interestingly, Candida railenensis, Candida cylindracea, Hanseniaspora meyeri, Hanseniaspora pseudoguilliermondii, and Metschnikowia sinensis were recovered for the first time in the yeast community of an apple environment. Phylogenetic analysis revealed a better resolution in identifying Metschnikowia and Moesziomyces isolates than comparative analysis using the GenBank or YeastIP gene databases. This study provides important data on yeast microbiota of apple juice and evidenced differences between two geographical cider production areas in terms of species composition.
Dynamics of pairwise motions in the Cosmic Web
NASA Astrophysics Data System (ADS)
Hellwing, Wojciech A.
2016-10-01
We present results of analysis of the dark matter (DM) pairwise velocity statistics in different Cosmic Web environments. We use the DM velocity and density field from the Millennium 2 simulation together with the NEXUS+ algorithm to segment the simulation volume into voxels uniquely identifying one of the four possible environments: nodes, filaments, walls or cosmic voids. We show that the PDFs of the mean infall velocities v 12 as well as its spatial dependence together with the perpendicular and parallel velocity dispersions bear a significant signal of the large-scale structure environment in which DM particle pairs are embedded. The pairwise flows are notably colder and have smaller mean magnitude in wall and voids, when compared to much denser environments of filaments and nodes. We discuss on our results, indicating that they are consistent with a simple theoretical predictions for pairwise motions as induced by gravitational instability mechanism. Our results indicate that the Cosmic Web elements are coherent dynamical entities rather than just temporal geometrical associations. In addition it should be possible to observationally test various Cosmic Web finding algorithms by segmenting available peculiar velocity data and studying resulting pairwise velocity statistics.
Chen, Li; Reeve, James; Zhang, Lujun; Huang, Shengbing; Wang, Xuefeng; Chen, Jun
2018-01-01
Normalization is the first critical step in microbiome sequencing data analysis used to account for variable library sizes. Current RNA-Seq based normalization methods that have been adapted for microbiome data fail to consider the unique characteristics of microbiome data, which contain a vast number of zeros due to the physical absence or under-sampling of the microbes. Normalization methods that specifically address the zero-inflation remain largely undeveloped. Here we propose geometric mean of pairwise ratios-a simple but effective normalization method-for zero-inflated sequencing data such as microbiome data. Simulation studies and real datasets analyses demonstrate that the proposed method is more robust than competing methods, leading to more powerful detection of differentially abundant taxa and higher reproducibility of the relative abundances of taxa.
Molecular epidemiology of Plum pox virus in Japan.
Maejima, Kensaku; Himeno, Misako; Komatsu, Ken; Takinami, Yusuke; Hashimoto, Masayoshi; Takahashi, Shuichiro; Yamaji, Yasuyuki; Oshima, Kenro; Namba, Shigetou
2011-05-01
For a molecular epidemiological study based on complete genome sequences, 37 Plum pox virus (PPV) isolates were collected from the Kanto region in Japan. Pair-wise analyses revealed that all 37 Japanese isolates belong to the PPV-D strain, with low genetic diversity (less than 0.8%). In phylogenetic analysis of the PPV-D strain based on complete nucleotide sequences, the relationships of the PPV-D strain were reconstructed with high resolution: at the global level, the American, Canadian, and Japanese isolates formed their own distinct monophyletic clusters, suggesting that the routes of viral entry into these countries were independent; at the local level, the actual transmission histories of PPV were precisely reconstructed with high bootstrap support. This is the first description of the molecular epidemiology of PPV based on complete genome sequences.
Statistical method to compare massive parallel sequencing pipelines.
Elsensohn, M H; Leblay, N; Dimassi, S; Campan-Fournier, A; Labalme, A; Roucher-Boulez, F; Sanlaville, D; Lesca, G; Bardel, C; Roy, P
2017-03-01
Today, sequencing is frequently carried out by Massive Parallel Sequencing (MPS) that cuts drastically sequencing time and expenses. Nevertheless, Sanger sequencing remains the main validation method to confirm the presence of variants. The analysis of MPS data involves the development of several bioinformatic tools, academic or commercial. We present here a statistical method to compare MPS pipelines and test it in a comparison between an academic (BWA-GATK) and a commercial pipeline (TMAP-NextGENe®), with and without reference to a gold standard (here, Sanger sequencing), on a panel of 41 genes in 43 epileptic patients. This method used the number of variants to fit log-linear models for pairwise agreements between pipelines. To assess the heterogeneity of the margins and the odds ratios of agreement, four log-linear models were used: a full model, a homogeneous-margin model, a model with single odds ratio for all patients, and a model with single intercept. Then a log-linear mixed model was fitted considering the biological variability as a random effect. Among the 390,339 base-pairs sequenced, TMAP-NextGENe® and BWA-GATK found, on average, 2253.49 and 1857.14 variants (single nucleotide variants and indels), respectively. Against the gold standard, the pipelines had similar sensitivities (63.47% vs. 63.42%) and close but significantly different specificities (99.57% vs. 99.65%; p < 0.001). Same-trend results were obtained when only single nucleotide variants were considered (99.98% specificity and 76.81% sensitivity for both pipelines). The method allows thus pipeline comparison and selection. It is generalizable to all types of MPS data and all pipelines.
Failla, A J; Vasquez, A A; Hudson, P; Fujimoto, M; Ram, J L
2016-02-01
Establishing reliable methods for the identification of benthic chironomid communities is important due to their significant contribution to biomass, ecology and the aquatic food web. Immature larval specimens are more difficult to identify to species level by traditional morphological methods than their fully developed adult counterparts, and few keys are available to identify the larval species. In order to develop molecular criteria to identify species of chironomid larvae, larval and adult chironomids from Western Lake Erie were subjected to both molecular and morphological taxonomic analysis. Mitochondrial cytochrome c oxidase I (COI) barcode sequences of 33 adults that were identified to species level by morphological methods were grouped with COI sequences of 189 larvae in a neighbor-joining taxon-ID tree. Most of these larvae could be identified only to genus level by morphological taxonomy (only 22 of the 189 sequenced larvae could be identified to species level). The taxon-ID tree of larval sequences had 45 operational taxonomic units (OTUs, defined as clusters with >97% identity or individual sequences differing from nearest neighbors by >3%; supported by analysis of all larval pairwise differences), of which seven could be identified to species or 'species group' level by larval morphology. Reference sequences from the GenBank and BOLD databases assigned six larval OTUs with presumptive species level identifications and confirmed one previously assigned species level identification. Sequences from morphologically identified adults in the present study grouped with and further classified the identity of 13 larval OTUs. The use of morphological identification and subsequent DNA barcoding of adult chironomids proved to be beneficial in revealing possible species level identifications of larval specimens. Sequence data from this study also contribute to currently inadequate public databases relevant to the Great Lakes region, while the neighbor-joining analysis reported here describes the application and confirmation of a useful tool that can accelerate identification and bioassessment of chironomid communities.
Failla, Andrew Joseph; Vasquez, Adrian Amelio; Hudson, Patrick L.; Fujimoto, Masanori; Ram, Jeffrey L.
2016-01-01
Establishing reliable methods for the identification of benthic chironomid communities is important due to their significant contribution to biomass, ecology and the aquatic food web. Immature larval specimens are more difficult to identify to species level by traditional morphological methods than their fully developed adult counterparts, and few keys are available to identify the larval species. In order to develop molecular criteria to identify species of chironomid larvae, larval and adult chironomids from Western Lake Erie were subjected to both molecular and morphological taxonomic analysis. Mitochondrial cytochrome c oxidase I (COI) barcode sequences of 33 adults that were identified to species level by morphological methods were grouped with COI sequences of 189 larvae in a neighbor-joining taxon-ID tree. Most of these larvae could be identified only to genus level by morphological taxonomy (only 22 of the 189 sequenced larvae could be identified to species level). The taxon-ID tree of larval sequences had 45 operational taxonomic units (OTUs, defined as clusters with >97% identity or individual sequences differing from nearest neighbors by >3%; supported by analysis of all larval pairwise differences), of which seven could be identified to species or ‘species group’ level by larval morphology. Reference sequences from the GenBank and BOLD databases assigned six larval OTUs with presumptive species level identifications and confirmed one previously assigned species level identification. Sequences from morphologically identified adults in the present study grouped with and further classified the identity of 13 larval OTUs. The use of morphological identification and subsequent DNA barcoding of adult chironomids proved to be beneficial in revealing possible species level identifications of larval specimens. Sequence data from this study also contribute to currently inadequate public databases relevant to the Great Lakes region, while the neighbor-joining analysis reported here describes the application and confirmation of a useful tool that can accelerate identification and bioassesment of chironomid communities.
The Use of Weighted Graphs for Large-Scale Genome Analysis
Zhou, Fang; Toivonen, Hannu; King, Ross D.
2014-01-01
There is an acute need for better tools to extract knowledge from the growing flood of sequence data. For example, thousands of complete genomes have been sequenced, and their metabolic networks inferred. Such data should enable a better understanding of evolution. However, most existing network analysis methods are based on pair-wise comparisons, and these do not scale to thousands of genomes. Here we propose the use of weighted graphs as a data structure to enable large-scale phylogenetic analysis of networks. We have developed three types of weighted graph for enzymes: taxonomic (these summarize phylogenetic importance), isoenzymatic (these summarize enzymatic variety/redundancy), and sequence-similarity (these summarize sequence conservation); and we applied these types of weighted graph to survey prokaryotic metabolism. To demonstrate the utility of this approach we have compared and contrasted the large-scale evolution of metabolism in Archaea and Eubacteria. Our results provide evidence for limits to the contingency of evolution. PMID:24619061
The Intestinal Microbiota of Tadpoles Differs from Those of Syntopic Aquatic Invertebrates.
Lyra, Mariana L; Bletz, Molly C; Haddad, Célio F B; Vences, Miguel
2017-11-20
Bacterial communities associated to eukaryotes play important roles in the physiology, development, and health of their hosts. Here, we examine the intestinal microbiota in tadpoles and aquatic invertebrates (insects and gastropods) to better understand the degree of specialization in the tadpole microbiotas. Samples were collected at the same time in one pond, and the V4 region of the bacterial 16S rRNA gene was sequenced with Illumina amplicon sequencing. We found that bacterial richness and diversity were highest in two studied snail individuals, intermediate in tadpoles, and lowest in the four groups of aquatic insects. All groups had substantial numbers of exclusive bacterial operational taxonomic units (OTUs) in their guts, but also shared a high proportion of OTUs, probably corresponding to transient environmental bacteria. Significant differences were found for all pairwise comparisons of tadpoles and snails with the major groups of insects, but not among insect groups or between snails and tadpoles. The similarity between tadpoles and snails may be related to similar feeding mode as both snails and tadpoles scratch biofilms and algae from surfaces; however, this requires confirmation due to low sample sizes. Overall, the gut microbiota differences found among syntopic aquatic animals are likely shaped by both food preferences and host identity.
Interspecific analysis of covariance structure in the masticatory apparatus of galagos.
Vinyard, Christopher J
2007-01-01
The primate masticatory apparatus (MA) is a functionally integrated set of features, each of which performs important functions in biting, ingestive, and chewing behaviors. A comparison of morphological covariance structure among species for these MA features will help us to further understand the evolutionary history of this region. In this exploratory analysis, the covariance structure of the MA is compared across seven galago species to investigate 1) whether there are differences in covariance structure in this region, and 2) if so, how has this covariation changed with respect to size, MA form, diet, and/or phylogeny? Ten measurements of the MA functionally related to bite force production and load resistance were obtained from 218 adults of seven galago species. Correlation matrices were generated for these 10 dimensions and compared among species via matrix correlations and Mantel tests. Subsequently, pairwise covariance disparity in the MA was estimated as a measure of difference in covariance structure between species. Covariance disparity estimates were correlated with pairwise distances related to differences in body size, MA size and shape, genetic distance (based on cytochrome-b sequences) and percentage of dietary foods to determine whether one or more of these factors is linked to differences in covariance structure. Galagos differ in MA covariance structure. Body size appears to be a major factor correlated with differences in covariance structure among galagos. The largest galago species, Otolemur crassicaudatus, exhibits large differences in body mass and covariance structure relative to other galagos, and thus plays a primary role in creating this association. MA size and shape do not correlate with covariance structure when body mass is held constant. Diet also shows no association. Genetic distance is significantly negatively correlated with covariance disparity when body mass is held constant, but this correlation appears to be a function of the small body size and large genetic distance for Galagoides demidoff. These exploratory results indicate that changing body size may have been a key factor in the evolution of the galago MA.
Deiana, Antonio; Giansanti, Andrea
2010-04-21
Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding. In this methodological paper dichotomic folding indexes are considered: hydrophobicity-charge, mean packing, mean pairwise energy, Poodle-W and a new global index, that is called here gVSL2, based on the local disorder predictor VSL2. The performance of these indexes is evaluated on different datasets, in particular on a new dataset composed by 2369 folded and 81 natively unfolded proteins. Poodle-W, gVSL2 and mean pairwise energy have good performance and stability in all the datasets considered and are combined into a strictly unanimous combination score SSU, that leaves proteins unclassified when the consensus of all combined indexes is not reached. The unclassified proteins: i) belong to an overlap region in the vector space of amino acidic compositions occupied by both folded and unfolded proteins; ii) are composed by approximately the same number of order-promoting and disorder-promoting amino acids; iii) have a mean flexibility intermediate between that of folded and that of unfolded proteins. Our results show that proteins unclassified by SSU belong to a twilight zone. Proteins left unclassified by the consensus score SSU have physical properties intermediate between those of folded and those of natively unfolded proteins and their structural properties and evolutionary history are worth to be investigated.
2010-01-01
Background Natively unfolded proteins lack a well defined three dimensional structure but have important biological functions, suggesting a re-assignment of the structure-function paradigm. To assess that a given protein is natively unfolded requires laborious experimental investigations, then reliable sequence-only methods for predicting whether a sequence corresponds to a folded or to an unfolded protein are of interest in fundamental and applicative studies. Many proteins have amino acidic compositions compatible both with the folded and unfolded status, and belong to a twilight zone between order and disorder. This makes difficult a dichotomic classification of protein sequences into folded and natively unfolded ones. In this work we propose an operational method to identify proteins belonging to the twilight zone by combining into a consensus score good performing single predictors of folding. Results In this methodological paper dichotomic folding indexes are considered: hydrophobicity-charge, mean packing, mean pairwise energy, Poodle-W and a new global index, that is called here gVSL2, based on the local disorder predictor VSL2. The performance of these indexes is evaluated on different datasets, in particular on a new dataset composed by 2369 folded and 81 natively unfolded proteins. Poodle-W, gVSL2 and mean pairwise energy have good performance and stability in all the datasets considered and are combined into a strictly unanimous combination score SSU, that leaves proteins unclassified when the consensus of all combined indexes is not reached. The unclassified proteins: i) belong to an overlap region in the vector space of amino acidic compositions occupied by both folded and unfolded proteins; ii) are composed by approximately the same number of order-promoting and disorder-promoting amino acids; iii) have a mean flexibility intermediate between that of folded and that of unfolded proteins. Conclusions Our results show that proteins unclassified by SSU belong to a twilight zone. Proteins left unclassified by the consensus score SSU have physical properties intermediate between those of folded and those of natively unfolded proteins and their structural properties and evolutionary history are worth to be investigated. PMID:20409339
Pairwise Sequence Alignment Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jeff Daily, PNNL
2015-05-20
Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, amore » novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less
Mallon, Dermot H; Bradley, J Andrew; Winn, Peter J; Taylor, Craig J; Kosmoliaptsis, Vasilis
2015-02-01
We have previously shown that qualitative assessment of surface electrostatic potential of HLA class I molecules helps explain serological patterns of alloantibody binding. We have now used a novel computational approach to quantitate differences in surface electrostatic potential of HLA B-cell epitopes and applied this to explain HLA Bw4 and Bw6 antigenicity. Protein structure models of HLA class I alleles expressing either the Bw4 or Bw6 epitope (defined by sequence motifs at positions 77 to 83) were generated using comparative structure prediction. The electrostatic potential in 3-dimensional space encompassing the Bw4/Bw6 epitope was computed by solving the Poisson-Boltzmann equation and quantitatively compared in a pairwise, all-versus-all fashion to produce distance matrices that cluster epitopes with similar electrostatics properties. Quantitative comparison of surface electrostatic potential at the carboxyl terminal of the α1-helix of HLA class I alleles, corresponding to amino acid sequence motif 77 to 83, produced clustering of HLA molecules in 3 principal groups according to Bw4 or Bw6 epitope expression. Remarkably, quantitative differences in electrostatic potential reflected known patterns of serological reactivity better than Bw4/Bw6 amino acid sequence motifs. Quantitative assessment of epitope electrostatic potential allowed the impact of known amino acid substitutions (HLA-B*07:02 R79G, R82L, G83R) that are critical for antibody binding to be predicted. We describe a novel approach for quantitating differences in HLA B-cell epitope electrostatic potential. Proof of principle is provided that this approach enables better assessment of HLA epitope antigenicity than amino acid sequence data alone, and it may allow prediction of HLA immunogenicity.
Non-pairwise additivity of the leading-order dispersion energy
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hollett, Joshua W., E-mail: j.hollett@uwinnipeg.ca
2015-02-28
The leading-order (i.e., dipole-dipole) dispersion energy is calculated for one-dimensional (1D) and two-dimensional (2D) infinite lattices, and an infinite 1D array of infinitely long lines, of doubly occupied locally harmonic wells. The dispersion energy is decomposed into pairwise and non-pairwise additive components. By varying the force constant and separation of the wells, the non-pairwise additive contribution to the dispersion energy is shown to depend on the overlap of density between neighboring wells. As well separation is increased, the non-pairwise additivity of the dispersion energy decays. The different rates of decay for 1D and 2D lattices of wells is explained inmore » terms of a Jacobian effect that influences the number of nearest neighbors. For an array of infinitely long lines of wells spaced 5 bohrs apart, and an inter-well spacing of 3 bohrs within a line, the non-pairwise additive component of the leading-order dispersion energy is −0.11 kJ mol{sup −1} well{sup −1}, which is 7% of the total. The polarizability of the wells and the density overlap between them are small in comparison to that of the atomic densities that arise from the molecular density partitioning used in post-density-functional theory (DFT) damped dispersion corrections, or DFT-D methods. Therefore, the nonadditivity of the leading-order dispersion observed here is a conservative estimate of that in molecular clusters.« less
Connecting Palau's marine protected areas: a population genetic approach to conservation
NASA Astrophysics Data System (ADS)
Cros, Annick; Toonen, Robert J.; Donahue, Megan J.; Karl, Stephen A.
2017-09-01
Bleaching events are becoming more frequent and are projected to become annual in Micronesia by 2040. To prepare for this threat, the Government of Palau is reviewing its marine protected area network to increase the resilience of the reefs by integrating connectivity into the network design. To support their effort, we used high-throughput sequencing of microsatellites to create genotypes of colonies of the coral Acropora hyacinthus to characterize population genetic structure and dispersal patterns that led to the recovery of Palau's reefs from a 1998 bleaching event. We found no evidence of a founder effect or refugium where colonies may have survived to recolonize the reef. Instead, we found significant pairwise F' st values, indicating population structure and low connectivity among most of the 25 sites around Palau. We used kinship to measure genetic differences at the individual level among sites and found that differences were best explained by the degree of exposure to the ocean [ F 1,20 = 3.015, Pr(> F) = 0.01], but with little of the total variation explained. A permutation test of the pairwise kinship coefficients revealed that there was self-seeding within sites. Overall, the data point to the population of A. hyacinthus in Palau recovering from a handful of surviving colonies with population growth primarily from self-seeding and little exchange among sites. This finding has significant implications for the management strategies for the reefs of Palau, and we recommend increasing the number and distribution of management areas around Palau to capture the genetic architecture and increase the chances of protecting potential refuges in the future.
Mhc class II B gene evolution in East African cichlid fishes.
Figueroa, F; Mayer, W E; Sültmann, H; O'hUigin, C; Tichy, H; Satta, Y; Takezaki, N; Takahata, N; Klein, J
2000-06-01
A distinctive feature of essential major histocompatibility complex (Mhc) loci is their polymorphism characterized by large genetic distances between alleles and long persistence times of allelic lineages. Since the lineages often span several successive speciations, we investigated the behavior of the Mhc alleles during or close to the speciation phase. We sequenced exon 2 of the class II B locus 4 from 232 East African cichlid fishes representing 32 related species. The divergence times of the (sub)species ranged from 6,000 to 8.4 million years. Two types of evolutionary analysis were used to elucidate the pattern of exon 2 sequence divergence. First, phylogenetic methods were applied to reconstruct the most likely evolutionary pathways leading from the last common ancestor of the set to the extant sequences, and to assess the probable mechanisms involved in allelic diversification. Second, pairwise comparisons of sequences were carried out to detect differences seemingly incompatible with origin by nonparallel point mutations. The analysis revealed point mutations to be the most important mechanism behind allelic divergences, with recombination playing only an auxiliary part. Comparison of sequences from related species revealed evidence of random allelic (lineage) losses apparently associated with speciation. Sharing of identical alleles could be demonstrated between species that diverged 2 million years ago. The phylogeny of the exon was incongruent with that of the flanking introns, indicating either a high degree of convergent evolution at the peptide-binding region-encoding sites, or intron homogenization.
Lin, C S; Sun, Y L; Liu, C Y; Yang, P C; Chang, L C; Cheng, I C; Mao, S J; Huang, M C
1999-08-05
The complete nucleotide sequence of the pig (Sus scrofa) mitochondrial genome, containing 16613bp, is presented in this report. The genome is not a specific length because of the presence of the variable numbers of tandem repeats, 5'-CGTGCGTACA in the displacement loop (D-loop). Genes responsible for 12S and 16S rRNAs, 22 tRNAs, and 13 protein-coding regions are found. The genome carries very few intergenic nucleotides with several instances of overlap between protein-coding or tRNA genes, except in the D-loop region. For evaluating the possible evolutionary relationships between Artiodactyla and Cetacea, the nucleotide substitutions and amino acid sequences of 13 protein-coding genes were aligned by pairwise comparisons of the pig, cow, and fin whale. By comparing these sequences, we suggest that there is a closer relationship between the pig and cow than that between either of these species and fin whale. In addition, the accumulation of transversions and gaps in pig 12S and 16S rRNA genes was compared with that in other eutherian species, including cow, fin whale, human, horse, and harbor seal. The results also reveal a close phylogenetic relationship between pig and cow, as compared to fin whale and others. Thus, according to the sequence differences of mitochondrial rRNA genes in eutherian species, the evolutionary separation of pig and cow occurred about 53-60 million years ago.
Alva, Vikram; Remmert, Michael; Biegert, Andreas; Lupas, Andrei N; Söding, Johannes
2010-01-01
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co-occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.
Taylor, Angela J; Lappi, Victoria; Wolfgang, William J; Lapierre, Pascal; Palumbo, Michael J; Medus, Carlota; Boxrud, David
2015-10-01
Salmonella enterica serovar Enteritidis is a significant cause of gastrointestinal illness in the United States; however, current molecular subtyping methods lack resolution for this highly clonal serovar. Advances in next-generation sequencing technologies have made it possible to examine whole-genome sequencing (WGS) as a potential molecular subtyping tool for outbreak detection and source trace back. Here, we conducted a retrospective analysis of S. Enteritidis isolates from seven epidemiologically confirmed foodborne outbreaks and sporadic isolates (not epidemiologically linked) to determine the utility of WGS to identify outbreaks. A collection of 55 epidemiologically characterized clinical and environmental S. Enteritidis isolates were sequenced. Single nucleotide polymorphism (SNP)-based cluster analysis of the S. Enteritidis genomes revealed well supported clades, with less than four-SNP pairwise diversity, that were concordant with epidemiologically defined outbreaks. Sporadic isolates were an average of 42.5 SNPs distant from the outbreak clusters. Isolates collected from the same patient over several weeks differed by only two SNPs. Our findings show that WGS provided greater resolution between outbreak, sporadic, and suspect isolates than the current gold standard subtyping method, pulsed-field gel electrophoresis (PFGE). Furthermore, results could be obtained in a time frame suitable for surveillance activities, supporting the use of WGS as an outbreak detection and characterization method for S. Enteritidis. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Dunfield, Kari E; King, Gary M
2004-07-01
Genomic DNA extracts from four sites at Kilauea Volcano were used as templates for PCR amplification of the large subunit (coxL) of aerobic carbon monoxide dehydrogenase. The sites included a 42-year-old tephra deposit, a 108-year-old lava flow, a 212-year-old partially vegetated ash-and-tephra deposit, and an approximately 300-year-old forest. PCR primers amplified coxL sequences from the OMP clade of CO oxidizers, which includes isolates such as Oligotropha carboxidovorans, Mycobacterium tuberculosis, and Pseudomonas thermocarboxydovorans. PCR products were used to create clone libraries that provide the first insights into the diversity and phylogenetic affiliations of CO oxidizers in situ. On the basis of phylogenetic and statistical analyses, clone libraries for each site were distinct. Although some clone sequences were similar to coxL sequences from known organisms, many sequences appeared to represent phylogenetic lineages not previously known to harbor CO oxidizers. On the basis of average nucleotide diversity and average pairwise difference, a forested site supported the most diverse CO-oxidizing populations, while an 1894 lava flow supported the least diverse populations. Neither parameter correlated with previous estimates of atmospheric CO uptake rates, but both parameters correlated positively with estimates of microbial biomass and respiration. Collectively, the results indicate that the CO oxidizer functional group associated with recent volcanic deposits of the remote Hawaiian Islands contains substantial and previously unsuspected diversity.
Dunfield, Kari E.; King, Gary M.
2004-01-01
Genomic DNA extracts from four sites at Kilauea Volcano were used as templates for PCR amplification of the large subunit (coxL) of aerobic carbon monoxide dehydrogenase. The sites included a 42-year-old tephra deposit, a 108-year-old lava flow, a 212-year-old partially vegetated ash-and-tephra deposit, and an approximately 300-year-old forest. PCR primers amplified coxL sequences from the OMP clade of CO oxidizers, which includes isolates such as Oligotropha carboxidovorans, Mycobacterium tuberculosis, and Pseudomonas thermocarboxydovorans. PCR products were used to create clone libraries that provide the first insights into the diversity and phylogenetic affiliations of CO oxidizers in situ. On the basis of phylogenetic and statistical analyses, clone libraries for each site were distinct. Although some clone sequences were similar to coxL sequences from known organisms, many sequences appeared to represent phylogenetic lineages not previously known to harbor CO oxidizers. On the basis of average nucleotide diversity and average pairwise difference, a forested site supported the most diverse CO-oxidizing populations, while an 1894 lava flow supported the least diverse populations. Neither parameter correlated with previous estimates of atmospheric CO uptake rates, but both parameters correlated positively with estimates of microbial biomass and respiration. Collectively, the results indicate that the CO oxidizer functional group associated with recent volcanic deposits of the remote Hawaiian Islands contains substantial and previously unsuspected diversity. PMID:15240307
Kang, Hae Ji; Bennett, Shannon N.; Dizney, Laurie; Sumibcay, Laarni; Arai, Satoru; Ruedas, Luis A.; Song, Jin-Won; Yanagihara, Richard
2009-01-01
A genetically distinct hantavirus, designated Oxbow virus (OXBV), was detected in tissues of an American shrew mole (Neurotrichus gibbsii), captured in Gresham, Oregon, in September 2003. Pairwise analysis of full-length S- and M- and partial L-segment nucleotide and amino acid sequences of OXBV indicated low sequence similarity with rodent-borne hantaviruses. Phylogenetic analyses using maximum-likelihood and Bayesian methods, and host-parasite evolutionary comparisons, showed that OXBV and Asama virus, a hantavirus recently identified from the Japanese shrew mole (Urotrichus talpoides), were related to soricine shrew-borne hantaviruses from North America and Eurasia, respectively, suggesting parallel evolution associated with cross-species transmission. PMID:19394994
Horizontal transfers of Mariner transposons between mammals and insects.
Oliveira, Sarah G; Bao, Weidong; Martins, Cesar; Jurka, Jerzy
2012-09-26
Active transposable elements (TEs) can be passed between genomes of different species by horizontal transfer (HT). This may help them to avoid vertical extinction due to elimination by natural selection or silencing. HT is relatively frequent within eukaryotic taxa, but rare between distant species. Closely related Mariner-type DNA transposon families, collectively named as Mariner-1_Tbel families, are present in the genomes of two ants and two mammalian genomes. Consensus sequences of the four families show pairwise identities greater than 95%. In addition, mammalian Mariner1_BT family shows a close evolutionary relationship with some insect Mariner families. Mammalian Mariner1_BT type sequences are present only in species from three groups including ruminants, tooth whales (Odontoceti), and New World leaf-nosed bats (Phyllostomidae). Horizontal transfer accounts for the presence of Mariner_Tbel and Mariner1_BT families in mammals. Mariner_Tbel family was introduced into hedgehog and tree shrew genomes approximately 100 to 69 million years ago (MYA). Most likely, these TE families were transferred from insects to mammals, but details of the transfer remain unknown.
Expansion of inverted repeat does not decrease substitution rates in Pelargonium plastid genomes.
Weng, Mao-Lun; Ruhlman, Tracey A; Jansen, Robert K
2017-04-01
For species with minor inverted repeat (IR) boundary changes in the plastid genome (plastome), nucleotide substitution rates were previously shown to be lower in the IR than the single copy regions (SC). However, the impact of large-scale IR expansion/contraction on plastid nucleotide substitution rates among closely related species remains unclear. We included plastomes from 22 Pelargonium species, including eight newly sequenced genomes, and used both pairwise and model-based comparisons to investigate the impact of the IR on sequence evolution in plastids. Ten types of plastome organization with different inversions or IR boundary changes were identified in Pelargonium. Inclusion in the IR was not sufficient to explain the variation of nucleotide substitution rates. Instead, the rate heterogeneity in Pelargonium plastomes was a mixture of locus-specific, lineage-specific and IR-dependent effects. Our study of Pelargonium plastomes that vary in IR length and gene content demonstrates that the evolutionary consequences of retaining these repeats are more complicated than previously suggested. © 2016 The Authors. New Phytologist © 2016 New Phytologist Trust.
Reduced Mtdna Diversity in the Ngobe Amerinds of Panama
Kolman, C. J.; Bermingham, E.; Cooke, R.; Ward, R. H.; Arias, T. D.; Guionneau-Sinclair, F.
1995-01-01
Mitochondrial DNA (mtDNA) haplotype diversity was determined for 46 Ngobe Amerinds sampled widely across their geographic range in western Panama. The Ngobe data were compared with mtDNA control region I sequences from two additional Amerind groups located at the northern and southern extremes of Amerind distribution, the Nuu-Chah-Nulth of the Pacific Northwest and the Chilean Mapuche and from one Na-Dene group, the Haida of the Pacific Northwest. The Ngobe exhibit the lowest mtDNA control region sequence diversity yet reported for an Amerind group. Moreover, they carry only two of the four Amerind founding lineages first described by Wallace and coworkers. We posit that the Ngobe passed through a population bottleneck caused by ethnogenesis from a small founding population and/or European conquest and colonization. Dating of the Ngobe population expansion using the HARPENDING et al. approach to the analysis of pairwise genetic differences indicates a Ngobe expansion at roughly 6800 years before present (range: 1850-14,000 years before present), a date more consistent with a bottleneck at Chibcha ethnogenesis than a conquest-based event. PMID:7635293
A comparison of different functions for predicted protein model quality assessment.
Li, Juan; Fang, Huisheng
2016-07-01
In protein structure prediction, a considerable number of models are usually produced by either the Template-Based Method (TBM) or the ab initio prediction. The purpose of this study is to find the critical parameter in assessing the quality of the predicted models. A non-redundant template library was developed and 138 target sequences were modeled. The target sequences were all distant from the proteins in the template library and were aligned with template library proteins on the basis of the transformation matrix. The quality of each model was first assessed with QMEAN and its six parameters, which are C_β interaction energy (C_beta), all-atom pairwise energy (PE), solvation energy (SE), torsion angle energy (TAE), secondary structure agreement (SSA), and solvent accessibility agreement (SAE). Finally, the alignment score (score) was also used to assess the quality of model. Hence, a total of eight parameters (i.e., QMEAN, C_beta, PE, SE, TAE, SSA, SAE, score) were independently used to assess the quality of each model. The results indicate that SSA is the best parameter to estimate the quality of the model.
Trellis Coding of Non-coherent Multiple Symbol Full Response M-ary CPFSK with Modulation Index 1/M
NASA Technical Reports Server (NTRS)
Lee, H.; Divsalar, D.; Weber, C.
1994-01-01
This paper introduces a trellis coded modulation (TCM) scheme for non-coherent multiple full response M-ary CPFSK with modulation index 1/M. A proper branch metric for the trellis decoder is obtained by employing a simple approximation of the modified Bessel function for large signal to noise ratio (SNR). Pairwise error probability of coded sequences is evaluated by applying a linear approximation to the Rician random variable.
Shi, Hang; Tan, Ceheng; Zhang, Weibin; Zhang, Zichun; Long, Rong; Luo, Tuoping; Yang, Zhen
2015-05-15
A highly enantio- and diastereoselective synthesis of the left-wing fragment of 11-epi-azadirachtin I characterized with the pairwise use of palladium- and gold-catalyzed cascade reactions is presented. By enlisting a sequence of stereocontrolled transformations, our 21-step route established the stereocenters of the left-wing fragment from one chiral starting material, (-)-carvone, which would significantly facilitate the synthetic studies of the azadirachtin-type limonoids.
NASA Astrophysics Data System (ADS)
Donoghue, C.; Rao, A.; Bull, A. M. J.; Rueckert, D.
2011-03-01
Osteoarthritis (OA) is a degenerative, debilitating disease with a large socio-economic impact. This study looks to manifold learning as an automatic approach to harness the plethora of data provided by the Osteoarthritis Initiative (OAI). We construct several Laplacian Eigenmap embeddings of articular cartilage appearance from MR images of the knee using multiple MR sequences. A region of interest (ROI) defined as the weight bearing medial femur is automatically located in all images through non-rigid registration. A pairwise intensity based similarity measure is computed between all images, resulting in a fully connected graph, where each vertex represents an image and the weight of edges is the similarity measure. Spectral analysis is then applied to these pairwise similarities, which acts to reduce the dimensionality non-linearly and embeds these images in a manifold representation. In the manifold space, images that are close to each other are considered to be more "similar" than those far away. In the experiment presented here we use manifold learning to automatically predict the morphological changes in the articular cartilage by using the co-ordinates of the images in the manifold as independent variables for multiple linear regression. In the study presented here five manifolds are generated from five sequences of 390 distinct knees. We find statistically significant correlations (up to R2 = 0.75), between our predictors and the results presented in the literature.
Zhu, H.; Braun, W.
1999-01-01
A statistical analysis of a representative data set of 169 known protein structures was used to analyze the specificity of residue interactions between spatial neighboring strands in beta-sheets. Pairwise potentials were derived from the frequency of residue pairs in nearest contact, second nearest and third nearest contacts across neighboring beta-strands compared to the expected frequency of residue pairs in a random model. A pseudo-energy function based on these statistical pairwise potentials recognized native beta-sheets among possible alternative pairings. The native pairing was found within the three lowest energies in 73% of the cases in the training data set and in 63% of beta-sheets in a test data set of 67 proteins, which were not part of the training set. The energy function was also used to detect tripeptides, which occur frequently in beta-sheets of native proteins. The majority of native partners of tripeptides were distributed in a low energy range. Self-correcting distance geometry (SECODG) calculations using distance constraints sets derived from possible low energy pairing of beta-strands uniquely identified the native pairing of the beta-sheet in pancreatic trypsin inhibitor (BPTI). These results will be useful for predicting the structure of proteins from their amino acid sequence as well as for the design of proteins containing beta-sheets. PMID:10048326
Web-Beagle: a web server for the alignment of RNA secondary structures.
Mattei, Eugenio; Pietrosanto, Marco; Ferrè, Fabrizio; Helmer-Citterich, Manuela
2015-07-01
Web-Beagle (http://beagle.bio.uniroma2.it) is a web server for the pairwise global or local alignment of RNA secondary structures. The server exploits a new encoding for RNA secondary structure and a substitution matrix of RNA structural elements to perform RNA structural alignments. The web server allows the user to compute up to 10 000 alignments in a single run, taking as input sets of RNA sequences and structures or primary sequences alone. In the latter case, the server computes the secondary structure prediction for the RNAs on-the-fly using RNAfold (free energy minimization). The user can also compare a set of input RNAs to one of five pre-compiled RNA datasets including lncRNAs and 3' UTRs. All types of comparison produce in output the pairwise alignments along with structural similarity and statistical significance measures for each resulting alignment. A graphical color-coded representation of the alignments allows the user to easily identify structural similarities between RNAs. Web-Beagle can be used for finding structurally related regions in two or more RNAs, for the identification of homologous regions or for functional annotation. Benchmark tests show that Web-Beagle has lower computational complexity, running time and better performances than other available methods. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
The Cervical Microbiome over 7 Years and a Comparison of Methodologies for Its Characterization
Smith, Benjamin C.; McAndrew, Thomas; Chen, Zigui; Harari, Ariana; Barris, David M.; Viswanathan, Shankar; Rodriguez, Ana Cecilia; Castle, Phillip; Herrero, Rolando; Schiffman, Mark; Burk, Robert D.
2012-01-01
Background The rapidly expanding field of microbiome studies offers investigators a large choice of methods for each step in the process of determining the microorganisms in a sample. The human cervicovaginal microbiome affects female reproductive health, susceptibility to and natural history of many sexually transmitted infections, including human papillomavirus (HPV). At present, long-term behavior of the cervical microbiome in early sexual life is poorly understood. Methods The V6 and V6–V9 regions of the 16S ribosomal RNA gene were amplified from DNA isolated from exfoliated cervical cells. Specimens from 10 women participating in the Natural History Study of HPV in Guanacaste, Costa Rica were sampled successively over a period of 5–7 years. We sequenced amplicons using 3 different platforms (Sanger, Roche 454, and Illumina HiSeq 2000) and analyzed sequences using pipelines based on 3 different classification algorithms (usearch, RDP Classifier, and pplacer). Results Usearch and pplacer provided consistent microbiome classifications for all sequencing methods, whereas RDP Classifier deviated significantly when characterizing Illumina reads. Comparing across sequencing platforms indicated 7%–41% of the reads were reclassified, while comparing across software pipelines reclassified up to 32% of the reads. Variability in classification was shown not to be due to a difference in read lengths. Six cervical microbiome community types were observed and are characterized by a predominance of either G. vaginalis or Lactobacillus spp. Over the 5–7 year period, subjects displayed fluctuation between community types. A PERMANOVA analysis on pairwise Kantorovich-Rubinstein distances between the microbiota of all samples yielded an F-test ratio of 2.86 (p<0.01), indicating a significant difference comparing within and between subjects’ microbiota. Conclusions Amplification and sequencing methods affected the characterization of the microbiome more than classification algorithms. Pplacer and usearch performed consistently with all sequencing methods. The analyses identified 6 community types consistent with those previously reported. The long-term behavior of the cervical microbiome indicated that fluctuations were subject dependent. PMID:22792313
NASA Astrophysics Data System (ADS)
Perrin, Douglas P.; Bueno, Alejandra; Rodriguez, Andrea; Marx, Gerald R.; del Nido, Pedro J.
2017-03-01
In this paper we describe a pilot study, where machine learning methods are used to differentiate between congenital heart diseases. Our approach was to apply convolutional neural networks (CNNs) to echocardiographic images from five different pediatric populations: normal, coarctation of the aorta (CoA), hypoplastic left heart syndrome (HLHS), transposition of the great arteries (TGA), and single ventricle (SV). We used a single network topology that was trained in a pairwise fashion in order to evaluate the potential to differentiate between patient populations. In total we used 59,151 echo frames drawn from 1,666 clinical sequences. Approximately 80% of the data was used for training, and the remainder for validation. Data was split at sequence boundaries to avoid having related images in the training and validation sets. While training was done with echo images/frames, evaluation was performed for both single frame discrimination as well as sequence discrimination (by majority voting). In total 10 networks were generated and evaluated. Unlike other domains where this network topology has been used, in ultrasound there is low visual variation between classes. This work shows the potential for CNNs to be applied to this low-variation domain of medical imaging for disease discrimination.
Masters, N; Christie, M; Katouli, M; Stratton, H
2015-06-01
We investigated the usefulness of the β-d-glucuronidase gene variance in Escherichia coli as a microbial source tracking tool using a novel algorithm for comparison of sequences from a prescreened set of host-specific isolates using a high-resolution PhP typing method. A total of 65 common biochemical phenotypes belonging to 318 E. coli strains isolated from humans and domestic and wild animals were analysed for nucleotide variations at 10 loci along a 518 bp fragment of the 1812 bp β-d-glucuronidase gene. Neighbour-joining analysis of loci variations revealed 86 (76.8%) human isolates and 91.2% of animal isolates were correctly identified. Pairwise hierarchical clustering improved assignment; where 92 (82.1%) human and 204 (99%) animal strains were assigned to their respective cluster. Our data show that initial typing of isolates and selection of common types from different hosts prior to analysis of the β-d-glucuronidase gene sequence improves source identification. We also concluded that numerical profiling of the nucleotide variations can be used as a valuable approach to differentiate human from animal E. coli. This study signifies the usefulness of the β-d-glucuronidase gene as a marker for differentiating human faecal pollution from animal sources.
Genetic characterization and phylogenetic analysis of Eimeria arloingi in Iranian native kids.
Khodakaram-Tafti, A; Hashemnia, M; Razavi, S M; Sharifiyazdi, H; Nazifi, S
2013-09-01
Among the 16 species of Eimeria from goats, Eimeria arloingi and Eimeria ninakohlyakimovae are regarded as the most pathogenic species in the world and cause clinical caprine coccidiosis. E. arloingi is known to be an important cause of coccidiosis in Iranian kids. Molecular analyses of two portions of nuclear ribosomal DNA (internal transcribed spacer1 (ITS1) and 18S rDNA) were used for the genetic characterization of the E. arloingi. Comparison of the sequencing data of E. arloingi obtained in the present study (ITS1: KC507793 and 18S rDNA: KC507792) with other Eimeria species in the GenBank database revealed a particularly close relationship between E. arloingi and Eimeria spp. from the cattle and sheep. The phylogram based on the ITS1 sequences shows that the E. arloingi, Eimeria bovis, and Eimeria zuernii formed a distinct group separate from the other remaining Eimeria spp. in cattle and poultry. In pairwise alignment, 18S rDNA sequence derived from E. arloingi showed 99% similarity to Eimeria ahsata with differences observed at only three nucleotides. This study showed that the ITS1 and 18S rDNA gene are useful genetic markers for the specific identification and differentiation of Eimeria spp. in ruminants.
Li, Yu Long; Dong, Jing; Wang, Bin; Li, Yi Ping; Yu, Xu Guang; Fu, Jie; Wang, Wen Bo
2016-07-01
To investigate the genetic characterization and population genetic structure of Rhopilema esculentum, we sequenced the mtDNA COI gene (624 bp) in 56 individuals collected from Liaodong Bay and the Ganghwado Island in the estuarine waters of the Han River. In addition, the homologous sequences of other 15 individuals which were sampled from the Bohai and Yellow seas and Sea of Japan were analyzed. A total of 28 polymorphic nucleotide sites were detected among the 71 individuals, which defined 32 haplotypes. Haplotype diversity levels were high (0.91±0.06-0.94±0.01) in R. esculentum populations, whereas those of nucleotide diversity were moderate to low [(0.60±0.34)%-(0.68±0.40)%]. Compared with several other giant jellyfish species, the variation level of R. esculentum was high. Phylogeographic analysis of the COI region revealed two lineages. The pairwise F ST comparison and hierarchical molecular variance analysis (AMOVA) showed that significant population structure existed throughout the range of R. esculentum. The results of this study indicated that the life-cycle characteristics, together with possible anthropogenic introduction such as stock enhancement and the prevailing ocean currents in this region, were proposed as the main factors that determined the genetic patterns of R. esculentum.
Sequence-Selective Formation of Synthetic H-Bonded Duplexes
2017-01-01
Oligomers equipped with a sequence of phenol and pyridine N-oxide groups form duplexes via H-bonding interactions between these recognition units. Reductive amination chemistry was used to synthesize all possible 3-mer sequences: AAA, AAD, ADA, DAA, ADD, DAD, DDA, and DDD. Pairwise interactions between the oligomers were investigated using NMR titration and dilution experiments in toluene. The measured association constants vary by 3 orders of magnitude (102 to 105 M–1). Antiparallel sequence-complementary oligomers generally form more stable complexes than mismatched duplexes. Mismatched duplexes that have an excess of H-bond donors are stabilized by the interaction of two phenol donors with one pyridine N-oxide acceptor. Oligomers that have a H-bond donor and acceptor on the ends of the chain can fold to form intramolecular H-bonds in the free state. The 1,3-folding equilibrium competes with duplex formation and lowers the stability of duplexes involving these sequences. As a result, some of the mismatch duplexes are more stable than some of the sequence-complementary duplexes. However, the most stable mismatch duplexes contain DDD and compete with the most stable sequence-complementary duplex, AAA·DDD, so in mixtures that contain all eight sequences, sequence-complementary duplexes dominate. Even higher fidelity sequence selectivity can be achieved if alternating donor–acceptor sequences are avoided. PMID:28857551
Weak Higher-Order Interactions in Macroscopic Functional Networks of the Resting Brain.
Huang, Xuhui; Xu, Kaibin; Chu, Congying; Jiang, Tianzi; Yu, Shan
2017-10-25
Interactions among different brain regions are usually examined through functional connectivity (FC) analysis, which is exclusively based on measuring pairwise correlations in activities. However, interactions beyond the pairwise level, that is, higher-order interactions (HOIs), are vital in understanding the behavior of many complex systems. So far, whether HOIs exist among brain regions and how they can affect the brain's activities remains largely elusive. To address these issues, here, we analyzed blood oxygenation level-dependent (BOLD) signals recorded from six typical macroscopic functional networks of the brain in 100 human subjects (46 males and 54 females) during the resting state. Through examining the binarized BOLD signals, we found that HOIs within and across individual networks were both very weak regardless of the network size, topology, degree of spatial proximity, spatial scales, and whether the global signal was regressed. To investigate the potential mechanisms underlying the weak HOIs, we analyzed the dynamics of a network model and also found that HOIs were generally weak within a wide range of key parameters provided that the overall dynamic feature of the model was similar to the empirical data and it was operating close to a linear fluctuation regime. Our results suggest that weak HOI may be a general property of brain's macroscopic functional networks, which implies the dominance of pairwise interactions in shaping brain activities at such a scale and warrants the validity of widely used pairwise-based FC approaches. SIGNIFICANCE STATEMENT To explain how activities of different brain areas are coordinated through interactions is essential to revealing the mechanisms underlying various brain functions. Traditionally, such an interaction structure is commonly studied using pairwise-based functional network analyses. It is unclear whether the interactions beyond the pairwise level (higher-order interactions or HOIs) play any role in this process. Here, we show that HOIs are generally weak in macroscopic brain networks. We also suggest a possible dynamical mechanism that may underlie this phenomenon. These results provide plausible explanation for the effectiveness of widely used pairwise-based approaches in analyzing brain networks. More importantly, it reveals a previously unknown, simple organization of the brain's macroscopic functional systems. Copyright © 2017 the authors 0270-6474/17/3710481-17$15.00/0.
Friedberg, Devorah; Midkiff, Michael; Calvo, Joseph M.
2001-01-01
Lrp (leucine-responsive regulatory protein) plays a global regulatory role in Escherichia coli, affecting expression of dozens of operons. Numerous lrp-related genes have been identified in different bacteria and archaea, including asnC, an E. coli gene that was the first reported member of this family. Pairwise comparisons of amino acid sequences of the corresponding proteins shows an average sequence identity of only 29% for the vast majority of comparisons. By contrast, Lrp-related proteins from enteric bacteria show more than 97% amino acid identity. Is the global regulatory role associated with E. coli Lrp limited to enteric bacteria? To probe this question we investigated LrfB, an Lrp-related protein from Haemophilus influenzae that shares 75% sequence identity with E. coli Lrp (highest sequence identity among 42 sequences compared). A strain of H. influenzae having an lrfB null allele grew at the wild-type growth rate but with a filamentous morphology. A comparison of two-dimensional (2D) electrophoretic patterns of proteins from parent and mutant strains showed only two differences (comparable studies with lrp+ and lrp E. coli strains by others showed 20 differences). The abundance of LrfB in H. influenzae, estimated by Western blotting experiments, was about 130 dimers per cell (compared to 3,000 dimers per E. coli cell). LrfB expressed in E. coli replaced Lrp as a repressor of the lrp gene but acted only to a limited extent as an activator of the ilvIH operon. Thus, although LrfB resembles Lrp sufficiently to perform some of its functions, its low abundance is consonant with a more local role in regulating but a few genes, a view consistent with the results of the 2D electrophoretic analysis. We speculate that an Lrp having a global regulatory role evolved to help enteric bacteria adapt to their ecological niches and that it is unlikely that Lrp-related proteins in other organisms have a broad regulatory function. PMID:11395465
[Analysis of variance of repeated data measured by water maze with SPSS].
Qiu, Hong; Jin, Guo-qin; Jin, Ru-feng; Zhao, Wei-kang
2007-01-01
To introduce the method of analyzing repeated data measured by water maze with SPSS 11.0, and offer a reference statistical method to clinical and basic medicine researchers who take the design of repeated measures. Using repeated measures and multivariate analysis of variance (ANOVA) process of the general linear model in SPSS and giving comparison among different groups and different measure time pairwise. Firstly, Mauchly's test of sphericity should be used to judge whether there were relations among the repeatedly measured data. If any (P
Tohya, Mari; Watanabe, Takayasu; Maruyama, Fumito; Arai, Sakura; Ota, Atsushi; Athey, Taryn B. T.; Fittipaldi, Nahuel; Nakagawa, Ichiro; Sekizaki, Tsutomu
2016-01-01
Many bacterial species coexist in the same niche as heterogeneous clones with different phenotypes; however, understanding of infectious diseases by polyphenotypic bacteria is still limited. In the present study, encapsulation in isolates of the porcine pathogen Streptococcus suis from persistent endocarditis lesions was examined. Coexistence of both encapsulated and unencapsulated S. suis isolates was found in 26 out of 59 endocarditis samples. The isolates were serotype 2, and belonged to two different sequence types (STs), ST1 and ST28. The genomes of each of the 26 pairs of encapsulated and unencapsulated isolates from the 26 samples were sequenced. The data showed that each pair of isolates had one or more unique nonsynonymous mutations in the cps gene, and the encapsulated and unencapsulated isolates from the same samples were closest to each other. Pairwise comparisons of the sequences of cps genes in 7 pairs of encapsulated and unencapsulated isolates identified insertion/deletions (indels) ranging from one to 104 bp in different cps genes of unencapsulated isolates. Capsule expression was restored in a subset of unencapsulated isolates by complementation in trans with cps expression vectors. Examination of gene content common to isolates indicated that mutation frequency was higher in ST28 pairs than in ST1 pairs. Genes within mobile genetic elements were mutation hot spots among ST28 isolates. Taken all together, our results demonstrate the coexistence of dual phenotype (encapsulated and unencapsulated) bacterial clones and suggest that the dual phenotypes arose independently in each farm by means of spontaneous mutations in cps genes. PMID:27433935
Filatov, Gleb; Bauwens, Bruno; Kertész-Farkas, Attila
2018-05-07
Bioinformatics studies often rely on similarity measures between sequence pairs, which often pose a bottleneck in large-scale sequence analysis. Here, we present a new convolutional kernel function for protein sequences called the LZW-Kernel. It is based on code words identified with the Lempel-Ziv-Welch (LZW) universal text compressor. The LZW-Kernel is an alignment-free method, it is always symmetric, is positive, always provides 1.0 for self-similarity and it can directly be used with Support Vector Machines (SVMs) in classification problems, contrary to normalized compression distance (NCD), which often violates the distance metric properties in practice and requires further techniques to be used with SVMs. The LZW-Kernel is a one-pass algorithm, which makes it particularly plausible for big data applications. Our experimental studies on remote protein homology detection and protein classification tasks reveal that the LZW-Kernel closely approaches the performance of the Local Alignment Kernel (LAK) and the SVM-pairwise method combined with Smith-Waterman (SW) scoring at a fraction of the time. Moreover, the LZW-Kernel outperforms the SVM-pairwise method when combined with BLAST scores, which indicates that the LZW code words might be a better basis for similarity measures than local alignment approximations found with BLAST. In addition, the LZW-Kernel outperforms n-gram based mismatch kernels, hidden Markov model based SAM and Fisher kernel, and protein family based PSI-BLAST, among others. Further advantages include the LZW-Kernel's reliance on a simple idea, its ease of implementation, and its high speed, three times faster than BLAST and several magnitudes faster than SW or LAK in our tests. LZW-Kernel is implemented as a standalone C code and is a free open-source program distributed under GPLv3 license and can be downloaded from https://github.com/kfattila/LZW-Kernel. akerteszfarkas@hse.ru. Supplementary data are available at Bioinformatics Online.
Varsani, Arvind; Kraberger, Simona; Jennings, Scott; Porzig, Elizabeth L; Julian, Laurel; Massaro, Melanie; Pollard, Annie; Ballard, Grant; Ainley, David G
2014-06-01
Papillomaviruses are epitheliotropic viruses that have circular dsDNA genomes encapsidated in non-enveloped virions. They have been found to infect a variety of mammals, reptiles and birds, but so far they have not been found in amphibians. Using a next-generation sequencing de novo assembly contig-informed recovery, we cloned and Sanger sequenced the complete genome of a novel papillomavirus from the faecal matter of Adélie penguins (Pygoscelis adeliae) nesting on Ross Island, Antarctica. The genome had all the usual features of a papillomavirus and an E9 ORF encoding a protein of unknown function that is found in all avian papillomaviruses to date. This novel papillomavirus genome shared ~60 % pairwise identity with the genomes of the other three known avian papillomaviruses: Fringilla coelebs papillomavirus 1 (FcPV1), Francolinus leucoscepus papillomavirus 1 (FlPV1) and Psittacus erithacus papillomavirus 1. Pairwise identity analysis and phylogenetic analysis of the major capsid protein gene clearly indicated that it represents a novel species, which we named Pygoscelis adeliae papillomavirus 1 (PaCV1). No evidence of recombination was detected in the genome of PaCV1, but we did detect a recombinant region (119 nt) in the E6 gene of FlPV1 with the recombinant region being derived from ancestral FcPV1-like sequences. Previously only paramyxoviruses, orthomyxoviruses and avian pox viruses have been genetically identified in penguins; however, the majority of penguin viral identifications have been based on serology or histology. This is the first report, to our knowledge, of a papillomavirus associated with a penguin species. © 2014 The Authors.
Hybrid pairwise likelihood analysis of animal behavior experiments.
Cattelan, Manuela; Varin, Cristiano
2013-12-01
The study of the determinants of fights between animals is an important issue in understanding animal behavior. For this purpose, tournament experiments among a set of animals are often used by zoologists. The results of these tournament experiments are naturally analyzed by paired comparison models. Proper statistical analysis of these models is complicated by the presence of dependence between the outcomes of fights because the same animal is involved in different contests. This paper discusses two different model specifications to account for between-fights dependence. Models are fitted through the hybrid pairwise likelihood method that iterates between optimal estimating equations for the regression parameters and pairwise likelihood inference for the association parameters. This approach requires the specification of means and covariances only. For this reason, the method can be applied also when the computation of the joint distribution is difficult or inconvenient. The proposed methodology is investigated by simulation studies and applied to real data about adult male Cape Dwarf Chameleons. © 2013, The International Biometric Society.
Limtong, Savitree; Kaewwichian, Rungluk
2013-01-01
Three strains (K59(T), K60 and K70 (T)) representing two novel yeast species were isolated from the external surface of leaves of different wine grape (Vitis vinifera) plants, which were collected from the Kanchanaburi Research Station (N14°07'15.1″ E099°19'05.6″), Wang Dong Sub-district, Mueang District, Kanchanaburi Province, Thailand, by an enrichment technique. The sequences of the D1/D2 domain of the large subunit (LSU) rRNA gene of two strains (K59(T) and K60) were identical and differed from that of strain K70(T). In terms of pairwise sequence similarity of the D1/D2 domain, the closest species to the three strains was Candida asparagi but with 2.3% nucleotide substitutions for strains K59(T) and K60, and 2.1% nucleotide substitutions for strain K70(T). On the basis of morphological, biochemical, physiological and chemotaxonomic characteristics and the sequence analysis of the D1/D2 domain of the large subunit (LSU) rRNA gene, the three strains were assigned to be two novel Candida species. Two strains (K59(T) and K60) were assigned as Candida phyllophila sp. nov. (type strain K59(T)=BCC 42662(T)=NBRC 107776(T)=CBS 12671(T)). Candida vitiphila sp. nov. is proposed for strain K70(T) (=BCC 42663(T)=NBRC 107777(T)=CBS 12672(T)).
Haukisalmi, Voitto; Hardman, Lotta M; Hardman, Michael; Laakkonen, Juha; Niemimaa, Jukka; Henttonen, Heikki
2007-01-01
A new species, Paranoplocephala buryatiensis n. sp. (Cestoda:Anoplocephalidae), is described from the grey-sided vole Clethrionomys rufocanus (Sundevall) in the Republic of Buryatia (Russian Federation) and compared with P. longivaginata Chechulin & Gulyaev, 1998, a parasite of the red vole C. rutilus (Pallas) in the same region. P. buryatiensis n. sp. and P. longivaginata both have an exceptionally long vagina and cirrus, unique features among known species of Paranoplocephala Lühe, 1910. The new species differs from P. longivaginata primarily by its wider and more robust body, lower length/width ratio of mature proglottides, tendency of testes to occur in two separate groups, seminal receptacle of a different shape and the position of the cirrus-sac with respect to the ventral longitudinal osmoregulatory canal. The cytochrome oxidase subunit I (COI) sequence data support the independent status of these species, and show that they form a monophyletic assemblage within Paranoplocephala (sensu lato). Assuming cospeciation, an indirect calibration using host speciation dates estimated a rate of mtDNA substitution of 1.0-1.7% pairwise (0.5-0.85% per lineage) sequence divergence per million years. A faunistic review of Paranoplocephala species in C. rufocanus and C. rutilus in the Holarctic region is presented.
Muju virus, a novel hantavirus harboured by the arvicolid rodent Myodes regulus in Korea
Song, Ki-Joon; Baek, Luck Ju; Moon, Sungsil; Ha, Si Jung; Kim, Sang Hyun; Park, Kwang Sook; Klein, Terry A.; Sames, William; Kim, Heung-Chul; Lee, John S.; Yanagihara, Richard; Song, Jin-Won
2008-01-01
Acute-phase sera from >5 % of cases of haemorrhagic fever with renal syndrome occurring annually in Korea have been found to exhibit a fourfold or higher antibody titre to Puumala virus (PUUV) than to Hantaan virus (HTNV) by double-sandwich IgM ELISA, suggesting the existence of a PUUV-related hantavirus. Based on the phylogenetic relationships among arvicolid rodents, the royal vole (Myodes regulus) was targeted as a likely reservoir host of hantavirus. Using RT-PCR, a genetically distinct hantavirus, designated Muju virus (MUJV), was detected in lung tissue of royal voles, captured in widely separated geographical regions in Korea during 1996–2007. Pairwise analysis of the full-length S (1857 nt) and M (3634 nt) segments of MUJV indicated approximately 77 % sequence similarity with PUUV. At the amino acid level, MUJV differed from PUUV by 5.5–6.9 % (nucleocapsid) and 10.0–11.6 % (Gn and Gc envelope glycoproteins). Interstrain variation of MUJV sequences from royal voles captured in different regions suggested geographic-specific clustering. Neutralizing antibody titres against PUUV were two- to sixfold higher than to HTNV in sera of MUJV-infected Myodes regulus. Although virus isolation attempts were unsuccessful, the collective data indicate that MUJV is a distinct hantavirus species. PMID:17947538
Eo, Soo Hyung; DeWoody, J. Andrew
2010-01-01
Rates of biological diversification should ultimately correspond to rates of genome evolution. Recent studies have compared diversification rates with phylogenetic branch lengths, but incomplete phylogenies hamper such analyses for many taxa. Herein, we use pairwise comparisons of confamilial sauropsid (bird and reptile) mitochondrial DNA (mtDNA) genome sequences to estimate substitution rates. These molecular evolutionary rates are considered in light of the age and species richness of each taxonomic family, using a random-walk speciation–extinction process to estimate rates of diversification. We find the molecular clock ticks at disparate rates in different families and at different genes. For example, evolutionary rates are relatively fast in snakes and lizards, intermediate in crocodilians and slow in turtles and birds. There was also rate variation across genes, where non-synonymous substitution rates were fastest at ATP8 and slowest at CO3. Family-by-gene interactions were significant, indicating that local clocks vary substantially among sauropsids. Most importantly, we find evidence that mitochondrial genome evolutionary rates are positively correlated with speciation rates and with contemporary species richness. Nuclear sequences are poorly represented among reptiles, but the correlation between rates of molecular evolution and species diversification also extends to 18 avian nuclear genes we tested. Thus, the nuclear data buttress our mtDNA findings. PMID:20610427
fRMSDPred: Predicting Local RMSD Between Structural Fragments Using Sequence Information
2007-04-04
machine learning approaches for estimating the RMSD value of a pair of protein fragments. These estimated fragment-level RMSD values can be used to construct the alignment, assess the quality of an alignment, and identify high-quality alignment segments. We present algorithms to solve this fragment-level RMSD prediction problem using a supervised learning framework based on support vector regression and classification that incorporates protein profiles, predicted secondary structure, effective information encoding schemes, and novel second-order pairwise exponential kernel
Using sobol sequences for planning computer experiments
NASA Astrophysics Data System (ADS)
Statnikov, I. N.; Firsov, G. I.
2017-12-01
Discusses the use for research of problems of multicriteria synthesis of dynamic systems method of Planning LP-search (PLP-search), which not only allows on the basis of the simulation model experiments to revise the parameter space within specified ranges of their change, but also through special randomized nature of the planning of these experiments is to apply a quantitative statistical evaluation of influence of change of varied parameters and their pairwise combinations to analyze properties of the dynamic system.Start your abstract here...
Comparative Analysis of Genome Sequences Covering the Seven Cronobacter Species
Cummings, Craig A.; Shih, Rita; Degoricija, Lovorka; Rico, Alain; Brzoska, Pius; Hamby, Stephen E.; Masood, Naqash; Hariri, Sumyya; Sonbol, Hana; Chuzhanova, Nadia; McClelland, Michael; Furtado, Manohar R.; Forsythe, Stephen J.
2012-01-01
Background Species of Cronobacter are widespread in the environment and are occasional food-borne pathogens associated with serious neonatal diseases, including bacteraemia, meningitis, and necrotising enterocolitis. The genus is composed of seven species: C. sakazakii, C. malonaticus, C. turicensis, C. dublinensis, C. muytjensii, C. universalis, and C. condimenti. Clinical cases are associated with three species, C. malonaticus, C. turicensis and, in particular, with C. sakazakii multilocus sequence type 4. Thus, it is plausible that virulence determinants have evolved in certain lineages. Methodology/Principal Findings We generated high quality sequence drafts for eleven Cronobacter genomes representing the seven Cronobacter species, including an ST4 strain of C. sakazakii. Comparative analysis of these genomes together with the two publicly available genomes revealed Cronobacter has over 6,000 genes in one or more strains and over 2,000 genes shared by all Cronobacter. Considerable variation in the presence of traits such as type six secretion systems, metal resistance (tellurite, copper and silver), and adhesins were found. C. sakazakii is unique in the Cronobacter genus in encoding genes enabling the utilization of exogenous sialic acid which may have clinical significance. The C. sakazakii ST4 strain 701 contained additional genes as compared to other C. sakazakii but none of them were known specific virulence-related genes. Conclusions/Significance Genome comparison revealed that pair-wise DNA sequence identity varies between 89 and 97% in the seven Cronobacter species, and also suggested various degrees of divergence. Sets of universal core genes and accessory genes unique to each strain were identified. These gene sequences can be used for designing genus/species specific detection assays. Genes encoding adhesins, T6SS, and metal resistance genes as well as prophages are found in only subsets of genomes and have contributed considerably to the variation of genomic content. Differences in gene content likely contribute to differences in the clinical and environmental distribution of species and sequence types. PMID:23166675
Sequence comparison alignment-free approach based on suffix tree and L-words frequency.
Soares, Inês; Goios, Ana; Amorim, António
2012-01-01
The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.
The twilight zone of cis element alignments.
Sebastian, Alvaro; Contreras-Moreira, Bruno
2013-02-01
Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein-DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein-DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.
The twilight zone of cis element alignments
Sebastian, Alvaro; Contreras-Moreira, Bruno
2013-01-01
Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein–DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein–DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments. PMID:23268451
Bào, Yīmíng; Amarasinghe, Gaya K; Basler, Christopher F; Bavari, Sina; Bukreyev, Alexander; Chandran, Kartik; Dolnik, Olga; Dye, John M; Ebihara, Hideki; Formenty, Pierre; Hewson, Roger; Kobinger, Gary P; Leroy, Eric M; Mühlberger, Elke; Netesov, Sergey V; Patterson, Jean L; Paweska, Janusz T; Smither, Sophie J; Takada, Ayato; Towner, Jonathan S; Volchkov, Viktor E; Wahl-Jensen, Victoria; Kuhn, Jens H
2017-05-11
The mononegaviral family Filoviridae has eight members assigned to three genera and seven species. Until now, genus and species demarcation were based on arbitrarily chosen filovirus genome sequence divergence values (≈50% for genera, ≈30% for species) and arbitrarily chosen phenotypic virus or virion characteristics. Here we report filovirus genome sequence-based taxon demarcation criteria using the publicly accessible PAirwise Sequencing Comparison (PASC) tool of the US National Center for Biotechnology Information (Bethesda, MD, USA). Comparison of all available filovirus genomes in GenBank using PASC revealed optimal genus demarcation at the 55-58% sequence diversity threshold range for genera and at the 23-36% sequence diversity threshold range for species. Because these thresholds do not change the current official filovirus classification, these values are now implemented as filovirus taxon demarcation criteria that may solely be used for filovirus classification in case additional data are absent. A near-complete, coding-complete, or complete filovirus genome sequence will now be required to allow official classification of any novel "filovirus." Classification of filoviruses into existing taxa or determining the need for novel taxa is now straightforward and could even become automated using a presented algorithm/flowchart rooted in RefSeq (type) sequences.
Sockeye: A 3D Environment for Comparative Genomics
Montgomery, Stephen B.; Astakhova, Tamara; Bilenky, Mikhail; Birney, Ewan; Fu, Tony; Hassel, Maik; Melsopp, Craig; Rak, Marcin; Robertson, A. Gordon; Sleumer, Monica; Siddiqui, Asim S.; Jones, Steven J.M.
2004-01-01
Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization. PMID:15123592
CAFE: aCcelerated Alignment-FrEe sequence analysis.
Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A; Waterman, Michael S; Sun, Fengzhu
2017-07-03
Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, $d_2^*$ and $d_2^S$ are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Amareshwari, P; Bhatia, Mayuri; Venkatesh, K; Roja Rani, A; Ravi, G V; Bhakt, Priyanka; Bandaru, Srinivas; Yadav, Mukesh; Nayarisseri, Anuraj; Nair, Achuthsankar S
2015-03-01
Indiscriminate application of pesticides like chlorpyrifos, diazinon, or malathion contaminate the soil in addition has being unsafe often it has raised severe health concerns. Conversely, microorganisms like Trichoderma, Aspergillus and Bacteria like Rhizobium Bacillus, Azotobacter, Flavobacterium etc have evolved that are endowed with degradation of pesticides aforementioned to non-toxic products. The current study pitches into identification of a novel species of Flavobacterium bacteria capable to degrade the Organophosphorous pesticides. The bacterium was isolated from agricultural soil collected from Guntur District, Andhra Pradesh, India. The samples were serially diluted and the aliquots were incubated for a suitable time following which the suspected colony was subjected to 16S rDNA sequencing. The sequence thus obtained was aligned pairwise against Flavobacterium species, which resulted in identification of novel specie of Flavobacterium later named as EMBS0145, the sequence of which was deposited in in GenBank with accession number JN794045.
Cytochrome c oxidase subunit I barcoding of the green bee-eater (Merops orientalis).
Arif, I A; Khan, H A; Shobrak, M; Williams, J
2011-10-21
DNA barcoding using mitochondrial cytochrome c oxidase subunit I (COI) is regarded as a standard method for species identification. Recent reports have also shown extended applications of COI gene analysis in phylogeny and molecular diversity studies. The bee-eaters are a group of near passerine birds in the family Meropidae. There are 26 species worldwide; five of them are found in Saudi Arabia. Until now, GenBank included a COI barcode for only one species of bee-eater, the European bee-eater (Merops apiaster). We sequenced the 694-bp segment of the COI gene of the green bee-eater M. orientalis and compared the sequences with those of M. apiaster. Pairwise sequence comparison showed 66 variable sites across all the eight sequences from both species, with an interspecific genetic distance of 0.0362. Two and one within-species variable sites were found, with genetic distances of 0.0005 and 0.0003 for M. apiaster and M. orientalis, respectively. This is the first study reporting barcodes for M. orientalis.
Detecting Earthquakes over a Seismic Network using Single-Station Similarity Measures
NASA Astrophysics Data System (ADS)
Bergen, Karianne J.; Beroza, Gregory C.
2018-03-01
New blind waveform-similarity-based detection methods, such as Fingerprint and Similarity Thresholding (FAST), have shown promise for detecting weak signals in long-duration, continuous waveform data. While blind detectors are capable of identifying similar or repeating waveforms without templates, they can also be susceptible to false detections due to local correlated noise. In this work, we present a set of three new methods that allow us to extend single-station similarity-based detection over a seismic network; event-pair extraction, pairwise pseudo-association, and event resolution complete a post-processing pipeline that combines single-station similarity measures (e.g. FAST sparse similarity matrix) from each station in a network into a list of candidate events. The core technique, pairwise pseudo-association, leverages the pairwise structure of event detections in its network detection model, which allows it to identify events observed at multiple stations in the network without modeling the expected move-out. Though our approach is general, we apply it to extend FAST over a sparse seismic network. We demonstrate that our network-based extension of FAST is both sensitive and maintains a low false detection rate. As a test case, we apply our approach to two weeks of continuous waveform data from five stations during the foreshock sequence prior to the 2014 Mw 8.2 Iquique earthquake. Our method identifies nearly five times as many events as the local seismicity catalog (including 95% of the catalog events), and less than 1% of these candidate events are false detections.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Yingying; Triscari, Joseph M.; Tseng, George C.
Data mining was performed on 28 330 unique peptide tandem mass spectra for which sequences were assigned with high confidence. By dividing the spectra into different sets based on structural features and charge states of the corresponding peptides, chemical interactions involved in promoting specific cleavage patterns in gas-phase peptides were characterized. Pairwise fragmentation maps describing cleavages at all Xxx-Zzz residue combinations for b and y ions reveal that the difference in basicity between Arg and Lys results in different dissociation patterns for singly charged Arg- and Lys-ending tryptic peptides. While one dominant protonation form (proton localized) exists for Arg-ending peptides,more » a heterogeneous population of different protonated forms or more facile interconversion of protonated forms (proton partially mobile) exists for Lys-ending peptides. Cleavage C-terminal to acidic residues dominates spectra from peptides that have a localized proton and cleavage N-terminal to Pro dominates those that have a mobile or partially mobile proton. When Pro is absent from peptides that have a mobile or partially mobile proton, cleavage at each peptide bond becomes much more prominent. Whether the above patterns can be found in b ions, y ions, or both depends on the location of the proton holder(s). Enhanced cleavages C-terminal to branched aliphatic residues (Ile, Val, Leu) are observed in both b and y ions from peptides that have a mobile proton, as well as in y ions from peptides that have a partially mobile proton; enhanced cleavages N-terminal to these residues are observed in b ions from peptides that have a partially mobile proton. Statistical tools have been designed to visualize the fragmentation maps and measure the similarity between them. The pairwise cleavage patterns observed expand our knowledge of peptide gas-phase fragmentation behaviors and should be useful in algorithm development that employs improved models to predict fragment ion intensities.« less
Cajimat, Maria N. B.; Milazzo, Mary Louise; Borchert, Jeff N.; Abbott, Ken D.; Bradley, Robert D.; Fulhorst, Charles F.
2008-01-01
The results of analyses of glycoprotein precursor and nucleocapsid protein gene sequences indicated that an arenavirus isolated from a Mexican woodrat (Neotoma mexicana) captured in Arizona is a strain of a novel species (proposed name Skinner Tank virus) and that arenaviruses isolated from Mexican woodrats captured in Colorado, New Mexico, and Utah are strains of Whitewater Arroyo virus or species phylogenetically closely related to Whitewater Arroyo virus. Pairwise comparisons of glycoprotein precursor sequences and nucleocapsid protein sequences revealed a high level of divergence among the viruses isolated from the Mexican woodrats captured in Colorado, New Mexico, and Utah and the Whitewater Arroyo virus prototype strain AV 9310135, which originally was isolated from a white-throated woodrat (Neotoma albigula) captured in New Mexico. Conceptually, the viruses from Colorado, New Mexico, and Utah and strain AV 9310135 could be grouped together in a species complex in the family Arenaviridae, genus Arenavirus. PMID:18304671
Orthology detection combining clustering and synteny for very large datasets.
Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K; Prohaska, Sonja J; Stadler, Peter F
2014-01-01
The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets.
Orthology Detection Combining Clustering and Synteny for Very Large Datasets
Lechner, Marcus; Hernandez-Rosales, Maribel; Doerr, Daniel; Wieseke, Nicolas; Thévenin, Annelyse; Stoye, Jens; Hartmann, Roland K.; Prohaska, Sonja J.; Stadler, Peter F.
2014-01-01
The elucidation of orthology relationships is an important step both in gene function prediction as well as towards understanding patterns of sequence evolution. Orthology assignments are usually derived directly from sequence similarities for large data because more exact approaches exhibit too high computational costs. Here we present PoFF, an extension for the standalone tool Proteinortho, which enhances orthology detection by combining clustering, sequence similarity, and synteny. In the course of this work, FFAdj-MCS, a heuristic that assesses pairwise gene order using adjacencies (a similarity measure related to the breakpoint distance) was adapted to support multiple linear chromosomes and extended to detect duplicated regions. PoFF largely reduces the number of false positives and enables more fine-grained predictions than purely similarity-based approaches. The extension maintains the low memory requirements and the efficient concurrency options of its basis Proteinortho, making the software applicable to very large datasets. PMID:25137074
BrucellaBase: Genome information resource.
Sankarasubramanian, Jagadesan; Vishnu, Udayakumar S; Khader, L K M Abdul; Sridhar, Jayavel; Gunasekaran, Paramasamy; Rajendhran, Jeyaprakash
2016-09-01
Brucella sp. causes a major zoonotic disease, brucellosis. Brucella belongs to the family Brucellaceae under the order Rhizobiales of Alphaproteobacteria. We present BrucellaBase, a web-based platform, providing features of a genome database together with unique analysis tools. We have developed a web version of the multilocus sequence typing (MLST) (Whatmore et al., 2007) and phylogenetic analysis of Brucella spp. BrucellaBase currently contains genome data of 510 Brucella strains along with the user interfaces for BLAST, VFDB, CARD, pairwise genome alignment and MLST typing. Availability of these tools will enable the researchers interested in Brucella to get meaningful information from Brucella genome sequences. BrucellaBase will regularly be updated with new genome sequences, new features along with improvements in genome annotations. BrucellaBase is available online at http://www.dbtbrucellosis.in/brucellabase.html or http://59.99.226.203/brucellabase/homepage.html. Copyright © 2016 Elsevier B.V. All rights reserved.
Genomic Diversity of Erwinia carotovora subsp. carotovora and Its Correlation with Virulence
Yap, Mee-Ngan; Barak, Jeri D.; Charkowski, Amy O.
2004-01-01
We used genetic and biochemical methods to examine the genomic diversity of the enterobacterial plant pathogen Erwinia carotovora subsp. carotovora. The results obtained with each method showed that E. carotovora subsp. carotovora strains isolated from one ecological niche, potato plants, are surprisingly diverse compared to related pathogens. A comparison of 23 partial mdh sequences revealed a maximum pairwise difference of 10.49% and an average pairwise difference of 2.13%, values which are much greater than the maximum variation (1.81%) and average variation (0.75%) previously reported for Escherichia coli. Pulsed-field gel electrophoresis analysis of I-CeuI-digested genomic DNA revealed seven rrn operons in all E. carotovora subsp. carotovora strains examined except strain WPP17, which had only six copies. We identified 26 I-CeuI restriction fragment length polymorphism patterns and observed significant polymorphism in fragment sizes ranging from 100 to 450 kb for all strains. We detected large plasmids in two strains, including the model strain E. carotovora subsp. carotovora 71. The two least virulent strains had an unusual chromosomal structure, suggesting that a particular pulsotype is correlated with virulence. To compare chromosomal organization of multiple enterobacterial genomes, several genes were mapped onto I-CeuI fragments. We identified portions of the genome that appear to be conserved across enterobacteria and portions that have undergone genome rearrangements. We found that the least virulent strain, WPP17, failed to oxidize cellobiose and was missing several hrp and hrc genes. The unexpected variability among isolates obtained from clonal hosts in one region and in one season suggests that factors other than the host plant, potato, drive the evolution of this common environmental bacterium and key plant pathogen. PMID:15128563
Molecular simulations of the pairwise interaction of monoclonal antibodies.
Lapelosa, Mauro; Patapoff, Thomas W; Zarraga, Isidro E
2014-11-20
Molecular simulations are employed to compute the free energy of pairwise monoclonal antibodies (mAbs) association using a conformational sampling algorithm with a scoring function. The work reported here is aimed at investigating the mAb-mAb association driven by weak interactions with a computational method capable of predicting experimental observations of low binding affinity. The simulations are able to explore the free energy landscape. A steric interaction component, electrostatic interactions, and a nonpolar component of the free energy form the energy scoring function. Electrostatic interactions are calculated by solving the Poisson-Boltzmann equation. The nonpolar component is derived from the van der Waals interactions upon close contact of the protein surfaces. Two mAbs with similar IgG1 framework but with small sequence differences, mAb1 and mAb2, are considered for their different viscosity and propensity to form a weak interacting dimer. mAb1 presents favorable free energy of association at pH 6 with 15 mM of ion concentration reproducing experimental trends of high viscosity and dimer formation at high concentration. Free energy landscape and minimum free energy configurations of the dimer, as well as the second virial coefficient (B22) values are calculated. The energy distributions for mAb1 are obtained, and the most probable configurations are seen to be consistent with experimental measurements. In contrast, mAb2 shows an unfavorable average free energy at the same buffer conditions due to poor electrostatic complementarity, and reversible dimer configurations with favorable free energy are found to be unlikely. Finally, the simulations of the mAb association dynamics provide insights on the self-association responsible for bulk solution behavior and aggregation, which are important to the processing and the quality of biopharmaceuticals.
Carr, Steven M.; Duggan, Ana T.; Stenson, Garry B.; Marshall, H. Dawn
2015-01-01
Phylogenomic analysis of highly-resolved intraspecific phylogenies obtained from complete mitochondrial DNA genomes has had great success in clarifying relationships within and among human populations, but has found limited application in other wild species. Analytical challenges include assessment of random versus non-random phylogeographic distributions, and quantification of differences in tree topologies among populations. Harp Seals (Pagophilus groenlandicus Erxleben, 1777) have a biogeographic distribution based on four discrete trans-Atlantic breeding and whelping populations located on “fast ice” attached to land in the White Sea, Greenland Sea, the Labrador ice Front, and Southern Gulf of St Lawrence. This East to West distribution provides a set of a priori phylogeographic hypotheses. Outstanding biogeographic questions include the degree of genetic distinctiveness among these populations, in particular between the Greenland Sea and White Sea grounds. We obtained complete coding-region DNA sequences (15,825 bp) for 53 seals. Each seal has a unique mtDNA genome sequence, which differ by 6 ~ 107 substitutions. Six major clades / groups are detectable by parsimony, neighbor-joining, and Bayesian methods, all of which are found in breeding populations on either side of the Atlantic. The species coalescent is at 180 KYA; the most recent clade, which accounts for 66% of the diversity, reflects an expansion during the mid-Wisconsinan glaciation 40 ~ 60 KYA. FST is significant only between the White Sea and Greenland Sea or Ice Front populations. Hierarchal AMOVA of 2-, 3-, or 4-island models identifies small but significant ΦSC among populations within groups, but not among groups. A novel Monte-Carlo simulation indicates that the observed distribution of individuals within breeding populations over the phylogenetic tree requires significantly fewer dispersal events than random expectation, consistent with island or a priori East to West 2- or 3-stepping-stone biogeographic models, but not a simple 1-step trans-Atlantic model. Plots of the cumulative pairwise sequence difference curves among seals in each of the four populations provide continuous proxies for phylogenetic diversification within each. Non-parametric Kolmogorov-Smirnov (K-S) tests of maximum pairwise differences between these curves indicates that the Greenland Sea population has a markedly younger phylogenetic structure than either the White Sea population or the two Northwest Atlantic populations, which are of intermediate age and homogeneous structure. The Monte Carlo and K-S assessments provide sensitive quantitative tests of within-species mitogenomic phylogeography. This is the first study to indicate that the White Sea and Greenland Sea populations have different population genetic histories. The analysis supports the hypothesis that Harp Seals comprises three genetically distinguishable breeding populations, in the White Sea, Greenland Sea, and Northwest Atlantic. Implications for an ice-dependent species during ongoing climate change are discussed. PMID:26301872
Carr, Steven M; Duggan, Ana T; Stenson, Garry B; Marshall, H Dawn
2015-01-01
Phylogenomic analysis of highly-resolved intraspecific phylogenies obtained from complete mitochondrial DNA genomes has had great success in clarifying relationships within and among human populations, but has found limited application in other wild species. Analytical challenges include assessment of random versus non-random phylogeographic distributions, and quantification of differences in tree topologies among populations. Harp Seals (Pagophilus groenlandicus Erxleben, 1777) have a biogeographic distribution based on four discrete trans-Atlantic breeding and whelping populations located on "fast ice" attached to land in the White Sea, Greenland Sea, the Labrador ice Front, and Southern Gulf of St Lawrence. This East to West distribution provides a set of a priori phylogeographic hypotheses. Outstanding biogeographic questions include the degree of genetic distinctiveness among these populations, in particular between the Greenland Sea and White Sea grounds. We obtained complete coding-region DNA sequences (15,825 bp) for 53 seals. Each seal has a unique mtDNA genome sequence, which differ by 6 ~ 107 substitutions. Six major clades / groups are detectable by parsimony, neighbor-joining, and Bayesian methods, all of which are found in breeding populations on either side of the Atlantic. The species coalescent is at 180 KYA; the most recent clade, which accounts for 66% of the diversity, reflects an expansion during the mid-Wisconsinan glaciation 40~60 KYA. FST is significant only between the White Sea and Greenland Sea or Ice Front populations. Hierarchal AMOVA of 2-, 3-, or 4-island models identifies small but significant ΦSC among populations within groups, but not among groups. A novel Monte-Carlo simulation indicates that the observed distribution of individuals within breeding populations over the phylogenetic tree requires significantly fewer dispersal events than random expectation, consistent with island or a priori East to West 2- or 3-stepping-stone biogeographic models, but not a simple 1-step trans-Atlantic model. Plots of the cumulative pairwise sequence difference curves among seals in each of the four populations provide continuous proxies for phylogenetic diversification within each. Non-parametric Kolmogorov-Smirnov (K-S) tests of maximum pairwise differences between these curves indicates that the Greenland Sea population has a markedly younger phylogenetic structure than either the White Sea population or the two Northwest Atlantic populations, which are of intermediate age and homogeneous structure. The Monte Carlo and K-S assessments provide sensitive quantitative tests of within-species mitogenomic phylogeography. This is the first study to indicate that the White Sea and Greenland Sea populations have different population genetic histories. The analysis supports the hypothesis that Harp Seals comprises three genetically distinguishable breeding populations, in the White Sea, Greenland Sea, and Northwest Atlantic. Implications for an ice-dependent species during ongoing climate change are discussed.
mESAdb: microRNA Expression and Sequence Analysis Database
Kaya, Koray D.; Karakülah, Gökhan; Yakıcıer, Cengiz M.; Acar, Aybar C.; Konu, Özlen
2011-01-01
microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data. PMID:21177657
Transcription Factor Map Alignment of Promoter Regions
Blanco, Enrique; Messeguer, Xavier; Smith, Temple F; Guigó, Roderic
2006-01-01
We address the problem of comparing and characterizing the promoter regions of genes with similar expression patterns. This remains a challenging problem in sequence analysis, because often the promoter regions of co-expressed genes do not show discernible sequence conservation. In our approach, thus, we have not directly compared the nucleotide sequence of promoters. Instead, we have obtained predictions of transcription factor binding sites, annotated the predicted sites with the labels of the corresponding binding factors, and aligned the resulting sequences of labels—to which we refer here as transcription factor maps (TF-maps). To obtain the global pairwise alignment of two TF-maps, we have adapted an algorithm initially developed to align restriction enzyme maps. We have optimized the parameters of the algorithm in a small, but well-curated, collection of human–mouse orthologous gene pairs. Results in this dataset, as well as in an independent much larger dataset from the CISRED database, indicate that TF-map alignments are able to uncover conserved regulatory elements, which cannot be detected by the typical sequence alignments. PMID:16733547
mESAdb: microRNA expression and sequence analysis database.
Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen
2011-01-01
microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.
Network-based function prediction and interactomics: the case for metabolic enzymes.
Janga, S C; Díaz-Mejía, J Javier; Moreno-Hagelsieb, G
2011-01-01
As sequencing technologies increase in power, determining the functions of unknown proteins encoded by the DNA sequences so produced becomes a major challenge. Functional annotation is commonly done on the basis of amino-acid sequence similarity alone. Long after sequence similarity becomes undetectable by pair-wise comparison, profile-based identification of homologs can often succeed due to the conservation of position-specific patterns, important for a protein's three dimensional folding and function. Nevertheless, prediction of protein function from homology-driven approaches is not without problems. Homologous proteins might evolve different functions and the power of homology detection has already started to reach its maximum. Computational methods for inferring protein function, which exploit the context of a protein in cellular networks, have come to be built on top of homology-based approaches. These network-based functional inference techniques provide both a first hand hint into a proteins' functional role and offer complementary insights to traditional methods for understanding the function of uncharacterized proteins. Most recent network-based approaches aim to integrate diverse kinds of functional interactions to boost both coverage and confidence level. These techniques not only promise to solve the moonlighting aspect of proteins by annotating proteins with multiple functions, but also increase our understanding on the interplay between different functional classes in a cell. In this article we review the state of the art in network-based function prediction and describe some of the underlying difficulties and successes. Given the volume of high-throughput data that is being reported the time is ripe to employ these network-based approaches, which can be used to unravel the functions of the uncharacterized proteins accumulating in the genomic databases. © 2010 Elsevier Inc. All rights reserved.
St Jean, A; Charlebois, R L
1996-01-01
Anonymous probes from the genome of Halobacterium salinarium GRB and 12 gene probes were hybridized to the cosmid clones representing the chromosome and plasmids of Halobacterium salinarium GRB and Haloferax volcanii DS2. The order of and pairwise distances between 35 loci uniquely cross-hybridizing to both chromosomes were analyzed in a search for conservation. No conservation between the genomes could be detected at the 15-kbp resolution used in this study. We found distinct sets of low-copy-number repeated sequences in the chromosome and plasmids of Halobacterium salinarium GRB, indicating some degree of partitioning between these replicons. We propose alternative courses for the evolution of the haloarchaeal genome: (i) that the majority of genomic differences that exist between genera came about at the inception of this group or (ii) that the differences have accumulated over the lifetime of the lineage. The strengths and limitations of investigating these models through comparative genomic studies are discussed. PMID:8682791
Network of likes and dislikes: Conflict and membership
NASA Astrophysics Data System (ADS)
Park, Hye Jin; Yi, Su Do; Kim, Dae Joong; Kim, Beom Jun
2016-11-01
We all have friends and foes. In the study of complex networks, such a pairwise interaction is described by a directed link since the relation is not necessarily symmetric. We study a real network constructed from a survey in which each individual chooses five members (s)he wants to work with, and other five (s)he does not like to work together. Although everyone's outdegrees for such like and dislike links are fixed to five, respectively, it is found that indegree sequence for each type of links exhibits very different behaviors. We also pursue to answer the question of proper divisions of the organization based on the concept of happiness defined for each directed relation. For example, two individuals connected by like (dislike) links in both directions are happy if they belong to the same (different) group(s). We then adopt the framework of the q-state Potts model with long-ranged ferromagnetic and antiferromagnetic interactions and discuss the group structure in the organization that minimizes a suitably defined unhappiness.
Barriers to critical thinking: workflow interruptions and task switching among nurses.
Cornell, Paul; Riordan, Monica; Townsend-Gervis, Mary; Mobley, Robin
2011-10-01
Nurses are increasingly called upon to engage in critical thinking. However, current workflow inhibits this goal with frequent task switching and unpredictable demands. To assess workflow's cognitive impact, nurses were observed at 2 hospitals with different patient loads and acuity levels. Workflow on a medical/surgical and pediatric oncology unit was observed, recording tasks, tools, collaborators, and locations. Nineteen nurses were observed for a total of 85.2 hours. Tasks were short with a mean duration of 62.4 and 81.6 seconds on the 2 units. More than 50% of the recorded tasks were less than 30 seconds in length. An analysis of task sequence revealed few patterns and little pairwise repetition. Performance on specific tasks differed between the 2 units, but the character of the workflow was highly similar. The nonrepetitive flow and high amount of switching indicate nurses experience a heavy cognitive load with little uninterrupted time. This implies that nurses rarely have the conditions necessary for critical thinking.
Nguyen, Thanh Giang Thi; Van De, Nguyen; Vercruysse, Jozef; Dorny, Pierre; Le, Thanh Hoa
2009-12-01
Ribosomal RNA sequences (361 or 362bp) of the second internal transcribed spacer 2 (ITS-2) and a portion of mitochondrial cox1 (423bp) for Fasciola spp. obtained from specimens collected in indigenous and hybrid goats and sheep in Vietnam were characterized for genotypic status and hybridization/introgression. Alignment of 48 ITS-2 sequences (also those from goats and sheep in this study) indicates that F. gigantica and F. hepatica differ typically from each other at seven sites whereas one of these is a distinguishing deletion (T) at the 327th position in F. gigantica relative to F. hepatica. The isolates from the mountainous goats in the North of Vietnam (Yen Bai province) showed the ITS-2 composition relatively identical to that of F. hepatica. The ITS-2 sequences from populations of Fasciola isolates in goats had probably experienced introgression/hybridization as reported previously in other ruminants and humans. All Vietnamese goat-of-origin specimens had high pairwise percentage of mitochondrial cox1 sequences to F. gigantica (97-100%), and very low identity to F. hepatica (91-93%), suggesting their maternal linkage to be traced to F. gigantica. The presence of hybrid and/or introgressed populations of liver flukes bearing genetic material from both F. hepatica and F. gigantica in the goats/sheep in Vietnam, regardless of indigenous or imported hosts, appears to be the first demonstration from a tropical country.
QueTAL: a suite of tools to classify and compare TAL effectors functionally and phylogenetically
Pérez-Quintero, Alvaro L.; Lamy, Léo; Gordon, Jonathan L.; Escalon, Aline; Cunnac, Sébastien; Szurek, Boris; Gagnevin, Lionel
2015-01-01
Transcription Activator-Like (TAL) effectors from Xanthomonas plant pathogenic bacteria can bind to the promoter region of plant genes and induce their expression. DNA-binding specificity is governed by a central domain made of nearly identical repeats, each determining the recognition of one base pair via two amino acid residues (a.k.a. Repeat Variable Di-residue, or RVD). Knowing how TAL effectors differ from each other within and between strains would be useful to infer functional and evolutionary relationships, but their repetitive nature precludes reliable use of traditional alignment methods. The suite QueTAL was therefore developed to offer tailored tools for comparison of TAL effector genes. The program DisTAL considers each repeat as a unit, transforms a TAL effector sequence into a sequence of coded repeats and makes pair-wise alignments between these coded sequences to construct trees. The program FuncTAL is aimed at finding TAL effectors with similar DNA-binding capabilities. It calculates correlations between position weight matrices of potential target DNA sequence predicted from the RVD sequence, and builds trees based on these correlations. The programs accurately represented phylogenetic and functional relationships between TAL effectors using either simulated or literature-curated data. When using the programs on a large set of TAL effector sequences, the DisTAL tree largely reflected the expected species phylogeny. In contrast, FuncTAL showed that TAL effectors with similar binding capabilities can be found between phylogenetically distant taxa. This suite will help users to rapidly analyse any TAL effector genes of interest and compare them to other available TAL genes and should improve our understanding of TAL effectors evolution. It is available at http://bioinfo-web.mpl.ird.fr/cgi-bin2/quetal/quetal.cgi. PMID:26284082
Simulating protein folding initiation sites using an alpha-carbon-only knowledge-based force field
Buck, Patrick M.; Bystroff, Christopher
2015-01-01
Protein folding is a hierarchical process where structure forms locally first, then globally. Some short sequence segments initiate folding through strong structural preferences that are independent of their three-dimensional context in proteins. We have constructed a knowledge-based force field in which the energy functions are conditional on local sequence patterns, as expressed in the hidden Markov model for local structure (HMMSTR). Carbon-alpha force field (CALF) builds sequence specific statistical potentials based on database frequencies for α-carbon virtual bond opening and dihedral angles, pairwise contacts and hydrogen bond donor-acceptor pairs, and simulates folding via Brownian dynamics. We introduce hydrogen bond donor and acceptor potentials as α-carbon probability fields that are conditional on the predicted local sequence. Constant temperature simulations were carried out using 27 peptides selected as putative folding initiation sites, each 12 residues in length, representing several different local structure motifs. Each 0.6 μs trajectory was clustered based on structure. Simulation convergence or representativeness was assessed by subdividing trajectories and comparing clusters. For 21 of the 27 sequences, the largest cluster made up more than half of the total trajectory. Of these 21 sequences, 14 had cluster centers that were at most 2.6 Å root mean square deviation (RMSD) from their native structure in the corresponding full-length protein. To assess the adequacy of the energy function on nonlocal interactions, 11 full length native structures were relaxed using Brownian dynamics simulations. Equilibrated structures deviated from their native states but retained their overall topology and compactness. A simple potential that folds proteins locally and stabilizes proteins globally may enable a more realistic understanding of hierarchical folding pathways. PMID:19137613
Zhang, Honghai; Chen, Lei
2011-03-01
The dhole (Cuon alpinus) is the only existent species in the genus Cuon (Carnivora: Canidae). In the present study, the complete mitochondrial genome of the dhole was sequenced. The total length is 16672 base pairs which is the shortest in Canidae. Sequence analysis revealed that most mitochondrial genomic functional regions were highly consistent among canid animals except the CSB domain of the control region. The difference in length among the Canidae mitochondrial genome sequences is mainly due to the number of short segments of tandem repeated in the CSB domain. Phylogenetic analysis was progressed based on the concatenated data set of 14 mitochondrial genes of 8 canid animals by using maximum parsimony (MP), maximum likelihood (ML) and Bayesian (BI) inference methods. The genera Vulpes and Nyctereutes formed a sister group and split first within Canidae, followed by that in the Cuon. The divergence in the genus Canis was the latest. The divarication of domestic dogs after that of the Canis lupus laniger is completely supported by all the three topologies. Pairwise sequence divergence data of different mitochondrial genes among canid animals were also determined. Except for the synonymous substitutions in protein-coding genes, the control region exhibits the highest sequence divergences. The synonymous rates are approximately two to six times higher than those of the non-synonymous sites except for a slightly higher rate in the non-synonymous substitution between Cuon alpinus and Vulpes vulpes. 16S rRNA genes have a slightly faster sequence divergence than 12S rRNA and tRNA genes. Based on nucleotide substitutions of tRNA genes and rRNA genes, the times since divergence between dhole and other canid animals, and between domestic dogs and three subspecies of wolves were evaluated. The result indicates that Vulpes and Nyctereutes have a close phylogenetic relationship and the divergence of Nyctereutes is a little earlier. The Tibetan wolf may be an archaic pedigree within wolf subspecies. The genetic distance between wolves and domestic dogs is less than that among different subspecies of wolves. The domestication of dogs was about 1.56-1.92 million years ago or even earlier.
APOLLO: a quality assessment service for single and multiple protein models.
Wang, Zheng; Eickholt, Jesse; Cheng, Jianlin
2011-06-15
We built a web server named APOLLO, which can evaluate the absolute global and local qualities of a single protein model using machine learning methods or the global and local qualities of a pool of models using a pair-wise comparison approach. Based on our evaluations on 107 CASP9 (Critical Assessment of Techniques for Protein Structure Prediction) targets, the predicted quality scores generated from our machine learning and pair-wise methods have an average per-target correlation of 0.671 and 0.917, respectively, with the true model quality scores. Based on our test on 92 CASP9 targets, our predicted absolute local qualities have an average difference of 2.60 Å with the actual distances to native structure. http://sysbio.rnet.missouri.edu/apollo/. Single and pair-wise global quality assessment software is also available at the site.
Amino Acid Properties Conserved in Molecular Evolution
Rudnicki, Witold R.; Mroczek, Teresa; Cudek, Paweł
2014-01-01
That amino acid properties are responsible for the way protein molecules evolve is natural and is also reasonably well supported both by the structure of the genetic code and, to a large extent, by the experimental measures of the amino acid similarity. Nevertheless, there remains a significant gap between observed similarity matrices and their reconstructions from amino acid properties. Therefore, we introduce a simple theoretical model of amino acid similarity matrices, which allows splitting the matrix into two parts – one that depends only on mutabilities of amino acids and another that depends on pairwise similarities between them. Then the new synthetic amino acid properties are derived from the pairwise similarities and used to reconstruct similarity matrices covering a wide range of information entropies. Our model allows us to explain up to 94% of the variability in the BLOSUM family of the amino acids similarity matrices in terms of amino acid properties. The new properties derived from amino acid similarity matrices correlate highly with properties known to be important for molecular evolution such as hydrophobicity, size, shape and charge of amino acids. This result closes the gap in our understanding of the influence of amino acids on evolution at the molecular level. The methods were applied to the single family of similarity matrices used often in general sequence homology searches, but it is general and can be used also for more specific matrices. The new synthetic properties can be used in analyzes of protein sequences in various biological applications. PMID:24967708
Flexbar 3.0 - SIMD and multicore parallelization.
Roehr, Johannes T; Dieterich, Christoph; Reinert, Knut
2017-09-15
High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. https://github.com/seqan/flexbar. johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
libgapmis: extending short-read alignments
2013-01-01
Background A wide variety of short-read alignment programmes have been published recently to tackle the problem of mapping millions of short reads to a reference genome, focusing on different aspects of the procedure such as time and memory efficiency, sensitivity, and accuracy. These tools allow for a small number of mismatches in the alignment; however, their ability to allow for gaps varies greatly, with many performing poorly or not allowing them at all. The seed-and-extend strategy is applied in most short-read alignment programmes. After aligning a substring of the reference sequence against the high-quality prefix of a short read--the seed--an important problem is to find the best possible alignment between a substring of the reference sequence succeeding and the remaining suffix of low quality of the read--extend. The fact that the reads are rather short and that the gap occurrence frequency observed in various studies is rather low suggest that aligning (parts of) those reads with a single gap is in fact desirable. Results In this article, we present libgapmis, a library for extending pairwise short-read alignments. Apart from the standard CPU version, it includes ultrafast SSE- and GPU-based implementations. libgapmis is based on an algorithm computing a modified version of the traditional dynamic-programming matrix for sequence alignment. Extensive experimental results demonstrate that the functions of the CPU version provided in this library accelerate the computations by a factor of 20 compared to other programmes. The analogous SSE- and GPU-based implementations accelerate the computations by a factor of 6 and 11, respectively, compared to the CPU version. The library also provides the user the flexibility to split the read into fragments, based on the observed gap occurrence frequency and the length of the read, thereby allowing for a variable, but bounded, number of gaps in the alignment. Conclusions We present libgapmis, a library for extending pairwise short-read alignments. We show that libgapmis is better-suited and more efficient than existing algorithms for this task. The importance of our contribution is underlined by the fact that the provided functions may be seamlessly integrated into any short-read alignment pipeline. The open-source code of libgapmis is available at http://www.exelixis-lab.org/gapmis. PMID:24564250
Fast alignment-free sequence comparison using spaced-word frequencies.
Leimeister, Chris-Andre; Boden, Marcus; Horwege, Sebastian; Lindner, Sebastian; Morgenstern, Burkhard
2014-07-15
Alignment-free methods for sequence comparison are increasingly used for genome analysis and phylogeny reconstruction; they circumvent various difficulties of traditional alignment-based approaches. In particular, alignment-free methods are much faster than pairwise or multiple alignments. They are, however, less accurate than methods based on sequence alignment. Most alignment-free approaches work by comparing the word composition of sequences. A well-known problem with these methods is that neighbouring word matches are far from independent. To reduce the statistical dependency between adjacent word matches, we propose to use 'spaced words', defined by patterns of 'match' and 'don't care' positions, for alignment-free sequence comparison. We describe a fast implementation of this approach using recursive hashing and bit operations, and we show that further improvements can be achieved by using multiple patterns instead of single patterns. To evaluate our approach, we use spaced-word frequencies as a basis for fast phylogeny reconstruction. Using real-world and simulated sequence data, we demonstrate that our multiple-pattern approach produces better phylogenies than approaches relying on contiguous words. Our program is freely available at http://spaced.gobics.de/. © The Author 2014. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Zhang, Tianzhen; Wang, Xiumei; Gao, Xinbo
2018-04-01
Nowadays, several datasets are demonstrated by multi-view, which usually include shared and complementary information. Multi-view clustering methods integrate the information of multi-view to obtain better clustering results. Nonnegative matrix factorization has become an essential and popular tool in clustering methods because of its interpretation. However, existing nonnegative matrix factorization based multi-view clustering algorithms do not consider the disagreement between views and neglects the fact that different views will have different contributions to the data distribution. In this paper, we propose a new multi-view clustering method, named adaptive multi-view clustering based on nonnegative matrix factorization and pairwise co-regularization. The proposed algorithm can obtain the parts-based representation of multi-view data by nonnegative matrix factorization. Then, pairwise co-regularization is used to measure the disagreement between views. There is only one parameter to auto learning the weight values according to the contribution of each view to data distribution. Experimental results show that the proposed algorithm outperforms several state-of-the-arts algorithms for multi-view clustering.
Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.
Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami
2012-08-01
Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or <0.5% or >15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.
Consistency-based rectification of nonrigid registrations
Gass, Tobias; Székely, Gábor; Goksel, Orcun
2015-01-01
Abstract. We present a technique to rectify nonrigid registrations by improving their group-wise consistency, which is a widely used unsupervised measure to assess pair-wise registration quality. While pair-wise registration methods cannot guarantee any group-wise consistency, group-wise approaches typically enforce perfect consistency by registering all images to a common reference. However, errors in individual registrations to the reference then propagate, distorting the mean and accumulating in the pair-wise registrations inferred via the reference. Furthermore, the assumption that perfect correspondences exist is not always true, e.g., for interpatient registration. The proposed consistency-based registration rectification (CBRR) method addresses these issues by minimizing the group-wise inconsistency of all pair-wise registrations using a regularized least-squares algorithm. The regularization controls the adherence to the original registration, which is additionally weighted by the local postregistration similarity. This allows CBRR to adaptively improve consistency while locally preserving accurate pair-wise registrations. We show that the resulting registrations are not only more consistent, but also have lower average transformation error when compared to known transformations in simulated data. On clinical data, we show improvements of up to 50% target registration error in breathing motion estimation from four-dimensional MRI and improvements in atlas-based segmentation quality of up to 65% in terms of mean surface distance in three-dimensional (3-D) CT. Such improvement was observed consistently using different registration algorithms, dimensionality (two-dimensional/3-D), and modalities (MRI/CT). PMID:26158083
From pairwise to group interactions in games of cyclic dominance.
Szolnoki, Attila; Vukov, Jeromos; Perc, Matjaž
2014-06-01
We study the rock-paper-scissors game in structured populations, where the invasion rates determine individual payoffs that govern the process of strategy change. The traditional version of the game is recovered if the payoffs for each potential invasion stem from a single pairwise interaction. However, the transformation of invasion rates to payoffs also allows the usage of larger interaction ranges. In addition to the traditional pairwise interaction, we therefore consider simultaneous interactions with all nearest neighbors, as well as with all nearest and next-nearest neighbors, thus effectively going from single pair to group interactions in games of cyclic dominance. We show that differences in the interaction range affect not only the stationary fractions of strategies but also their relations of dominance. The transition from pairwise to group interactions can thus decelerate and even revert the direction of the invasion between the competing strategies. Like in evolutionary social dilemmas, in games of cyclic dominance, too, the indirect multipoint interactions that are due to group interactions hence play a pivotal role. Our results indicate that, in addition to the invasion rates, the interaction range is at least as important for the maintenance of biodiversity among cyclically competing strategies.
A critical analysis of computational protein design with sparse residue interaction graphs
Georgiev, Ivelin S.
2017-01-01
Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. PMID:28358804
Sela, Itamar; Ashkenazy, Haim; Katoh, Kazutaka; Pupko, Tal
2015-07-01
Inference of multiple sequence alignments (MSAs) is a critical part of phylogenetic and comparative genomics studies. However, from the same set of sequences different MSAs are often inferred, depending on the methodologies used and the assumed parameters. Much effort has recently been devoted to improving the ability to identify unreliable alignment regions. Detecting such unreliable regions was previously shown to be important for downstream analyses relying on MSAs, such as the detection of positive selection. Here we developed GUIDANCE2, a new integrative methodology that accounts for: (i) uncertainty in the process of indel formation, (ii) uncertainty in the assumed guide tree and (iii) co-optimal solutions in the pairwise alignments, used as building blocks in progressive alignment algorithms. We compared GUIDANCE2 with seven methodologies to detect unreliable MSA regions using extensive simulations and empirical benchmarks. We show that GUIDANCE2 outperforms all previously developed methodologies. Furthermore, GUIDANCE2 also provides a set of alternative MSAs which can be useful for downstream analyses. The novel algorithm is implemented as a web-server, available at: http://guidance.tau.ac.il. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Nanoscale swimmers: hydrodynamic interactions and propulsion of molecular machines
NASA Astrophysics Data System (ADS)
Sakaue, T.; Kapral, R.; Mikhailov, A. S.
2010-06-01
Molecular machines execute nearly regular cyclic conformational changes as a result of ligand binding and product release. This cyclic conformational dynamics is generally non-reciprocal so that under time reversal a different sequence of machine conformations is visited. Since such changes occur in a solvent, coupling to solvent hydrodynamic modes will generally result in self-propulsion of the molecular machine. These effects are investigated for a class of coarse grained models of protein machines consisting of a set of beads interacting through pair-wise additive potentials. Hydrodynamic effects are incorporated through a configuration-dependent mobility tensor, and expressions for the propulsion linear and angular velocities, as well as the stall force, are obtained. In the limit where conformational changes are small so that linear response theory is applicable, it is shown that propulsion is exponentially small; thus, propulsion is nonlinear phenomenon. The results are illustrated by computations on a simple model molecular machine.
Three Divergent Subpopulations of the Malaria Parasite Plasmodium knowlesi
Lin, Lee C.; Rovie-Ryan, Jeffrine J.; Kadir, Khamisah A.; Anderios, Fread; Hisam, Shamilah; Sharma, Reuben S.K.; Singh, Balbir; Conway, David J.
2017-01-01
Multilocus microsatellite genotyping of Plasmodium knowlesi isolates previously indicated 2 divergent parasite subpopulations in humans on the island of Borneo, each associated with a different macaque reservoir host species. Geographic divergence was also apparent, and independent sequence data have indicated particularly deep divergence between parasites from mainland Southeast Asia and Borneo. To resolve the overall population structure, multilocus microsatellite genotyping was conducted on a new sample of 182 P. knowlesi infections (obtained from 134 humans and 48 wild macaques) from diverse areas of Malaysia, first analyzed separately and then in combination with previous data. All analyses confirmed 2 divergent clusters of human cases in Malaysian Borneo, associated with long-tailed macaques and pig-tailed macaques, and a third cluster in humans and most macaques in peninsular Malaysia. High levels of pairwise divergence between each of these sympatric and allopatric subpopulations have implications for the epidemiology and control of this zoonotic species. PMID:28322705
Morin, Ryan D.; Chang, Elbert; Petrescu, Anca; Liao, Nancy; Griffith, Malachi; Kirkpatrick, Robert; Butterfield, Yaron S.; Young, Alice C.; Stott, Jeffrey; Barber, Sarah; Babakaiff, Ryan; Dickson, Mark C.; Matsuo, Corey; Wong, David; Yang, George S.; Smailus, Duane E.; Wetherby, Keith D.; Kwong, Peggy N.; Grimwood, Jane; Brinkley, Charles P.; Brown-John, Mabel; Reddix-Dugue, Natalie D.; Mayo, Michael; Schmutz, Jeremy; Beland, Jaclyn; Park, Morgan; Gibson, Susan; Olson, Teika; Bouffard, Gerard G.; Tsai, Miranda; Featherstone, Ruth; Chand, Steve; Siddiqui, Asim S.; Jang, Wonhee; Lee, Ed; Klein, Steven L.; Blakesley, Robert W.; Zeeberg, Barry R.; Narasimhan, Sudarshan; Weinstein, John N.; Pennacchio, Christa Prange; Myers, Richard M.; Green, Eric D.; Wagner, Lukas; Gerhard, Daniela S.; Marra, Marco A.; Jones, Steven J.M.; Holt, Robert A.
2006-01-01
Sequencing of full-insert clones from full-length cDNA libraries from both Xenopus laevis and Xenopus tropicalis has been ongoing as part of the Xenopus Gene Collection Initiative. Here we present 10,967 full ORF verified cDNA clones (8049 from X. laevis and 2918 from X. tropicalis) as a community resource. Because the genome of X. laevis, but not X. tropicalis, has undergone allotetraploidization, comparison of coding sequences from these two clawed (pipid) frogs provides a unique angle for exploring the molecular evolution of duplicate genes. Within our clone set, we have identified 445 gene trios, each comprised of an allotetraploidization-derived X. laevis gene pair and their shared X. tropicalis ortholog. Pairwise dN/dS, comparisons within trios show strong evidence for purifying selection acting on all three members. However, dN/dS ratios between X. laevis gene pairs are elevated relative to their X. tropicalis ortholog. This difference is highly significant and indicates an overall relaxation of selective pressures on duplicated gene pairs. We have found that the paralogs that have been lost since the tetraploidization event are enriched for several molecular functions, but have found no such enrichment in the extant paralogs. Approximately 14% of the paralogous pairs analyzed here also show differential expression indicative of subfunctionalization. PMID:16672307
Xu, Shengyong; Song, Na; Lu, Zhichuang; Wang, Jun; Cai, Shanshan; Gao, Tianxiang
2014-06-01
Scaly hair-fin anchovy (Setipinna tenuifilis) is a small, pelagic and economical species and widely distributed in Chinese coastal water. However, resources of S. tenuifilis have been reduced due to overfishing. For better fishery management, it is necessary to understand the pattern of S. tenuifilis's biogeography. Genetic analyses were taken place to detect their population genetic variation. A total of 153 individuals from 7 locations (Dongying, Yantai, Qingdao, Nantong, Wenzhou, Xiamen and Beibu Bay) were sequenced at the 5' end of mtDNA control region. A 39-bp tandem repeated sequence was found at the 5' end of the segment and a polymorphism of tandem repeated sequence was detected among 7 populations. Both mismatch distribution analysis and neutrality tests showed S. tenuifilis had experienced a recent population expansion. The topology of neighbor-joining tree and Bayesian evolutionary tree showed no significant genealogical branches or clusters of samples corresponding to sampling locality. Hierarchical analysis of molecular variance and conventional pairwise population Fst value at group hierarchical level implied that there might have genetic divergence between southern group (population WZ, XM and BB) and northern group (population DY, YT, QD and NT). We concluded that there might have three different fishery management groups of S. tenuifilis and the late Pleistocene glacial event might have a crucial effect on present-day demography of S. tenuifilis in this region.
Ter-Voskanyan, Hasmik; Allgaier, Martin; Borsch, Thomas
2014-01-01
Plastid genomes exhibit different levels of variability in their sequences, depending on the respective kinds of genomic regions. Genes are usually more conserved while noncoding introns and spacers evolve at a faster pace. While a set of about thirty maximum variable noncoding genomic regions has been suggested to provide universally promising phylogenetic markers throughout angiosperms, applications often require several regions to be sequenced for many individuals. Our project aims to illuminate evolutionary relationships and species-limits in the genus Pyrus (Rosaceae)—a typical case with very low genetic distances between taxa. In this study, we have sequenced the plastid genome of Pyrus spinosa and aligned it to the already available P. pyrifolia sequence. The overall p-distance of the two Pyrus genomes was 0.00145. The intergenic spacers between ndhC–trnV, trnR–atpA, ndhF–rpl32, psbM–trnD, and trnQ–rps16 were the most variable regions, also comprising the highest total numbers of substitutions, indels and inversions (potentially informative characters). Our comparative analysis of further plastid genome pairs with similar low p-distances from Oenothera (representing another rosid), Olea (asterids) and Cymbidium (monocots) showed in each case a different ranking of genomic regions in terms of variability and potentially informative characters. Only two intergenic spacers (ndhF–rpl32 and trnK–rps16) were consistently found among the 30 top-ranked regions. We have mapped the occurrence of substitutions and microstructural mutations in the four genome pairs. High AT content in specific sequence elements seems to foster frequent mutations. We conclude that the variability among the fastest evolving plastid genomic regions is lineage-specific and thus cannot be precisely predicted across angiosperms. The often lineage-specific occurrence of stem-loop elements in the sequences of introns and spacers also governs lineage-specific mutations. Sequencing whole plastid genomes to find markers for evolutionary analyses is therefore particularly useful when overall genetic distances are low. PMID:25405773
AbuOdeh, Raed; Al-Mawlawi, Naema; Al-Qahtani, Ahmed A; Bohol, Marie Fe F; Al-Ahdal, Mohammed N; Hasan, Haydar A; AbuOdeh, Lamees; Nasrallah, Gheyath K
2015-07-01
Torque Teno virus (TTV) has been associated with non A-G hepatitis. The goal of this study was to estimate the infection rates and genotypic characteristics of TTV in the State of Qatar. A total of 644 blood samples representing different nationalities: (i) Qatari (118) and (ii) non-Qatari (526) nationals (mostly from Arab and South Eeast Asia countries) were tested for the presence of TTV DNA by nested PCR. The majority (573) of the blood samples belonged to healthy blood donors, whereas 54 and 53 of the blood samples belonged to patients infected with hepatitis B virus (HBV) and hepatitis C virus (HCV), respectively. The results obtained showed that the TTV infection rates in the healthy blood donors, and those infected with HBV or HCV patients were 81.4, 90.75 and 84.9%, respectively. Significant association between TTV viremia and age, or nationality was observed. Sequence analysis of PCR fragments amplified from the 5'-untranslated region (5'-UTR) of all (531) TTV positive samples showed that 65.5% (348/531) of the PCR fragment sequences were classified into main genogroup 3, followed by main genogroups 5 (24%), 2 (5.8%), and 1 (4.7%). Genogroup 4 was not detected among the our studied subjects. Phylogenetic and pairwise analyses using sequences from TTV viremic samples also showed an overall close similarity to the main genogroup 3. In conclusion, there was no significant difference in the rates of TTV detection among Qataris and non-Qataris and several genotypes, mainly genotype 3, were isolated. © 2015 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wallace, W.; Attaway, H.
1995-12-31
Perchlorate and chlorate salts are widely used by the chemical, aerospace and defense industries as oxidizers in propellant, explosives and pyrotechnics. The authors have isolated a anaerobic bacterium which is capable of the dissimilatory reduction of both perchlorate and chlorate for energy and growth. Strain HAP-1 is a gram negative, thin rod, non-sporeforming, highly motile strict anaerobe. Antibiotic resistance profiles, utilization of carbon substrates and electron acceptors demonstrated similar physiological characteristics to Wolinella succinogenes. Pairwise comparisons of 16S RNA sequences showed only a 0.75% divergence between strain HAP-1 and W. succinogenes. Physiological, morphological and 16S RRNA sequence data indicate strainmore » HAP-1 is a subspecies of W. succinogenes that can utilize perchlorate and chlorate as terminal electron acceptors.« less
1993-05-01
in real time. RMSDs were calculated only to a single structure on which the others were then superimposed. To get a pairwise listing of RMSDs, a group...to fix the chirality, minimize and anneal in 4-D (if necessary) an increasing number of residues until the entire structure is treated as one get /sym...nstr "Number of structures to create: get /sym refseq "Sequence to use: . get /sym refbmx "Bounds matrix to use: get /sym fname "Filename for written
The prediction of biogenic magnetic nanoparticles biomineralization in human tissues and organs
NASA Astrophysics Data System (ADS)
Medviediev, O.; Gorobets, O. Yu; Gorobets, S. V.; Yadrykhins'ky, V. S.
2017-10-01
In this study, human homologs of magnetosome island proteins basing on pairwise and multiple alignment of amino acid sequences were found. The expression levels of genes, which encode magnetosome island proteins of M. gryphiswaldense MSR-1, that were cultured under oxygen deficiency conditions and also under microaerobic conditions were compared to the expression levels of genes that encode the relevant homologs in human organism. The possibility of BMN biomineralization in human tissues and organs, in which BMN were not experimentally found before, was predicted.
Identifying novel sequence variants of RNA 3D motifs
Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.
2015-01-01
Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723
Evolutionary distances in the twilight zone--a rational kernel approach.
Schwarz, Roland F; Fletcher, William; Förster, Frank; Merget, Benjamin; Wolf, Matthias; Schultz, Jörg; Markowetz, Florian
2010-12-31
Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.
cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on CPU+GPU.
Zhang, Jing; Wang, Hao; Feng, Wu-Chun
2017-01-01
BLAST, short for Basic Local Alignment Search Tool, is a ubiquitous tool used in the life sciences for pairwise sequence search. However, with the advent of next-generation sequencing (NGS), whether at the outset or downstream from NGS, the exponential growth of sequence databases is outstripping our ability to analyze the data. While recent studies have utilized the graphics processing unit (GPU) to speedup the BLAST algorithm for searching protein sequences (i.e., BLASTP), these studies use coarse-grained parallelism, where one sequence alignment is mapped to only one thread. Such an approach does not efficiently utilize the capabilities of a GPU, particularly due to the irregularity of BLASTP in both execution paths and memory-access patterns. To address the above shortcomings, we present a fine-grained approach to parallelize BLASTP, where each individual phase of sequence search is mapped to many threads on a GPU. This approach, which we refer to as cuBLASTP, reorders data-access patterns and reduces divergent branches of the most time-consuming phases (i.e., hit detection and ungapped extension). In addition, cuBLASTP optimizes the remaining phases (i.e., gapped extension and alignment with trace back) on a multicore CPU and overlaps their execution with the phases running on the GPU.
Detecting earthquakes over a seismic network using single-station similarity measures
NASA Astrophysics Data System (ADS)
Bergen, Karianne J.; Beroza, Gregory C.
2018-06-01
New blind waveform-similarity-based detection methods, such as Fingerprint and Similarity Thresholding (FAST), have shown promise for detecting weak signals in long-duration, continuous waveform data. While blind detectors are capable of identifying similar or repeating waveforms without templates, they can also be susceptible to false detections due to local correlated noise. In this work, we present a set of three new methods that allow us to extend single-station similarity-based detection over a seismic network; event-pair extraction, pairwise pseudo-association, and event resolution complete a post-processing pipeline that combines single-station similarity measures (e.g. FAST sparse similarity matrix) from each station in a network into a list of candidate events. The core technique, pairwise pseudo-association, leverages the pairwise structure of event detections in its network detection model, which allows it to identify events observed at multiple stations in the network without modeling the expected moveout. Though our approach is general, we apply it to extend FAST over a sparse seismic network. We demonstrate that our network-based extension of FAST is both sensitive and maintains a low false detection rate. As a test case, we apply our approach to 2 weeks of continuous waveform data from five stations during the foreshock sequence prior to the 2014 Mw 8.2 Iquique earthquake. Our method identifies nearly five times as many events as the local seismicity catalogue (including 95 per cent of the catalogue events), and less than 1 per cent of these candidate events are false detections.
Causal analysis of ordinal treatments and binary outcomes under truncation by death.
Wang, Linbo; Richardson, Thomas S; Zhou, Xiao-Hua
2017-06-01
It is common that in multi-arm randomized trials, the outcome of interest is "truncated by death," meaning that it is only observed or well-defined conditioning on an intermediate outcome. In this case, in addition to pairwise contrasts, the joint inference for all treatment arms is also of interest. Under a monotonicity assumption we present methods for both pairwise and joint causal analyses of ordinal treatments and binary outcomes in presence of truncation by death. We illustrate via examples the appropriateness of our assumptions in different scientific contexts.
NASA Astrophysics Data System (ADS)
Wang, Wei; Cao, Leiming; Lou, Yanbo; Du, Jinjian; Jing, Jietai
2018-01-01
We theoretically and experimentally characterize the performance of the pairwise correlations from triple quantum correlated beams based on the cascaded four-wave mixing (FWM) processes. The pairwise correlations between any two of the beams are theoretically calculated and experimentally measured. The experimental and theoretical results are in good agreement. We find that two of the three pairwise correlations can be in the quantum regime. The other pairwise correlation is always in the classical regime. In addition, we also measure the triple-beam correlation which is always in the quantum regime. Such unbalanced and controllable pairwise correlation structures may be taken as advantages in practical quantum communications, for example, hierarchical quantum secret sharing. Our results also open the way for the classification and application of quantum states generated from the cascaded FWM processes.
Lohman, David J; Peggie, Djunijanti; Pierce, Naomi E; Meier, Rudolf
2008-10-30
Evolutionary genetics provides a rich theoretical framework for empirical studies of phylogeography. Investigations of intraspecific genetic variation can uncover new putative species while allowing inference into the evolutionary origin and history of extant populations. With a distribution on four continents ranging throughout most of the Old World, Lampides boeticus (Lepidoptera: Lycaenidae) is one of the most widely distributed species of butterfly. It is placed in a monotypic genus with no commonly accepted subspecies. Here, we investigate the demographic history and taxonomic status of this widespread species, and screen for the presence or absence of the bacterial endosymbiont Wolbachia. We performed phylogenetic, population genetic, and phylogeographic analyses using 1799 bp of mitochondrial sequence data from 57 specimens collected throughout the species' range. Most of the samples (>90%) were nearly genetically identical, with uncorrected pairwise sequence differences of 0-0.5% across geographic distances >9,000 km. However, five samples from central Thailand, Madagascar, northern Australia and the Moluccas formed two divergent clades differing from the majority of samples by uncorrected pairwise distances ranging from 1.79-2.21%. Phylogenetic analyses suggest that L. boeticus is almost certainly monophyletic, with all sampled genes coalescing well after the divergence from three closely related taxa included for outgroup comparisons. Analyses of molecular diversity indicate that most L. boeticus individuals in extant populations are descended from one or two relatively recent population bottlenecks. The combined analyses suggest a scenario in which the most recent common ancestor of L. boeticus and its sister taxon lived in the African region approximately 7 Mya; extant lineages of L. boeticus began spreading throughout the Old World at least 1.5 Mya. More recently, expansion after population bottlenecks approximately 1.4 Mya seem to have displaced most of the ancestral polymorphism throughout its range, though at least two early-branching lineages still persist. One of these lineages, in northern Australia and the Moluccas, may have experienced accelerated differentiation due to infection with the bacterial endosymbiont Wolbachia, which affects reproduction. Examination of a haplotype network suggests that Australia has been colonized by the species several times. While there is little evidence for the existence of morphologically cryptic species, these results suggest a complex history affected by repeated dispersal events.
Efficient conformational space exploration in ab initio protein folding simulation.
Ullah, Ahammed; Ahmed, Nasif; Pappu, Subrata Dey; Shatabda, Swakkhar; Ullah, A Z M Dayem; Rahman, M Sohel
2015-08-01
Ab initio protein folding simulation largely depends on knowledge-based energy functions that are derived from known protein structures using statistical methods. These knowledge-based energy functions provide us with a good approximation of real protein energetics. However, these energy functions are not very informative for search algorithms and fail to distinguish the types of amino acid interactions that contribute largely to the energy function from those that do not. As a result, search algorithms frequently get trapped into the local minima. On the other hand, the hydrophobic-polar (HP) model considers hydrophobic interactions only. The simplified nature of HP energy function makes it limited only to a low-resolution model. In this paper, we present a strategy to derive a non-uniform scaled version of the real 20×20 pairwise energy function. The non-uniform scaling helps tackle the difficulty faced by a real energy function, whereas the integration of 20×20 pairwise information overcomes the limitations faced by the HP energy function. Here, we have applied a derived energy function with a genetic algorithm on discrete lattices. On a standard set of benchmark protein sequences, our approach significantly outperforms the state-of-the-art methods for similar models. Our approach has been able to explore regions of the conformational space which all the previous methods have failed to explore. Effectiveness of the derived energy function is presented by showing qualitative differences and similarities of the sampled structures to the native structures. Number of objective function evaluation in a single run of the algorithm is used as a comparison metric to demonstrate efficiency.
Yuan, Zihao; Huang, Wei; Liu, Shikai; Xu, Peng; Dunham, Rex; Liu, Zhanjiang
2018-04-01
The inference of historical demography of a species is helpful for understanding species' differentiation and its population dynamics. However, such inference has been previously difficult due to the lack of proper analytical methods and availability of genetic data. A recently developed method called Pairwise Sequentially Markovian Coalescent (PSMC) offers the capability for estimation of the trajectories of historical populations over considerable time periods using genomic sequences. In this study, we applied this approach to infer the historical demography of the common carp using samples collected from Europe, Asia and the Americas. Comparison between Asian and European common carp populations showed that the last glacial period starting 100 ka BP likely caused a significant decline in population size of the wild common carp in Europe, while it did not have much of an impact on its counterparts in Asia. This was probably caused by differences in glacial activities in East Asia and Europe, and suggesting a separation of the European and Asian clades before the last glacial maximum. The North American clade which is an invasive population shared a similar demographic history as those from Europe, consistent with the idea that the North American common carp probably had European ancestral origins. Our analysis represents the first reconstruction of the historical population demography of the common carp, which is important to elucidate the separation of European and Asian common carp clades during the Quaternary glaciation, as well as the dispersal of common carp across the world.
Mitochondrial control-region sequence variation in aboriginal Australians.
van Holst Pellekaan, S; Frommer, M; Sved, J; Boettcher, B
1998-01-01
The mitochondrial D-loop hypervariable segment 1 (mt HVS1) between nucleotides 15997 and 16377 has been examined in aboriginal Australian people from the Darling River region of New South Wales (riverine) and from Yuendumu in central Australia (desert). Forty-seven unique HVS1 types were identified, varying at 49 nucleotide positions. Pairwise analysis by calculation of BEPPI (between population proportion index) reveals statistically significant structure in the populations, although some identical HVS1 types are seen in the two contrasting regions. mt HVS1 types may reflect more-ancient distributions than do linguistic diversity and other culturally distinguishing attributes. Comparison with sequences from five published global studies reveals that these Australians demonstrate greatest divergence from some Africans, least from Papua New Guinea highlanders, and only slightly more from some Pacific groups (Indonesian, Asian, Samoan, and coastal Papua New Guinea), although the HVS1 types vary at different nucleotide sites. Construction of a median network, displaying three main groups, suggests that several hypervariable nucleotide sites within the HVS1 are likely to have undergone mutation independently, making phylogenetic comparison with global samples by conventional methods difficult. Specific nucleotide-site variants are major separators in median networks constructed from Australian HVS1 types alone and for one global selection. The distribution of these, requiring extended study, suggests that they may be signatures of different groups of prehistoric colonizers into Australia, for which the time of colonization remains elusive. PMID:9463317
Gomes, Iva; Pereira, Plácido J P; Harms, Sonja; Oliveira, Andréa M; Schneider, Peter M; Brehm, António
2017-11-01
A male West African sample from Guinea-Bissau (West-African coast) was genetically analyzed using 12 X chromosomal short tandem repeats that are grouped into four haplotype groups. Linkage disequilibrium was tested (p≤0.0008) and association was detected for the majority of markers in three out of the four studied haplotype clusters. The sample of 332 unrelated individuals analyzed in this study belonged to several recognized ethnic groups (n=18) which were used to evaluate the genetic variation of Guinea-Bissau's population. Pairwise genetic distances (F ST ) did not reveal significant differences among the majority of groups. An additional 110 samples from other countries also belonging to West Africa were as well compared with the sample of Guinea-Bissau. No significant differences were found between these two groups of West African individuals, supporting the genetic homogeneity of this region on the X chromosome level. The generation of over 100 DNA West African sequences provided new insights into the repeat sequence structure of some of the present X-STRs. Parameters for forensic evaluation were also calculated for each X-STR, supporting the potential application of these markers in typical kinship scenarios. Also, the high power of discrimination values for samples of female and male origin observed in this study, confirms the usefulness of the present X-STRs in identification analysis. Copyright © 2017 Elsevier B.V. All rights reserved.
GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.
Catanho, Marcos; Mascarenhas, Daniel; Degrave, Wim; Miranda, Antonio Basílio de
2006-03-31
Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB (http://www.dbbm.fiocruz.br/GenoMycDB) is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution.
Thiel, William H.; Bair, Thomas; Peek, Andrew S.; Liu, Xiuying; Dassie, Justin; Stockdale, Katie R.; Behlke, Mark A.; Miller, Francis J.; Giangrande, Paloma H.
2012-01-01
Background The broad applicability of RNA aptamers as cell-specific delivery tools for therapeutic reagents depends on the ability to identify aptamer sequences that selectively access the cytoplasm of distinct cell types. Towards this end, we have developed a novel approach that combines a cell-based selection method (cell-internalization SELEX) with high-throughput sequencing (HTS) and bioinformatics analyses to rapidly identify cell-specific, internalization-competent RNA aptamers. Methodology/Principal Findings We demonstrate the utility of this approach by enriching for RNA aptamers capable of selective internalization into vascular smooth muscle cells (VSMCs). Several rounds of positive (VSMCs) and negative (endothelial cells; ECs) selection were performed to enrich for aptamer sequences that preferentially internalize into VSMCs. To identify candidate RNA aptamer sequences, HTS data from each round of selection were analyzed using bioinformatics methods: (1) metrics of selection enrichment; and (2) pairwise comparisons of sequence and structural similarity, termed edit and tree distance, respectively. Correlation analyses of experimentally validated aptamers or rounds revealed that the best cell-specific, internalizing aptamers are enriched as a result of the negative selection step performed against ECs. Conclusions and Significance We describe a novel approach that combines cell-internalization SELEX with HTS and bioinformatics analysis to identify cell-specific, cell-internalizing RNA aptamers. Our data highlight the importance of performing a pre-clear step against a non-target cell in order to select for cell-specific aptamers. We expect the extended use of this approach to enable the identification of aptamers to a multitude of different cell types, thereby facilitating the broad development of targeted cell therapies. PMID:22962591
Brelsford, Alan; Perrin, Nicolas
2014-01-01
In contrast with mammals and birds, most poikilothermic vertebrates feature structurally undifferentiated sex chromosomes, which may result either from frequent turnovers, or from occasional events of XY recombination. The latter mechanism was recently suggested to be responsible for sex-chromosome homomorphy in European tree frogs (Hyla arborea). However, no single case of male recombination has been identified in large-scale laboratory crosses, and populations from NW Europe consistently display sex-specific allelic frequencies with male-diagnostic alleles, suggesting the absence of recombination in their recent history. To address this apparent paradox, we extended the phylogeographic scope of investigations, by analyzing the sequences of three sex-linked markers throughout the whole species distribution. Refugial populations (southern Balkans and Adriatic coast) show a mix of X and Y alleles in haplotypic networks, and no more within-individual pairwise nucleotide differences in males than in females, testifying to recurrent XY recombination. In contrast, populations of NW Europe, which originated from a recent postglacial expansion, show a clear pattern of XY differentiation; the X and Y gametologs of the sex-linked gene Med15 present different alleles, likely fixed by drift on the front wave of expansions, and kept differentiated since. Our results support the view that sex-chromosome homomorphy in H. arborea is maintained by occasional or historical events of recombination; whether the frequency of these events indeed differs between populations remains to be clarified. PMID:24892652
Correlations and Functional Connections in a Population of Grid Cells
Roudi, Yasser
2015-01-01
We study the statistics of spike trains of simultaneously recorded grid cells in freely behaving rats. We evaluate pairwise correlations between these cells and, using a maximum entropy kinetic pairwise model (kinetic Ising model), study their functional connectivity. Even when we account for the covariations in firing rates due to overlapping fields, both the pairwise correlations and functional connections decay as a function of the shortest distance between the vertices of the spatial firing pattern of pairs of grid cells, i.e. their phase difference. They take positive values between cells with nearby phases and approach zero or negative values for larger phase differences. We find similar results also when, in addition to correlations due to overlapping fields, we account for correlations due to theta oscillations and head directional inputs. The inferred connections between neurons in the same module and those from different modules can be both negative and positive, with a mean close to zero, but with the strongest inferred connections found between cells of the same module. Taken together, our results suggest that grid cells in the same module do indeed form a local network of interconnected neurons with a functional connectivity that supports a role for attractor dynamics in the generation of grid pattern. PMID:25714908
The complete genome sequence of the Atlantic salmon paramyxovirus (ASPV)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nylund, Stian; Karlsen, Marius; Nylund, Are
2008-03-30
The complete RNA genome of the Atlantic salmon paramyxovirus (ASPV), isolated from Atlantic salmon suffering from proliferative gill inflammation (PGI), has been determined. The genome is 16,965 nucleotides in length and consists of six nonoverlapping genes in the order 3'- N - P/C/V - M - F - HN - L -5', coding for the nucleocapsid, phospho-, matrix, fusion, hemagglutinin-neuraminidase and large polymerase proteins, respectively. The gene junctions contain highly conserved transcription start and stop signal sequences and trinucleotide intergenic regions similar to those of other Paramyxoviridae. The ASPV P-gene expression strategy is like that of the respiro- and morbilliviruses,more » which express the phosphoprotein from the primary transcript, and edit a portion of the mRNA to encode the accessory proteins V and W. It also encodes the C-protein by ribosomal choice of translation initiation. Pairwise comparisons of amino acid identities, and phylogenetic analysis of deduced ASPV protein sequences with homologous sequences from other Paramyxoviridae, show that ASPV has an affinity for the genus Respirovirus, but may represent a new genus within the subfamily Paramyxovirinae.« less
Genotype to Phenotype Mapping of the E. coli lac Promoter
NASA Astrophysics Data System (ADS)
Otwinowski, Jakub; Nemenman, Ilya
2014-03-01
Genotype-to-phenotype maps and the related fitness landscapes that include epistatic interactions are difficult to measure because of their high dimensional structure. Here we construct such a map using the recently collected corpora of high-throughput sequence data from the 75 base pairs long mutagenized E. coli lac promoter region, where each sequence is associated with induced transcriptional activity measured by a fluorescent reporter. We find that the additive (non-epistatic) contributions of individual mutations account for about two-thirds of the explainable phenotype variance, while pairwise epistasis explains about 7% of the variance for the full mutagenized sequence and about 15% for the subsequence associated with protein binding sites. Surprisingly, there is no evidence for third order epistatic contributions, and our inferred fitness landscape is essentially single peaked, with a small amount of antagonistic epistasis. We identify transcription factor (CRP) and RNA polymerase binding sites in the promotor region and their interactions. We conclude with a cautionary note that inferred properties of fitness landscapes may be severely influenced by biases in the sequence data. Funded in part by HFSP and James S. McDonnell Foundation.
Nayarisseri, Anuraj; Suppahia, Anjana; Nadh, Anuroopa G; Nair, Achuthsankar S
2015-06-01
Organophosphates like chlorpyrifos, diazinon, or malathion have become most common and indisputably most toxic pest control agents that adversely affects the human nervous system even at low levels of exposure. Because of their relatively low cost and ability to be applied on a wide range of target insects and crop, organophosphorus pesticides account for a large share of all insecticides used in India, and this in turn raises severe health concerns. In this view, the present investigation was aimed to identify novel species of Flavobacterium bacteria which is bestowed with the capacity to degrade pesticides like chlorpyrifos, diazinon, or malathion. The bacterium was isolated from agricultural soil collected from Guntur District, Andhra Pradesh, India. The samples were serially diluted, and the aliquots were incubated for a suitable time following which the suspected colony was subjected to 16S rRNA gene sequencing. The sequence thus obtained was aligned pairwise against Flavobacterium species, which resulted in identification of novel species of Flavobacterium later which was named as EMBS0145 and sequence was deposited in GenBank with Accession Number: JN794045.
Nayarisseri, Anuraj; Suppahia, Anjana; Nadh, Anuroopa G; Nair, Achuthsankar S
2014-08-09
Organophosphates (OPs) like chlorpyrifos, diazinon, or malathion have become most common and indisputably most toxic pest-control agents that adversely affects the human nervous system even at low levels of exposure. Because of their relatively low cost and ability to be applied on a wide range of target insects and crop, organophosphorus pesticides account for a large share of all insecticides used in India, this in turn raises severe health concerns. In this view, the present investigation was aimed to identify novel species of Flavobacterium bacteria which is bestowed with the capacity to degrade pesticides like chlorpyrifos, diazinon or malathion. The bacterium was isolated from agricultural soil collected from Guntur District, Andhra Pradesh, India. The samples were serially diluted and the aliquots were incubated for a suitable time following which the suspected colony was subjected to 16S rRNA gene sequencing. The sequence thus obtained was aligned pairwise against Flavobacterium species, which resulted in identification of novel species of Flavobacterium later which was named as EMBS0145 and sequence was deposited in GenBank with accession number JN794045.
DNA Barcode Sequence Identification Incorporating Taxonomic Hierarchy and within Taxon Variability
Little, Damon P.
2011-01-01
For DNA barcoding to succeed as a scientific endeavor an accurate and expeditious query sequence identification method is needed. Although a global multiple–sequence alignment can be generated for some barcoding markers (e.g. COI, rbcL), not all barcoding markers are as structurally conserved (e.g. matK). Thus, algorithms that depend on global multiple–sequence alignments are not universally applicable. Some sequence identification methods that use local pairwise alignments (e.g. BLAST) are unable to accurately differentiate between highly similar sequences and are not designed to cope with hierarchic phylogenetic relationships or within taxon variability. Here, I present a novel alignment–free sequence identification algorithm–BRONX–that accounts for observed within taxon variability and hierarchic relationships among taxa. BRONX identifies short variable segments and corresponding invariant flanking regions in reference sequences. These flanking regions are used to score variable regions in the query sequence without the production of a global multiple–sequence alignment. By incorporating observed within taxon variability into the scoring procedure, misidentifications arising from shared alleles/haplotypes are minimized. An explicit treatment of more inclusive terminals allows for separate identifications to be made for each taxonomic level and/or for user–defined terminals. BRONX performs better than all other methods when there is imperfect overlap between query and reference sequences (e.g. mini–barcode queries against a full–length barcode database). BRONX consistently produced better identifications at the genus–level for all query types. PMID:21857897
A protein block based fold recognition method for the annotation of twilight zone sequences.
Suresh, V; Ganesan, K; Parthasarathy, S
2013-03-01
The description of protein backbone was recently improved with a group of structural fragments called Structural Alphabets instead of the regular three states (Helix, Sheet and Coil) secondary structure description. Protein Blocks is one of the Structural Alphabets used to describe each and every region of protein backbone including the coil. According to de Brevern (2000) the Protein Blocks has 16 structural fragments and each one has 5 residues in length. Protein Blocks fragments are highly informative among the available Structural Alphabets and it has been used for many applications. Here, we present a protein fold recognition method based on Protein Blocks for the annotation of twilight zone sequences. In our method, we align the predicted Protein Blocks of a query amino acid sequence with a library of assigned Protein Blocks of 953 known folds using the local pair-wise alignment. The alignment results with z-value ≥ 2.5 and P-value ≤ 0.08 are predicted as possible folds. Our method is able to recognize the possible folds for nearly 35.5% of the twilight zone sequences with their predicted Protein Block sequence obtained by pb_prediction, which is available at Protein Block Export server.
Jia, Yi; Huan, Jun; Buhr, Vincent; Zhang, Jintao; Carayannopoulos, Leonidas N
2009-01-01
Background Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the "twilight-" or "midnight-" zones where pair-wise sequence identities to known sequences fall below 25% and sequence-based functional annotations often fail. Results Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in "immunoevasins", proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusion We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty. PMID:19208148
Oligonucleotide fingerprinting of rRNA genes for analysis of fungal community composition.
Valinsky, Lea; Della Vedova, Gianluca; Jiang, Tao; Borneman, James
2002-12-01
Thorough assessments of fungal diversity are currently hindered by technological limitations. Here we describe a new method for identifying fungi, oligonucleotide fingerprinting of rRNA genes (OFRG). ORFG sorts arrayed rRNA gene (ribosomal DNA [rDNA]) clones into taxonomic clusters through a series of hybridization experiments, each using a single oligonucleotide probe. A simulated annealing algorithm was used to design an OFRG probe set for fungal rDNA. Analysis of 1,536 fungal rDNA clones derived from soil generated 455 clusters. A pairwise sequence analysis showed that clones with average sequence identities of 99.2% were grouped into the same cluster. To examine the accuracy of the taxonomic identities produced by this OFRG experiment, we determined the nucleotide sequences for 117 clones distributed throughout the tree. For all but two of these clones, the taxonomic identities generated by this OFRG experiment were consistent with those generated by a nucleotide sequence analysis. Eighty-eight percent of the clones were affiliated with Ascomycota, while 12% belonged to BASIDIOMYCOTA: A large fraction of the clones were affiliated with the genera Fusarium (404 clones) and Raciborskiomyces (176 clones). Smaller assemblages of clones had high sequence identities to the Alternaria, Ascobolus, Chaetomium, Cryptococcus, and Rhizoctonia clades.
Pittschieler, Elisabeth; Szomolanyi, Pavol; Schmid-Schwap, Martina; Weber, Michael; Egerbacher, Monika; Traxler, Hannes; Trattnig, Siegfried
2014-01-01
Objective To 1) test the feasibility of delayed Gadolinium-Enhanced Magnetic Resonance Imaging of Cartilage (dGEMRIC) at 3 T in the temporomandibular joint (TMJ) and 2) to determine the optimal delay for measurements of the TMJ disc after i.v. contrast agent (CA) administration. Design MRI of the right and left TMJ of six asymptomatic volunteers was performed at 3 T using a dedicated coil. 2D inversion recovery (2D-IR) sequences were performed at 4 time points covering 120 minutes and 3D gradient-echo (3D GRE) dual flip-angle sequences were performed at 14 time points covering 130 minutes after the administration of 0.2 mmol/kg of Gd-diethylenetriamine pentaacetic acid ion (Gd-DTPA)2-, i.e., 0.4 mL of Magnevist™ per kg body weight. Pair-wise tests were used to assess differences between pre-and post-contrast T1 values. Results 2D-IR sequences showed a statistically significant drop (p < 0.001) in T1 values after i.v. CA administration. The T1 drop of 50% was reached 60 minutes after bolus injection in the TMJ disc. The 3D GRE dual flip-angle sequences confirmed these results and show plateau of T1 after 60 minutes. Conclusions T1(Gd) maps calculated from dGEMRIC data allow in vivo assessment of the fibrocartilage disc of the TMJ. The recommended measurement time for dGEMRIC in the TMJ after i.v. CA administration is from 60 to 120 minutes. PMID:25131629
Pittschieler, Elisabeth; Szomolanyi, Pavol; Schmid-Schwap, Martina; Weber, Michael; Egerbacher, Monika; Traxler, Hannes; Trattnig, Siegfried
2014-12-01
To 1) test the feasibility of delayed Gadolinium-Enhanced Magnetic Resonance Imaging of Cartilage (dGEMRIC) at 3 T in the temporomandibular joint (TMJ) and 2) to determine the optimal delay for measurements of the TMJ disc after i.v. contrast agent (CA) administration. MRI of the right and left TMJ of six asymptomatic volunteers was performed at 3 T using a dedicated coil. 2D inversion recovery (2D-IR) sequences were performed at 4 time points covering 120 minutes and 3D gradient-echo (3D GRE) dual flip-angle sequences were performed at 14 time points covering 130 minutes after the administration of 0.2 mmol/kg of Gd-diethylenetriamine pentaacetic acid ion (Gd-DTPA)(2-), i.e., 0.4 mL of Magnevist™ per kg body weight. Pair-wise tests were used to assess differences between pre-and post-contrast T1 values. 2D-IR sequences showed a statistically significant drop (p<0.001) in T1 values after i.v. CA administration. The T1 drop of 50% was reached 60 minutes after bolus injection in the TMJ disc. The 3D GRE dual flip-angle sequences confirmed these results and show plateau of T1 after 60 minutes. T1(Gd) maps calculated from dGEMRIC data allow in vivo assessment of the fibrocartilage disc of the TMJ. The recommended measurement time for dGEMRIC in the TMJ after i.v. CA administration is from 60 to 120 minutes. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Gritsun, T S; Venugopal, K; Zanotto, P M; Mikhailov, M V; Sall, A A; Holmes, E C; Polkinghorne, I; Frolova, T V; Pogodina, V V; Lashkevich, V A; Gould, E A
1997-05-01
The complete nucleotide sequence of two tick-transmitted flaviviruses, Vasilchenko (Vs) from Siberia and louping ill (LI) from the UK, have been determined. The genomes were respectively, 10928 and 10871 nucleotides (nt) in length. The coding strategy and functional protein sequence motifs of tick-borne flaviviruses are presented in both Vs and LI viruses. The phylogenies based on maximum likelihood, maximum parsimony and distance analysis of the polyproteins, identified Vs virus as a member of the tick-borne encephalitis virus subgroup within the tick-borne serocomplex, genus Flavivirus, family Flaviviridae. Comparative alignment of the 3'-untranslated regions revealed deletions of different lengths essentially at the same position downstream of the stop codon for all tick-borne viruses. Two direct 27 nucleotide repeats at the 3'-end were found only for Vs and LI virus. Immediately following the deletions a region of 332-334 nt with relatively conserved primary structure (67-94% identity) was observed at the 3'-non-coding end of the virus genome. Pairwise comparisons of the nucleotide sequence data revealed similar levels of variation between the coding region, and the 5' and 3'-termini of the genome, implying an equivalent strong selective control for translated and untranslated regions. Indeed the predicted folding of the 5' and 3'-untranslated regions revealed patterns of stem and loop structures conserved for all tick-borne flaviviruses suggesting a purifying selection for preservation of essential RNA secondary structures which could be involved in translational control and replication. The possible implications of these findings are discussed.
Recombination in feline immunodeficiency virus from feral and companion domestic cats.
Hayward, Jessica J; Rodrigo, Allen G
2008-06-17
Recombination is a relatively common phenomenon in retroviruses. We investigated recombination in Feline Immunodeficiency Virus from naturally-infected New Zealand domestic cats (Felis catus) by sequencing regions of the gag, pol and env genes. The occurrence of intragenic recombination was highest in env, with evidence of recombination in 6.4% (n = 156) of all cats. A further recombinant was identified in each of the gag (n = 48) and pol (n = 91) genes. Comparisons of phylogenetic trees across genes identified cases of incongruence, indicating intergenic recombination. Three (7.7%, n = 39) of these incongruencies were found to be significantly different using the Shimodaira-Hasegawa test.Surprisingly, our phylogenies from the gag and pol genes showed that no New Zealand sequences group with reference subtype C sequences within intrasubtype pairwise distances. Indeed, we find one and two distinct unknown subtype groups in gag and pol, respectively. These observations cause us to speculate that these New Zealand FIV strains have undergone several recombination events between subtype A parent strains and undefined unknown subtype strains, similar to the evolutionary history hypothesised for HIV-1 "subtype E".Endpoint dilution sequencing was used to confirm the consensus sequences of the putative recombinants and unknown subtype groups, providing evidence for the authenticity of these sequences. Endpoint dilution sequencing also resulted in the identification of a dual infection event in the env gene. In addition, an intrahost recombination event between variants of the same subtype in the pol gene was established. This is the first known example of naturally-occurring recombination in a cat with infection of the parent strains. Evidence of intragenic recombination in the gag, pol and env regions, and complex intergenic recombination, of FIV from naturally-infected domestic cats in New Zealand was found. Strains of unknown subtype were identified in all three gene regions. These results have implications for the use of the current FIV vaccine in New Zealand.
Concerted evolution at the population level: pupfish HindIII satellite DNA sequences.
Elder, J F; Turner, B J
1994-01-01
The canonical monomers (approximately 170 bp) of an abundant (1.9 x 10(6) copies per diploid genome) satellite DNA sequence family in the genome of Cyprinodon variegatus, a "pupfish" that ranges along the Atlantic coast from Cape Cod to central Mexico, are divergent in base sequence in 10 of 12 samples collected from natural populations. The divergence involves substitutions, deletions, and insertions, is marked in scope (mean pairwise sequence similarity = 61.6%; range = 35-95.9%), is largely confined to the 3' half of the monomer, and is not correlated with the distance among collecting sites. Repetitive cloning and direct genomic sequencing experiments failed to detect intrapopulation and intraindividual variation, suggesting high levels of sequence homogeneity within populations. The satellite sequence has therefore undergone "concerted evolution," at the level of the local population. Concerted evolution has previously almost always been discussed in terms of the divergence of species or higher taxa; its intraspecific occurrence apparently has not been reported previously. The generality of the observation is difficult to evaluate, for although satellite DNAs from a large number of organisms have been studied in detail, there appear to be little or no other data on their sequence variation in natural populations. The relationship (if any) between concerted, population level, satellite DNA divergence and the extent of gene flow/genetic isolation among conspecific natural populations remains to be established. Images PMID:8302879
Kazmier, Kelli; Alexander, Nathan S.; Meiler, Jens; Mchaourab, Hassane S.
2010-01-01
A hybrid protein structure determination approach combining sparse Electron Paramagnetic Resonance (EPR) distance restraints and Rosetta de novo protein folding has been previously demonstrated to yield high quality models (Alexander et al., 2008). However, widespread application of this methodology to proteins of unknown structures is hindered by the lack of a general strategy to place spin label pairs in the primary sequence. In this work, we report the development of an algorithm that optimally selects spin labeling positions for the purpose of distance measurements by EPR. For the α-helical subdomain of T4 lysozyme (T4L), simulated restraints that maximize sequence separation between the two spin labels while simultaneously ensuring pairwise connectivity of secondary structure elements yielded vastly improved models by Rosetta folding. 50% of all these models have the correct fold compared to only 21% and 8% correctly folded models when randomly placed restraints or no restraints are used, respectively. Moreover, the improvements in model quality require a limited number of optimized restraints, the number of which is determined by the pairwise connectivities of T4L α-helices. The predicted improvement in Rosetta model quality was verified by experimental determination of distances between spin labels pairs selected by the algorithm. Overall, our results reinforce the rationale for the combined use of sparse EPR distance restraints and de novo folding. By alleviating the experimental bottleneck associated with restraint selection, this algorithm sets the stage for extending computational structure determination to larger, traditionally elusive protein topologies of critical structural and biochemical importance. PMID:21074624
The OGCleaner: filtering false-positive homology clusters.
Fujimoto, M Stanley; Suvorov, Anton; Jensen, Nicholas O; Clement, Mark J; Snell, Quinn; Bybee, Seth M
2017-01-01
Detecting homologous sequences in organisms is an essential step in protein structure and function prediction, gene annotation and phylogenetic tree construction. Heuristic methods are often employed for quality control of putative homology clusters. These heuristics, however, usually only apply to pairwise sequence comparison and do not examine clusters as a whole. We present the Orthology Group Cleaner (the OGCleaner), a tool designed for filtering putative orthology groups as homology or non-homology clusters by considering all sequences in a cluster. The OGCleaner relies on high-quality orthologous groups identified in OrthoDB to train machine learning algorithms that are able to distinguish between true-positive and false-positive homology groups. This package aims to improve the quality of phylogenetic tree construction especially in instances of lower-quality transcriptome assemblies. https://github.com/byucsl/ogcleaner CONTACT: sfujimoto@gmail.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Coan, Heather B.; Youker, Robert T.
2017-01-01
Understanding how proteins mutate is critical to solving a host of biological problems. Mutations occur when an amino acid is substituted for another in a protein sequence. The set of likelihoods for amino acid substitutions is stored in a matrix and input to alignment algorithms. The quality of the resulting alignment is used to assess the similarity of two or more sequences and can vary according to assumptions modeled by the substitution matrix. Substitution strategies with minor parameter variations are often grouped together in families. For example, the BLOSUM and PAM matrix families are commonly used because they provide a standard, predefined way of modeling substitutions. However, researchers often do not know if a given matrix family or any individual matrix within a family is the most suitable. Furthermore, predefined matrix families may inaccurately reflect a particular hypothesis that a researcher wishes to model or otherwise result in unsatisfactory alignments. In these cases, the ability to compare the effects of one or more custom matrices may be needed. This laborious process is often performed manually because the ability to simultaneously load multiple matrices and then compare their effects on alignments is not readily available in current software tools. This paper presents SubVis, an interactive R package for loading and applying multiple substitution matrices to pairwise alignments. Users can simultaneously explore alignments resulting from multiple predefined and custom substitution matrices. SubVis utilizes several of the alignment functions found in R, a common language among protein scientists. Functions are tied together with the Shiny platform which allows the modification of input parameters. Information regarding alignment quality and individual amino acid substitutions is displayed with the JavaScript language which provides interactive visualizations for revealing both high-level and low-level alignment information. PMID:28674656
Structure based alignment and clustering of proteins (STRALCP)
Zemla, Adam T.; Zhou, Carol E.; Smith, Jason R.; Lam, Marisa W.
2013-06-18
Disclosed are computational methods of clustering a set of protein structures based on local and pair-wise global similarity values. Pair-wise local and global similarity values are generated based on pair-wise structural alignments for each protein in the set of protein structures. Initially, the protein structures are clustered based on pair-wise local similarity values. The protein structures are then clustered based on pair-wise global similarity values. For each given cluster both a representative structure and spans of conserved residues are identified. The representative protein structure is used to assign newly-solved protein structures to a group. The spans are used to characterize conservation and assign a "structural footprint" to the cluster.
The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation
Casadio, Rita
2017-01-01
Abstract BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. PMID:28453653
Falk, K.; Batts, W.N.; Kvellestad, A.; Kurath, G.; Wiik-Nielsen, J.; Winton, J.R.
2008-01-01
Atlantic salmon paramyxovirus (ASPV) was isolated in 1995 from gills of farmed Atlantic salmon suffering from proliferative gill inflammation. The complete genome sequence of ASPV was determined, revealing a genome 16,968 nucleotides in length consisting of six non-overlapping genes coding for the nucleo- (N), phospho- (P), matrix- (M), fusion- (F), haemagglutinin-neuraminidase- (HN) and large polymerase (L) proteins in the order 3???-N-P-M-F-HN-L-5???. The various conserved features related to virus replication found in most paramyxoviruses were also found in ASPV. These include: conserved and complementary leader and trailer sequences, tri-nucleotide intergenic regions and highly conserved transcription start and stop signal sequences. The P gene expression strategy of ASPV was like that of the respiro-, morbilli- and henipaviruses, which express the P and C proteins from the primary transcript and edit a portion of the mRNA to encode V and W proteins. Sequence similarities among various features related to virus replication, pairwise comparisons of all deduced ASPV protein sequences with homologous regions from other members of the family Paramyxoviridae, and phylogenetic analyses of these amino acid sequences suggested that ASPV was a novel member of the sub-family Paramyxovirinae, most closely related to the respiroviruses. ?? 2008 Elsevier B.V. All rights reserved.
Holm, Liisa; Laakso, Laura M
2016-07-08
The Dali server (http://ekhidna2.biocenter.helsinki.fi/dali) is a network service for comparing protein structures in 3D. In favourable cases, comparing 3D structures may reveal biologically interesting similarities that are not detectable by comparing sequences. The Dali server has been running in various places for over 20 years and is used routinely by crystallographers on newly solved structures. The latest update of the server provides enhanced analytics for the study of sequence and structure conservation. The server performs three types of structure comparisons: (i) Protein Data Bank (PDB) search compares one query structure against those in the PDB and returns a list of similar structures; (ii) pairwise comparison compares one query structure against a list of structures specified by the user; and (iii) all against all structure comparison returns a structural similarity matrix, a dendrogram and a multidimensional scaling projection of a set of structures specified by the user. Structural superimpositions are visualized using the Java-free WebGL viewer PV. The structural alignment view is enhanced by sequence similarity searches against Uniprot. The combined structure-sequence alignment information is compressed to a stack of aligned sequence logos. In the stack, each structure is structurally aligned to the query protein and represented by a sequence logo. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Development of Genetic Markers in Eucalyptus Species by Target Enrichment and Exome Sequencing
Dasgupta, Modhumita Ghosh; Dharanishanthi, Veeramuthu; Agarwal, Ishangi; Krutovsky, Konstantin V.
2015-01-01
The advent of next-generation sequencing has facilitated large-scale discovery, validation and assessment of genetic markers for high density genotyping. The present study was undertaken to identify markers in genes supposedly related to wood property traits in three Eucalyptus species. Ninety four genes involved in xylogenesis were selected for hybridization probe based nuclear genomic DNA target enrichment and exome sequencing. Genomic DNA was isolated from the leaf tissues and used for on-array probe hybridization followed by Illumina sequencing. The raw sequence reads were trimmed and high-quality reads were mapped to the E. grandis reference sequence and the presence of single nucleotide variants (SNVs) and insertions/ deletions (InDels) were identified across the three species. The average read coverage was 216X and a total of 2294 SNVs and 479 InDels were discovered in E. camaldulensis, 2383 SNVs and 518 InDels in E. tereticornis, and 1228 SNVs and 409 InDels in E. grandis. Additionally, SNV calling and InDel detection were conducted in pair-wise comparisons of E. tereticornis vs. E. grandis, E. camaldulensis vs. E. tereticornis and E. camaldulensis vs. E. grandis. This study presents an efficient and high throughput method on development of genetic markers for family– based QTL and association analysis in Eucalyptus. PMID:25602379
High-throughput sequence alignment using Graphics Processing Units
Schatz, Michael C; Trapnell, Cole; Delcher, Arthur L; Varshney, Amitabh
2007-01-01
Background The recent availability of new, less expensive high-throughput DNA sequencing technologies has yielded a dramatic increase in the volume of sequence data that must be analyzed. These data are being generated for several purposes, including genotyping, genome resequencing, metagenomics, and de novo genome assembly projects. Sequence alignment programs such as MUMmer have proven essential for analysis of these data, but researchers will need ever faster, high-throughput alignment tools running on inexpensive hardware to keep up with new sequence technologies. Results This paper describes MUMmerGPU, an open-source high-throughput parallel pairwise local sequence alignment program that runs on commodity Graphics Processing Units (GPUs) in common workstations. MUMmerGPU uses the new Compute Unified Device Architecture (CUDA) from nVidia to align multiple query sequences against a single reference sequence stored as a suffix tree. By processing the queries in parallel on the highly parallel graphics card, MUMmerGPU achieves more than a 10-fold speedup over a serial CPU version of the sequence alignment kernel, and outperforms the exact alignment component of MUMmer on a high end CPU by 3.5-fold in total application time when aligning reads from recent sequencing projects using Solexa/Illumina, 454, and Sanger sequencing technologies. Conclusion MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies. MUMmerGPU demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU. PMID:18070356
Criterion Predictability: Identifying Differences Between [r-squares
ERIC Educational Resources Information Center
Malgady, Robert G.
1976-01-01
An analysis of variance procedure for testing differences in r-squared, the coefficient of determination, across independent samples is proposed and briefly discussed. The principal advantage of the procedure is to minimize Type I error for follow-up tests of pairwise differences. (Author/JKS)
Registration of 4D time-series of cardiac images with multichannel Diffeomorphic Demons.
Peyrat, Jean-Marc; Delingette, Hervé; Sermesant, Maxime; Pennec, Xavier; Xu, Chenyang; Ayache, Nicholas
2008-01-01
In this paper, we propose a generic framework for intersubject non-linear registration of 4D time-series images. In this framework, spatio-temporal registration is defined by mapping trajectories of physical points as opposed to spatial registration that solely aims at mapping homologous points. First, we determine the trajectories we want to register in each sequence using a motion tracking algorithm based on the Diffeomorphic Demons algorithm. Then, we perform simultaneously pairwise registrations of corresponding time-points with the constraint to map the same physical points over time. We show this trajectory registration can be formulated as a multichannel registration of 3D images. We solve it using the Diffeomorphic Demons algorithm extended to vector-valued 3D images. This framework is applied to the inter-subject non-linear registration of 4D cardiac CT sequences.
Scale dependence in species turnover reflects variance in species occupancy.
McGlinn, Daniel J; Hurlbert, Allen H
2012-02-01
Patterns of species turnover may reflect the processes driving community dynamics across scales. While the majority of studies on species turnover have examined pairwise comparison metrics (e.g., the average Jaccard dissimilarity), it has been proposed that the species-area relationship (SAR) also offers insight into patterns of species turnover because these two patterns may be analytically linked. However, these previous links only apply in a special case where turnover is scale invariant, and we demonstrate across three different plant communities that over 90% of the pairwise turnover values are larger than expected based on scale-invariant predictions from the SAR. Furthermore, the degree of scale dependence in turnover was negatively related to the degree of variance in the occupancy frequency distribution (OFD). These findings suggest that species turnover diverges from scale invariance, and as such pairwise turnover and the slope of the SAR are not redundant. Furthermore, models developed to explain the OFD should be linked with those developed to explain species turnover to achieve a more unified understanding of community structure.
A new graph-based method for pairwise global network alignment
Klau, Gunnar W
2009-01-01
Background In addition to component-based comparative approaches, network alignments provide the means to study conserved network topology such as common pathways and more complex network motifs. Yet, unlike in classical sequence alignment, the comparison of networks becomes computationally more challenging, as most meaningful assumptions instantly lead to NP-hard problems. Most previous algorithmic work on network alignments is heuristic in nature. Results We introduce the graph-based maximum structural matching formulation for pairwise global network alignment. We relate the formulation to previous work and prove NP-hardness of the problem. Based on the new formulation we build upon recent results in computational structural biology and present a novel Lagrangian relaxation approach that, in combination with a branch-and-bound method, computes provably optimal network alignments. The Lagrangian algorithm alone is a powerful heuristic method, which produces solutions that are often near-optimal and – unlike those computed by pure heuristics – come with a quality guarantee. Conclusion Computational experiments on the alignment of protein-protein interaction networks and on the classification of metabolic subnetworks demonstrate that the new method is reasonably fast and has advantages over pure heuristics. Our software tool is freely available as part of the LISA library. PMID:19208162
Galpert, Deborah; del Río, Sara; Herrera, Francisco; Ancede-Gallardo, Evys; Antunes, Agostinho; Agüero-Chapin, Guillermin
2015-01-01
Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification. PMID:26605337
New Measurement for Correlation of Co-evolution Relationship of Subsequences in Protein.
Gao, Hongyun; Yu, Xiaoqing; Dou, Yongchao; Wang, Jun
2015-12-01
Many computational tools have been developed to measure the protein residues co-evolution. Most of them only focus on co-evolution for pairwise residues in a protein sequence. However, number of residues participate in co-evolution might be multiple. And some co-evolved residues are clustered in several distinct regions in primary structure. Therefore, the co-evolution among the adjacent residues and the correlation between the distinct regions offer insights into function and evolution of the protein and residues. Subsequence is used to represent the adjacent multiple residues in one distinct region. In the paper, co-evolution relationship in each subsequence is represented by mutual information matrix (MIM). Then, Pearson's correlation coefficient: R value is developed to measure the similarity correlation of two MIMs. MSAs from Catalytic Data Base (Catalytic Site Atlas, CSA) are used for testing. R value characterizes a specific class of residues. In contrast to individual pairwise co-evolved residues, adjacent residues without high individual MI values are found since the co-evolved relationship among them is similar to that among another set of adjacent residues. These subsequences possess some flexibility in the composition of side chains, such as the catalyzed environment.
Galpert, Deborah; Del Río, Sara; Herrera, Francisco; Ancede-Gallardo, Evys; Antunes, Agostinho; Agüero-Chapin, Guillermin
2015-01-01
Orthology detection requires more effective scaling algorithms. In this paper, a set of gene pair features based on similarity measures (alignment scores, sequence length, gene membership to conserved regions, and physicochemical profiles) are combined in a supervised pairwise ortholog detection approach to improve effectiveness considering low ortholog ratios in relation to the possible pairwise comparison between two genomes. In this scenario, big data supervised classifiers managing imbalance between ortholog and nonortholog pair classes allow for an effective scaling solution built from two genomes and extended to other genome pairs. The supervised approach was compared with RBH, RSD, and OMA algorithms by using the following yeast genome pairs: Saccharomyces cerevisiae-Kluyveromyces lactis, Saccharomyces cerevisiae-Candida glabrata, and Saccharomyces cerevisiae-Schizosaccharomyces pombe as benchmark datasets. Because of the large amount of imbalanced data, the building and testing of the supervised model were only possible by using big data supervised classifiers managing imbalance. Evaluation metrics taking low ortholog ratios into account were applied. From the effectiveness perspective, MapReduce Random Oversampling combined with Spark SVM outperformed RBH, RSD, and OMA, probably because of the consideration of gene pair features beyond alignment similarities combined with the advances in big data supervised classification.
Raupach, Michael J; Hendrich, Lars; Küchler, Stefan M; Deister, Fabian; Morinière, Jérome; Gossner, Martin M
2014-01-01
During the last few years, DNA barcoding has become an efficient method for the identification of species. In the case of insects, most published DNA barcoding studies focus on species of the Ephemeroptera, Trichoptera, Hymenoptera and especially Lepidoptera. In this study we test the efficiency of DNA barcoding for true bugs (Hemiptera: Heteroptera), an ecological and economical highly important as well as morphologically diverse insect taxon. As part of our study we analyzed DNA barcodes for 1742 specimens of 457 species, comprising 39 families of the Heteroptera. We found low nucleotide distances with a minimum pairwise K2P distance <2.2% within 21 species pairs (39 species). For ten of these species pairs (18 species), minimum pairwise distances were zero. In contrast to this, deep intraspecific sequence divergences with maximum pairwise distances >2.2% were detected for 16 traditionally recognized and valid species. With a successful identification rate of 91.5% (418 species) our study emphasizes the use of DNA barcodes for the identification of true bugs and represents an important step in building-up a comprehensive barcode library for true bugs in Germany and Central Europe as well. Our study also highlights the urgent necessity of taxonomic revisions for various taxa of the Heteroptera, with a special focus on various species of the Miridae. In this context we found evidence for on-going hybridization events within various taxonomically challenging genera (e.g. Nabis Latreille, 1802 (Nabidae), Lygus Hahn, 1833 (Miridae), Phytocoris Fallén, 1814 (Miridae)) as well as the putative existence of cryptic species (e.g. Aneurus avenius (Duffour, 1833) (Aradidae) or Orius niger (Wolff, 1811) (Anthocoridae)).
Raupach, Michael J.; Hendrich, Lars; Küchler, Stefan M.; Deister, Fabian; Morinière, Jérome; Gossner, Martin M.
2014-01-01
During the last few years, DNA barcoding has become an efficient method for the identification of species. In the case of insects, most published DNA barcoding studies focus on species of the Ephemeroptera, Trichoptera, Hymenoptera and especially Lepidoptera. In this study we test the efficiency of DNA barcoding for true bugs (Hemiptera: Heteroptera), an ecological and economical highly important as well as morphologically diverse insect taxon. As part of our study we analyzed DNA barcodes for 1742 specimens of 457 species, comprising 39 families of the Heteroptera. We found low nucleotide distances with a minimum pairwise K2P distance <2.2% within 21 species pairs (39 species). For ten of these species pairs (18 species), minimum pairwise distances were zero. In contrast to this, deep intraspecific sequence divergences with maximum pairwise distances >2.2% were detected for 16 traditionally recognized and valid species. With a successful identification rate of 91.5% (418 species) our study emphasizes the use of DNA barcodes for the identification of true bugs and represents an important step in building-up a comprehensive barcode library for true bugs in Germany and Central Europe as well. Our study also highlights the urgent necessity of taxonomic revisions for various taxa of the Heteroptera, with a special focus on various species of the Miridae. In this context we found evidence for on-going hybridization events within various taxonomically challenging genera (e.g. Nabis Latreille, 1802 (Nabidae), Lygus Hahn, 1833 (Miridae), Phytocoris Fallén, 1814 (Miridae)) as well as the putative existence of cryptic species (e.g. Aneurus avenius (Duffour, 1833) (Aradidae) or Orius niger (Wolff, 1811) (Anthocoridae)). PMID:25203616
Improving prediction of heterodimeric protein complexes using combination with pairwise kernel.
Ruan, Peiying; Hayashida, Morihiro; Akutsu, Tatsuya; Vert, Jean-Philippe
2018-02-19
Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.
Accuracy of taxonomy prediction for 16S rRNA and fungal ITS sequences
2018-01-01
Prediction of taxonomy for marker gene sequences such as 16S ribosomal RNA (rRNA) is a fundamental task in microbiology. Most experimentally observed sequences are diverged from reference sequences of authoritatively named organisms, creating a challenge for prediction methods. I assessed the accuracy of several algorithms using cross-validation by identity, a new benchmark strategy which explicitly models the variation in distances between query sequences and the closest entry in a reference database. When the accuracy of genus predictions was averaged over a representative range of identities with the reference database (100%, 99%, 97%, 95% and 90%), all tested methods had ≤50% accuracy on the currently-popular V4 region of 16S rRNA. Accuracy was found to fall rapidly with identity; for example, better methods were found to have V4 genus prediction accuracy of ∼100% at 100% identity but ∼50% at 97% identity. The relationship between identity and taxonomy was quantified as the probability that a rank is the lowest shared by a pair of sequences with a given pair-wise identity. With the V4 region, 95% identity was found to be a twilight zone where taxonomy is highly ambiguous because the probabilities that the lowest shared rank between pairs of sequences is genus, family, order or class are approximately equal. PMID:29682424
Eyre, David W; Davies, Kerrie A; Davis, Georgina; Fawley, Warren N; Dingle, Kate E; De Maio, Nicola; Karas, Andreas; Crook, Derrick W; Peto, Tim E A; Walker, A Sarah; Wilcox, Mark H
2018-04-06
Rates of Clostridium difficile infection vary widely across Europe, as do prevalent ribotypes. The extent of Europe-wide diversity within each ribotype is however unknown. Inpatient diarrhoeal faecal samples submitted on one day in summer and winter (2012-2013) to laboratories in 482 European hospitals were cultured for C. difficile, and isolates ribotyped; those from the 10 most prevalent ribotypes were Illumina whole-genome sequenced. Pairwise single nucleotide differences (SNPs) were obtained from recombination-corrected maximum-likelihood phylogenies. Within each ribotype, country-based sequence clustering was assessed using the ratio of the median SNPs between isolates within versus across different countries using permutation tests. Time-scaled Bayesian phylogenies where used to reconstruct the historic location of each lineage. Sequenced isolates (n=624) were from 19 countries. Five ribotypes had within-country clustering: ribotype-356, only in Italy; ribotype-018, predominantly in Italy; ribotype-176, with distinct Czech and German clades; ribotype-001/072, including distinct German, Slovakian, and Spanish clades; and ribotype-027, with multiple predominantly country-specific clades including in Hungary, Italy, Germany, Romania and Poland. By contrast, we found no within-country clustering for ribotypes 078, 015, 002, 014, and 020, consistent with a Europe-wide distribution. Fluoroquinolone-resistance was significantly more common in within-country clustered ribotypes (p=0.009). Fluoroquinolone-resistant isolates were also more tightly geographically clustered, median (IQR) 43 (0-213) miles between each isolate and the most closely genetically-related isolate vs. 421 (204-680) in non-resistant pairs (p<0.001). Two distinct patterns of C. difficile ribotype spread were observed, consistent with either predominantly healthcare-associated acquisition or Europe-wide dissemination via other routes/sources, e.g. the food chain.
iPARTS2: an improved tool for pairwise alignment of RNA tertiary structures, version 2.
Yang, Chung-Han; Shih, Cheng-Ting; Chen, Kun-Tze; Lee, Po-Han; Tsai, Ping-Han; Lin, Jian-Cheng; Yen, Ching-Yu; Lin, Tiao-Yin; Lu, Chin Lung
2016-07-08
Since its first release in 2010, iPARTS has become a valuable tool for globally or locally aligning two RNA 3D structures. It was implemented by a structural alphabet (SA)-based approach, which uses an SA of 23 letters to reduce RNA 3D structures into 1D sequences of SA letters and applies traditional sequence alignment to these SA-encoded sequences for determining their global or local similarity. In this version, we have re-implemented iPARTS into a new web server iPARTS2 by constructing a totally new SA, which consists of 92 elements with each carrying both information of base and backbone geometry for a representative nucleotide. This SA is significantly different from the one used in iPARTS, because the latter consists of only 23 elements with each carrying only the backbone geometry information of a representative nucleotide. Our experimental results have shown that iPARTS2 outperforms its previous version iPARTS and also achieves better accuracy than other popular tools, such as SARA, SETTER and RASS, in RNA alignment quality and function prediction. iPARTS2 takes as input two RNA 3D structures in the PDB format and outputs their global or local alignments with graphical display. iPARTS2 is now available online at http://genome.cs.nthu.edu.tw/iPARTS2/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Rubio-Moraga, Angela; Candel-Perez, David; Lucas-Borja, Manuel E; Tiscar, Pedro A; Viñegla, Benjamin; Linares, Juan C; Gómez-Gómez, Lourdes; Ahrazem, Oussama
2012-01-01
Eight Pinus nigra Arn. populations from Southern Spain and Northern Morocco were examined using inter-simple sequence repeat markers to characterize the genetic variability amongst populations. Pair-wise population genetic distance ranged from 0.031 to 0.283, with a mean of 0.150 between populations. The highest inter-population average distance was between PaCU from Cuenca and YeCA from Cazorla, while the lowest distance was between TaMO from Morocco and MA Sierra Mágina populations. Analysis of molecular variance (AMOVA) and Nei's genetic diversity analyses revealed higher genetic variation within the same population than among different populations. Genetic differentiation (Gst) was 0.233. Cuenca showed the highest Nei's genetic diversity followed by the Moroccan region, Sierra Mágina, and Cazorla region. However, clustering of populations was not in accordance with their geographical locations. Principal component analysis showed the presence of two major groups-Group 1 contained all populations from Cuenca while Group 2 contained populations from Cazorla, Sierra Mágina and Morocco-while Bayesian analysis revealed the presence of three clusters. The low genetic diversity observed in PaCU and YeCA is probably a consequence of inappropriate management since no estimation of genetic variability was performed before the silvicultural treatments. Data indicates that the inter-simple sequence repeat (ISSR) method is sufficiently informative and powerful to assess genetic variability among populations of P. nigra.
Rubio-Moraga, Angela; Candel-Perez, David; Lucas-Borja, Manuel E.; Tiscar, Pedro A.; Viñegla, Benjamin; Linares, Juan C.; Gómez-Gómez, Lourdes; Ahrazem, Oussama
2012-01-01
Eight Pinus nigra Arn. populations from Southern Spain and Northern Morocco were examined using inter-simple sequence repeat markers to characterize the genetic variability amongst populations. Pair-wise population genetic distance ranged from 0.031 to 0.283, with a mean of 0.150 between populations. The highest inter-population average distance was between PaCU from Cuenca and YeCA from Cazorla, while the lowest distance was between TaMO from Morocco and MA Sierra Mágina populations. Analysis of molecular variance (AMOVA) and Nei’s genetic diversity analyses revealed higher genetic variation within the same population than among different populations. Genetic differentiation (Gst) was 0.233. Cuenca showed the highest Nei’s genetic diversity followed by the Moroccan region, Sierra Mágina, and Cazorla region. However, clustering of populations was not in accordance with their geographical locations. Principal component analysis showed the presence of two major groups—Group 1 contained all populations from Cuenca while Group 2 contained populations from Cazorla, Sierra Mágina and Morocco—while Bayesian analysis revealed the presence of three clusters. The low genetic diversity observed in PaCU and YeCA is probably a consequence of inappropriate management since no estimation of genetic variability was performed before the silvicultural treatments. Data indicates that the inter-simple sequence repeat (ISSR) method is sufficiently informative and powerful to assess genetic variability among populations of P. nigra. PMID:22754321
Rostami, S; Salavati, R; Beech, R N; Babaei, Z; Sharbatkhori, M; Baneshi, M R; Hajialilo, E; Shad, H; Harandi, M F
2015-03-01
Although Taenia hydatigena is one of the most prevalent taeniid species of livestock, very little molecular genetic information exists for this parasite. Up to 100 sheep isolates of T. hydatigena were collected from 19 abattoirs located in the provinces of Tehran, Alborz and Kerman. A calibrated microscope was used to measure the larval rostellar hook lengths. Following DNA extraction, fragments of cytochrome c oxidase 1 (CO1) and 12S rRNA genes were amplified by the polymerase chain reaction method and the amplicons were subjected to sequencing. The mean total length of large and small hooks was 203.4 μm and 135.9 μm, respectively. Forty CO1 and 39 12S rRNA sequence haplotypes were obtained in the study. The levels of pairwise nucleotide variation between individual haplotypes of CO1 and 12S rRNA genes were determined to be between 0.3-3.4% and 0.2-2.1%, respectively. The overall nucleotide variation among all the CO1 haplotypes was 9.7%, and for all the 12S rRNA haplotypes it was 10.1%. A significant difference was observed between rostellar hook morphometry and both CO1 and 12S rRNA sequence variability. A significantly high level of genetic variation was observed in the present study. The results showed that the 12S rRNA gene is more variable than CO1.
Watanobe, Takuma; Ishiguro, Naotaka; Nakano, Masuo; Takamiya, Hiroto; Matsui, Akira; Hongo, Hitomi
2002-08-01
Ancient DNAs of Sus scrofa specimens excavated from archaeological sites on the Okinawa islands were examined to clarify the genetic relationships among prehistoric Sus scrofa, modern wild boars and domestic pigs inhabiting the Ryukyu archipelago, the Japanese islands, and the Asian continent. We extracted remain DNA from 161 bone specimens excavated from 12 archaeological sites on the Okinawa islands and successfully amplified mitochondrial DNA control region fragments from 33 of 161 specimens. Pairwise difference between prehistoric and modern S. scrofa nucleotide sequences showed that haplotypes of the East Asian domestic pig lineage were found from archaeological specimens together with Ryukyu wild boars native to the Ryukyu archipelago. Phylogenetic analysis of 14 ancient sequences (11 haplotypes; 574 bp) indicated that S. scrofa specimens from two Yayoi-Heian sites (Kitahara and Ara shellmiddens) and two Recent Times sites (Wakuta Kiln and Kiyuna sites) are grouped with modern East Asian domestic pigs. Sus scrofa specimens from Shimizu shellmidden (Yayoi-Heian Period) were very closely related to modern Sus scrofa riukiuanus but had a unique nucleotide insertion, indicating that the population is genetically distinct from the lineage of modern Ryukyu wild boars. This genetic evidence suggests that domestic pigs from the Asian continent were introduced to the Okinawa islands in the early Yayoi-Heian period (1700-2000 BP), or earlier.
Chaiyasan, P; Pramual, P
2016-09-01
An understanding of the genetic structure and diversity of vector species is crucial for effective control and management. In this study, mitochondrial DNA sequences were used to examine the genetic structure, diversity and demographic history of a black fly vector, Simulium nodosum Puri (Diptera: Simuliidae), in Thailand. A total of 145 sequences were obtained from 10 sampling locations collected across geographical ranges in the country. Low genetic diversity was found in populations of S. nodosum that could be explained by the recent population history of this species. Demographic history analysis revealed a signature of demographic expansion dating back to only 2600-5200 years ago. Recent population expansion in S. nodosum possibly followed an increase in agriculture that enabled its hosts', humans and domestic animals, densities to increase. Alternatively, the Thai populations could be a derivative of an older expansion event in the more northern populations. Mitochondrial DNA genealogy revealed no genetically divergent lineages, which agrees with the previous cytogenetic study. Genetic structure analyses found that only 27% of the pairwise comparisons were significantly different. The most likely explanation for the pattern of genetic structuring is the effect of genetic drift because of recent colonization. © 2016 The Royal Entomological Society.
Descriptive Statistics of the Genome: Phylogenetic Classification of Viruses.
Hernandez, Troy; Yang, Jie
2016-10-01
The typical process for classifying and submitting a newly sequenced virus to the NCBI database involves two steps. First, a BLAST search is performed to determine likely family candidates. That is followed by checking the candidate families with the pairwise sequence alignment tool for similar species. The submitter's judgment is then used to determine the most likely species classification. The aim of this article is to show that this process can be automated into a fast, accurate, one-step process using the proposed alignment-free method and properly implemented machine learning techniques. We present a new family of alignment-free vectorizations of the genome, the generalized vector, that maintains the speed of existing alignment-free methods while outperforming all available methods. This new alignment-free vectorization uses the frequency of genomic words (k-mers), as is done in the composition vector, and incorporates descriptive statistics of those k-mers' positional information, as inspired by the natural vector. We analyze five different characterizations of genome similarity using k-nearest neighbor classification and evaluate these on two collections of viruses totaling over 10,000 viruses. We show that our proposed method performs better than, or as well as, other methods at every level of the phylogenetic hierarchy. The data and R code is available upon request.
A Proposed Genus Boundary for the Prokaryotes Based on Genomic Insights
Qin, Qi-Long; Xie, Bin-Bin; Zhang, Xi-Ying; Chen, Xiu-Lan; Zhou, Bai-Cheng; Zhou, Jizhong; Oren, Aharon
2014-01-01
Genomic information has already been applied to prokaryotic species definition and classification. However, the contribution of the genome sequence to prokaryotic genus delimitation has been less studied. To gain insights into genus definition for the prokaryotes, we attempted to reveal the genus-level genomic differences in the current prokaryotic classification system and to delineate the boundary of a genus on the basis of genomic information. The average nucleotide sequence identity between two genomes can be used for prokaryotic species delineation, but it is not suitable for genus demarcation. We used the percentage of conserved proteins (POCP) between two strains to estimate their evolutionary and phenotypic distance. A comprehensive genomic survey indicated that the POCP can serve as a robust genomic index for establishing the genus boundary for prokaryotic groups. Basically, two species belonging to the same genus would share at least half of their proteins. In a specific lineage, the genus and family/order ranks showed slight or no overlap in terms of POCP values. A prokaryotic genus can be defined as a group of species with all pairwise POCP values higher than 50%. Integration of whole-genome data into the current taxonomy system can provide comprehensive information for prokaryotic genus definition and delimitation. PMID:24706738
Boufana, Belgees; Scala, Antonio; Lahmar, Samia; Pointing, Steve; Craig, Philip S; Dessì, Giorgia; Zidda, Antonella; Pipia, Anna Paola; Varcasia, Antonio
2015-11-30
Cysticercosis caused by the metacestode stage of Taenia hydatigena is endemic in Sardinia. Information on the genetic variation of this parasite is important for epidemiological studies and implementation of control programs. Using two mitochondrial genes, the cytochrome c oxidase subunit 1 (cox1) and the NADH dehydrogenase subunit 1 (ND1) we investigated the genetic variation and population structure of Cysticercus tenuicollis from Sardinian intermediate hosts and compared it to that from other hosts from various geographical regions. The parsimony cox1 network analysis indicated the existence of a common lineage for T. hydatigena and the overall diversity and neutrality indices indicated demographic expansion. Using the cox1 sequences, low pairwise fixation index (Fst) values were recorded for Sardinian, Iranian and Palestinian sheep C. tenuicollis which suggested the absence of genetic differentiation. Using the ND1 sequences, C. tenuicollis from Sardinian sheep appeared to be differentiated from those of goat and pig origin. In addition, goat C. tenuicollis were genetically different from adult T. hydatigena as indicated by the statistically significant Fst value. Our results are consistent with biochemical and morphological studies that suggest the existence of variants of T. hydatigena. Copyright © 2015 Elsevier B.V. All rights reserved.
Diversity of indoor fungi as revealed by DNA metabarcoding.
Korpelainen, Helena; Pietiläinen, Maria
2017-01-01
In the present study, we conducted DNA metabarcoding (the nuclear ITS2 region) for indoor fungal samples originating from two nursery schools with a suspected mould problem (sampling before and after renovation), from two university buildings, and from an old farmhouse. Good-quality sequences were obtained, and the results showed that DNA metabarcoding provides high resolution in fungal identification. The pooled proportions of sequences representing filamentous ascomycetes, filamentous basidiomycetes, yeasts, and other fungi equalled 62.3%, 8.0%, 28.3%, and 1.4%, respectively, and the total number of fungal genera found during the study was 585. When comparing fungal diversities and taxonomic composition between different types of buildings, no obvious pattern was detected. The average pairwise values of Sørensen Chao indices that were used to compare similarities for taxon composition between samples among the samples from the two university buildings, two nurseries, and farmhouse equaled 0.693, 0.736, 0.852, 0.928, and 0.981, respectively, while the mean similarity index for all samples was 0.864. We discovered that making explicit conclusions on the relationship between the indoor air quality and mycoflora is complicated by the lack of appropriate indicators for air quality and by the occurrence of wide spatial and temporal changes in diversity and compositions among samples.
Nie, Yuanyang; Zhou, Zhiwei; Guan, Jiuqiang; Xia, Baixue; Luo, Xiaolin; Yang, Yang; Fu, Yu; Sun, Qun
2017-01-01
Objective To understand the dynamic structure, function, and influence on nutrient metabolism in hosts, it was crucial to assess the genetic potential of gut microbial community in yaks of different ages. Methods The denaturing gradient gel electrophoresis (DGGE) profiles and Illumina-based metagenomic sequencing on colon contents of 15 semi-domestic yaks were investigated. Unweighted pairwise grouping method with mathematical averages (UPGMA) clustering and principal component analysis (PCA) were used to analyze the DGGE fingerprint. The Illumina sequences were assembled, predicted to genes and functionally annotated, and then classified by querying protein sequences of the genes against the Kyoto encyclopedia of genes and genomes (KEGG) database. Results Metagenomic sequencing showed that more than 85% of ribosomal RNA (rRNA) gene sequences belonged to the phylum Firmicutes and Bacteroidetes, indicating that the family Ruminococcaceae (46.5%), Rikenellaceae (11.3%), Lachnospiraceae (10.0%), and Bacteroidaceae (6.3%) were dominant gut microbes. Over 50% of non-rRNA gene sequences represented the metabolic pathways of amino acids (14.4%), proteins (12.3%), sugars (11.9%), nucleotides (6.8%), lipids (1.7%), xenobiotics (1.4%), coenzymes, and vitamins (3.6%). Gene functional classification showed that most of enzyme-coding genes were related to cellulose digestion and amino acids metabolic pathways. Conclusion Yaks’ age had a substantial effect on gut microbial composition. Comparative metagenomics of gut microbiota in 0.5-, 1.5-, and 2.5-year-old yaks revealed that the abundance of the class Clostridia, Bacteroidia, and Lentisphaeria, as well as the phylum Firmicutes, Bacteroidetes, Lentisphaerae, Tenericutes, and Cyanobacteria, varied more greatly during yaks’ growth, especially in young animals (0.5 and 1.5 years old). Gut microbes, including Bacteroides, Clostridium, and Lentisphaeria, make a contribution to the energy metabolism and synthesis of amino acid, which are essential to the normal growth of yaks. PMID:28183172
Kelishadi, Roya; Haghjooy Javanmard, Shaghayegh; Tajadini, Mohammad Hasan; Mansourian, Marjan; Motlagh, Mohammad Esmaeil; Ardalan, Gelayol; Ban, Matthew
2014-11-01
Depressed high-density lipoprotein cholesterol (HDL-C) is prevalent the Middle East and North Africa. Some studies have documented associations between HDL-C and several single nucleotide polymorphisms (SNPs) in candidate gene polymorphisms. We investigated the associations between SNP genotypes and HDL-C levels in Iranian students, aged 10-18 years. Genotyping was performed in 750 randomly selected participants among those with low HDL-C levels (below 5th percentile), intermediate HDL-C levels (5-95th) and high HDL-C levels (above the 95th percentile). Minor allele frequencies (MAFs) of the SNPs of interest were compared between the three HDL-C groups. The vast majority of pairwise comparisons of MAFs between HDL-C groups were significant. Pairwise comparisons between low and high HDL-C groups showed significant between-group differences in MAFs for all SNPs, except for APOC3 rs5128. Pairwise comparisons between low and intermediate HDL-C groups showed significant between-group differences in MAFs for all SNPs, except for APOC3 rs5128 and APOA1 rs2893157. Pairwise comparisons between intermediate and high HDL-C groups showed significant between-group differences in MAFs for all SNPs, except for ABCA1 APOC3 rs5128 and APOA1 rs2893157. After adjustment for confounding factors, including age, sex, body mass index, low physical activity, consumption of saturated fats, and socioeconomic status, ABCA1 r1587K and CETP A373P significantly increased the risk of depressed HDL-C, and CETP Taq1 had a protective role. This study replicated several associations between HDL-C levels and candidate gene SNPs from genome-wide associations with HDL-C in Iranians from the pediatric age group. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Metabolic Pathway Assignment of Plant Genes based on Phylogenetic Profiling–A Feasibility Study
Weißenborn, Sandra; Walther, Dirk
2017-01-01
Despite many developed experimental and computational approaches, functional gene annotation remains challenging. With the rapidly growing number of sequenced genomes, the concept of phylogenetic profiling, which predicts functional links between genes that share a common co-occurrence pattern across different genomes, has gained renewed attention as it promises to annotate gene functions based on presence/absence calls alone. We applied phylogenetic profiling to the problem of metabolic pathway assignments of plant genes with a particular focus on secondary metabolism pathways. We determined phylogenetic profiles for 40,960 metabolic pathway enzyme genes with assigned EC numbers from 24 plant species based on sequence and pathway annotation data from KEGG and Ensembl Plants. For gene sequence family assignments, needed to determine the presence or absence of particular gene functions in the given plant species, we included data of all 39 species available at the Ensembl Plants database and established gene families based on pairwise sequence identities and annotation information. Aside from performing profiling comparisons, we used machine learning approaches to predict pathway associations from phylogenetic profiles alone. Selected metabolic pathways were indeed found to be composed of gene families of greater than expected phylogenetic profile similarity. This was particularly evident for primary metabolism pathways, whereas for secondary pathways, both the available annotation in different species as well as the abstraction of functional association via distinct pathways proved limiting. While phylogenetic profile similarity was generally not found to correlate with gene co-expression, direct physical interactions of proteins were reflected by a significantly increased profile similarity suggesting an application of phylogenetic profiling methods as a filtering step in the identification of protein-protein interactions. This feasibility study highlights the potential and challenges associated with phylogenetic profiling methods for the detection of functional relationships between genes as well as the need to enlarge the set of plant genes with proven secondary metabolism involvement as well as the limitations of distinct pathways as abstractions of relationships between genes. PMID:29163570
Brown, Angus M
2010-04-01
The objective of the method described in this paper is to develop a spreadsheet template for the purpose of comparing multiple sample means. An initial analysis of variance (ANOVA) test on the data returns F--the test statistic. If F is larger than the critical F value drawn from the F distribution at the appropriate degrees of freedom, convention dictates rejection of the null hypothesis and allows subsequent multiple comparison testing to determine where the inequalities between the sample means lie. A variety of multiple comparison methods are described that return the 95% confidence intervals for differences between means using an inclusive pairwise comparison of the sample means. 2009 Elsevier Ireland Ltd. All rights reserved.
Nielsen, Jennifer L.; Zimmerman, Christian E.; Olsen, Jeffrey B.; Wiacek, Talia; Kretschmer, E.J.; Greenwald, Glenn M.; Wenburg, John K.
2003-01-01
Microsatellite allelic and mitochondrial DNA (mtDNA) haplotype diversity are analyzed in eight rainbow trout (Oncorhynchus mykiss) collections: two from tributaries flowing into the upper Santa Ynez River watershed at Gibraltar Reservoir (Camuesa and Gidney creeks); three from tributaries between Gibraltar and Jameson reservoirs (Fox, Blue Canyon, and Alder creeks); one from a tributary above Jameson Reservoir (Juncal Creek); Jameson Reservoir; and one from the mainstem Santa Ynez River above the Jameson Reservoir. Both analyses reveal a high degree of population structure. Thirteen microsatellite loci are amplified from 376 fish. Population pairwise comparisons show significant differences in allelic frequency among all populations with the exception of Juncal Creek and Jameson Reservoir (p = 0.4). Pairwise Fst values range from 0.001 (Juncal Creek and Jameson Reservoir) to 0.17 (Camuesa and Juncal creeks) with an overall value of 0.021. Regression analyses (Slatkin 1993) supports an isolation-bydistance model in the five populations below Jameson Reservoir (intercept = 1.187, slope = -0.41, r2 = 0.67). A neighbor-joining bootstrap value of 100% (based on 2000 replicate trees) separates the populations sampled above and below Juncal Dam. Composite haplotypes from 321 fish generated using mtDNA sequence data (Dloop) reveal four previously described haplotypes (MYS1, MYS3, MYS5 and MYS8; Nielsen et al. 1994a), and one (MYS5) was found in all populations. Mean haplotype diversity is 0.48. Pairwise Fst values from mtDNA range from -0.019 to 0.530 (0.177 over all populations) and are larger than those for microsatellites in 26 of 28 pairwise comparisons. In addition, the mtDNA and microsatellites provide contrasting evidence of the relationship of Fox and Alder creeks to the other six populations. Discrepancies between the two markers are likely due to the unique properties of the two marker types and their value in revealing historic (mtDNA) versus contemporary (microsatellites) genetic relationships. The contrasting results may indicate how relationships among the upper Santa Ynez River populations have changed since the installation of Juncal Dam. Comparisons of mtDNA haplotype frequencies from fish collected for this study with samples analyzed previously in JLN’s laboratory (1993) reveal significant differences in mtDNA haplotypes for Fox and Alder creeks. In the 2001 samples from this study, there is a loss of three haplotypes despite larger sample sizes. AMOVA analysis of what we term “upper” (Alder, Fox, Blue Canyon, Camuesa, Gidney creeks and the upper Santa Ynez mainstem) and “lower” (Hilton, Salsipuedes and the lower mainstem Santa Ynez River) Santa Ynez River populations (1993-2001) reveal that 11% of the variance in haplotypes is found between the upper and lower drainage. A comparison of the mtDNA data from this study with those available for southern California coastal and California hatchery O. mykiss populations yields Fst values of 0.15 and 0.47, respectively. Differentiation of mtDNA haplotypes for population pairs of Santa Ynez River and hatchery fish show no significant differentiation between wild and at least one hatchery strain in Cachuma Reservoir, Hilton Creek, and the Lower Santa Ynez River.
A new species of masked-owl (Aves: Strigiformes: Tytonidae) from Seram, Indonesia.
Jønsson, Knud Andreas; Poulsen, Michael Køie; Haryoko, Tri; Reeve, Andrew Hart; Fabre, Pierre-Henri
2013-01-01
We describe a new species of masked-owl from the lower montane forest of Seram, one of the largest islands in the Moluccas of eastern Indonesia, for which we propose the name Tyto almae (Seram Masked-Owl), sp. nov. Molecular (mitochondrial cyt-b) differences show that Tyto sororcula of Buru and Tanimbar is closely related to T novaehollandiae of Australia and New Guinea (-1% uncorrected pairwise distance), and that Tyto almae of Seram differs by -3% (uncorrected pairwise distance) from both of them. These differences are further corroborated by morphology and colouration. Although a photograph from Seram published in 1987 had already established the presence of a Tyto owl on the island, ours represents the first specimen of this species. The bird was mist-netted in wet, mossy lower montane forest at an elevation of 1,350 m. No further observations of the owl were made during four weeks of fieldwork in Seram.
NASA Astrophysics Data System (ADS)
Wismüller, Axel; DSouza, Adora M.; Abidin, Anas Z.; Wang, Xixi; Hobbs, Susan K.; Nagarajan, Mahesh B.
2015-03-01
Echo state networks (ESN) are recurrent neural networks where the hidden layer is replaced with a fixed reservoir of neurons. Unlike feed-forward networks, neuron training in ESN is restricted to the output neurons alone thereby providing a computational advantage. We demonstrate the use of such ESNs in our mutual connectivity analysis (MCA) framework for recovering the primary motor cortex network associated with hand movement from resting state functional MRI (fMRI) data. Such a framework consists of two steps - (1) defining a pair-wise affinity matrix between different pixel time series within the brain to characterize network activity and (2) recovering network components from the affinity matrix with non-metric clustering. Here, ESNs are used to evaluate pair-wise cross-estimation performance between pixel time series to create the affinity matrix, which is subsequently subject to non-metric clustering with the Louvain method. For comparison, the ground truth of the motor cortex network structure is established with a task-based fMRI sequence. Overlap between the primary motor cortex network recovered with our model free MCA approach and the ground truth was measured with the Dice coefficient. Our results show that network recovery with our proposed MCA approach is in close agreement with the ground truth. Such network recovery is achieved without requiring low-pass filtering of the time series ensembles prior to analysis, an fMRI preprocessing step that has courted controversy in recent years. Thus, we conclude our MCA framework can allow recovery and visualization of the underlying functionally connected networks in the brain on resting state fMRI.
NASA Technical Reports Server (NTRS)
Ricks, Wendell R.
1995-01-01
Pairwise comparison (PWC) is computer program that collects data for psychometric scaling techniques now used in cognitive research. It applies technique of pairwise comparisons, which is one of many techniques commonly used to acquire the data necessary for analyses. PWC administers task, collects data from test subject, and formats data for analysis. Written in Turbo Pascal v6.0.
Multilinguals' Perceptions of Feeling Different When Switching Languages
ERIC Educational Resources Information Center
Dewaele, Jean-Marc; Nakano, Seiji
2013-01-01
Research into multilingualism and personality has shown that a majority of multilinguals report feeling different when they switch from one language to another. The present study looks at perceived shifts on five scales of feelings (feeling logical, serious, emotional, fake and different) in pair-wise comparisons between languages following the…
Transient Classifier Systems and Man-Machine Interface Research.
1987-08-31
different timbre from two different resonant sources, i.e., like a violin and oboe emitting nearly the same fundamental mode fre- quency, but each with its...the subjects by examing both hits and misses for signal and noise stimuli. A pairwise com- parison of the means resulted in significant differences (at
SALAD database: a motif-based database of protein annotations for plant comparative genomics
Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi
2010-01-01
Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209 529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named ‘SALAD on ARRAYs’ to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis. PMID:19854933
SANSparallel: interactive homology search against Uniprot
Somervuo, Panu; Holm, Liisa
2015-01-01
Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. PMID:25855811
SALAD database: a motif-based database of protein annotations for plant comparative genomics.
Mihara, Motohiro; Itoh, Takeshi; Izawa, Takeshi
2010-01-01
Proteins often have several motifs with distinct evolutionary histories. Proteins with similar motifs have similar biochemical properties and thus related biological functions. We constructed a unique comparative genomics database termed the SALAD database (http://salad.dna.affrc.go.jp/salad/) from plant-genome-based proteome data sets. We extracted evolutionarily conserved motifs by MEME software from 209,529 protein-sequence annotation groups selected by BLASTP from the proteome data sets of 10 species: rice, sorghum, Arabidopsis thaliana, grape, a lycophyte, a moss, 3 algae, and yeast. Similarity clustering of each protein group was performed by pairwise scoring of the motif patterns of the sequences. The SALAD database provides a user-friendly graphical viewer that displays a motif pattern diagram linked to the resulting bootstrapped dendrogram for each protein group. Amino-acid-sequence-based and nucleotide-sequence-based phylogenetic trees for motif combination alignment, a logo comparison diagram for each clade in the tree, and a Pfam-domain pattern diagram are also available. We also developed a viewer named 'SALAD on ARRAYs' to view arbitrary microarray data sets of paralogous genes linked to the same dendrogram in a window. The SALAD database is a powerful tool for comparing protein sequences and can provide valuable hints for biological analysis.
Alabi, Olufemi J; Villegas, Cecilia; Gregg, Lori; Murray, K Daniel
2016-06-01
Two isolates of a novel bipartite begomovirus, tentatively named malvastrum bright yellow mosaic virus (MaBYMV), were molecularly characterized from naturally infected plants of the genus Malvastrum showing bright yellow mosaic disease symptoms in South Texas. Six complete DNA-A and five DNA-B genome sequences of MaBYMV obtained from the isolates ranged in length from 2,608 to 2,609 nucleotides (nt) and 2,578 to 2,605 nt, respectively. Both genome segments shared a 178- to 180-nt common region. In pairwise comparisons, the complete DNA-A and DNA-B sequences of MaBYMV were most similar (87-88 % and 79-81 % identity, respectively) and phylogenetically related to the corresponding sequences of sida mosaic Sinaloa virus-[MX-Gua-06]. Further analysis revealed that MaBYMV is a putative recombinant virus, thus supporting the notion that malvaceous hosts may be influencing the evolution of several begomoviruses. The design of new diagnostic primers enabled the detection of MaBYMV in cohorts of Bemisia tabaci collected from symptomatic Malvastrum sp. plants, thus implicating whiteflies as potential vectors of the virus.
CAFE: aCcelerated Alignment-FrEe sequence analysis
Lu, Yang Young; Tang, Kujin; Ren, Jie; Fuhrman, Jed A.; Waterman, Michael S.
2017-01-01
Abstract Alignment-free genome and metagenome comparisons are increasingly important with the development of next generation sequencing (NGS) technologies. Recently developed state-of-the-art k-mer based alignment-free dissimilarity measures including CVTree, \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$d_2^*$\\end{document} and \\documentclass[12pt]{minimal} \\usepackage{amsmath} \\usepackage{wasysym} \\usepackage{amsfonts} \\usepackage{amssymb} \\usepackage{amsbsy} \\usepackage{upgreek} \\usepackage{mathrsfs} \\setlength{\\oddsidemargin}{-69pt} \\begin{document} }{}$d_2^S$\\end{document} are more computationally expensive than measures based solely on the k-mer frequencies. Here, we report a standalone software, aCcelerated Alignment-FrEe sequence analysis (CAFE), for efficient calculation of 28 alignment-free dissimilarity measures. CAFE allows for both assembled genome sequences and unassembled NGS shotgun reads as input, and wraps the output in a standard PHYLIP format. In downstream analyses, CAFE can also be used to visualize the pairwise dissimilarity measures, including dendrograms, heatmap, principal coordinate analysis and network display. CAFE serves as a general k-mer based alignment-free analysis platform for studying the relationships among genomes and metagenomes, and is freely available at https://github.com/younglululu/CAFE. PMID:28472388
Zygosaccharomyces favi sp. nov., an obligate osmophilic yeast species from bee bread and honey.
Čadež, Neža; Fülöp, László; Dlauchy, Dénes; Péter, Gábor
2015-03-01
Five yeast strains representing a hitherto undescribed yeast species were isolated from bee bread and honey in Hungary. They are obligate osmophilic, i.e. they are unable to grow in/on high water activity culture media. Following isogamous conjugation, they form 1-4 spheroid or subspheroid ascospores in persistent asci. The analysis of the sequences of their large subunit rRNA gene D1/D2 domain placed the new species in the Zygosaccharomyces clade. In terms of pairwise sequence similarity, Zygosaccharomyces gambellarensis is the most closely related species. Comparisons of D1/D2, internal transcribed spacer and translation elongation factor-1α (EF-1α) gene sequences of the five strains with that of the type strain of Z. gambellarensis revealed that they represent a new yeast species. The name Zygosaccharomyces favi sp. nov. (type strain: NCAIM Y.01994(T) = CBS 13653(T) = NRRL Y-63719(T) = ZIM 2551(T)) is proposed for this new yeast species, which based on phenotype can be distinguished from related Zygosaccharomyces species by its obligate osmophilic nature. Some intragenomic sequence variability, mainly indels, was detected among the ITS copies of the strains of the new species.
Galilean-invariant Nosé-Hoover-type thermostats.
Pieprzyk, S; Heyes, D M; Maćkowiak, Sz; Brańka, A C
2015-03-01
A new pairwise Nosé-Hoover type thermostat for molecular dynamics (MD) simulations which is similar in construction to the pair-velocity thermostat of Allen and Schmid, [Mol. Simul. 33, 21 (2007)] (AS) but is based on the configurational thermostat is proposed and tested. Both thermostats generate the canonical velocity distribution, are Galilean invariant, and conserve linear and angular momentum. The unique feature of the pairwise thermostats is an unconditional conservation of the total angular momentum, which is important for thermalizing isolated systems and those nonequilibrium bulk systems manifesting local rotating currents. These thermostats were benchmarked against the corresponding Nosé-Hoover (NH) and Braga-Travis prescriptions, being based on the kinetic and configurational definitions of temperature, respectively. Some differences between the shear-rate-dependent shear viscosity from Sllod nonequilibrium MD are observed at high shear rates using the different thermostats. The thermostats based on the configurational temperature produced very similar monotically decaying shear viscosity (shear thinning) with increasing shear rate, while the NH method showed discontinuous shear thinning into a string phase, and the AS method produced a continuous increase of viscosity (shear thickening), after a shear thinning region at lower shear rates. Both pairwise additive thermostats are neither purely kinetic nor configurational in definition, and possible directions for further improvement in certain aspects are discussed.
Galilean-invariant Nosé-Hoover-type thermostats
NASA Astrophysics Data System (ADS)
Pieprzyk, S.; Heyes, D. M.; Maćkowiak, Sz.; Brańka, A. C.
2015-03-01
A new pairwise Nosé-Hoover type thermostat for molecular dynamics (MD) simulations which is similar in construction to the pair-velocity thermostat of Allen and Schmid, [Mol. Simul. 33, 21 (2007), 10.1080/08927020601052856] (AS) but is based on the configurational thermostat is proposed and tested. Both thermostats generate the canonical velocity distribution, are Galilean invariant, and conserve linear and angular momentum. The unique feature of the pairwise thermostats is an unconditional conservation of the total angular momentum, which is important for thermalizing isolated systems and those nonequilibrium bulk systems manifesting local rotating currents. These thermostats were benchmarked against the corresponding Nosé-Hoover (NH) and Braga-Travis prescriptions, being based on the kinetic and configurational definitions of temperature, respectively. Some differences between the shear-rate-dependent shear viscosity from Sllod nonequilibrium MD are observed at high shear rates using the different thermostats. The thermostats based on the configurational temperature produced very similar monotically decaying shear viscosity (shear thinning) with increasing shear rate, while the NH method showed discontinuous shear thinning into a string phase, and the AS method produced a continuous increase of viscosity (shear thickening), after a shear thinning region at lower shear rates. Both pairwise additive thermostats are neither purely kinetic nor configurational in definition, and possible directions for further improvement in certain aspects are discussed.
The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation.
Profiti, Giuseppe; Martelli, Pier Luigi; Casadio, Rita
2017-07-03
BAR 3.0 updates our server BAR (Bologna Annotation Resource) for predicting protein structural and functional features from sequence. We increase data volume, query capabilities and information conveyed to the user. The core of BAR 3.0 is a graph-based clustering procedure of UniProtKB sequences, following strict pairwise similarity criteria (sequence identity ≥40% with alignment coverage ≥90%). Each cluster contains the available annotation downloaded from UniProtKB, GO, PFAM and PDB. After statistical validation, GO terms and PFAM domains are cluster-specific and annotate new sequences entering the cluster after satisfying similarity constraints. BAR 3.0 includes 28 869 663 sequences in 1 361 773 clusters, of which 22.2% (22 241 661 sequences) and 47.4% (24 555 055 sequences) have at least one validated GO term and one PFAM domain, respectively. 1.4% of the clusters (36% of all sequences) include PDB structures and the cluster is associated to a hidden Markov model that allows building template-target alignment suitable for structural modeling. Some other 3 399 026 sequences are singletons. BAR 3.0 offers an improved search interface, allowing queries by UniProtKB-accession, Fasta sequence, GO-term, PFAM-domain, organism, PDB and ligand/s. When evaluated on the CAFA2 targets, BAR 3.0 largely outperforms our previous version and scores among state-of-the-art methods. BAR 3.0 is publicly available and accessible at http://bar.biocomp.unibo.it/bar3. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Links, Matthew G; Dumonceaux, Tim J; Hemmingsen, Sean M; Hill, Janet E
2012-01-01
Barcoding with molecular sequences is widely used to catalogue eukaryotic biodiversity. Studies investigating the community dynamics of microbes have relied heavily on gene-centric metagenomic profiling using two genes (16S rRNA and cpn60) to identify and track Bacteria. While there have been criteria formalized for barcoding of eukaryotes, these criteria have not been used to evaluate gene targets for other domains of life. Using the framework of the International Barcode of Life we evaluated DNA barcodes for Bacteria. Candidates from the 16S rRNA gene and the protein coding cpn60 gene were evaluated. Within complete bacterial genomes in the public domain representing 983 species from 21 phyla, the largest difference between median pairwise inter- and intra-specific distances ("barcode gap") was found from cpn60. Distribution of sequence diversity along the ∼555 bp cpn60 target region was remarkably uniform. The barcode gap of the cpn60 universal target facilitated the faithful de novo assembly of full-length operational taxonomic units from pyrosequencing data from a synthetic microbial community. Analysis supported the recognition of both 16S rRNA and cpn60 as DNA barcodes for Bacteria. The cpn60 universal target was found to have a much larger barcode gap than 16S rRNA suggesting cpn60 as a preferred barcode for Bacteria. A large barcode gap for cpn60 provided a robust target for species-level characterization of data. The assembly of consensus sequences for barcodes was shown to be a reliable method for the identification and tracking of novel microbes in metagenomic studies.
Figueiredo, Joana; Simões, Maria José; Gomes, Paula; Barroso, Cristina; Pinho, Diogo; Conceição, Luci; Fonseca, Luís; Abrantes, Isabel; Pinheiro, Miguel; Egas, Conceição
2013-01-01
The pinewood nematode, Bursaphelenchus xylophilus, is native to North America but it only causes damaging pine wilt disease in those regions of the world where it has been introduced. The accurate detection of the species and its dispersal routes are thus essential to define effective control measures. The main goals of this study were to analyse the genetic diversity among B. xylophilus isolates from different geographic locations and identify single nucleotide polymorphism (SNPs) markers for geographic origin, through a comparative transcriptomic approach. The transcriptomes of seven B. xylophilus isolates, from Continental Portugal (4), China (1), Japan (1) and USA (1), were sequenced in the next generation platform Roche 454. Analysis of effector gene transcripts revealed inter-isolate nucleotide diversity that was validated by Sanger sequencing in the genomic DNA of the seven isolates and eight additional isolates from different geographic locations: Madeira Island (2), China (1), USA (1), Japan (2) and South Korea (2). The analysis identified 136 polymorphic positions in 10 effector transcripts. Pairwise comparison of the 136 SNPs through Neighbor-Joining and the Maximum Likelihood methods and 5-mer frequency analysis with the alignment-independent bilinear multivariate modelling approach correlated the SNPs with the isolates geographic origin. Furthermore, the SNP analysis indicated a closer proximity of the Portuguese isolates to the Korean and Chinese isolates than to the Japanese or American isolates. Each geographic cluster carried exclusive alleles that can be used as SNP markers for B. xylophilus isolate identification. PMID:24391785
Dahl, Christopher R.; Bickham, John W.; Wickliffe, Jeffery K.; Custer, Thomas W.
2001-01-01
DNA sequence analysis of a 215 base-pair region of the mitochondrial cytochrome b gene was used to examine genetic variation and search for evidence of an increased mutation rate in black-crowned night-herons. We examined five populations exposed to environmental contamination (primarily PAHs and PCBs) and one reference population from the eastern U.S. There was no evidence of a high mutation rate even within populations previously shown to exhibit increased variation in DNA content among somatic cells as a result of petroleum exposure. Three haplotypes were observed among 99 individuals. The low level of variability could be evidence for a genetic bottleneck, or that cytochrome b is too conservative for use in population genetic studies of this species. With the exception of one population from Louisiana, pair-wise Phist estimates were very low, indicative of little population structure and potentially high rates of effective migration among populations.
Population genetics of the malaria vector Anopheles aconitus in China and Southeast Asia
Chen, Bin; Harbach, Ralph E.; Walton, Catherine; He, Zhengbo; Zhong, Daibin; Yan, Guiyun; Butlin, Roger K.
2012-01-01
Anopheles aconitus is a well-known vector of malaria and is broadly distributed in the Oriental Region, yet there is no information on its population genetic characteristics. In this study, the genetic differentiation among populations was examined using 140 mtDNA COII sequences from 21 sites throughout southern China, Myanmar, Vietnam, Thailand, Laos and Sri Lanka. The population in Sri Lanka has characteristic rDNA D3 and ITS2, mtDNA COII and ND5 haplotypes, and may be considered a distinct subspecies. Clear genetic structure was observed with highly significant genetic variation present among population groups in Southeast Asia. The greatest genetic diversity exists in Yunnan and Myanmar population groups. All population groups are significantly different from one another in pairwise Fst values, except northern Thailand with central Thailand. Mismatch distributions and extremely significant Fs values suggest that the populations passed through a recent demographic expansion. These patterns are discussed in relation to the likely biogeographic history of the region and compared to other Anopheles species. PMID:22982161
Xu, Rui; Yang, Zhao-Hui; Zheng, Yue; Liu, Jian-Bo; Xiong, Wei-Ping; Zhang, Yan-Ru; Lu, Yue; Xue, Wen-Jing; Fan, Chang-Zheng
2018-04-22
Understanding of how anaerobic digestion (AD)-related microbiomes are constructed by operational parameters or their interactions within the biochemical process is limited. Using high-throughput sequencing and molecular ecological network analysis, this study shows the succession of AD-related microbiome hosting diverse members of the phylum Actinobacteria, Bacteroidetes, Euryarchaeota, and Firmicutes, which were affected by organic loading rate (OLR) and hydraulic retention time (HRT). OLR formed finer microbial network modules than HRT (12 vs. 6), suggesting the further subdivision of functional components. Biomarkers were also identified in OLR or HRT groups (e.g. the family Actinomycetaceae, Methanosaetaceae and Aminiphilaceae). The most pair-wise link between Firmicutes and biogas production indicates the keystone members based on network features can be considered as markers in the regulation of AD. A set of 40% species ("core microbiome") were similar across different digesters. Such noteworthy overlap of microbiomes indicates they are generalists in maintaining the ecological stability of digesters. Copyright © 2018 Elsevier Ltd. All rights reserved.
Chang, Jian-Cheng; Ponnath, Daniel W; Ramasamy, Srinivasan
2016-01-01
Leucinodes orbonalis is the most detrimental South and Southeast Asian insect pest of eggplant. To help reduce the impact of this pest, population genetic diversity and structure of L. orbonalis were examined in eight populations from six countries using mitochondrial cytochrome c oxidase subunit I DNA sequences. No correlation between genetic diversity and geographic distance was detected among populations. Low levels of haplotype and nucleotide diversities were observed in the Philippines population, suggesting recent colonization. No significant gene flow was found among local populations in different countries. The Vietnam population is highly differentiated, indicated by significant pairwise FST values, and may be ascribed to a new subspecies or race. India was confirmed to be the source of genetic variation in L. orbonalis populations. Our study showed that L. orbonalis formed subpopulations for each local region, and the corresponding pest management technology should be developed at the country scale.
Constructing STR multiplexes for individual identification of Hungarian red deer.
Szabolcsi, Zoltan; Egyed, Balazs; Zenke, Petra; Padar, Zsolt; Borsy, Adrienn; Steger, Viktor; Pasztor, Erzsebet; Csanyi, Sandor; Buzas, Zsuzsanna; Orosz, Laszlo
2014-07-01
Red deer is the most valuable game of the fauna in Hungary, and there is a strong need for genetic identification of individuals. For this purpose, 10 tetranucleotide STR markers were developed and amplified in two 5-plex systems. The study presented here includes the flanking region sequence analysis and the allele nomenclature of the 10 loci as well as the PCR optimization of the DeerPlex I and II. LD pairwise tests and cross-species similarity analyses showed the 10 loci to be independently inherited. Considerable levels of genetic differences between two subpopulations were recorded, and F(ST) was 0.034 using AMOVA. The average probability of identity (PI(ave)) was at the value of 2.6736 × 10(-15). This low value for PI(ave) nearly eliminates false identification. An illegal hunting case solved by DeerPlex is described herein. The calculated likelihood ratio (LR) illustrates the potential of the 10 red deer microsatellite markers for forensic investigations. © 2014 American Academy of Forensic Sciences.
Naz, Sadia; Ngo, Tony; Farooq, Umar
2017-01-01
Background The rapid increase in antibiotic resistance by various bacterial pathogens underlies the significance of developing new therapies and exploring different drug targets. A fraction of bacterial pathogens abbreviated as ESKAPE by the European Center for Disease Prevention and Control have been considered a major threat due to the rise in nosocomial infections. Here, we compared putative drug binding pockets of twelve essential and mostly conserved metabolic enzymes in numerous bacterial pathogens including those of the ESKAPE group and Mycobacterium tuberculosis. The comparative analysis will provide guidelines for the likelihood of transferability of the inhibitors from one species to another. Methods Nine bacterial species including six ESKAPE pathogens, Mycobacterium tuberculosis along with Mycobacterium smegmatis and Eschershia coli, two non-pathogenic bacteria, have been selected for drug binding pocket analysis of twelve essential enzymes. The amino acid sequences were obtained from Uniprot, aligned using ICM v3.8-4a and matched against the Pocketome encyclopedia. We used known co-crystal structures of selected target enzyme orthologs to evaluate the location of their active sites and binding pockets and to calculate a matrix of pairwise sequence identities across each target enzyme across the different species. This was used to generate sequence maps. Results High sequence identity of enzyme binding pockets, derived from experimentally determined co-crystallized structures, was observed among various species. Comparison at both full sequence level and for drug binding pockets of key metabolic enzymes showed that binding pockets are highly conserved (sequence similarity up to 100%) among various ESKAPE pathogens as well as Mycobacterium tuberculosis. Enzymes orthologs having conserved binding sites may have potential to interact with inhibitors in similar way and might be helpful for design of similar class of inhibitors for a particular species. The derived pocket alignments and distance-based maps provide guidelines for drug discovery and repurposing. In addition they also provide recommendations for the relevant model bacteria that may be used for initial drug testing. Discussion Comparing ligand binding sites through sequence identity calculation could be an effective approach to identify conserved orthologs as drug binding pockets have shown higher level of conservation among various species. By using this approach we could avoid the problems associated with full sequence comparison. We identified essential metabolic enzymes among ESKAPE pathogens that share high sequence identity in their putative drug binding pockets (up to 100%), of which known inhibitors can potentially antagonize these identical pockets in the various species in a similar manner. PMID:28948099
Naz, Sadia; Ngo, Tony; Farooq, Umar; Abagyan, Ruben
2017-01-01
The rapid increase in antibiotic resistance by various bacterial pathogens underlies the significance of developing new therapies and exploring different drug targets. A fraction of bacterial pathogens abbreviated as ESKAPE by the European Center for Disease Prevention and Control have been considered a major threat due to the rise in nosocomial infections. Here, we compared putative drug binding pockets of twelve essential and mostly conserved metabolic enzymes in numerous bacterial pathogens including those of the ESKAPE group and Mycobacterium tuberculosis . The comparative analysis will provide guidelines for the likelihood of transferability of the inhibitors from one species to another. Nine bacterial species including six ESKAPE pathogens, Mycobacterium tuberculosis along with Mycobacterium smegmatis and Eschershia coli , two non-pathogenic bacteria, have been selected for drug binding pocket analysis of twelve essential enzymes. The amino acid sequences were obtained from Uniprot, aligned using ICM v3.8-4a and matched against the Pocketome encyclopedia. We used known co-crystal structures of selected target enzyme orthologs to evaluate the location of their active sites and binding pockets and to calculate a matrix of pairwise sequence identities across each target enzyme across the different species. This was used to generate sequence maps. High sequence identity of enzyme binding pockets, derived from experimentally determined co-crystallized structures, was observed among various species. Comparison at both full sequence level and for drug binding pockets of key metabolic enzymes showed that binding pockets are highly conserved (sequence similarity up to 100%) among various ESKAPE pathogens as well as Mycobacterium tuberculosis . Enzymes orthologs having conserved binding sites may have potential to interact with inhibitors in similar way and might be helpful for design of similar class of inhibitors for a particular species. The derived pocket alignments and distance-based maps provide guidelines for drug discovery and repurposing. In addition they also provide recommendations for the relevant model bacteria that may be used for initial drug testing. Comparing ligand binding sites through sequence identity calculation could be an effective approach to identify conserved orthologs as drug binding pockets have shown higher level of conservation among various species. By using this approach we could avoid the problems associated with full sequence comparison. We identified essential metabolic enzymes among ESKAPE pathogens that share high sequence identity in their putative drug binding pockets (up to 100%), of which known inhibitors can potentially antagonize these identical pockets in the various species in a similar manner.
An alternative view of protein fold space.
Shindyalov, I N; Bourne, P E
2000-02-15
Comparing and subsequently classifying protein structures information has received significant attention concurrent with the increase in the number of experimentally derived 3-dimensional structures. Classification schemes have focused on biological function found within protein domains and on structure classification based on topology. Here an alternative view is presented that groups substructures. Substructures are long (50-150 residue) highly repetitive near-contiguous pieces of polypeptide chain that occur frequently in a set of proteins from the PDB defined as structurally non-redundant over the complete polypeptide chain. The substructure classification is based on a previously reported Combinatorial Extension (CE) algorithm that provides a significantly different set of structure alignments than those previously described, having, for example, only a 40% overlap with FSSP. Qualitatively the algorithm provides longer contiguous aligned segments at the price of a slightly higher root-mean-square deviation (rmsd). Clustering these alignments gives a discreet and highly repetitive set of substructures not detectable by sequence similarity alone. In some cases different substructures represent all or different parts of well known folds indicative of the Russian doll effect--the continuity of protein fold space. In other cases they fall into different structure and functional classifications. It is too early to determine whether these newly classified substructures represent new insights into the evolution of a structural framework important to many proteins. What is apparent from on-going work is that these substructures have the potential to be useful probes in finding remote sequence homology and in structure prediction studies. The characteristics of the complete all-by-all comparison of the polypeptide chains present in the PDB and details of the filtering procedure by pair-wise structure alignment that led to the emergent substructure gallery are discussed. Substructure classification, alignments, and tools to analyze them are available at http://cl.sdsc.edu/ce.html.
Zhang, Zhengqing; Chang, Yong; Li, Menglou
2017-06-01
Dastarcus helophoroides (Fairmaire) (Coleoptera: Bothrideridae) is an important natural enemy of long-horned beetles in China, Japan, and Korea. In this study, the genetic sequence of cytochrome oxidase subunit Ι was used to investigate the genetics and relationships within and among D. helophoroides populations collected from five different geographic locations. We used principal component analysis, heatmap, and Venn diagram results to determine the relationship between haplotypes and populations. In total, 26 haplotypes with 51 nucleotide polymorphic sites were defined, and low genetic diversity was found among the different populations. Significant genetic variations were observed mainly within populations, and no correlation was found between genetic distribution and geographical distance. Low pairwise fixation index values (-0.01424 to 0.04896) and high gene flows show that there was high gene exchange between populations. The codistributed haplotype DH01 was suggested to be the most ancestral haplotype, and other haplotypes were thought to have evolved from it through several mutations. In four of the populations, both common haplotypes (DH01, DH03, and DH22) and unique haplotypes were found. Low genetic diversity among different populations is related to a relatively high flight capacity, host movement, and human-aided dispersal of D. helophoroides. The high gene exchange and typically weak population genetic structure among five populations, especially among populations of Anoplophora glabripennis (Motschulsky), Monochamus alternatus (Hope), and Massicus raddei (Blessig), may suggest that these populations cross naturally in the field. © The Authors 2017. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age
2010-01-01
Background Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. Results We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. Conclusions GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events. PMID:20534164
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age.
Thomas, Paul D
2010-06-09
Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events.
Villacís, Anita G; Marcet, Paula L; Yumiseva, César A; Dotson, Ellen M; Tibayrenc, Michel; Brenière, Simone Frédérique; Grijalva, Mario J
2017-09-01
Effective control of Chagas disease vector populations requires a good understanding of the epidemiological components, including a reliable analysis of the genetic structure of vector populations. Rhodnius ecuadoriensis is the most widespread vector of Chagas disease in Ecuador, occupying domestic, peridomestic and sylvatic habitats. It is widely distributed in the central coast and southern highlands regions of Ecuador, two very different regions in terms of bio-geographical characteristics. To evaluate the genetic relationship among R. ecuadoriensis populations in these two regions, we analyzed genetic variability at two microsatellite loci for 326 specimens (n=122 in Manabí and n=204 in Loja) and the mitochondrial cytochrome b gene (Cyt b) sequences for 174 individuals collected in the two provinces (n=73 and=101 in Manabí and Loja respectively). The individual samples were grouped in populations according to their community of origin. A few populations presented positive F IS, possible due to Wahlund effect. Significant pairwise differentiation was detected between populations within each province for both genetic markers, and the isolation by distance model was significant for these populations. Microsatellite markers showed significant genetic differentiation between the populations of the two provinces. The partial sequences of the Cyt b gene (578bp) identified a total of 34 haplotypes among 174 specimens sequenced, which translated into high haplotype diversity (Hd=0.929). The haplotype distribution differed among provinces (significant Fisher's exact test). Overall, the genetic differentiation of R. ecuadoriensis between provinces detected in this study is consistent with the biological and phenotypic differences previously observed between Manabí and Loja populations. The current phylogenetic analysis evidenced the monophyly of the populations of R. ecuadoriensis within the R. pallescens species complex; R. pallescens and R. colombiensis were more closely related than they were to R. ecuadoriensis. Copyright © 2017 Elsevier B.V. All rights reserved.
Bioinformatic mining of EST-SSR loci in the Pacific oyster, Crassostrea gigas.
Wang, Y; Ren, R; Yu, Z
2008-06-01
A set of expressed sequence tag-simple sequence repeat (EST-SSR) markers of the Pacific oyster, Crassostrea gigas, was developed through bioinformatic mining of the GenBank public database. As of June 30, 2007, a total of 5132 EST sequences from GenBank were downloaded and screened for di-, tri- and tetra-nucleotide repeats, with criteria set at a minimum of 5, 4 and 4 repeats for the three categories of SSRs respectively. Seventeen polymorphic microsatellite markers were characterized. Allele numbers ranged from 3 to 10, and the observed and expected heterozygosity values varied from 0.125 to 0.770 and from 0.113 to 0.732 respectively. Eleven loci were at Hardy-Weinberg equilibrium (HWE); the other six loci showed significant departure from HWE (P < 0.01), suggesting possible presence of null alleles. Pairwise check of linkage disequilibrium (LD) indicated that 11 of 136 pairs of loci showed significant LD (P < 0.01), likely due to HWE present in single markers. Cross-species amplification was examined for five other Crassostrea species and reasonable results were obtained, promising usefulness of these markers in oyster genetics.
SODa: an Mn/Fe superoxide dismutase prediction and design server.
Kwasigroch, Jean Marc; Wintjens, René; Gilis, Dimitri; Rooman, Marianne
2008-06-02
Superoxide dismutases (SODs) are ubiquitous metalloenzymes that play an important role in the defense of aerobic organisms against oxidative stress, by converting reactive oxygen species into nontoxic molecules. We focus here on the SOD family that uses Fe or Mn as cofactor. The SODa webtool http://babylone.ulb.ac.be/soda predicts if a target sequence corresponds to an Fe/Mn SOD. If so, it predicts the metal ion specificity (Fe, Mn or cambialistic) and the oligomerization mode (dimer or tetramer) of the target. In addition, SODa proposes a list of residue substitutions likely to improve the predicted preferences for the metal cofactor and oligomerization mode. The method is based on residue fingerprints, consisting of residues conserved in SOD sequences or typical of SOD subgroups, and of interaction fingerprints, containing residue pairs that are in contact in SOD structures. SODa is shown to outperform and to be more discriminative than traditional techniques based on pairwise sequence alignments. Moreover, the fact that it proposes selected mutations makes it a valuable tool for rational protein design.
Classification of HCV and HIV-1 Sequences with the Branching Index
Hraber, Peter; Kuiken, Carla; Waugh, Mark; Geer, Shaun; Bruno, William J.; Leitner, Thomas
2009-01-01
SUMMARY Classification of viral sequences should be fast, objective, accurate, and reproducible. Most methods that classify sequences use either pairwise distances or phylogenetic relations, but cannot discern when a sequence is unclassifiable. The branching index (BI) combines distance and phylogeny methods to compute a ratio that quantifies how closely a query sequence clusters with a subtype clade. In the hypothesis-testing framework of statistical inference, the BI is compared with a threshold to test whether sufficient evidence exists for the query sequence to be classified among known sequences. If above the threshold, the null hypothesis of no support for the subtype relation is rejected and the sequence is taken as belonging to the subtype clade with which it clusters on the tree. This study evaluates statistical properties of the branching index for subtype classification in HCV and HIV-1. Pairs of BI values with known positive and negative test results were computed from 10,000 random fragments of reference alignments. Sampled fragments were of sufficient length to contain phylogenetic signal that groups reference sequences together properly into subtype clades. For HCV, a threshold BI of 0.71 yields 95.1% agreement with reference subtypes, with equal false positive and false negative rates. For HIV-1, a threshold of 0.66 yields 93.5% agreement. Higher thresholds can be used where lower false positive rates are required. In synthetic recombinants, regions without breakpoints are recognized accurately; regions with breakpoints do not uniquely represent any known subtype. Web-based services for viral subtype classification with the branching index are available online. PMID:18753218
Analysis of X-ray structures of matrix metalloproteinases via chaotic map clustering.
Giangreco, Ilenia; Nicolotti, Orazio; Carotti, Angelo; De Carlo, Francesco; Gargano, Gianfranco; Bellotti, Roberto
2010-10-08
Matrix metalloproteinases (MMPs) are well-known biological targets implicated in tumour progression, homeostatic regulation, innate immunity, impaired delivery of pro-apoptotic ligands, and the release and cleavage of cell-surface receptors. With this in mind, the perception of the intimate relationships among diverse MMPs could be a solid basis for accelerated learning in designing new selective MMP inhibitors. In this regard, decrypting the latent molecular reasons in order to elucidate similarity among MMPs is a key challenge. We describe a pairwise variant of the non-parametric chaotic map clustering (CMC) algorithm and its application to 104 X-ray MMP structures. In this analysis electrostatic potentials are computed and used as input for the CMC algorithm. It was shown that differences between proteins reflect genuine variation of their electrostatic potentials. In addition, the analysis has been also extended to analyze the protein primary structures and the molecular shapes of the MMP co-crystallised ligands. The CMC algorithm was shown to be a valuable tool in knowledge acquisition and transfer from MMP structures. Based on the variation of electrostatic potentials, CMC was successful in analysing the MMP target family landscape and different subsites. The first investigation resulted in rational figure interpretation of both domain organization as well as of substrate specificity classifications. The second made it possible to distinguish the MMP classes, demonstrating the high specificity of the S1' pocket, to detect both the occurrence of punctual mutations of ionisable residues and different side-chain conformations that likely account for induced-fit phenomena. In addition, CMC demonstrated a potential comparable to the most popular UPGMA (Unweighted Pair Group Method with Arithmetic mean) method that, at present, represents a standard clustering bioinformatics approach. Interestingly, CMC and UPGMA resulted in closely comparable outcomes, but often CMC produced more informative and more easy interpretable dendrograms. Finally, CMC was successful for standard pairwise analysis (i.e., Smith-Waterman algorithm) of protein sequences and was used to convincingly explain the complementarity existing between the molecular shapes of the co-crystallised ligand molecules and the accessible MMP void volumes.
related: an R package for analysing pairwise relatedness from codominant molecular markers.
Pew, Jack; Muir, Paul H; Wang, Jinliang; Frasier, Timothy R
2015-05-01
Analyses of pairwise relatedness represent a key component to addressing many topics in biology. However, such analyses have been limited because most available programs provide a means to estimate relatedness based on only a single estimator, making comparison across estimators difficult. Second, all programs to date have been platform specific, working only on a specific operating system. This has the undesirable outcome of making choice of relatedness estimator limited by operating system preference, rather than being based on scientific rationale. Here, we present a new R package, called related, that can calculate relatedness based on seven estimators, can account for genotyping errors, missing data and inbreeding, and can estimate 95% confidence intervals. Moreover, simulation functions are provided that allow for easy comparison of the performance of different estimators and for analyses of how much resolution to expect from a given data set. Because this package works in R, it is platform independent. Combined, this functionality should allow for more appropriate analyses and interpretation of pairwise relatedness and will also allow for the integration of relatedness data into larger R workflows. © 2014 John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Sari, Halil Ibrahim; Huggins, Anne Corinne
2015-01-01
This study compares two methods of defining groups for the detection of differential item functioning (DIF): (a) pairwise comparisons and (b) composite group comparisons. We aim to emphasize and empirically support the notion that the choice of pairwise versus composite group definitions in DIF is a reflection of how one defines fairness in DIF…
Testing hypotheses for differences between linear regression lines
Stanley J. Zarnoch
2009-01-01
Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
BioWord: A sequence manipulation suite for Microsoft Word
2012-01-01
Background The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. Results BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. Conclusions BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms. PMID:22676326
HIPPI: highly accurate protein family classification with ensembles of HMMs.
Nguyen, Nam-Phuong; Nute, Michael; Mirarab, Siavash; Warnow, Tandy
2016-11-11
Given a new biological sequence, detecting membership in a known family is a basic step in many bioinformatics analyses, with applications to protein structure and function prediction and metagenomic taxon identification and abundance profiling, among others. Yet family identification of sequences that are distantly related to sequences in public databases or that are fragmentary remains one of the more difficult analytical problems in bioinformatics. We present a new technique for family identification called HIPPI (Hierarchical Profile Hidden Markov Models for Protein family Identification). HIPPI uses a novel technique to represent a multiple sequence alignment for a given protein family or superfamily by an ensemble of profile hidden Markov models computed using HMMER. An evaluation of HIPPI on the Pfam database shows that HIPPI has better overall precision and recall than blastp, HMMER, and pipelines based on HHsearch, and maintains good accuracy even for fragmentary query sequences and for protein families with low average pairwise sequence identity, both conditions where other methods degrade in accuracy. HIPPI provides accurate protein family identification and is robust to difficult model conditions. Our results, combined with observations from previous studies, show that ensembles of profile Hidden Markov models can better represent multiple sequence alignments than a single profile Hidden Markov model, and thus can improve downstream analyses for various bioinformatic tasks. Further research is needed to determine the best practices for building the ensemble of profile Hidden Markov models. HIPPI is available on GitHub at https://github.com/smirarab/sepp .
BioWord: a sequence manipulation suite for Microsoft Word.
Anzaldi, Laura J; Muñoz-Fernández, Daniel; Erill, Ivan
2012-06-07
The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms.
Niira, Kazutaka; Ito, Mika; Masuda, Tsuneyuki; Saitou, Toshiya; Abe, Tadatsugu; Komoto, Satoshi; Sato, Mitsuo; Yamasato, Hiroshi; Kishimoto, Mai; Naoi, Yuki; Sano, Kaori; Tuchiaka, Shinobu; Okada, Takashi; Omatsu, Tsutomu; Furuya, Tetsuya; Aoki, Hiroshi; Katayama, Yukie; Oba, Mami; Shirai, Junsuke; Taniguchi, Koki; Mizutani, Tetsuya; Nagai, Makoto
2016-10-01
Porcine rotavirus C (RVC) is distributed throughout the world and is thought to be a pathogenic agent of diarrhea in piglets. Although, the VP7, VP4, and VP6 gene sequences of Japanese porcine RVCs are currently available, there is no whole-genome sequence data of Japanese RVC. Furthermore, only one to three sequences are available for porcine RVC VP1-VP3 and NSP1-NSP3 genes. Therefore, we determined nearly full-length whole-genome sequences of nine Japanese porcine RVCs from seven piglets with diarrhea and two healthy pigs and compared them with published RVC sequences from a database. The VP7 genes of two Japanese RVCs from healthy pigs were highly divergent from other known RVC strains and were provisionally classified as G12 and G13 based on the 86% nucleotide identity cut-off value. Pairwise sequence identity calculations and phylogenetic analyses revealed that candidate novel genotypes of porcine Japanese RVC were identified in the NSP1, NSP2 and NSP3 encoding genes, respectively. Furthermore, VP3 of Japanese porcine RVCs was shown to be closely related to human RVCs, suggesting a gene reassortment event between porcine and human RVCs and past interspecies transmission. The present study demonstrated that porcine RVCs show greater genetic diversity among strains than human and bovine RVCs. Copyright © 2016 Elsevier B.V. All rights reserved.
Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man
Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.
2000-01-01
The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409
Eisinga, Rob; Heskes, Tom; Pelzer, Ben; Te Grotenhuis, Manfred
2017-01-25
The Friedman rank sum test is a widely-used nonparametric method in computational biology. In addition to examining the overall null hypothesis of no significant difference among any of the rank sums, it is typically of interest to conduct pairwise comparison tests. Current approaches to such tests rely on large-sample approximations, due to the numerical complexity of computing the exact distribution. These approximate methods lead to inaccurate estimates in the tail of the distribution, which is most relevant for p-value calculation. We propose an efficient, combinatorial exact approach for calculating the probability mass distribution of the rank sum difference statistic for pairwise comparison of Friedman rank sums, and compare exact results with recommended asymptotic approximations. Whereas the chi-squared approximation performs inferiorly to exact computation overall, others, particularly the normal, perform well, except for the extreme tail. Hence exact calculation offers an improvement when small p-values occur following multiple testing correction. Exact inference also enhances the identification of significant differences whenever the observed values are close to the approximate critical value. We illustrate the proposed method in the context of biological machine learning, were Friedman rank sum difference tests are commonly used for the comparison of classifiers over multiple datasets. We provide a computationally fast method to determine the exact p-value of the absolute rank sum difference of a pair of Friedman rank sums, making asymptotic tests obsolete. Calculation of exact p-values is easy to implement in statistical software and the implementation in R is provided in one of the Additional files and is also available at http://www.ru.nl/publish/pages/726696/friedmanrsd.zip .
Caveats for the spatial arrangement method: Comment on Hout, Goldinger, and Ferguson (2013).
Verheyen, Steven; Voorspoels, Wouter; Vanpaemel, Wolf; Storms, Gert
2016-03-01
The gold standard among proximity data collection methods for multidimensional scaling is the (dis)similarity rating of pairwise presented stimuli. A drawback of the pairwise method is its lengthy duration, which may cause participants to change their strategy over time, become fatigued, or disengage altogether. Hout, Goldinger, and Ferguson (2013) recently made a case for the Spatial Arrangement Method (SpAM) as an alternative to the pairwise method, arguing that it is faster and more engaging. SpAM invites participants to directly arrange stimuli on a computer screen such that the interstimuli distances are proportional to psychological proximity. Based on a reanalysis of the Hout et al. (2013), data we identify three caveats for SpAM. An investigation of the distributional characteristics of the SpAM proximity data reveals that the spatial nature of SpAM imposes structure on the data, invoking a bias against featural representations. Individual-differences scaling of the SpAM proximity data reveals that the two-dimensional nature of SpAM allows individuals to only communicate two dimensions of variation among stimuli properly, invoking a bias against high-dimensional scaling representations. Monte Carlo simulations indicate that in order to obtain reliable estimates of the group average, SpAM requires more individuals to be tested. We conclude with an overview of considerations that can inform the choice between SpAM and the pairwise method and offer suggestions on how to overcome their respective limitations. (c) 2016 APA, all rights reserved).
Carlson, Jonathan M.; Chan, Benjamin; Chopera, Denis R.; Brumme, Chanson J.; Markle, Tristan J.; Martin, Eric; Shahid, Aniqa; Anmole, Gursev; Mwimanzi, Philip; Nassab, Pauline; Penney, Kali A.; Rahman, Manal A.; Milloy, M.-J.; Schechter, Martin T.; Markowitz, Martin; Carrington, Mary; Walker, Bruce D.; Wagner, Theresa; Buchbinder, Susan; Fuchs, Jonathan; Koblin, Beryl; Mayer, Kenneth H.; Harrigan, P. Richard; Brockman, Mark A.; Poon, Art F. Y.; Brumme, Zabrina L.
2014-01-01
HLA-restricted immune escape mutations that persist following HIV transmission could gradually spread through the viral population, thereby compromising host antiviral immunity as the epidemic progresses. To assess the extent and phenotypic impact of this phenomenon in an immunogenetically diverse population, we genotypically and functionally compared linked HLA and HIV (Gag/Nef) sequences from 358 historic (1979–1989) and 382 modern (2000–2011) specimens from four key cities in the North American epidemic (New York, Boston, San Francisco, Vancouver). Inferred HIV phylogenies were star-like, with approximately two-fold greater mean pairwise distances in modern versus historic sequences. The reconstructed epidemic ancestral (founder) HIV sequence was essentially identical to the North American subtype B consensus. Consistent with gradual diversification of a “consensus-like” founder virus, the median “background” frequencies of individual HLA-associated polymorphisms in HIV (in individuals lacking the restricting HLA[s]) were ∼2-fold higher in modern versus historic HIV sequences, though these remained notably low overall (e.g. in Gag, medians were 3.7% in the 2000s versus 2.0% in the 1980s). HIV polymorphisms exhibiting the greatest relative spread were those restricted by protective HLAs. Despite these increases, when HIV sequences were analyzed as a whole, their total average burden of polymorphisms that were “pre-adapted” to the average host HLA profile was only ∼2% greater in modern versus historic eras. Furthermore, HLA-associated polymorphisms identified in historic HIV sequences were consistent with those detectable today, with none identified that could explain the few HIV codons where the inferred epidemic ancestor differed from the modern consensus. Results are therefore consistent with slow HIV adaptation to HLA, but at a rate unlikely to yield imminent negative implications for cellular immunity, at least in North America. Intriguingly, temporal changes in protein activity of patient-derived Nef (though not Gag) sequences were observed, suggesting functional implications of population-level HIV evolution on certain viral proteins. PMID:24762668
Ortí, G; Meyer, A
1996-04-01
The rate and pattern of DNA evolution of ependymin, a single-copy gene coding for a highly expressed glycoprotein in the brain matrix of teleost fishes, is characterized and its phylogenetic utility for fish systematics is assessed. DNA sequences were determined from catfish, electric fish, and characiforms and compared with published ependymin sequences from cyprinids, salmon, pike, and herring. Among these groups, ependymin amino acid sequences were highly divergent (up to 60% sequence difference), but had surprisingly similar hydropathy profiles and invariant glycosylation sites, suggesting that functional properties of the proteins are conserved. Comparison of base composition at third codon positions and introns revealed AT-rich introns and GC-rich third codon positions, suggesting that the biased codon usage observed might not be due to mutational bias. Phylogenetic information content of third codon positions was surprisingly high and sufficient to recover the most basal nodes of the tree, in spite of the observation that pairwise distances (at third codon positions) were well above the presumed saturation level. This finding can be explained by the high proportion of phylogenetically informative nonsynonymous changes at third codon positions among these highly divergent proteins. Ependymin DNA sequences have established the first molecular evidence for the monophyly of a group containing salmonids and esociforms. In addition, ependymin suggests a sister group relationship of electric fish (Gymnotiformes) and Characiformes, constituting a significant departure from currently accepted classifications. However, relationships among characiform lineages were not completely resolved by ependymin sequences in spite of seemingly appropriate levels of variation among taxa and considerably low levels of homoplasy in the data (consistency index = 0.7). If the diversification of Characiformes took place in an "explosive" manner, over a relatively short period of time this pattern should also be observed using other phylogenetic markers. Poor conservation of ependymin's primary structure hinders the design of efficient primers for PCR that could be used in wide-ranging fish systematic studies. However, alternative methods like PCR amplification from cDNA used here should provide promising comparative sequence data for the resolution of phylogenetic relationships among other basal lineages of teleost fishes.
JCoDA: a tool for detecting evolutionary selection.
Steinway, Steven N; Dannenfelser, Ruth; Laucius, Christopher D; Hayes, James E; Nayak, Sudhir
2010-05-27
The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda.
JCoDA: a tool for detecting evolutionary selection
2010-01-01
Background The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. Results JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. Conclusions JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda. PMID:20507581
Spotin, Adel; Mahami-Oskouei, Mahmoud; Harandi, Majid Fasihi; Baratchian, Mehdi; Bordbar, Ali; Ahmadpour, Ehsan; Ebrahimi, Sahar
2017-01-01
To investigate the genetic variability and population structure of Echinococcus granulosus complex, 79 isolates were sequenced from different host species covering human, dog, camel, goat, sheep and cattle as of various geographical sub-populations of Iran (Northwestern, Northern, and Southeastern). In addition, 36 sequences of other geographical populations (Western, Southeastern and Central Iran), were directly retrieved from GenBank database for the mitochondrial cytochrome c oxidase subunit 1 (cox1) gene. The confirmed isolates were grouped as G1 genotype (n=92), G6 genotype (n=14), G3 genotype (n=8) and G2 genotype (n=1). 50 unique haplotypes were identified based on the analyzed sequences of cox1. A parsimonious network of the sequence haplotypes displayed star-like features in the overall population containing IR23 (22: 19.1%) as the most common haplotype. According to the analysis of molecular variance (AMOVA) test, the high value of haplotype diversity of E. granulosus complex was shown the total genetic variability within populations while nucleotide diversity was low in all populations. Neutrality indices of the cox1 (Tajima's D and Fu's Fs tests) were shown negative values in Western-Northwestern, Northern and Southeastern populations which indicating significant divergence from neutrality and positive but not significant in Central isolates. A pairwise fixation index (Fst) as a degree of gene flow was generally low value for all populations (0.00647-0.15198). The statistically Fst values indicate that Echinococcus sensu stricto (genotype G1-G3) populations are not genetically well differentiated in various geographical regions of Iran. To appraise the hypothetical evolutionary scenario, further study is needed to analyze concatenated mitogenomes and as well a panel of single locus nuclear markers should be considered in wider areas of Iran and neighboring countries. Copyright © 2016 Elsevier B.V. All rights reserved.
Suwannasai, Nuttika; Martín, María P; Phosri, Cherdchai; Sihanonth, Prakitsin; Whalley, Anthony J S; Spouge, John L
2013-01-01
Thailand, a part of the Indo-Burma biodiversity hotspot, has many endemic animals and plants. Some of its fungal species are difficult to recognize and separate, complicating assessments of biodiversity. We assessed species diversity within the fungal genera Annulohypoxylon and Hypoxylon, which produce biologically active and potentially therapeutic compounds, by applying classical taxonomic methods to 552 teleomorphs collected from across Thailand. Using probability of correct identification (PCI), we also assessed the efficacy of automated species identification with a fungal barcode marker, ITS, in the model system of Annulohypoxylon and Hypoxylon. The 552 teleomorphs yielded 137 ITS sequences; in addition, we examined 128 GenBank ITS sequences, to assess biases in evaluating a DNA barcode with GenBank data. The use of multiple sequence alignment in a barcode database like BOLD raises some concerns about non-protein barcode markers like ITS, so we also compared species identification using different alignment methods. Our results suggest the following. (1) Multiple sequence alignment of ITS sequences is competitive with pairwise alignment when identifying species, so BOLD should be able to preserve its present bioinformatics workflow for species identification for ITS, and possibly therefore with at least some other non-protein barcode markers. (2) Automated species identification is insensitive to a specific choice of evolutionary distance, contributing to resolution of a current debate in DNA barcoding. (3) Statistical methods are available to address, at least partially, the possibility of expert misidentification of species. Phylogenetic trees discovered a cryptic species and strongly supported monophyletic clades for many Annulohypoxylon and Hypoxylon species, suggesting that ITS can contribute usefully to a barcode for these fungi. The PCIs here, derived solely from ITS, suggest that a fungal barcode will require secondary markers in Annulohypoxylon and Hypoxylon, however. The URL http://tinyurl.com/spouge-barcode contains computer programs and other supplementary material relevant to this article.
Fontenele, Rafaela S; Alves-Freitas, Dione M T; Silva, Pedro I T; Foresti, Josemar; Silva, Paulo R; Godinho, Márcio T; Varsani, Arvind; Ribeiro, Simone G
2018-01-01
The genus Mastrevirus (family Geminiviridae) is composed of single-stranded DNA viruses that infect mono- and dicotyledonous plants and are transmitted by leafhoppers. In South America, there have been only two previous reports of mastreviruses, both identified in sweet potatoes (from Peru and Uruguay). As part of a general viral surveillance program, we used a vector-enabled metagenomics (VEM) approach and sampled leafhoppers (Dalbulus maidis) in Itumbiara (State of Goiás), Brazil. High-throughput sequencing of viral DNA purified from the leafhopper sample revealed mastrevirus-like contigs. Using a set of abutting primers, a 2746-nt circular genome was recovered. The circular genome has a typical mastrevirus genome organization and shares <63% pairwise identity with other mastrevirus isolates from around the world. Therefore, the new mastrevirus was tentatively named "maize striate mosaic virus". Seventeen maize leaf samples were collected in the same field as the leafhoppers, and ten samples were found to be positive for this mastrevirus. Furthermore, the ten genomes recovered from the maize samples share >99% pairwise identity with the one from the leafhopper. This is the first report of a maize-infecting mastrevirus in the Americas, the first identified in a non-vegetatively propagated mastrevirus host in South America, and the first mastrevirus to be identified in Brazil.
Molecular characterization of diazotrophic and denitrifying bacteria associated with mangrove roots.
Flores-Mireles, Ana L; Winans, Stephen C; Holguin, Gina
2007-11-01
An analysis of the molecular diversity of N(2) fixers and denitrifiers associated with mangrove roots was performed using terminal restriction length polymorphism (T-RFLP) of nifH (N(2) fixation) and nirS and nirK (denitrification), and the compositions and structures of these communities among three sites were compared. The number of operational taxonomic units (OTU) for nifH was higher than that for nirK or nirS at all three sites. Site 3, which had the highest organic matter and sand content in the rhizosphere sediment, as well as the lowest pore water oxygen concentration, had the highest nifH diversity. Principal component analysis of biogeochemical parameters identified soil texture, organic matter content, pore water oxygen concentration, and salinity as the main variables that differentiated the sites. Nonmetric multidimensional scaling (MDS) analyses of the T-RFLP data using the Bray-Curtis coefficient, group analyses, and pairwise comparisons between the sites clearly separated the OTU of site 3 from those of sites 1 and 2. For nirS, there were statistically significant differences in the composition of OTU among the sites, but the variability was less than for nifH. OTU defined on the basis of nirK were highly similar, and the three sites were not clearly separated on the basis of these sequences. The phylogenetic trees of nifH, nirK, and nirS showed that most of the cloned sequences were more similar to sequences from the rhizosphere isolates than to those from known strains or from other environments.
Molecular Characterization of Diazotrophic and Denitrifying Bacteria Associated with Mangrove Roots▿
Flores-Mireles, Ana L.; Winans, Stephen C.; Holguin, Gina
2007-01-01
An analysis of the molecular diversity of N2 fixers and denitrifiers associated with mangrove roots was performed using terminal restriction length polymorphism (T-RFLP) of nifH (N2 fixation) and nirS and nirK (denitrification), and the compositions and structures of these communities among three sites were compared. The number of operational taxonomic units (OTU) for nifH was higher than that for nirK or nirS at all three sites. Site 3, which had the highest organic matter and sand content in the rhizosphere sediment, as well as the lowest pore water oxygen concentration, had the highest nifH diversity. Principal component analysis of biogeochemical parameters identified soil texture, organic matter content, pore water oxygen concentration, and salinity as the main variables that differentiated the sites. Nonmetric multidimensional scaling (MDS) analyses of the T-RFLP data using the Bray-Curtis coefficient, group analyses, and pairwise comparisons between the sites clearly separated the OTU of site 3 from those of sites 1 and 2. For nirS, there were statistically significant differences in the composition of OTU among the sites, but the variability was less than for nifH. OTU defined on the basis of nirK were highly similar, and the three sites were not clearly separated on the basis of these sequences. The phylogenetic trees of nifH, nirK, and nirS showed that most of the cloned sequences were more similar to sequences from the rhizosphere isolates than to those from known strains or from other environments. PMID:17827324
Comparison of Metabolic Pathways in Escherichia coli by Using Genetic Algorithms.
Ortegon, Patricia; Poot-Hernández, Augusto C; Perez-Rueda, Ernesto; Rodriguez-Vazquez, Katya
2015-01-01
In order to understand how cellular metabolism has taken its modern form, the conservation and variations between metabolic pathways were evaluated by using a genetic algorithm (GA). The GA approach considered information on the complete metabolism of the bacterium Escherichia coli K-12, as deposited in the KEGG database, and the enzymes belonging to a particular pathway were transformed into enzymatic step sequences by using the breadth-first search algorithm. These sequences represent contiguous enzymes linked to each other, based on their catalytic activities as they are encoded in the Enzyme Commission numbers. In a posterior step, these sequences were compared using a GA in an all-against-all (pairwise comparisons) approach. Individual reactions were chosen based on their measure of fitness to act as parents of offspring, which constitute the new generation. The sequences compared were used to construct a similarity matrix (of fitness values) that was then considered to be clustered by using a k-medoids algorithm. A total of 34 clusters of conserved reactions were obtained, and their sequences were finally aligned with a multiple-sequence alignment GA optimized to align all the reaction sequences included in each group or cluster. From these comparisons, maps associated with the metabolism of similar compounds also contained similar enzymatic step sequences, reinforcing the Patchwork Model for the evolution of metabolism in E. coli K-12, an observation that can be expanded to other organisms, for which there is metabolism information. Finally, our mapping of these reactions is discussed, with illustrations from a particular case.
Comparison of Metabolic Pathways in Escherichia coli by Using Genetic Algorithms
Ortegon, Patricia; Poot-Hernández, Augusto C.; Perez-Rueda, Ernesto; Rodriguez-Vazquez, Katya
2015-01-01
In order to understand how cellular metabolism has taken its modern form, the conservation and variations between metabolic pathways were evaluated by using a genetic algorithm (GA). The GA approach considered information on the complete metabolism of the bacterium Escherichia coli K-12, as deposited in the KEGG database, and the enzymes belonging to a particular pathway were transformed into enzymatic step sequences by using the breadth-first search algorithm. These sequences represent contiguous enzymes linked to each other, based on their catalytic activities as they are encoded in the Enzyme Commission numbers. In a posterior step, these sequences were compared using a GA in an all-against-all (pairwise comparisons) approach. Individual reactions were chosen based on their measure of fitness to act as parents of offspring, which constitute the new generation. The sequences compared were used to construct a similarity matrix (of fitness values) that was then considered to be clustered by using a k-medoids algorithm. A total of 34 clusters of conserved reactions were obtained, and their sequences were finally aligned with a multiple-sequence alignment GA optimized to align all the reaction sequences included in each group or cluster. From these comparisons, maps associated with the metabolism of similar compounds also contained similar enzymatic step sequences, reinforcing the Patchwork Model for the evolution of metabolism in E. coli K-12, an observation that can be expanded to other organisms, for which there is metabolism information. Finally, our mapping of these reactions is discussed, with illustrations from a particular case. PMID:25973143
Zeng, Lu; Kortschak, R Daniel; Raison, Joy M; Bertozzi, Terry; Adelson, David L
2018-01-01
Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package.
Zeng, Lu; Kortschak, R. Daniel; Raison, Joy M.
2018-01-01
Transposable Elements (TEs) are mobile DNA sequences that make up significant fractions of amniote genomes. However, they are difficult to detect and annotate ab initio because of their variable features, lengths and clade-specific variants. We have addressed this problem by refining and developing a Comprehensive ab initio Repeat Pipeline (CARP) to identify and cluster TEs and other repetitive sequences in genome assemblies. The pipeline begins with a pairwise alignment using krishna, a custom aligner. Single linkage clustering is then carried out to produce families of repetitive elements. Consensus sequences are then filtered for protein coding genes and then annotated using Repbase and a custom library of retrovirus and reverse transcriptase sequences. This process yields three types of family: fully annotated, partially annotated and unannotated. Fully annotated families reflect recently diverged/young known TEs present in Repbase. The remaining two types of families contain a mixture of novel TEs and segmental duplications. These can be resolved by aligning these consensus sequences back to the genome to assess copy number vs. length distribution. Our pipeline has three significant advantages compared to other methods for ab initio repeat identification: 1) we generate not only consensus sequences, but keep the genomic intervals for the original aligned sequences, allowing straightforward analysis of evolutionary dynamics, 2) consensus sequences represent low-divergence, recently/currently active TE families, 3) segmental duplications are annotated as a useful by-product. We have compared our ab initio repeat annotations for 7 genome assemblies to other methods and demonstrate that CARP compares favourably with RepeatModeler, the most widely used repeat annotation package. PMID:29538441
Morgan, Ethan; Nyaku, Amesika N; DʼAquila, Richard T; Schneider, John A
2017-07-01
Phylogenetic analysis determines similarities among HIV genetic sequences from persons infected with HIV, identifying clusters of transmission. We determined characteristics associated with both membership in an HIV transmission cluster and the number of clustered sequences among a cohort of young black men who have sex with men (YBMSM) in Chicago. Pairwise genetic distances of HIV-1 pol sequences were collected during 2013-2016. Potential transmission ties were identified among HIV-infected persons whose sequences were ≤1.5% genetically distant. Putative transmission pairs were defined as ≥1 tie to another sequence. We then determined demographic and risk attributes associated with both membership in an HIV transmission cluster and the number of ties to the sequences from other persons in the cluster. Of 86 available sequences, 31 (36.0%) were tied to ≥1 other sequence. Through multivariable analyses, we determined that those who reported symptoms of depression and those who had a higher number of confidants in their network had significantly decreased odds of membership in transmission clusters. We found that those who had unstable housing and who reported heavy marijuana use had significantly more ties to other individuals within transmission clusters, whereas those identifying as bisexual, those participating in group sex, and those with higher numbers of sexual partners had significantly fewer ties. This study demonstrates the potential for combining phylogenetic and individual and network attributes to target HIV control efforts to persons with potentially higher transmission risk, as well as suggesting some unappreciated specific predictors of transmission risk among YBMSM in Chicago for future study.
A water market simulator considering pair-wise trades between agents
NASA Astrophysics Data System (ADS)
Huskova, I.; Erfani, T.; Harou, J. J.
2012-04-01
In many basins in England no further water abstraction licences are available. Trading water between water rights holders has been recognized as a potentially effective and economically efficient strategy to mitigate increasing scarcity. A screening tool that could assess the potential for trade through realistic simulation of individual water rights holders would help assess the solution's potential contribution to local water management. We propose an optimisation-driven water market simulator that predicts pair-wise trade in a catchment and represents its interaction with natural hydrology and engineered infrastructure. A model is used to emulate licence-holders' willingness to engage in short-term trade transactions. In their simplest form agents are represented using an economic benefit function. The working hypothesis is that trading behaviour can be partially predicted based on differences in marginal values of water over space and time and estimates of transaction costs on pair-wise trades. We discuss the further possibility of embedding rules, norms and preferences of the different water user sectors to more realistically represent the behaviours, motives and constraints of individual licence holders. The potential benefits and limitations of such a social simulation (agent-based) approach is contrasted with our simulator where agents are driven by economic optimization. A case study based on the Dove River Basin (UK) demonstrates model inputs and outputs. The ability of the model to suggest impacts of water rights policy reforms on trading is discussed.
Pair-Wise Trajectory Management-Oceanic (PTM-O) . [Concept of Operations—Version 3.9
NASA Technical Reports Server (NTRS)
Jones, Kenneth M.
2014-01-01
This document describes the Pair-wise Trajectory Management-Oceanic (PTM-O) Concept of Operations (ConOps). Pair-wise Trajectory Management (PTM) is a concept that includes airborne and ground-based capabilities designed to enable and to benefit from, airborne pair-wise distance-monitoring capability. PTM includes the capabilities needed for the controller to issue a PTM clearance that resolves a conflict for a specific pair of aircraft. PTM avionics include the capabilities needed for the flight crew to manage their trajectory relative to specific designated aircraft. Pair-wise Trajectory Management PTM-Oceanic (PTM-O) is a regional specific application of the PTM concept. PTM is sponsored by the National Aeronautics and Space Administration (NASA) Concept and Technology Development Project (part of NASA's Airspace Systems Program). The goal of PTM is to use enhanced and distributed communications and surveillance along with airborne tools to permit reduced separation standards for given aircraft pairs, thereby increasing the capacity and efficiency of aircraft operations at a given altitude or volume of airspace.
A pairwise maximum entropy model accurately describes resting-state human brain networks
Watanabe, Takamitsu; Hirose, Satoshi; Wada, Hiroyuki; Imai, Yoshio; Machida, Toru; Shirouzu, Ichiro; Konishi, Seiki; Miyashita, Yasushi; Masuda, Naoki
2013-01-01
The resting-state human brain networks underlie fundamental cognitive functions and consist of complex interactions among brain regions. However, the level of complexity of the resting-state networks has not been quantified, which has prevented comprehensive descriptions of the brain activity as an integrative system. Here, we address this issue by demonstrating that a pairwise maximum entropy model, which takes into account region-specific activity rates and pairwise interactions, can be robustly and accurately fitted to resting-state human brain activities obtained by functional magnetic resonance imaging. Furthermore, to validate the approximation of the resting-state networks by the pairwise maximum entropy model, we show that the functional interactions estimated by the pairwise maximum entropy model reflect anatomical connexions more accurately than the conventional functional connectivity method. These findings indicate that a relatively simple statistical model not only captures the structure of the resting-state networks but also provides a possible method to derive physiological information about various large-scale brain networks. PMID:23340410
Dynamics of prebiotic RNA reproduction illuminated by chemical game theory
Yeates, Jessica A. M.; Hilbe, Christian; Zwick, Martin; Nowak, Martin A.; Lehman, Niles
2016-01-01
Many origins-of-life scenarios depict a situation in which there are common and potentially scarce resources needed by molecules that compete for survival and reproduction. The dynamics of RNA assembly in a complex mixture of sequences is a frequency-dependent process and mimics such scenarios. By synthesizing Azoarcus ribozyme genotypes that differ in their single-nucleotide interactions with other genotypes, we can create molecules that interact among each other to reproduce. Pairwise interplays between RNAs involve both cooperation and selfishness, quantifiable in a 2 × 2 payoff matrix. We show that a simple model of differential equations based on chemical kinetics accurately predicts the outcomes of these molecular competitions using simple rate inputs into these matrices. In some cases, we find that mixtures of different RNAs reproduce much better than each RNA type alone, reflecting a molecular form of reciprocal cooperation. We also demonstrate that three RNA genotypes can stably coexist in a rock–paper–scissors analog. Our experiments suggest a new type of evolutionary game dynamics, called prelife game dynamics or chemical game dynamics. These operate without template-directed replication, illustrating how small networks of RNAs could have developed and evolved in an RNA world. PMID:27091972
Dynamics of prebiotic RNA reproduction illuminated by chemical game theory.
Yeates, Jessica A M; Hilbe, Christian; Zwick, Martin; Nowak, Martin A; Lehman, Niles
2016-05-03
Many origins-of-life scenarios depict a situation in which there are common and potentially scarce resources needed by molecules that compete for survival and reproduction. The dynamics of RNA assembly in a complex mixture of sequences is a frequency-dependent process and mimics such scenarios. By synthesizing Azoarcus ribozyme genotypes that differ in their single-nucleotide interactions with other genotypes, we can create molecules that interact among each other to reproduce. Pairwise interplays between RNAs involve both cooperation and selfishness, quantifiable in a 2 × 2 payoff matrix. We show that a simple model of differential equations based on chemical kinetics accurately predicts the outcomes of these molecular competitions using simple rate inputs into these matrices. In some cases, we find that mixtures of different RNAs reproduce much better than each RNA type alone, reflecting a molecular form of reciprocal cooperation. We also demonstrate that three RNA genotypes can stably coexist in a rock-paper-scissors analog. Our experiments suggest a new type of evolutionary game dynamics, called prelife game dynamics or chemical game dynamics. These operate without template-directed replication, illustrating how small networks of RNAs could have developed and evolved in an RNA world.
Clark, Andrew P; Howard, Kate L; Woods, Andy T; Penton-Voak, Ian S; Neumann, Christof
2018-01-01
We introduce "EloChoice", a package for R which uses Elo rating to assess pairwise comparisons between stimuli in order to measure perceived stimulus characteristics. To demonstrate the package and compare results from forced choice pairwise comparisons to those from more standard single stimulus rating tasks using Likert (or Likert-type) items, we investigated perceptions of physical strength from images of male bodies. The stimulus set comprised images of 82 men standing on a raised platform with minimal clothing. Strength-related anthropometrics and grip strength measurements were available for each man in the set. UK laboratory participants (Study 1) and US online participants (Study 2) viewed all images in both a Likert rating task, to collect mean Likert scores, and a pairwise comparison task, to calculate Elo, mean Elo (mElo), and Bradley-Terry scores. Within both studies, Likert, Elo and Bradley-Terry scores were closely correlated to mElo scores (all rs > 0.95), and all measures were correlated with stimulus grip strength (all rs > 0.38) and body size (all rs > 0.59). However, mElo scores were less variable than Elo scores and were hundreds of times quicker to compute than Bradley-Terry scores. Responses in pairwise comparison trials were 2/3 quicker than in Likert tasks, indicating that participants found pairwise comparisons to be easier. In addition, mElo scores generated from a data set with half the participants randomly excluded produced very comparable results to those produced with Likert scores from the full participant set, indicating that researchers require fewer participants when using pairwise comparisons.
A general transformation to canonical form for potentials in pairwise interatomic interactions.
Walton, Jay R; Rivera-Rivera, Luis A; Lucchese, Robert R; Bevan, John W
2015-06-14
A generalized formulation of explicit force-based transformations is introduced to investigate the concept of a canonical potential in both fundamental chemical and intermolecular bonding. Different classes of representative ground electronic state pairwise interatomic interactions are referenced to a chosen canonical potential illustrating application of such transformations. Specifically, accurately determined potentials of the diatomic molecules H2, H2(+), HF, LiH, argon dimer, and one-dimensional dissociative coordinates in Ar-HBr, OC-HF, and OC-Cl2 are investigated throughout their bound potentials. Advantages of the current formulation for accurately evaluating equilibrium dissociation energies and a fundamentally different unified perspective on nature of intermolecular interactions will be emphasized. In particular, this canonical approach has significance to previous assertions that there is no very fundamental distinction between van der Waals bonding and covalent bonding or for that matter hydrogen and halogen bonds.
Beyond pairwise strategy updating in the prisoner's dilemma game
NASA Astrophysics Data System (ADS)
Wang, Xiaofeng; Perc, Matjaž; Liu, Yongkui; Chen, Xiaojie; Wang, Long
2012-10-01
In spatial games players typically alter their strategy by imitating the most successful or one randomly selected neighbor. Since a single neighbor is taken as reference, the information stemming from other neighbors is neglected, which begets the consideration of alternative, possibly more realistic approaches. Here we show that strategy changes inspired not only by the performance of individual neighbors but rather by entire neighborhoods introduce a qualitatively different evolutionary dynamics that is able to support the stable existence of very small cooperative clusters. This leads to phase diagrams that differ significantly from those obtained by means of pairwise strategy updating. In particular, the survivability of cooperators is possible even by high temptations to defect and over a much wider uncertainty range. We support the simulation results by means of pair approximations and analysis of spatial patterns, which jointly highlight the importance of local information for the resolution of social dilemmas.
SANSparallel: interactive homology search against Uniprot.
Somervuo, Panu; Holm, Liisa
2015-07-01
Proteins evolve by mutations and natural selection. The network of sequence similarities is a rich source for mining homologous relationships that inform on protein structure and function. There are many servers available to browse the network of homology relationships but one has to wait up to a minute for results. The SANSparallel webserver provides protein sequence database searches with immediate response and professional alignment visualization by third-party software. The output is a list, pairwise alignment or stacked alignment of sequence-similar proteins from Uniprot, UniRef90/50, Swissprot or Protein Data Bank. The stacked alignments are viewed in Jalview or as sequence logos. The database search uses the suffix array neighborhood search (SANS) method, which has been re-implemented as a client-server, improved and parallelized. The method is extremely fast and as sensitive as BLAST above 50% sequence identity. Benchmarks show that the method is highly competitive compared to previously published fast database search programs: UBLAST, DIAMOND, LAST, LAMBDA, RAPSEARCH2 and BLAT. The web server can be accessed interactively or programmatically at http://ekhidna2.biocenter.helsinki.fi/cgi-bin/sans/sans.cgi. It can be used to make protein functional annotation pipelines more efficient, and it is useful in interactive exploration of the detailed evidence supporting the annotation of particular proteins of interest. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Problems of classification in the family Paramyxoviridae.
Rima, Bert; Collins, Peter; Easton, Andrew; Fouchier, Ron; Kurath, Gael; Lamb, Robert A; Lee, Benhur; Maisner, Andrea; Rota, Paul; Wang, Lin-Fa
2018-05-01
A number of unassigned viruses in the family Paramyxoviridae need to be classified either as a new genus or placed into one of the seven genera currently recognized in this family. Furthermore, numerous new paramyxoviruses continue to be discovered. However, attempts at classification have highlighted the difficulties that arise by applying historic criteria or criteria based on sequence alone to the classification of the viruses in this family. While the recent taxonomic change that elevated the previous subfamily Pneumovirinae into a separate family Pneumoviridae is readily justified on the basis of RNA dependent -RNA polymerase (RdRp or L protein) sequence motifs, using RdRp sequence comparisons for assignment to lower level taxa raises problems that would require an overhaul of the current criteria for assignment into genera in the family Paramyxoviridae. Arbitrary cut off points to delineate genera and species would have to be set if classification was based on the amino acid sequence of the RdRp alone or on pairwise analysis of sequence complementarity (PASC) of all open reading frames (ORFs). While these cut-offs cannot be made consistent with the current classification in this family, resorting to genus-level demarcation criteria with additional input from the biological context may afford a way forward. Such criteria would reflect the increasingly dynamic nature of virus taxonomy even if it would require a complete revision of the current classification.
Lo, Yu-Sheng; Tseng, Wen-Hsuan; Chuang, Chien-Ying; Hou, Ming-Hon
2013-01-01
The potent anticancer drug actinomycin D (ActD) functions by intercalating into DNA at GpC sites, thereby interrupting essential biological processes including replication and transcription. Certain neurological diseases are correlated with the expansion of (CGG)n trinucleotide sequences, which contain many contiguous GpC sites separated by a single G:G mispair. To characterize the binding of ActD to CGG triplet repeat sequences, the structural basis for the strong binding of ActD to neighbouring GpC sites flanking a G:G mismatch has been determined based on the crystal structure of ActD bound to ATGCGGCAT, which contains a CGG triplet sequence. The binding of ActD molecules to GCGGC causes many unexpected conformational changes including nucleotide flipping out, a sharp bend and a left-handed twist in the DNA helix via a two site-binding model. Heat denaturation, circular dichroism and surface plasmon resonance analyses showed that adjacent GpC sequences flanking a G:G mismatch are preferred ActD-binding sites. In addition, ActD was shown to bind the hairpin conformation of (CGG)16 in a pairwise combination and with greater stability than that of other DNA intercalators. Our results provide evidence of a possible biological consequence of ActD binding to CGG triplet repeat sequences. PMID:23408860
Perera, Piyumali K; Gasser, Robin B; Jabbar, Abdul
2015-03-01
Oriental theileriosis is a tick-borne, protozoan disease of cattle caused by one or more genotypes of Theileria orientalis complex. In this study, we assessed sequence variability in a region of the 23kDa piroplasm membrane protein (p23) gene within and among three T. orientalis genotypes (designated buffeli, chitose and ikeda) in south-eastern Australia. Genomic DNA (n=100) was extracted from blood of infected cattle from various locations endemic for oriental theileriosis and tested by polymerase chain reaction (PCR)-coupled mutation scanning (single-strand conformation polymorphism (SSCP)) and targeted sequencing analysis. Eight distinct sequences represented all DNA samples, and three genotypes were found: buffeli (n=3), chitose (3) and ikeda (2). Nucleotide pairwise comparisons among these eight sequences revealed considerably higher variability among the genotypes (6.6-11.7%) than within them (0-1.9%), indicating that the p23 gene region allows the accurate identification of T. orientalis genotypes. In the future, we will combine this gene with other molecular markers to study the genetic structure of T. orientalis populations in Australasia, which will pave the way to establish a highly sensitive and specific PCR-based assay for genotypic diagnosis of infection and for assessing levels of parasitaemia in cattle. Copyright © 2014 Elsevier GmbH. All rights reserved.
Gibbs motif sampling: detection of bacterial outer membrane protein repeats.
Neuwald, A. F.; Liu, J. S.; Lawrence, C. E.
1995-01-01
The detection and alignment of locally conserved regions (motifs) in multiple sequences can provide insight into protein structure, function, and evolution. A new Gibbs sampling algorithm is described that detects motif-encoding regions in sequences and optimally partitions them into distinct motif models; this is illustrated using a set of immunoglobulin fold proteins. When applied to sequences sharing a single motif, the sampler can be used to classify motif regions into related submodels, as is illustrated using helix-turn-helix DNA-binding proteins. Other statistically based procedures are described for searching a database for sequences matching motifs found by the sampler. When applied to a set of 32 very distantly related bacterial integral outer membrane proteins, the sampler revealed that they share a subtle, repetitive motif. Although BLAST (Altschul SF et al., 1990, J Mol Biol 215:403-410) fails to detect significant pairwise similarity between any of the sequences, the repeats present in these outer membrane proteins, taken as a whole, are highly significant (based on a generally applicable statistical test for motifs described here). Analysis of bacterial porins with known trimeric beta-barrel structure and related proteins reveals a similar repetitive motif corresponding to alternating membrane-spanning beta-strands. These beta-strands occur on the membrane interface (as opposed to the trimeric interface) of the beta-barrel. The broad conservation and structural location of these repeats suggests that they play important functional roles. PMID:8520488
Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution.
Yap, Jia-Yee S; Rohner, Thore; Greenfield, Abigail; Van Der Merwe, Marlien; McPherson, Hannah; Glenn, Wendy; Kornfeld, Geoff; Marendy, Elessa; Pan, Annie Y H; Wilton, Alan; Wilkins, Marc R; Rossetto, Maurizio; Delaney, Sven K
2015-01-01
The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine.
Zhang, Weidai; Zhang, Jiawei; Yang, Baojun; Wu, Kefei; Lin, Hanfei; Wang, Yanping; Zhou, Lihong; Wang, Huatao; Zeng, Chujuan; Chen, Xiao; Wang, Zhixing; Zhu, Junxing; Songming, Chen
2018-06-01
The effectiveness of oral hydration in preventing contrast-induced acute kidney injury (CI-AKI) in patients undergoing coronary angiography or intervention has not been well established. This study aims to evaluate the efficacy of oral hydration compared with intravenous hydration and other frequently used hydration strategies. PubMed, Embase, Web of Science, and the Cochrane central register of controlled trials were searched from inception to 8 October 2017. To be eligible for analysis, studies had to evaluate the relative efficacy of different prophylactic hydration strategies. We selected and assessed the studies that fulfilled the inclusion criteria and carried out a pairwise and network meta-analysis using RevMan5.2 and Aggregate Data Drug Information System 1.16.8 software. A total of four studies (538 participants) were included in our pairwise meta-analysis and 1754 participants from eight studies with four frequently used hydration strategies were included in a network meta-analysis. Pairwise meta-analysis indicated that oral hydration was as effective as intravenous hydration for the prevention of CI-AKI (5.88 vs. 8.43%; odds ratio: 0.73; 95% confidence interval: 0.36-1.47; P>0.05), with no significant heterogeneity between studies. Network meta-analysis showed that there was no significant difference in the prevention of CI-AKI. However, the rank probability plot suggested that oral plus intravenous hydration had a higher probability (51%) of being the best strategy, followed by diuretic plus intravenous hydration (39%) and oral hydration alone (10%). Intravenous hydration alone was the strategy with the highest probability (70%) of being the worst hydration strategy. Our study shows that oral hydration is not inferior to intravenous hydration for the prevention of CI-AKI in patients with normal or mild-to-moderate renal dysfunction undergoing coronary angiography or intervention.
Ren, Shuang; Hao, You-Jin; Chen, Bin; Yin, You-Ping
2017-01-01
The onion maggot, Delia antiqua, is a worldwide subterranean pest and can enter diapause during the summer and winter seasons. The molecular regulation of the ontogenesis transition remains largely unknown. Here we used high-throughput RNA sequencing to identify candidate genes and processes linked to summer diapause (SD) induction by comparing the transcriptome differences between the most sensitive larval developmental stage of SD and nondiapause (ND). Nine pairwise comparisons were performed, and significantly differentially regulated transcripts were identified. Several functional terms related to lipid, carbohydrate, and energy metabolism, environmental adaption, immune response, and aging were enriched during the most sensitive SD induction period. A subset of genes, including circadian clock genes, were expressed differentially under diapause induction conditions, and there was much more variation in the most sensitive period of ND- than SD-destined larvae. These expression variations probably resulted in a deep restructuring of metabolic pathways. Potential regulatory elements of SD induction including genes related to lipid, carbohydrate, energy metabolism, and environmental adaption. Collectively, our results suggest the circadian clock is one of the key drivers for integrating environmental signals into the SD induction. Our transcriptome analysis provides insight into the fundamental role of the circadian clock in SD induction in this important model insect species, and contributes to the in-depth elucidation of the molecular regulation mechanism of insect diapause induction. PMID:29158334
A greedy, graph-based algorithm for the alignment of multiple homologous gene lists.
Fostier, Jan; Proost, Sebastian; Dhoedt, Bart; Saeys, Yvan; Demeester, Piet; Van de Peer, Yves; Vandepoele, Klaas
2011-03-15
Many comparative genomics studies rely on the correct identification of homologous genomic regions using accurate alignment tools. In such case, the alphabet of the input sequences consists of complete genes, rather than nucleotides or amino acids. As optimal multiple sequence alignment is computationally impractical, a progressive alignment strategy is often employed. However, such an approach is susceptible to the propagation of alignment errors in early pairwise alignment steps, especially when dealing with strongly diverged genomic regions. In this article, we present a novel accurate and efficient greedy, graph-based algorithm for the alignment of multiple homologous genomic segments, represented as ordered gene lists. Based on provable properties of the graph structure, several heuristics are developed to resolve local alignment conflicts that occur due to gene duplication and/or rearrangement events on the different genomic segments. The performance of the algorithm is assessed by comparing the alignment results of homologous genomic segments in Arabidopsis thaliana to those obtained by using both a progressive alignment method and an earlier graph-based implementation. Especially for datasets that contain strongly diverged segments, the proposed method achieves a substantially higher alignment accuracy, and proves to be sufficiently fast for large datasets including a few dozens of eukaryotic genomes. http://bioinformatics.psb.ugent.be/software. The algorithm is implemented as a part of the i-ADHoRe 3.0 package.
Fiebig, Lena; Kohl, Thomas A; Popovici, Odette; Mühlenfeld, Margarita; Indra, Alexander; Homorodean, Daniela; Chiotan, Domnica; Richter, Elvira; Rüsch-Gerdes, Sabine; Schmidgruber, Beatrix; Beckert, Patrick; Hauer, Barbara; Niemann, Stefan; Allerberger, Franz; Haas, Walter
2017-01-01
Molecular surveillance of multidrug-resistant tuberculosis (MDR-TB) using 24-loci MIRU-VNTR in the European Union suggests the occurrence of international transmission. In early 2014, Austria detected a molecular MDR-TB cluster of five isolates. Links to Romania and Germany prompted the three countries to investigate possible cross-border MDR-TB transmission jointly. We searched genotyping databases, genotyped additional isolates from Romania, used whole genome sequencing (WGS) to infer putative transmission links, and investigated pairwise epidemiological links and patient mobility. Ten isolates from 10 patients shared the same 24-loci MIRU-VNTR pattern. Within this cluster, WGS defined two subgroups of four patients each. The first comprised an MDR-TB patient from Romania who had sought medical care in Austria and two patients from Austria. The second comprised patients, two of them epidemiologically linked, who lived in three different countries but had the same city of provenance in Romania. Our findings strongly suggested that the two cases in Austrian citizens resulted from a newly introduced MDR-TB strain, followed by domestic transmission. For the other cases, transmission probably occurred in the same city of provenance. To prevent further MDR-TB transmission, we need to ensure universal access to early and adequate therapy and collaborate closely in tuberculosis care beyond administrative borders. PMID:28106529
Vinatzer, Boris A; Weisberg, Alexandra J; Monteil, Caroline L; Elmarakeby, Haitham A; Sheppard, Samuel K; Heath, Lenwood S
2017-01-01
Taxonomy of plant pathogenic bacteria is challenging because pathogens of different crops often belong to the same named species but current taxonomy does not provide names for bacteria below the subspecies level. The introduction of the host range-based pathovar system in the 1980s provided a temporary solution to this problem but has many limitations. The affordability of genome sequencing now provides the opportunity for developing a new genome-based taxonomic framework. We already proposed to name individual bacterial isolates based on pairwise genome similarity. Here, we expand on this idea and propose to use genome similarity-based codes, which we now call life identification numbers (LINs), to describe and name bacterial taxa. Using 93 genomes of Pseudomonas syringae sensu lato, LINs were compared with a P. syringae genome tree whereby the assigned LINs were found to be informative of a majority of phylogenetic relationships. LINs also reflected host range and outbreak association for strains of P. syringae pathovar actinidiae, a pathovar for which many genome sequences are available. We conclude that LINs could provide the basis for a new taxonomic framework to address the shortcomings of the current pathovar system and to complement the current taxonomic system of bacteria in general.
Niu, Li-na; Luo, Xiao-juan; Li, Guo-hua; Bortoluzzi, Eduardo A.; Mao, Jing; Chen, Ji-hua; Gutmann, James L.; Pashley, David H.; Tay, Franklin R.
2014-01-01
Objectives The effects of different EndoActivator® (EA) sonic activation protocols on root canal debridement efficacy were examined. Methods Root canals in 48 single-rooted teeth were instrumented, irrigated initially with NaOCl and divided into 6 groups (N=8) based on the application time of QMix (antimicrobial calcium-chelating irrigant), and the time and sequence of EA irrigant activation - Positive Control: 90 sec QMix; Negative Control: 90 sec saline; Group 1A: 15 sec QMix + 15 sec QMix with EA-activation; Group 1B: 30 sec QMix + 30 sec of QMix with EA-activation; Group 2A: 15 sec QMix with EA-activation + 15 sec QMix; Group 2B: 30 sec QMix with EA-activation + 30 sec QMix. Split roots were examined with scanning electron microscopy for assignment of smear and debris scores in locations along the coronal, middle and apical thirds of the canals. The overall cleanliness of pooled canal locations in the Positive Control and the 4 experimental groups were compared with chi-square tests. Results Significant differences were detected among the 5 groups (p<0.001). Post-hoc pairwise comparisons indicated that the overall canal cleanliness was in the order (from best to worst): 1B = 2B > 2A > 1A > Positive Control. Completely clean canals could not be achieved due to the absence of continuous irrigant flow for EA to clear intraradicular debris. Conclusions Irrespective of the sonic activation sequence, irrigant activation for 30 seconds during a 60-second period of QMix application appears to maximize the smear layer and debris removal potential of the EndoActivator® system. PMID:24878251
Intercenter Differences in Bronchopulmonary Dysplasia or Death Among Very Low Birth Weight Infants
Walsh, Michele; Bobashev, Georgiy; Das, Abhik; Levine, Burton; Carlo, Waldemar A.; Higgins, Rosemary D.
2011-01-01
OBJECTIVES: To determine (1) the magnitude of clustering of bronchopulmonary dysplasia (36 weeks) or death (the outcome) across centers of the Eunice Kennedy Shriver National Institute of Child and Human Development National Research Network, (2) the infant-level variables associated with the outcome and estimate their clustering, and (3) the center-specific practices associated with the differences and build predictive models. METHODS: Data on neonates with a birth weight of <1250 g from the cluster-randomized benchmarking trial were used to determine the magnitude of clustering of the outcome according to alternating logistic regression by using pairwise odds ratio and predictive modeling. Clinical variables associated with the outcome were identified by using multivariate analysis. The magnitude of clustering was then evaluated after correction for infant-level variables. Predictive models were developed by using center-specific and infant-level variables for data from 2001 2004 and projected to 2006. RESULTS: In 2001–2004, clustering of bronchopulmonary dysplasia/death was significant (pairwise odds ratio: 1.3; P < .001) and increased in 2006 (pairwise odds ratio: 1.6; overall incidence: 52%; range across centers: 32%–74%); center rates were relatively stable over time. Variables that varied according to center and were associated with increased risk of outcome included lower body temperature at NICU admission, use of prophylactic indomethacin, specific drug therapy on day 1, and lack of endotracheal intubation. Center differences remained significant even after correction for clustered variables. CONCLUSION: Bronchopulmonary dysplasia/death rates demonstrated moderate clustering according to center. Clinical variables associated with the outcome were also clustered. Center differences after correction of clustered variables indicate presence of as-yet unmeasured center variables. PMID:21149431
Joseph, Agnel Praveen; Srinivasan, Narayanaswamy; de Brevern, Alexandre G
2012-09-01
Comparison of multiple protein structures has a broad range of applications in the analysis of protein structure, function and evolution. Multiple structure alignment tools (MSTAs) are necessary to obtain a simultaneous comparison of a family of related folds. In this study, we have developed a method for multiple structure comparison largely based on sequence alignment techniques. A widely used Structural Alphabet named Protein Blocks (PBs) was used to transform the information on 3D protein backbone conformation as a 1D sequence string. A progressive alignment strategy similar to CLUSTALW was adopted for multiple PB sequence alignment (mulPBA). Highly similar stretches identified by the pairwise alignments are given higher weights during the alignment. The residue equivalences from PB based alignments are used to obtain a three dimensional fit of the structures followed by an iterative refinement of the structural superposition. Systematic comparisons using benchmark datasets of MSTAs underlines that the alignment quality is better than MULTIPROT, MUSTANG and the alignments in HOMSTRAD, in more than 85% of the cases. Comparison with other rigid-body and flexible MSTAs also indicate that mulPBA alignments are superior to most of the rigid-body MSTAs and highly comparable to the flexible alignment methods. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
NASA Astrophysics Data System (ADS)
Xu, Jiajie; Jiang, Bo; Chai, Sanming; He, Yuan; Zhu, Jianyi; Shen, Zonggen; Shen, Songdong
2016-09-01
Filamentous Bangia, which are distributed extensively throughout the world, have simple and similar morphological characteristics. Scientists can classify these organisms using molecular markers in combination with morphology. We successfully sequenced the complete nuclear ribosomal DNA, approximately 13 kb in length, from a marine Bangia population. We further analyzed the small subunit ribosomal DNA gene (nrSSU) and the internal transcribed spacer (ITS) sequence regions along with nine other marine, and two freshwater Bangia samples from China. Pairwise distances of the nrSSU and 5.8S ribosomal DNA gene sequences show the marine samples grouping together with low divergences (00.003; 0-0.006, respectively) from each other, but high divergences (0.123-0.126; 0.198, respectively) from freshwater samples. An exception is the marine sample collected from Weihai, which shows high divergence from both other marine samples (0.063-0.065; 0.129, respectively) and the freshwater samples (0.097; 0.120, respectively). A maximum likelihood phylogenetic tree based on a combined SSU-ITS dataset with maximum likelihood method shows the samples divided into three clades, with the two marine sample clades containing Bangia spp. from North America, Europe, Asia, and Australia; and one freshwater clade, containing Bangia atropurpurea from North America and China.
Accelerated probabilistic inference of RNA structure evolution
Holmes, Ian
2005-01-01
Background Pairwise stochastic context-free grammars (Pair SCFGs) are powerful tools for evolutionary analysis of RNA, including simultaneous RNA sequence alignment and secondary structure prediction, but the associated algorithms are intensive in both CPU and memory usage. The same problem is faced by other RNA alignment-and-folding algorithms based on Sankoff's 1985 algorithm. It is therefore desirable to constrain such algorithms, by pre-processing the sequences and using this first pass to limit the range of structures and/or alignments that can be considered. Results We demonstrate how flexible classes of constraint can be imposed, greatly reducing the computational costs while maintaining a high quality of structural homology prediction. Any score-attributed context-free grammar (e.g. energy-based scoring schemes, or conditionally normalized Pair SCFGs) is amenable to this treatment. It is now possible to combine independent structural and alignment constraints of unprecedented general flexibility in Pair SCFG alignment algorithms. We outline several applications to the bioinformatics of RNA sequence and structure, including Waterman-Eggert N-best alignments and progressive multiple alignment. We evaluate the performance of the algorithm on test examples from the RFAM database. Conclusion A program, Stemloc, that implements these algorithms for efficient RNA sequence alignment and structure prediction is available under the GNU General Public License. PMID:15790387
Heipertz, Richard A; Sanders-Buell, Eric; Kijak, Gustavo; Howell, Shana; Lazzaro, Michelle; Jagodzinski, Linda L; Eggleston, John; Peel, Sheila; Malia, Jennifer; Armstrong, Adam; Michael, Nelson L; Kim, Jerome H; O'Connell, Robert J; Scott, Paul T; Brett-Major, David M; Tovanabutra, Sodsai
2013-10-01
The U.S. military represents a unique population within the human immunodeficiency virus 1 (HIV-1) pandemic. The last comprehensive study of HIV-1 in members of the U.S. Navy and Marine Corps (Sea Services) was completed in 2000, before large-scale combat operations were taking place. Here, we present molecular characterization of HIV-1 from 40 Sea Services personnel who were identified during their seroconversion window and initially classified as HIV-1 negative during screening. Protease/reverse transcriptase (pro/rt) and envelope (env) sequences were obtained from each member of the cohort. Phylogenetic analyses were carried out on these regions to determine relatedness within the cohort and calculate the most recent common ancestor for the related sequences. We identified 39 individuals infected with subtype B and one infected with CRF01_AE. Comparison of the pairwise genetic distance of Sea Service sequences and reference sequences in the env and pro/rt regions showed that five samples were part of molecular clusters, a group of two and a group of three, confirmed by single genome amplification. Real-time molecular monitoring of new HIV-1 acquisitions in the Sea Services may have a role in facilitating public health interventions at sites where related HIV-1 infections are identified.
Gueli Alletti, Gianpiero; Eigenbrod, Marina; Carstens, Eric B; Kleespies, Regina G; Jehle, Johannes A
2017-06-01
The European isolate Agrotis segetum granulovirus DA (AgseGV-DA) is a slow killing, type I granulovirus due to low dose-mortality responses within seven days post infection and a tissue tropism of infection restricted solely to the fat body of infected Agrotis segetum host larvae. The genome of AgseGV-DA was completely sequenced and compared to the whole genome sequences of the Chinese isolates AgseGV-XJ and AgseGV-L1. All three isolates share highly conserved genomes. The AgseGV-DA genome is 131,557bp in length and encodes for 149 putative open reading frames, including 37 baculovirus core genes and the per os infectivity factor ac110. Comprehensive investigations of repeat regions identified one putative non-hr like origin of replication in AgseGV-DA. Phylogenetic analysis based on concatenated amino acid alignments of 37 baculovirus core genes as well as pairwise distances based on the nucleotide alignments of partial granulin, lef-8 and lef-9 sequences with deposited betabaculoviruses confirmed AgseGV-DA, AgseGV-XJ and AgseGV-L1 as representative isolates of the same Betabaculovirus species. AgseGV encodes for a distinct putative enhancin, distantly related to enhancins from other granuloviruses. Copyright © 2017. Published by Elsevier Inc.
van Til, Janine; Groothuis-Oudshoorn, Catharina; Lieferink, Marijke; Dolan, James; Goetghebeur, Mireille
2014-01-01
There is an increased interest in the use of multi-criteria decision analysis (MCDA) to support regulatory and reimbursement decision making. The EVIDEM framework was developed to provide pragmatic multi-criteria decision support in health care, to estimate the value of healthcare interventions, and to aid in priority-setting. The objectives of this study were to test 1) the influence of different weighting techniques on the overall outcome of an MCDA exercise, 2) the discriminative power in weighting different criteria of such techniques, and 3) whether different techniques result in similar weights in weighting the criteria set proposed by the EVIDEM framework. A sample of 60 Dutch and Canadian students participated in the study. Each student used an online survey to provide weights for 14 criteria with two different techniques: a five-point rating scale and one of the following techniques selected randomly: ranking, point allocation, pairwise comparison and best worst scaling. The results of this study indicate that there is no effect of differences in weights on value estimates at the group level. On an individual level, considerable differences in criteria weights and rank order occur as a result of the weight elicitation method used, and the ability of different techniques to discriminate in criteria importance. Of the five techniques tested, the pair-wise comparison of criteria has the highest ability to discriminate in weights when fourteen criteria are compared. When weights are intended to support group decisions, the choice of elicitation technique has negligible impact on criteria weights and the overall value of an innovation. However, when weights are used to support individual decisions, the choice of elicitation technique influences outcome and studies that use dissimilar techniques cannot be easily compared. Weight elicitation through pairwise comparison of criteria is preferred when taking into account its superior ability to discriminate between criteria and respondents' preferences.
Song, Yang; Zhang, Yong; Fan, Qin; Cui, Hui; Yan, Dongmei; Zhu, Shuangli; Tang, Haishu; Sun, Qiang; Wang, Dongyan; Xu, Wenbo
2017-02-23
Human enterovirus B106 (EV-B106) is a new member of the enterovirus B species. To date, only three nucleotide sequences of EV-B106 have been published, and only one full-length genome sequence (the Yunnan strain 148/YN/CHN/12) is available in the GenBank database. In this study, we conducted phylogenetic characterisation of four EV-B106 strains isolated in Xinjiang, China. Pairwise comparisons of the nucleotide sequences and the deduced amino acid sequences revealed that the four Xinjiang EV-B106 strains had only 80.5-80.8% nucleotide identity and 95.4-97.3% amino acid identity with the Yunnan EV-B106 strain, indicating high mutagenicity. Similarity plots and bootscanning analyses revealed that frequent intertypic recombination occurred in all four Xinjiang EV-B106 strains in the non-structural region. These four strains may share a donor sequence with the EV-B85 strain, which circulated in Xinjiang in 2011, indicating extensive genetic exchanges between these strains. All Xinjiang EV-B106 strains were temperature-sensitive. An antibody seroprevalence study against EV-B106 in two Xinjiang prefectures also showed low titres of neutralizing antibodies, suggesting limited exposure and transmission in the population. This study contributes the whole genome sequences of EV-B106 to the GenBank database and provides valuable information regarding the molecular epidemiology of EV-B106 in China.
Song, Yang; Zhang, Yong; Fan, Qin; Cui, Hui; Yan, Dongmei; Zhu, Shuangli; Tang, Haishu; Sun, Qiang; Wang, Dongyan; Xu, Wenbo
2017-01-01
Human enterovirus B106 (EV-B106) is a new member of the enterovirus B species. To date, only three nucleotide sequences of EV-B106 have been published, and only one full-length genome sequence (the Yunnan strain 148/YN/CHN/12) is available in the GenBank database. In this study, we conducted phylogenetic characterisation of four EV-B106 strains isolated in Xinjiang, China. Pairwise comparisons of the nucleotide sequences and the deduced amino acid sequences revealed that the four Xinjiang EV-B106 strains had only 80.5–80.8% nucleotide identity and 95.4–97.3% amino acid identity with the Yunnan EV-B106 strain, indicating high mutagenicity. Similarity plots and bootscanning analyses revealed that frequent intertypic recombination occurred in all four Xinjiang EV-B106 strains in the non-structural region. These four strains may share a donor sequence with the EV-B85 strain, which circulated in Xinjiang in 2011, indicating extensive genetic exchanges between these strains. All Xinjiang EV-B106 strains were temperature-sensitive. An antibody seroprevalence study against EV-B106 in two Xinjiang prefectures also showed low titres of neutralizing antibodies, suggesting limited exposure and transmission in the population. This study contributes the whole genome sequences of EV-B106 to the GenBank database and provides valuable information regarding the molecular epidemiology of EV-B106 in China. PMID:28230168
An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.
Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K
2014-01-01
Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone free software.
NASA Astrophysics Data System (ADS)
Haneda, Eri; Luo, Jiajia; Can, Ali; Ramani, Sathish; Fu, Lin; De Man, Bruno
2016-05-01
In this study, we implement and compare model based iterative reconstruction (MBIR) with dictionary learning (DL) over MBIR with pairwise pixel-difference regularization, in the context of transportation security. DL is a technique of sparse signal representation using an over complete dictionary which has provided promising results in image processing applications including denoising,1 as well as medical CT reconstruction.2 It has been previously reported that DL produces promising results in terms of noise reduction and preservation of structural details, especially for low dose and few-view CT acquisitions.2 A distinguishing feature of transportation security CT is that scanned baggage may contain items with a wide range of material densities. While medical CT typically scans soft tissues, blood with and without contrast agents, and bones, luggage typically contains more high density materials (i.e. metals and glass), which can produce severe distortions such as metal streaking artifacts. Important factors of security CT are the emphasis on image quality such as resolution, contrast, noise level, and CT number accuracy for target detection. While MBIR has shown exemplary performance in the trade-off of noise reduction and resolution preservation, we demonstrate that DL may further improve this trade-off. In this study, we used the KSVD-based DL3 combined with the MBIR cost-minimization framework and compared results to Filtered Back Projection (FBP) and MBIR with pairwise pixel-difference regularization. We performed a parameter analysis to show the image quality impact of each parameter. We also investigated few-view CT acquisitions where DL can show an additional advantage relative to pairwise pixel difference regularization.
Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gregory, Ann C.; Solonenko, Sergei A.; Ignacio-Espinoza, J. Cesar
Genetic recombination is a driving force in genome evolution. Among viruses it has a dual role. For genomes with higher fitness, it maintains genome integrity in the face of high mutation rates. Conversely, for genomes with lower fitness, it provides immediate access to sequence space that cannot be reached by mutation alone. Understanding how recombination impacts the cohesion and dissolution of individual whole genomes within viral sequence space is poorly understood across double-stranded DNA bacteriophages (a.k.a phages) due to the challenges of obtaining appropriately scaled genomic datasets. Here in this study we explore the role of recombination in both maintainingmore » and differentiating whole genomes of 142 wild double-stranded DNA marine cyanophages. Phylogenomic analysis across the 51 core genes revealed ten lineages, six of which were well represented. These phylogenomic lineages represent discrete genotypic populations based on comparisons of intra- and inter- lineage shared gene content, genome-wide average nucleotide identity, as well as detected gaps in the distribution of pairwise differences between genomes. McDonald-Kreitman selection tests identified putative niche-differentiating genes under positive selection that differed across the six well-represented genotypic populations and that may have driven initial divergence. Concurrent with patterns of recombination of discrete populations, recombination analyses of both genic and intergenic regions largely revealed decreased genetic exchange across individual genomes between relative to within populations. Lastly, these findings suggest that discrete double-stranded DNA marine cyanophage populations occur in nature and are maintained by patterns of recombination akin to those observed in bacteria, archaea and in sexual eukaryotes.« less
Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer
Gregory, Ann C.; Solonenko, Sergei A.; Ignacio-Espinoza, J. Cesar; ...
2016-11-16
Genetic recombination is a driving force in genome evolution. Among viruses it has a dual role. For genomes with higher fitness, it maintains genome integrity in the face of high mutation rates. Conversely, for genomes with lower fitness, it provides immediate access to sequence space that cannot be reached by mutation alone. Understanding how recombination impacts the cohesion and dissolution of individual whole genomes within viral sequence space is poorly understood across double-stranded DNA bacteriophages (a.k.a phages) due to the challenges of obtaining appropriately scaled genomic datasets. Here in this study we explore the role of recombination in both maintainingmore » and differentiating whole genomes of 142 wild double-stranded DNA marine cyanophages. Phylogenomic analysis across the 51 core genes revealed ten lineages, six of which were well represented. These phylogenomic lineages represent discrete genotypic populations based on comparisons of intra- and inter- lineage shared gene content, genome-wide average nucleotide identity, as well as detected gaps in the distribution of pairwise differences between genomes. McDonald-Kreitman selection tests identified putative niche-differentiating genes under positive selection that differed across the six well-represented genotypic populations and that may have driven initial divergence. Concurrent with patterns of recombination of discrete populations, recombination analyses of both genic and intergenic regions largely revealed decreased genetic exchange across individual genomes between relative to within populations. Lastly, these findings suggest that discrete double-stranded DNA marine cyanophage populations occur in nature and are maintained by patterns of recombination akin to those observed in bacteria, archaea and in sexual eukaryotes.« less
Howard, Kate L.; Woods, Andy T.; Penton-Voak, Ian S.; Neumann, Christof
2018-01-01
We introduce “EloChoice”, a package for R which uses Elo rating to assess pairwise comparisons between stimuli in order to measure perceived stimulus characteristics. To demonstrate the package and compare results from forced choice pairwise comparisons to those from more standard single stimulus rating tasks using Likert (or Likert-type) items, we investigated perceptions of physical strength from images of male bodies. The stimulus set comprised images of 82 men standing on a raised platform with minimal clothing. Strength-related anthropometrics and grip strength measurements were available for each man in the set. UK laboratory participants (Study 1) and US online participants (Study 2) viewed all images in both a Likert rating task, to collect mean Likert scores, and a pairwise comparison task, to calculate Elo, mean Elo (mElo), and Bradley-Terry scores. Within both studies, Likert, Elo and Bradley-Terry scores were closely correlated to mElo scores (all rs > 0.95), and all measures were correlated with stimulus grip strength (all rs > 0.38) and body size (all rs > 0.59). However, mElo scores were less variable than Elo scores and were hundreds of times quicker to compute than Bradley-Terry scores. Responses in pairwise comparison trials were 2/3 quicker than in Likert tasks, indicating that participants found pairwise comparisons to be easier. In addition, mElo scores generated from a data set with half the participants randomly excluded produced very comparable results to those produced with Likert scores from the full participant set, indicating that researchers require fewer participants when using pairwise comparisons. PMID:29293615
Random Partition Distribution Indexed by Pairwise Information
Dahl, David B.; Day, Ryan; Tsai, Jerry W.
2017-01-01
We propose a random partition distribution indexed by pairwise similarity information such that partitions compatible with the similarities are given more probability. The use of pairwise similarities, in the form of distances, is common in some clustering algorithms (e.g., hierarchical clustering), but we show how to use this type of information to define a prior partition distribution for flexible Bayesian modeling. A defining feature of the distribution is that it allocates probability among partitions within a given number of subsets, but it does not shift probability among sets of partitions with different numbers of subsets. Our distribution places more probability on partitions that group similar items yet keeps the total probability of partitions with a given number of subsets constant. The distribution of the number of subsets (and its moments) is available in closed-form and is not a function of the similarities. Our formulation has an explicit probability mass function (with a tractable normalizing constant) so the full suite of MCMC methods may be used for posterior inference. We compare our distribution with several existing partition distributions, showing that our formulation has attractive properties. We provide three demonstrations to highlight the features and relative performance of our distribution. PMID:29276318
Process perspective on image quality evaluation
NASA Astrophysics Data System (ADS)
Leisti, Tuomas; Halonen, Raisa; Kokkonen, Anna; Weckman, Hanna; Mettänen, Marja; Lensu, Lasse; Ritala, Risto; Oittinen, Pirkko; Nyman, Göte
2008-01-01
The psychological complexity of multivariate image quality evaluation makes it difficult to develop general image quality metrics. Quality evaluation includes several mental processes and ignoring these processes and the use of a few test images can lead to biased results. By using a qualitative/quantitative (Interpretation Based Quality, IBQ) methodology, we examined the process of pair-wise comparison in a setting, where the quality of the images printed by laser printer on different paper grades was evaluated. Test image consisted of a picture of a table covered with several objects. Three other images were also used, photographs of a woman, cityscape and countryside. In addition to the pair-wise comparisons, observers (N=10) were interviewed about the subjective quality attributes they used in making their quality decisions. An examination of the individual pair-wise comparisons revealed serious inconsistencies in observers' evaluations on the test image content, but not on other contexts. The qualitative analysis showed that this inconsistency was due to the observers' focus of attention. The lack of easily recognizable context in the test image may have contributed to this inconsistency. To obtain reliable knowledge of the effect of image context or attention on subjective image quality, a qualitative methodology is needed.
Pairwise registration of TLS point clouds using covariance descriptors and a non-cooperative game
NASA Astrophysics Data System (ADS)
Zai, Dawei; Li, Jonathan; Guo, Yulan; Cheng, Ming; Huang, Pengdi; Cao, Xiaofei; Wang, Cheng
2017-12-01
It is challenging to automatically register TLS point clouds with noise, outliers and varying overlap. In this paper, we propose a new method for pairwise registration of TLS point clouds. We first generate covariance matrix descriptors with an adaptive neighborhood size from point clouds to find candidate correspondences, we then construct a non-cooperative game to isolate mutual compatible correspondences, which are considered as true positives. The method was tested on three models acquired by two different TLS systems. Experimental results demonstrate that our proposed adaptive covariance (ACOV) descriptor is invariant to rigid transformation and robust to noise and varying resolutions. The average registration errors achieved on three models are 0.46 cm, 0.32 cm and 1.73 cm, respectively. The computational times cost on these models are about 288 s, 184 s and 903 s, respectively. Besides, our registration framework using ACOV descriptors and a game theoretic method is superior to the state-of-the-art methods in terms of both registration error and computational time. The experiment on a large outdoor scene further demonstrates the feasibility and effectiveness of our proposed pairwise registration framework.
Impaired inference in a case of developmental amnesia.
D'Angelo, Maria C; Rosenbaum, R Shayna; Ryan, Jennifer D
2016-10-01
Amnesia is associated with impairments in relational memory, which is critically supported by the hippocampus. By adapting the transitivity paradigm, we previously showed that age-related impairments in inference were mitigated when judgments could be predicated on known pairwise relations, however, such advantages were not observed in the adult-onset amnesic case D.A. Here, we replicate and extend this finding in a developmental amnesic case (N.C.), who also shows impaired relational learning and transitive expression. Unlike D.A., N.C.'s damage affected the extended hippocampal system and diencephalic structures, and does not extend to neocortical areas that are affected in D.A. Critically, despite their differences in etiology and affected structures, N.C. and D.A. perform similarly on the task. N.C. showed intact pairwise knowledge, suggesting that he is able to use existing semantic information, but this semantic knowledge was insufficient to support transitive expression. The present results suggest a critical role for regions connected to the hippocampus and/or medial prefrontal cortex in inference beyond learning of pairwise relations. © 2016 The Authors Hippocampus Published by Wiley Periodicals, Inc. © 2016 The Authors. Wiley Periodicals, Inc.
Arai, Satoru; Gu, Se Hun; Baek, Luck Ju; Tabara, Kenji; Bennett, Shannon; Oh, Hong-Shik; Takada, Nobuhiro; Kang, Hae Ji; Tanaka-Taya, Keiko; Morikawa, Shigeru; Okabe, Nobuhiko; Yanagihara, Richard; Song, Jin-Won
2012-01-01
Spurred by the recent isolation of a novel hantavirus, named Imjin virus (MJNV), from the Ussuri white-toothed shrew (Crocidura lasiura), targeted trapping was conducted for the phylogenetically related Asian lesser white-toothed shrew (Crocidura shantungensis). Pair-wise alignment and comparison of the S, M and L segments of a newfound hantavirus, designated Jeju virus (JJUV), indicated remarkably low nucleotide and amino acid sequence similarity with MJNV. Phylogenetic analyses, using maximum likelihood and Bayesian methods, showed divergent ancestral lineages for JJUV and MJNV, despite the close phylogenetic relationship of their reservoir soricid hosts. Also, no evidence of host switching was apparent in tanglegrams, generated by TreeMap 2.0β. PMID:22230701
Chemale, Gustavo; Paneto, Greiciane Gaburro; Menezes, Meiga Aurea Mendes; de Freitas, Jorge Marcelo; Jacques, Guilherme Silveira; Cicarelli, Regina Maria Barretto; Fagundes, Paulo Roberto
2013-05-01
Mitochondrial DNA (mtDNA) analysis is usually a last resort in routine forensic DNA casework. However, it has become a powerful tool for the analysis of highly degraded samples or samples containing too little or no nuclear DNA, such as old bones and hair shafts. The gold standard methodology still constitutes the direct sequencing of polymerase chain reaction (PCR) products or cloned amplicons from the HVS-1 and HVS-2 (hypervariable segment) control region segments. Identifications using mtDNA are time consuming, expensive and can be very complex, depending on the amount and nature of the material being tested. The main goal of this work is to develop a less labour-intensive and less expensive screening method for mtDNA analysis, in order to aid in the exclusion of non-matching samples and as a presumptive test prior to final confirmatory DNA sequencing. We have selected 14 highly discriminatory single nucleotide polymorphisms (SNPs) based on simulations performed by Salas and Amigo (2010) to be typed using SNaPShot(TM) (Applied Biosystems, Foster City, CA, USA). The assay was validated by typing more than 100 HVS-1/HVS-2 sequenced samples. No differences were observed between the SNP typing and DNA sequencing when results were compared, with the exception of allelic dropouts observed in a few haplotypes. Haplotype diversity simulations were performed using 172 mtDNA sequences representative of the Brazilian population and a score of 0.9794 was obtained when the 14 SNPs were used, showing that the theoretical prediction approach for the selection of highly discriminatory SNPs suggested by Salas and Amigo (2010) was confirmed in the population studied. As the main goal of the work is to develop a screening assay to skip the sequencing of all samples in a particular case, a pair-wise comparison of the sequences was done using the selected SNPs. When both HVS-1/HVS-2 SNPs were used for simulations, at least two differences were observed in 93.2% of the comparisons performed. The assay was validated with casework samples. Results show that the method is straightforward and can be used for exclusionary purposes, saving time and laboratory resources. The assay confirms the theoretic prediction suggested by Salas and Amigo (2010). All forensic advantages, such as high sensitivity and power of discrimination, as also the disadvantages, such as the occurrence of allele dropouts, are discussed throughout the article. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Nimbalkar, Smita; Oh, Yih Y; Mok, Reei Y; Tioh, Jing Y; Yew, Kai J; Patil, Pravinkumar G
2018-03-16
Buccal corridor space and its variations greatly influence smile attractiveness. Facial types are different for different ethnic populations, and so is smile attractiveness. The subjective perception of smile attractiveness of different populations may vary in regard to different buccal corridor spaces and facial patterns. The purpose of this study was to determine esthetic perceptions of the Malaysian population regarding the width of buccal corridor spaces and their effect on smile esthetics in individuals with short, normal, and long faces. The image of a smiling individual with a mesofacial face was modified to create 2 different facial types (brachyfacial and dolicofacial). Each face form was further modified into 5 different buccal corridors (2%, 10%, 15%, 22%, and 28%). The images were submitted to 3 different ethnic groups of evaluators (Chinese, Malay, Indian; 100 each), ranging between 17 and 21 years of age. A visual analog scale (50 mm in length) was used for assessment. The scores given to each image were compared with the Kruskal-Wallis test, and pairwise comparison was performed using the Mann-Whitney U test (α=.05). All 3 groups of evaluators could distinguish gradations of dark spaces in the buccal corridor at 2%, 10%, and 28%. Statistically significant differences were observed among 3 groups of evaluators in esthetic perception when pairwise comparisons were performed. A 15% buccal corridor was found to score esthetically equally within 3 face types by all 3 groups of evaluators. The Indian population was more critical in evaluation than the Chinese or Malay populations. In a pairwise comparison, more significant differences were found between long and short faces and the normal face; the normal face was compared with long and short faces separately. The width of the buccal corridor space influences smile attractiveness in different facial types. A medium buccal corridor (15%) is the esthetic characteristic preferred by all groups of evaluators in short, normal, and long face types. Copyright © 2017 Editorial Council for the Journal of Prosthetic Dentistry. Published by Elsevier Inc. All rights reserved.
Lactococcus petauri sp. nov., isolated from an abscess of a sugar glider
Goodman, Laura B.; Lawton, Marie R.; Franklin-Guild, Rebecca J.; Anderson, Renee R.; Schaan, Lynn; Thachil, Anil J.; Wiedmann, Martin; Miller, Claire B.; Alcaine, Samuel D.; Kovac, Jasna
2017-01-01
A strain of lactic acid bacteria, designated 159469T, isolated from a facial abscess in a sugar glider, was characterized genetically and phenotypically. Cells of the strain were Gram-stain-positive, coccoid and catalase-negative. Morphological, physiological and phylogenetic data indicated that the isolate belongs to the genus Lactococcus. Strain 159469T was closely related to Lactococcus garvieae ATCC 43921T, showing 95.86 and 98.08 % sequence similarity in 16S rRNA gene and rpoB gene sequences, respectively. Furthermore, a pairwise average nucleotide identity blast (ANIb) value of 93.54 % and in silico DNA–DNA hybridization value of 50.7 % were determined for the genome of strain 159469T, when compared with the genome of the type strain of Lactococcus garvieae. Based on the data presented here, the isolate represents a novel species of the genus Lactococcus, for which the name Lactococcus petauri sp. nov. is proposed. The type strain is 159469T (=LMG 30040T=DSM 104842T). PMID:28945531
ExoLocator--an online view into genetic makeup of vertebrate proteins.
Khoo, Aik Aun; Ogrizek-Tomas, Mario; Bulovic, Ana; Korpar, Matija; Gürler, Ece; Slijepcevic, Ivan; Šikic, Mile; Mihalek, Ivana
2014-01-01
ExoLocator (http://exolocator.eopsf.org) collects in a single place information needed for comparative analysis of protein-coding exons from vertebrate species. The main source of data--the genomic sequences, and the existing exon and homology annotation--is the ENSEMBL database of completed vertebrate genomes. To these, ExoLocator adds the search for ostensibly missing exons in orthologous protein pairs across species, using an extensive computational pipeline to narrow down the search region for the candidate exons and find a suitable template in the other species, as well as state-of-the-art implementations of pairwise alignment algorithms. The resulting complements of exons are organized in a way currently unique to ExoLocator: multiple sequence alignments, both on the nucleotide and on the peptide levels, clearly indicating the exon boundaries. The alignments can be inspected in the web-embedded viewer, downloaded or used on the spot to produce an estimate of conservation within orthologous sets, or functional divergence across paralogues.
Listeria costaricensis sp. nov.
Núñez-Montero, Kattia; Leclercq, Alexandre; Moura, Alexandra; Vales, Guillaume; Peraza, Johnny; Pizarro-Cerdá, Javier; Lecuit, Marc
2018-03-01
A bacterial strain isolated from a food processing drainage system in Costa Rica fulfilled the criteria as belonging to the genus Listeria, but could not be assigned to any of the known species. Phylogenetic analysis based on the 16S rRNA gene revealed highest sequence similarity with the type strain of Listeria floridensis (98.7 %). Phylogenetic analysis based on Listeria core genomes placed the novel taxon within the Listeria fleishmannii, L. floridensis and Listeria aquatica clade (Listeria sensu lato). Whole-genome sequence analyses based on the average nucleotide blast identity (ANI<80 %) indicated that this isolate belonged to a novel species. Results of pairwise amino acid identity (AAI>70 %) and percentage of conserved proteins (POCP>68 %) with currently known Listeria species, as well as of biochemical characterization, confirmed that the strain constituted a novel species within the genus Listeria. The name Listeria costaricensis sp. nov. is proposed for the novel species, and is represented by the type strain CLIP 2016/00682 T (=CIP 111400 T =DSM 105474 T ).
Alignment of RNA molecules: Binding energy and statistical properties of random sequences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Valba, O. V., E-mail: valbaolga@gmail.com; Nechaev, S. K., E-mail: sergei.nechaev@gmail.com; Tamm, M. V., E-mail: thumm.m@gmail.com
2012-02-15
A new statistical approach to the problem of pairwise alignment of RNA sequences is proposed. The problem is analyzed for a pair of interacting polymers forming an RNA-like hierarchical cloverleaf structures. An alignment is characterized by the numbers of matches, mismatches, and gaps. A weight function is assigned to each alignment; this function is interpreted as a free energy taking into account both direct monomer-monomer interactions and a combinatorial contribution due to formation of various cloverleaf secondary structures. The binding free energy is determined for a pair of RNA molecules. Statistical properties are discussed, including fluctuations of the binding energymore » between a pair of RNA molecules and loop length distribution in a complex. Based on an analysis of the free energy per nucleotide pair complexes of random RNAs as a function of the number of nucleotide types c, a hypothesis is put forward about the exclusivity of the alphabet c = 4 used by nature.« less
Model for calculation of electrostatic contribution into protein stability
NASA Astrophysics Data System (ADS)
Kundrotas, Petras; Karshikoff, Andrey
2003-03-01
Existing models of the denatured state of proteins consider only one possible spatial distribution of protein charges and therefore are applicable to a limited number of cases. In this presentation a more general framework for the modeling of the denatured state is proposed. It is based on the assumption that the titratable groups of an unfolded protein can adopt a quasi-random distribution, restricted by the protein sequence. The model was tested on two proteins, barnase and N-terminal domain of the ribosomal protein L9. The calculated free energy of denaturation, Δ G( pH), reproduces the experimental data essentially better than the commonly used null approximation (NA). It was demonstrated that the seemingly good agreement with experimental data obtained by NA originates from the compensatory effect between the pair-wise electrostatic interactions and the desolvation energy of the individual sites. It was also found that the ionization properties of denatured proteins are influenced by the protein sequence.
Su, B; Fu, Y; Wang, Y; Jin, L; Chakraborty, R
2001-06-01
The red panda (Ailurus fulgens) is one of the flagship species in worldwide conservation and is of special interest in evolutionary studies due to its taxonomic uniqueness. We sequenced a 236-bp fragment of the mitochondrial D-loop region in a sample of 53 red pandas from two populations in southwestern China. Seventeen polymorphic sites were found, together with a total of 25 haplotypes, indicating a high level of genetic diversity in the red panda. However, no obvious genetic divergence was detected between the Sichuan and Yunnan populations. The consensus phylogenetic tree of the 25 haplotypes was starlike. The pairwise mismatch distribution fitted into a pattern of populations undergoing expansion. Furthermore, Fu's F(S) test of neutrality was significant for the total population (F(S) = -7.573), which also suggests a recent population expansion. Interestingly, the effective population size in the Sichuan population was both larger and more stable than that in the Yunnan population, implying a southward expansion from Sichuan to Yunnan.
Masuda, R; Lopez, J V; Slattery, J P; Yuhki, N; O'Brien, S J
1996-12-01
Molecular phylogeny of the cat family Felidae is derived using two mitochondrial genes, cytochrome b and 12S rRNA. Phylogenetic methods of weighted maximum parsimony and minimum evolution estimated by neighbor-joining are employed to reconstruct topologies among 20 extant felid species. Sequence analyses of 363 bp of cytochrome b and 376 bp of the 12S rRNA genes yielded average pair-wise similarity values between felids ranging from 94 to 99% and from 85 to 99%, respectively. Phylogenetic reconstruction supports more recent, intralineage associations but fails to completely resolve interlineage relationships. Both genes produce a monophyletic group of Felis species but vary in the placement of the pallas cat. The ocelot lineage represents an early divergence within the Felidae, with strong associations between ocelot and margay, Geoffroy's cat and kodkod, and pampas cat and tigrina. Implications of the relative recency of felid evolution, presence of ancestral polymorphisms, and influence of outgroups in placement of the topological root are discussed.
Lactococcus petauri sp. nov., isolated from an abscess of a sugar glider.
Goodman, Laura B; Lawton, Marie R; Franklin-Guild, Rebecca J; Anderson, Renee R; Schaan, Lynn; Thachil, Anil J; Wiedmann, Martin; Miller, Claire B; Alcaine, Samuel D; Kovac, Jasna
2017-11-01
A strain of lactic acid bacteria, designated 159469 T , isolated from a facial abscess in a sugar glider, was characterized genetically and phenotypically. Cells of the strain were Gram-stain-positive, coccoid and catalase-negative. Morphological, physiological and phylogenetic data indicated that the isolate belongs to the genus Lactococcus. Strain 159469 T was closely related to Lactococcus garvieae ATCC 43921 T , showing 95.86 and 98.08 % sequence similarity in 16S rRNA gene and rpoB gene sequences, respectively. Furthermore, a pairwise average nucleotide identity blast (ANIb) value of 93.54 % and in silico DNA-DNA hybridization value of 50.7 % were determined for the genome of strain 159469 T , when compared with the genome of the type strain of Lactococcus garvieae. Based on the data presented here, the isolate represents a novel species of the genus Lactococcus, for which the name Lactococcus petauri sp. nov. is proposed. The type strain is 159469 T (=LMG 30040 T =DSM 104842 T ).
Smibert, O C; Aung, A K; Woolnough, E; Carter, G P; Schultz, M B; Howden, B P; Seemann, T; Spelman, D; McGloughlin, S; Peleg, A Y
2018-03-02
Few studies have used molecular epidemiological methods to study transmission links to clinical isolates in intensive care units. Ninety-four multidrug-resistant organisms (MDROs) cultured from routine specimens from intensive care unit (ICU) patients over 13 weeks were stored (11 meticillin-resistant Staphylococcus aureus (MRSA), two vancomycin-resistant enterococci and 81 Gram-negative bacteria). Medical staff personal mobile phones, departmental phones, and ICU keyboards were swabbed and cultured for MDROs; MRSA was isolated from two phones. Environmental and patient isolates of the same genus were selected for whole genome sequencing. On whole genome sequencing, the mobile phone isolates had a pairwise single nucleotide polymorphism (SNP) distance of 183. However, >15,000 core genome SNPs separated the mobile phone and clinical isolates. In a low-endemic setting, mobile phones and keyboards appear unlikely to contribute to hospital-acquired MDROs. Copyright © 2018 The Healthcare Infection Society. Published by Elsevier Ltd. All rights reserved.
Lee, J; Hymowitz, T
2001-11-01
Phylogenetic relationships among 13 genera of the subtribe Glycininae, two genera of the allied subtribe Diocleinae that were included within Glycininae by Polhill, and two genera of the subtribe Erythrininae as outgroups were inferred from chloroplast DNA rps16 intron sequence variation. Pairwise sequence divergence values ranged from identity between Teramnus mollis and T. micans and between T. flexilis and T. labialis to 7.89% between Pueraria wallichii and Pseudeminia comosa across all accessions. Phylogenies estimated using parsimony and neighbor-joining methods revealed that (1) Glycininae is monophyletic if Pachyrhizus and Calopogonium (both Diocleinae) are included within Glycininae; (2) the genus Teramnus is closely related to Glycine, and Amphicarpaea showed a sister relationship to the clade comprising Teramnus and Glycine; (3) the expanded Glycininae including two genera of Diocleinae is divided into three branches, temporarily named I (comprising the rest of the examined taxa), II (Pueraria wallichii), and III (Mastersia), but their relationships are equivocal; and (4) the genus Pueraria, regarded as a closely related genus to Glycine, is not monophyletic and should be divided into at least four genera (a hypothesis supported previously by Lackey).
Minimap2: pairwise alignment for nucleotide sequences.
Li, Heng
2018-05-10
Recent advances in sequencing technologies promise ultra-long reads of ∼100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms. Minimap2 is a general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It works with accurate short reads of ≥ 100bp in length, ≥1kb genomic reads at error rate ∼15%, full-length noisy Direct RNA or cDNA reads, and assembly contigs or closely related full chromosomes of hundreds of megabases in length. Minimap2 does split-read alignment, employs concave gap cost for long insertions and deletions (INDELs) and introduces new heuristics to reduce spurious alignments. It is 3-4 times as fast as mainstream short-read mappers at comparable accuracy, and is ≥30 times faster than long-read genomic or cDNA mappers at higher accuracy, surpassing most aligners specialized in one type of alignment. https://github.com/lh3/minimap2. hengli@broadinstitute.org.
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family.
Danisman, Selahattin; van Dijk, Aalt D J; Bimbo, Andrea; van der Wal, Froukje; Hennig, Lars; de Folter, Stefan; Angenent, Gerco C; Immink, Richard G H
2013-12-01
Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein-protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein-protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family.
Analysis of functional redundancies within the Arabidopsis TCP transcription factor family
Danisman, Selahattin; de Folter, Stefan; Immink, Richard G. H.
2013-01-01
Analyses of the functions of TEOSINTE-LIKE1, CYCLOIDEA, and PROLIFERATING CELL FACTOR1 (TCP) transcription factors have been hampered by functional redundancy between its individual members. In general, putative functionally redundant genes are predicted based on sequence similarity and confirmed by genetic analysis. In the TCP family, however, identification is impeded by relatively low overall sequence similarity. In a search for functionally redundant TCP pairs that control Arabidopsis leaf development, this work performed an integrative bioinformatics analysis, combining protein sequence similarities, gene expression data, and results of pair-wise protein–protein interaction studies for the 24 members of the Arabidopsis TCP transcription factor family. For this, the work completed any lacking gene expression and protein–protein interaction data experimentally and then performed a comprehensive prediction of potential functional redundant TCP pairs. Subsequently, redundant functions could be confirmed for selected predicted TCP pairs by genetic and molecular analyses. It is demonstrated that the previously uncharacterized class I TCP19 gene plays a role in the control of leaf senescence in a redundant fashion with TCP20. Altogether, this work shows the power of combining classical genetic and molecular approaches with bioinformatics predictions to unravel functional redundancies in the TCP transcription factor family. PMID:24129704
STELLAR: fast and exact local alignments
2011-01-01
Background Large-scale comparison of genomic sequences requires reliable tools for the search of local alignments. Practical local aligners are in general fast, but heuristic, and hence sometimes miss significant matches. Results We present here the local pairwise aligner STELLAR that has full sensitivity for ε-alignments, i.e. guarantees to report all local alignments of a given minimal length and maximal error rate. The aligner is composed of two steps, filtering and verification. We apply the SWIFT algorithm for lossless filtering, and have developed a new verification strategy that we prove to be exact. Our results on simulated and real genomic data confirm and quantify the conjecture that heuristic tools like BLAST or BLAT miss a large percentage of significant local alignments. Conclusions STELLAR is very practical and fast on very long sequences which makes it a suitable new tool for finding local alignments between genomic sequences under the edit distance model. Binaries are freely available for Linux, Windows, and Mac OS X at http://www.seqan.de/projects/stellar. The source code is freely distributed with the SeqAn C++ library version 1.3 and later at http://www.seqan.de. PMID:22151882
Weigel, B J; Burgett, S G; Chen, V J; Skatrud, P L; Frolik, C A; Queener, S W; Ingolia, T D
1988-01-01
beta-Lactam antibiotics such as penicillins and cephalosporins are synthesized by a wide variety of microbes, including procaryotes and eucaryotes. Isopenicillin N synthetase catalyzes a key reaction in the biosynthetic pathway of penicillins and cephalosporins. The genes encoding this protein have previously been cloned from the filamentous fungi Cephalosporium acremonium and Penicillium chrysogenum and characterized. We have extended our analysis to the isopenicillin N synthetase genes from the fungus Aspergillus nidulans and the gram-positive procaryote Streptomyces lipmanii. The isopenicillin N synthetase genes from these organisms have been cloned and sequenced, and the proteins encoded by the open reading frames were expressed in Escherichia coli. Active isopenicillin N synthetase enzyme was recovered from extracts of E. coli cells prepared from cells containing each of the genes in expression vectors. The four isopenicillin N synthetase genes studied are closely related. Pairwise comparison of the DNA sequences showed between 62.5 and 75.7% identity; comparison of the predicted amino acid sequences showed between 53.9 and 80.6% identity. The close homology of the procaryotic and eucaryotic isopenicillin N synthetase genes suggests horizontal transfer of the genes during evolution. Images PMID:3045077
Bourland, William A; Wendell, Laura; Hampikian, Greg; Vďačný, Peter
2014-02-01
We describe the morphology and 18S rDNA phylogeny of Bryophryoides ocellatus n. g., n. sp., a bryophryid ciliate inhabiting in situ soil percolates from Idaho, U.S.A. The new genus is distinguished from other bryophryid genera by a combination of the following features: (1) kreyellid (irregularly meshed) silverline pattern, (2) polymorphic adoral organelles in the preoral suture, (3) absence of vestibular kineties. In phylogenetic analyses, Bryophryoides ocellatus is most closely related to Bryophrya gemmea. The 18S rDNA sequence pairwise distance of 2% between these genera, while similar to that between many colpodidan species, exceeds that between some colpodidan genera (e.g. Mykophagophrys and Pseudoplatyophrya, 1.1%), further supporting establishment of the new genus. Topology hypothesis testing strongly supports the monophyly of the Colpodida including the bryophryids. Despite weak nodal support, tests of topology constraints narrowly reject the non-monophyly of the sequenced Bryophryidae (Bryophrya+Bryophryoides+Notoxoma). Likewise, the monophyletic origin of the sequenced Bryophryidae is indicated in the phylogenetic networks though with low support. Copyright © 2013 Elsevier GmbH. All rights reserved.
rVISTA 2.0: Evolutionary Analysis of Transcription Factor Binding Sites
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loots, G G; Ovcharenko, I
2004-01-28
Identifying and characterizing the patterns of DNA cis-regulatory modules represents a challenge that has the potential to reveal the regulatory language the genome uses to dictate transcriptional dynamics. Several studies have demonstrated that regulatory modules are under positive selection and therefore are often conserved between related species. Using this evolutionary principle we have created a comparative tool, rVISTA, for analyzing the regulatory potential of noncoding sequences. The rVISTA tool combines transcription factor binding site (TFBS) predictions, sequence comparisons and cluster analysis to identify noncoding DNA regions that are highly conserved and present in a specific configuration within an alignment. Heremore » we present the newly developed version 2.0 of the rVISTA tool that can process alignments generated by both zPicture and PipMaker alignment programs or use pre-computed pairwise alignments of seven vertebrate genomes available from the ECR Browser. The rVISTA web server is closely interconnected with the TRANSFAC database, allowing users to either search for matrices present in the TRANSFAC library collection or search for user-defined consensus sequences. rVISTA tool is publicly available at http://rvista.dcode.org/.« less
Lee, Justin S; Bevins, Sarah N; Serieys, Laurel E K; Vickers, Winston; Logan, Ken A; Aldredge, Mat; Boydston, Erin E; Lyren, Lisa M; McBride, Roy; Roelke-Parker, Melody; Pecon-Slattery, Jill; Troyer, Jennifer L; Riley, Seth P; Boyce, Walter M; Crooks, Kevin R; VandeWoude, Sue
2014-07-01
Mountain lions (Puma concolor) throughout North and South America are infected with puma lentivirus clade B (PLVB). A second, highly divergent lentiviral clade, PLVA, infects mountain lions in southern California and Florida. Bobcats (Lynx rufus) in these two geographic regions are also infected with PLVA, and to date, this is the only strain of lentivirus identified in bobcats. We sequenced full-length PLV genomes in order to characterize the molecular evolution of PLV in bobcats and mountain lions. Low sequence homology (88% average pairwise identity) and frequent recombination (1 recombination breakpoint per 3 isolates analyzed) were observed in both clades. Viral proteins have markedly different patterns of evolution; sequence homology and negative selection were highest in Gag and Pol and lowest in Vif and Env. A total of 1.7% of sites across the PLV genome evolve under positive selection, indicating that host-imposed selection pressure is an important force shaping PLV evolution. PLVA strains are highly spatially structured, reflecting the population dynamics of their primary host, the bobcat. In contrast, the phylogeography of PLVB reflects the highly mobile mountain lion, with diverse PLVB isolates cocirculating in some areas and genetically related viruses being present in populations separated by thousands of kilometers. We conclude that PLVA and PLVB are two different viral species with distinct feline hosts and evolutionary histories. Importance: An understanding of viral evolution in natural host populations is a fundamental goal of virology, molecular biology, and disease ecology. Here we provide a detailed analysis of puma lentivirus (PLV) evolution in two natural carnivore hosts, the bobcat and mountain lion. Our results illustrate that PLV evolution is a dynamic process that results from high rates of viral mutation/recombination and host-imposed selection pressure. Copyright © 2014, American Society for Microbiology. All Rights Reserved.
ASPM and the Evolution of Cerebral Cortical Size in a Community of New World Monkeys
Villanea, Fernando A.; Perry, George H.; Gutiérrez-Espeleta, Gustavo A.; Dominy, Nathaniel J.
2012-01-01
The ASPM (abnormal spindle-like microcephaly associated) gene has been proposed as a major determinant of cerebral cortical size among primates, including humans. Yet the specific functions of ASPM and its connection to human intelligence remain controversial. This debate is limited in part by a taxonomic focus on Old World monkeys and apes. Here we expand the comparative context of ASPM sequence analyses with a study of New World monkeys, a radiation of primates in which enlarged brain size has evolved in parallel in spider monkeys (genus Ateles) and capuchins (genus Cebus). The primate community of Costa Rica is perhaps a model system because it allows for independent pairwise comparisons of smaller- and larger-brained species within two taxonomic families. Accordingly, we analyzed the complete sequence of exon 18 of ASPM in Ateles geoffroyi, Alouatta palliata, Cebus capucinus, and Saimiri oerstedii. As the analysis of multiple species in a genus improves phylogenetic reconstruction, we also analyzed eleven published sequences from other New World monkeys. Our exon-wide, lineage-specific analysis of eleven genera and the ratio of rates of nonsynonymous to synonymous substitutions (dN/dS) on ASPM revealed no detectable evidence for positive selection in the lineages leading to Ateles or Cebus, as indicated by dN/dS ratios of <1.0 (0.6502 and 0.4268, respectively). Our results suggest that a multitude of interacting genes have driven the evolution of larger brains among primates, with different genes involved in this process in different encephalized lineages, or at least with evidence for positive selection not readily apparent for the same genes in all lineages. The primate community of Costa Rica may serve as a model system for future studies that aim to elucidate the molecular mechanisms underlying cognitive capacity and cortical size. PMID:23028686
Selection of a DNA barcode for Nectriaceae from fungal whole-genomes.
Zeng, Zhaoqing; Zhao, Peng; Luo, Jing; Zhuang, Wenying; Yu, Zhihe
2012-01-01
A DNA barcode is a short segment of sequence that is able to distinguish species. A barcode must ideally contain enough variation to distinguish every individual species and be easily obtained. Fungi of Nectriaceae are economically important and show high species diversity. To establish a standard DNA barcode for this group of fungi, the genomes of Neurospora crassa and 30 other filamentous fungi were compared. The expect value was treated as a criterion to recognize homologous sequences. Four candidate markers, Hsp90, AAC, CDC48, and EF3, were tested for their feasibility as barcodes in the identification of 34 well-established species belonging to 13 genera of Nectriaceae. Two hundred and fifteen sequences were analyzed. Intra- and inter-specific variations and the success rate of PCR amplification and sequencing were considered as important criteria for estimation of the candidate markers. Ultimately, the partial EF3 gene met the requirements for a good DNA barcode: No overlap was found between the intra- and inter-specific pairwise distances. The smallest inter-specific distance of EF3 gene was 3.19%, while the largest intra-specific distance was 1.79%. In addition, there was a high success rate in PCR and sequencing for this gene (96.3%). CDC48 showed sufficiently high sequence variation among species, but the PCR and sequencing success rate was 84% using a single pair of primers. Although the Hsp90 and AAC genes had higher PCR and sequencing success rates (96.3% and 97.5%, respectively), overlapping occurred between the intra- and inter-specific variations, which could lead to misidentification. Therefore, we propose the EF3 gene as a possible DNA barcode for the nectriaceous fungi.
Phylogenetic analysis of Demodex caprae based on mitochondrial 16S rDNA sequence.
Zhao, Ya-E; Hu, Li; Ma, Jun-Xian
2013-11-01
Demodex caprae infests the hair follicles and sebaceous glands of goats worldwide, which not only seriously impairs goat farming, but also causes a big economic loss. However, there are few reports on the DNA level of D. caprae. To reveal the taxonomic position of D. caprae within the genus Demodex, the present study conducted phylogenetic analysis of D. caprae based on mt16S rDNA sequence data. D. caprae adults and eggs were obtained from a skin nodule of the goat suffering demodicidosis. The mt16S rDNA sequences of individual mite were amplified using specific primers, and then cloned, sequenced, and aligned. The sequence divergence, genetic distance, and transition/transversion rate were computed, and the phylogenetic trees in Demodex were reconstructed. Results revealed the 339-bp partial sequences of six D. caprae isolates were obtained, and the sequence identity was 100% among isolates. The pairwise divergences between D. caprae and Demodex canis or Demodex folliculorum or Demodex brevis were 22.2-24.0%, 24.0-24.9%, and 22.9-23.2%, respectively. The corresponding average genetic distances were 2.840, 2.926, and 2.665, and the average transition/transversion rates were 0.70, 0.55, and 0.54, respectively. The divergences, genetic distances, and transition/transversion rates of D. caprae versus the other three species all reached interspecies level. The five phylogenetic trees all presented that D. caprae clustered with D. brevis first, and then with D. canis, D. folliculorum, and Demodex injai in sequence. In conclusion, D. caprae is an independent species, and it is closer to D. brevis than to D. canis, D. folliculorum, or D. injai.
Conservation and diversification of Msx protein in metazoan evolution.
Takahashi, Hirokazu; Kamiya, Akiko; Ishiguro, Akira; Suzuki, Atsushi C; Saitou, Naruya; Toyoda, Atsushi; Aruga, Jun
2008-01-01
Msx (/msh) family genes encode homeodomain (HD) proteins that control ontogeny in many animal species. We compared the structures of Msx genes from a wide range of Metazoa (Porifera, Cnidaria, Nematoda, Arthropoda, Tardigrada, Platyhelminthes, Mollusca, Brachiopoda, Annelida, Echiura, Echinodermata, Hemichordata, and Chordata) to gain an understanding of the role of these genes in phylogeny. Exon-intron boundary analysis suggested that the position of the intron located N-terminally to the HDs was widely conserved in all the genes examined, including those of cnidarians. Amino acid (aa) sequence comparison revealed 3 new evolutionarily conserved domains, as well as very strong conservation of the HDs. Two of the three domains were associated with Groucho-like protein binding in both a vertebrate and a cnidarian Msx homolog, suggesting that the interaction between Groucho-like proteins and Msx proteins was established in eumetazoan ancestors. Pairwise comparison among the collected HDs and their C-flanking aa sequences revealed that the degree of sequence conservation varied depending on the animal taxa from which the sequences were derived. Highly conserved Msx genes were identified in the Vertebrata, Cephalochordata, Hemichordata, Echinodermata, Mollusca, Brachiopoda, and Anthozoa. The wide distribution of the conserved sequences in the animal phylogenetic tree suggested that metazoan ancestors had already acquired a set of conserved domains of the current Msx family genes. Interestingly, although strongly conserved sequences were recovered from the Vertebrata, Cephalochordata, and Anthozoa, the sequences from the Urochordata and Hydrozoa showed weak conservation. Because the Vertebrata-Cephalochordata-Urochordata and Anthozoa-Hydrozoa represent sister groups in the Chordata and Cnidaria, respectively, Msx sequence diversification may have occurred differentially in the course of evolution. We speculate that selective loss of the conserved domains in Msx family proteins contributed to the diversification of animal body organization.
2009-01-01
We analyzed mtDNA control region sequences of green turtles (Chelonia mydas) from Arvoredo Island, a foraging ground in southern Brazil, and identified eight haplotypes. Of these, CM-A8 (64%) and CM-A5 (22%) were dominant, the remainder presenting low frequencies (< 5%). Haplotype (h) and nucleotide (π) diversities were 0.5570 ± 0.0697 and 0.0021 ± 0.0016, respectively. Exact tests of differentiation and AMOVA ΦST pairwise values between the study area and eight other Atlantic foraging grounds revealed significant differences in most areas, except Ubatuba and Rocas/Noronha, in Brazil (p > 0.05). Mixed Stock Analysis, incorporating eleven Atlantic and one Mediterranean rookery as possible sources of individuals, indicated Ascension and Aves islands as the main contributing stocks to the Arvoredo aggregation (68.01% and 22.96%, respectively). These results demonstrate the extensive relationships between Arvoredo Island and other Atlantic foraging and breeding areas. Such an understanding provides a framework for establishing adequate management and conservation strategies for this endangered species. PMID:21637527
Ekström, Johanna; Forslund, Ola; Dillner, Joakim
2010-02-20
To expand our knowledge of the genomic diversity of human papillomaviruses (HPVs), we searched for new HPVs in squamous cell carcinomas of the skin (SCC) and seemingly HPV-negative, otherwise typically HPV-associated lesions. We describe the characterization of three novel HPV types. HPV109 was isolated from an SCC, HPV112 from a condyloma and HPV114 from a low-grade cervical lesion. Pairwise alignment of the L1 sequences classified HPV114 to genus alpha species 3, whereas HPV112 defined a new species in the genus gamma. HPV109 had uncertain classification because of a low and about equal similarity in the L1 gene (between 60% and 65%) to different genera. Type-specific real-time PCRs of cervical samples, a majority from women with low grade atypical cytology, (n=2856) and various cutaneous samples (n=538), found HPV114 in 1.7% (48/2856) of the genital samples, whereas both HPV109 and 112 were rare viruses found at high viral loads only in their index samples. Copyright 2009 Elsevier Inc. All rights reserved.
Laukkanen, Anne-Maria; Pulakka, Hannu; Alku, Paavo; Vilkman, Erkki; Hertegård, Stellan; Lindestad, Per-Ake; Larsson, Hans; Granqvist, Svante
2007-01-01
Vocal exercises that increase the vocal tract impedance are widely used in voice training and therapy. The present study applies a versatile methodology to investigate phonation during varying artificial extension of the vocal tract. Two males and one female phonated into a hard-walled plastic tube (phi 2 cm), whose physical length was randomly pair-wise changed between 30 cm, 60 cm and 100 cm. High-speed image (1900 f/sec) sequences of the vocal folds were obtained via a rigid endoscope. Acoustic and electroglottographic signals (EGG) were recorded. Oral pressure during shuttering of the tube was used to give an estimate of subglottic pressure (Psub). The only trend observed was that with the two longer tubes compared to the shortest one, fundamental frequency was lower, open time of the glottis shorter, and Psub higher. The results may partly reflect increased vocal tract impedance as such and partly the increased vocal effort to compensate for it. In other parameters there were individual differences in tube length-related changes, suggesting complexity of the coupling between supraglottic space and the glottis.
Gene Flow Patterns of the Mayfly Fallceon quilleri in San Diego County, California.
NASA Astrophysics Data System (ADS)
Zickovich, J.; Bohonak, A. J.
2005-05-01
Management decisions and conservation strategies for freshwater invertebrates critically depend on an understanding of gene flow and genetic structure. We collected the mayfly Fallceon quilleri (Ephemeroptera: Baetidae) from 15 streams across three geographically distinct watersheds in San Diego County, California (San Dieguito, Santa Margarita, and Tijuana) and one site in Anza-Borrego desert. We sequenced a 667 base pair region of the mitochondrial DNA (COI) to assess genetic structure and gene flow. We found eight haplotypes across all populations. San Dieguito and Santa Margarita each contained six haplotypes. Tijuana and Anza Borrego each contained four haplotypes. The expected heterozygosity for San Dieguito, Santa Margarita, Tijuana, and Anza Borrego was 0.81, 0.83, 0.75, and 1.0, respectively. A hierarchical AMOVA analysis indicated restricted gene flow and a pairwise comparison indicated that Tijuana watershed differs significantly from San Dieguito and Anza Borrego. A haplotype cladogram revealed two internal ancestral haplotypes and six derived tip haplotypes that are unique to particular watersheds. These results suggest that Tijuana (the southernmost and the most impacted watershed) is more genetically distinct and isolated than the other watersheds sampled.
A Procedure for Testing the Difference between Effect Sizes.
ERIC Educational Resources Information Center
Lambert, Richard G.; Flowers, Claudia
A special case of the homogeneity of effect size test, as applied to pairwise comparisons of standardized mean differences, was evaluated. Procedures for comparing pairs of pretest to posttest effect sizes, as well as pairs of treatment versus control group effect sizes, were examined. Monte Carlo simulation was used to generate Type I error rates…
Iskow, Rebecca C.; Austermann, Christian; Scharer, Christopher D.; Raj, Towfique; Boss, Jeremy M.; Sunyaev, Shamil; Price, Alkes; Stranger, Barbara; Simon, Viviana; Lee, Charles
2013-01-01
Ancient population structure shaping contemporary genetic variation has been recently appreciated and has important implications regarding our understanding of the structure of modern human genomes. We identified a ∼36-kb DNA segment in the human genome that displays an ancient substructure. The variation at this locus exists primarily as two highly divergent haplogroups. One of these haplogroups (the NE1 haplogroup) aligns with the Neandertal haplotype and contains a 4.6-kb deletion polymorphism in perfect linkage disequilibrium with 12 single nucleotide polymorphisms (SNPs) across diverse populations. The other haplogroup, which does not contain the 4.6-kb deletion, aligns with the chimpanzee haplotype and is likely ancestral. Africans have higher overall pairwise differences with the Neandertal haplotype than Eurasians do for this NE1 locus (p<10−15). Moreover, the nucleotide diversity at this locus is higher in Eurasians than in Africans. These results mimic signatures of recent Neandertal admixture contributing to this locus. However, an in-depth assessment of the variation in this region across multiple populations reveals that African NE1 haplotypes, albeit rare, harbor more sequence variation than NE1 haplotypes found in Europeans, indicating an ancient African origin of this haplogroup and refuting recent Neandertal admixture. Population genetic analyses of the SNPs within each of these haplogroups, along with genome-wide comparisons revealed significant FST (p = 0.00003) and positive Tajima's D (p = 0.00285) statistics, pointing to non-neutral evolution of this locus. The NE1 locus harbors no protein-coding genes, but contains transcribed sequences as well as sequences with putative regulatory function based on bioinformatic predictions and in vitro experiments. We postulate that the variation observed at this locus predates Human–Neandertal divergence and is evolving under balancing selection, especially among European populations. PMID:23593015
Zhu, Fuxiang; Sun, Ying; Wang, Yan; Pan, Hongyu; Wang, Fengting; Zhang, Xianghui; Zhang, Yanhua; Liu, Jinliang
2016-06-04
Turnip mosaic virus (TuMV) infects crops of plant species in the family Brassicaceae worldwide. TuMV isolates were clustered to five lineages corresponding to basal-B, basal-BR, Asian-BR, world-B and OMs. Here, we determined the complete genome sequences of three TuMV basal-BR isolates infecting radish from Shandong and Jilin Provinces in China. Their genomes were all composed of 9833 nucleotides, excluding the 3'-terminal poly(A) tail. They contained two open reading frames (ORFs), with the large one encoding a polyprotein of 3164 amino acids and the small overlapping ORF encoding a PIPO protein of 61 amino acids, which contained the typically conserved motifs found in members of the genus Potyvirus. In pairwise comparison with 30 other TuMV genome sequences, these three isolates shared their highest identities with isolates from Eurasian countries (Germany, Italy, Turkey and China). Recombination analysis showed that the three isolates in this study had no "clear" recombination. The analyses of conserved amino acids changed between groups showed that the codons in the TuMV out group (OGp) and OMs group were the same at three codon sites (852, 1006, 1548), and the other TuMV groups (basal-B, basal-BR, Asian-BR, world-B) were different. This pattern suggests that the codon in the OMs progenitor did not change but that in the other TuMV groups the progenitor sequence did change at divergence. Genetic diversity analyses indicate that the PIPO gene was under the highest selection pressure and the selection pressure on P3N-PIPO and P3 was almost the same. It suggests that most of the selection pressure on P3 was probably imposed through P3N-PIPO.
Yang, Cheng-Hong; Chuang, Li-Yeh; Shih, Tsung-Mu; Chang, Hsueh-Wei
2010-12-17
SAGE (serial analysis of gene expression) is a powerful method of analyzing gene expression for the entire transcriptome. There are currently many well-developed SAGE tools. However, the cross-comparison of different tissues is seldom addressed, thus limiting the identification of common- and tissue-specific tumor markers. To improve the SAGE mining methods, we propose a novel function for cross-tissue comparison of SAGE data by combining the mathematical set theory and logic with a unique "multi-pool method" that analyzes multiple pools of pair-wise case controls individually. When all the settings are in "inclusion", the common SAGE tag sequences are mined. When one tissue type is in "inclusion" and the other types of tissues are not in "inclusion", the selected tissue-specific SAGE tag sequences are generated. They are displayed in tags-per-million (TPM) and fold values, as well as visually displayed in four kinds of scales in a color gradient pattern. In the fold visualization display, the top scores of the SAGE tag sequences are provided, along with cluster plots. A user-defined matrix file is designed for cross-tissue comparison by selecting libraries from publically available databases or user-defined libraries. The hSAGEing tool provides a combination of friendly cross-tissue analysis and an interface for comparing SAGE libraries for the first time. Some up- or down-regulated genes with tissue-specific or common tumor markers and suppressors are identified computationally. The tool is useful and convenient for in silico cancer transcriptomic studies and is freely available at http://bio.kuas.edu.tw/hSAGEing.
Figuerola, Eva L M; Guerrero, Leandro D; Türkowsky, Dominique; Wall, Luis G; Erijman, Leonardo
2015-03-01
The goal of this study was to investigate the spatial turnover of soil bacterial communities in response to environmental changes introduced by the practices of soybean monoculture or crop rotations, relative to grassland soils. Amplicon sequencing of the 16S rRNA gene was used to analyse bacterial diversity in producer fields through three successive cropping cycles within one and a half years, across a regional scale of the Argentinean Pampas. Unlike local diversity, which was not significantly affected by land use type, agricultural management had a strong influence on β-diversity patterns. Distributions of pairwise distances between all soils samples under soybean monoculture had significantly lower β-diversity and narrower breadth compared with distributions of pairwise distances between soils managed with crop rotation. Interestingly, good agricultural practices had similar degree of β-diversity as natural grasslands. The higher phylogenetic relatedness of bacterial communities in soils under monoculture across the region was likely determined by the observed loss of endemic species, and affected mostly to phyla with low regional diversity, such as Acidobacteria, Verrucomicrobia and the candidates phyla SPAM and WS3. These results suggest that the implementation of good agricultural practices, including crop rotation, may be critical for the long-term conservation of soil biodiversity. © 2014 Society for Applied Microbiology and John Wiley & Sons Ltd.
Yang, Xian-Ming; Sun, Jing-Tao; Xue, Xiao-Feng; Zhu, Wen-Chao; Hong, Xiao-Yue
2012-01-01
The western flower thrips, Frankliniella occidentalis (Pergande), is an invasive species and the most economically important pest within the insect order Thysanoptera. For a better understanding of the genetic makeup and migration patterns of F. occidentalis throughout the world, we characterized 18 novel polymorphic EST-derived microsatellites. The mutational mechanism of these EST-SSRs was also investigated to facilitate the selection of appropriate combinations of markers for population genetic studies. Genetic diversity of these novel markers was assessed in 96 individuals from three populations in China (Harbin, Dali, and Guiyang). The results showed that all these 18 loci were highly polymorphic; the number of alleles ranged from 2 to 15, with an average of 5.50 alleles per locus. The observed (HO) and expected (HE) heterozygosities ranged from 0.072 to 0.707 and 0.089 to 0.851, respectively. Furthermore, only two locus/population combinations (WFT144 in Dali and WFT50 in Guiyang) significantly deviated from Hardy–Weinberg equilibrium (HWE). Pairwise FST analysis showed a low but significant differentiation (0.026 < FST < 0.032) among all three pairwise population comparisons. Sequence analysis of alleles per locus revealed a complex mutational pattern of these EST-SSRs. Thus, these EST-SSRs are useful markers but greater attention should be paid to the mutational characteristics of these microsatellites when they are used in population genetic studies. PMID:22489130
Yang, Xian-Ming; Sun, Jing-Tao; Xue, Xiao-Feng; Zhu, Wen-Chao; Hong, Xiao-Yue
2012-01-01
The western flower thrips, Frankliniella occidentalis (Pergande), is an invasive species and the most economically important pest within the insect order Thysanoptera. For a better understanding of the genetic makeup and migration patterns of F. occidentalis throughout the world, we characterized 18 novel polymorphic EST-derived microsatellites. The mutational mechanism of these EST-SSRs was also investigated to facilitate the selection of appropriate combinations of markers for population genetic studies. Genetic diversity of these novel markers was assessed in 96 individuals from three populations in China (Harbin, Dali, and Guiyang). The results showed that all these 18 loci were highly polymorphic; the number of alleles ranged from 2 to 15, with an average of 5.50 alleles per locus. The observed (H(O)) and expected (H(E)) heterozygosities ranged from 0.072 to 0.707 and 0.089 to 0.851, respectively. Furthermore, only two locus/population combinations (WFT144 in Dali and WFT50 in Guiyang) significantly deviated from Hardy-Weinberg equilibrium (HWE). Pairwise F(ST) analysis showed a low but significant differentiation (0.026 < F(ST) < 0.032) among all three pairwise population comparisons. Sequence analysis of alleles per locus revealed a complex mutational pattern of these EST-SSRs. Thus, these EST-SSRs are useful markers but greater attention should be paid to the mutational characteristics of these microsatellites when they are used in population genetic studies.
MultiSETTER: web server for multiple RNA structure comparison.
Čech, Petr; Hoksza, David; Svozil, Daniel
2015-08-12
Understanding the architecture and function of RNA molecules requires methods for comparing and analyzing their tertiary and quaternary structures. While structural superposition of short RNAs is achievable in a reasonable time, large structures represent much bigger challenge. Therefore, we have developed a fast and accurate algorithm for RNA pairwise structure superposition called SETTER and implemented it in the SETTER web server. However, though biological relationships can be inferred by a pairwise structure alignment, key features preserved by evolution can be identified only from a multiple structure alignment. Thus, we extended the SETTER algorithm to the alignment of multiple RNA structures and developed the MultiSETTER algorithm. In this paper, we present the updated version of the SETTER web server that implements a user friendly interface to the MultiSETTER algorithm. The server accepts RNA structures either as the list of PDB IDs or as user-defined PDB files. After the superposition is computed, structures are visualized in 3D and several reports and statistics are generated. To the best of our knowledge, the MultiSETTER web server is the first publicly available tool for a multiple RNA structure alignment. The MultiSETTER server offers the visual inspection of an alignment in 3D space which may reveal structural and functional relationships not captured by other multiple alignment methods based either on a sequence or on secondary structure motifs.
Aguirre, I M; Fuentes, R; Celedón, M O
2014-01-31
Llamas and alpacas are domesticated South American camelids (SACs) important to ancestral population in the Altiplano region, and to different communities worldwide where they have been introduced. These ungulates have shown to be susceptible to several livestock viral pathogens such as members of the Pestivirus genus, in particular Bovine Viral Diarrhea (BVDV), but there is little data available on Pestivirus infections in SACs. In this study we aimed to detect and identify Pestivirus genotypes and subgroups infecting SACs in both wild and confined environments. Samples were collected from 136 llamas and 30 alpacas from different areas in the Chilean Altiplano (wild animals), and from 22 llamas and 26 alpacas diagnosed as Pestivirus positive from the Metropolitana region in Chile (confined animals). Seroneutralization tests showed titers lower than 2 in all 166 samples from Chilean Altiplano. These samples were also negative to BVDV isolation, indicating that these animals have not been exposed to Pestivirus. After reactivation of positive samples from the Metropolitana region, the 5' non-codifying region (5'NCR) and E2 glycoprotein were amplified by RT-PCR from the Pestivirus genome. Viral sequences were pairwise compared and phylogenetic trees were constructed. The 5'NCR analysis showed that all 12 sequenced isolates belonged to BVDV-1. Of particular interest, isolates from eight llama and two alpaca were BVDV-1j and two alpacas were BVDV-1b. In agreement with these results, E2 phylogenetic analysis rendered a similar grouping indicating that all 16 isolates belong to BVDV-1. However, the lower availability of E2 sequences determines the creation of a smaller number of sub-groups than the 5'NCR sequences. Based on the E2 sequences, the 5'NCR BVDV 1j group consisting of all the llamas and 3 alpacas are completely included in the E2 BVDV 1e group. Due to the universal availability of the 5'NCR segment, we propose the classification of these Chilean llamas and alpacas Pestivirus isolates as BVDV 1j and BVDV 1b respectively. Thus, this is the first time BVDV-1j is obtained in SACs. In addition, these results indicate Pestivirus infection in llamas and alpacas is associated with bovine population as genotypes and sub-groups are the same as those affecting Chilean livestock. Copyright © 2013 Elsevier B.V. All rights reserved.
Galpert, Deborah; Fernández, Alberto; Herrera, Francisco; Antunes, Agostinho; Molina-Ruiz, Reinaldo; Agüero-Chapin, Guillermin
2018-05-03
The development of new ortholog detection algorithms and the improvement of existing ones are of major importance in functional genomics. We have previously introduced a successful supervised pairwise ortholog classification approach implemented in a big data platform that considered several pairwise protein features and the low ortholog pair ratios found between two annotated proteomes (Galpert, D et al., BioMed Research International, 2015). The supervised models were built and tested using a Saccharomycete yeast benchmark dataset proposed by Salichos and Rokas (2011). Despite several pairwise protein features being combined in a supervised big data approach; they all, to some extent were alignment-based features and the proposed algorithms were evaluated on a unique test set. Here, we aim to evaluate the impact of alignment-free features on the performance of supervised models implemented in the Spark big data platform for pairwise ortholog detection in several related yeast proteomes. The Spark Random Forest and Decision Trees with oversampling and undersampling techniques, and built with only alignment-based similarity measures or combined with several alignment-free pairwise protein features showed the highest classification performance for ortholog detection in three yeast proteome pairs. Although such supervised approaches outperformed traditional methods, there were no significant differences between the exclusive use of alignment-based similarity measures and their combination with alignment-free features, even within the twilight zone of the studied proteomes. Just when alignment-based and alignment-free features were combined in Spark Decision Trees with imbalance management, a higher success rate (98.71%) within the twilight zone could be achieved for a yeast proteome pair that underwent a whole genome duplication. The feature selection study showed that alignment-based features were top-ranked for the best classifiers while the runners-up were alignment-free features related to amino acid composition. The incorporation of alignment-free features in supervised big data models did not significantly improve ortholog detection in yeast proteomes regarding the classification qualities achieved with just alignment-based similarity measures. However, the similarity of their classification performance to that of traditional ortholog detection methods encourages the evaluation of other alignment-free protein pair descriptors in future research.
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.
Wang, Sheng; Sun, Siqi; Li, Zhen; Zhang, Renyu; Xu, Jinbo
2017-01-01
Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. http://raptorx.uchicago.edu/ContactMap/.
Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model
Li, Zhen; Zhang, Renyu
2017-01-01
Motivation Protein contacts contain key information for the understanding of protein structure and function and thus, contact prediction from sequence is an important problem. Recently exciting progress has been made on this problem, but the predicted contacts for proteins without many sequence homologs is still of low quality and not very useful for de novo structure prediction. Method This paper presents a new deep learning method that predicts contacts by integrating both evolutionary coupling (EC) and sequence conservation information through an ultra-deep neural network formed by two deep residual neural networks. The first residual network conducts a series of 1-dimensional convolutional transformation of sequential features; the second residual network conducts a series of 2-dimensional convolutional transformation of pairwise information including output of the first residual network, EC information and pairwise potential. By using very deep residual networks, we can accurately model contact occurrence patterns and complex sequence-structure relationship and thus, obtain higher-quality contact prediction regardless of how many sequence homologs are available for proteins in question. Results Our method greatly outperforms existing methods and leads to much more accurate contact-assisted folding. Tested on 105 CASP11 targets, 76 past CAMEO hard targets, and 398 membrane proteins, the average top L long-range prediction accuracy obtained by our method, one representative EC method CCMpred and the CASP11 winner MetaPSICOV is 0.47, 0.21 and 0.30, respectively; the average top L/10 long-range accuracy of our method, CCMpred and MetaPSICOV is 0.77, 0.47 and 0.59, respectively. Ab initio folding using our predicted contacts as restraints but without any force fields can yield correct folds (i.e., TMscore>0.6) for 203 of the 579 test proteins, while that using MetaPSICOV- and CCMpred-predicted contacts can do so for only 79 and 62 of them, respectively. Our contact-assisted models also have much better quality than template-based models especially for membrane proteins. The 3D models built from our contact prediction have TMscore>0.5 for 208 of the 398 membrane proteins, while those from homology modeling have TMscore>0.5 for only 10 of them. Further, even if trained mostly by soluble proteins, our deep learning method works very well on membrane proteins. In the recent blind CAMEO benchmark, our fully-automated web server implementing this method successfully folded 6 targets with a new fold and only 0.3L-2.3L effective sequence homologs, including one β protein of 182 residues, one α+β protein of 125 residues, one α protein of 140 residues, one α protein of 217 residues, one α/β of 260 residues and one α protein of 462 residues. Our method also achieved the highest F1 score on free-modeling targets in the latest CASP (Critical Assessment of Structure Prediction), although it was not fully implemented back then. Availability http://raptorx.uchicago.edu/ContactMap/ PMID:28056090
Building dynamic population graph for accurate correspondence detection.
Du, Shaoyi; Guo, Yanrong; Sanroma, Gerard; Ni, Dong; Wu, Guorong; Shen, Dinggang
2015-12-01
In medical imaging studies, there is an increasing trend for discovering the intrinsic anatomical difference across individual subjects in a dataset, such as hand images for skeletal bone age estimation. Pair-wise matching is often used to detect correspondences between each individual subject and a pre-selected model image with manually-placed landmarks. However, the large anatomical variability across individual subjects can easily compromise such pair-wise matching step. In this paper, we present a new framework to simultaneously detect correspondences among a population of individual subjects, by propagating all manually-placed landmarks from a small set of model images through a dynamically constructed image graph. Specifically, we first establish graph links between models and individual subjects according to pair-wise shape similarity (called as forward step). Next, we detect correspondences for the individual subjects with direct links to any of model images, which is achieved by a new multi-model correspondence detection approach based on our recently-published sparse point matching method. To correct those inaccurate correspondences, we further apply an error detection mechanism to automatically detect wrong correspondences and then update the image graph accordingly (called as backward step). After that, all subject images with detected correspondences are included into the set of model images, and the above two steps of graph expansion and error correction are repeated until accurate correspondences for all subject images are established. Evaluations on real hand X-ray images demonstrate that our proposed method using a dynamic graph construction approach can achieve much higher accuracy and robustness, when compared with the state-of-the-art pair-wise correspondence detection methods as well as a similar method but using static population graph. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Chella, Federico; Pizzella, Vittorio; Zappasodi, Filippo; Nolte, Guido; Marzetti, Laura
2016-05-01
Brain cognitive functions arise through the coordinated activity of several brain regions, which actually form complex dynamical systems operating at multiple frequencies. These systems often consist of interacting subsystems, whose characterization is of importance for a complete understanding of the brain interaction processes. To address this issue, we present a technique, namely the bispectral pairwise interacting source analysis (biPISA), for analyzing systems of cross-frequency interacting brain sources when multichannel electroencephalographic (EEG) or magnetoencephalographic (MEG) data are available. Specifically, the biPISA makes it possible to identify one or many subsystems of cross-frequency interacting sources by decomposing the antisymmetric components of the cross-bispectra between EEG or MEG signals, based on the assumption that interactions are pairwise. Thanks to the properties of the antisymmetric components of the cross-bispectra, biPISA is also robust to spurious interactions arising from mixing artifacts, i.e., volume conduction or field spread, which always affect EEG or MEG functional connectivity estimates. This method is an extension of the pairwise interacting source analysis (PISA), which was originally introduced for investigating interactions at the same frequency, to the study of cross-frequency interactions. The effectiveness of this approach is demonstrated in simulations for up to three interacting source pairs and for real MEG recordings of spontaneous brain activity. Simulations show that the performances of biPISA in estimating the phase difference between the interacting sources are affected by the increasing level of noise rather than by the number of the interacting subsystems. The analysis of real MEG data reveals an interaction between two pairs of sources of central mu and beta rhythms, localizing in the proximity of the left and right central sulci.
Generalized priority-queue network dynamics: Impact of team and hierarchy
NASA Astrophysics Data System (ADS)
Cho, Won-Kuk; Min, Byungjoon; Goh, K.-I.; Kim, I.-M.
2010-06-01
We study the effect of team and hierarchy on the waiting-time dynamics of priority-queue networks. To this end, we introduce generalized priority-queue network models incorporating interaction rules based on team-execution and hierarchy in decision making, respectively. It is numerically found that the waiting-time distribution exhibits a power law for long waiting times in both cases, yet with different exponents depending on the team size and the position of queue nodes in the hierarchy, respectively. The observed power-law behaviors have in many cases a corresponding single or pairwise-interacting queue dynamics, suggesting that the pairwise interaction may constitute a major dynamic consequence in the priority-queue networks. It is also found that the reciprocity of influence is a relevant factor for the priority-queue network dynamics.
Speranskaya, Anna S; Krinitsina, Anastasia A; Kudryavtseva, Anna V; Poltronieri, Palmiro; Santino, Angelo; Oparina, Nina Y; Dmitriev, Alexey A; Belenikin, Maxim S; Guseva, Marina A; Shevelev, Alexei B
2012-08-01
The group of Kunitz-type protease inhibitors (KPI) from potato is encoded by a polymorphic family of multiple allelic and non-allelic genes. The previous explanations of the KPI variability were based on the hypothesis of random mutagenesis as a key factor of KPI polymorphism. KPI-A genes from the genomes of Solanum tuberosum cv. Istrinskii and the wild species Solanum palustre were amplified by PCR with subsequent cloning in plasmids. True KPI sequences were derived from comparison of the cloned copies. "Hot spots" of recombination in KPI genes were independently identified by DnaSP 4.0 and TOPALi v2.5 software. The KPI-A sequence from potato cv. Istrinskii was found to be 100% identical to the gene from Solanum nigrum. This fact illustrates a high degree of similarity of KPI genes in the genus Solanum. Pairwise comparison of KPI A and B genes unambiguously showed a non-uniform extent of polymorphism at different nt positions. Moreover, the occurrence of substitutions was not random along the strand. Taken together, these facts contradict the traditional hypothesis of random mutagenesis as a principal source of KPI gene polymorphism. The experimentally found mosaic structure of KPI genes in both plants studied is consistent with the hypothesis suggesting recombination of ancestral genes. The same mechanism was proposed earlier for other resistance-conferring genes in the nightshade family (Solanaceae). Based on the data obtained, we searched for potential motifs of site-specific binding with plant DNA recombinases. During this work, we analyzed the sequencing data reported by the Potato Genome Sequencing Consortium (PGSC), 2011 and found considerable inconsistence of their data concerning the number, location, and orientation of KPI genes of groups A and B. The key role of recombination rather than random point mutagenesis in KPI polymorphism was demonstrated for the first time. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
2013-01-01
Background Rapid and reliable identification of quarantine pests is essential for plant inspection services to prevent introduction of invasive species. For insects, this may be a serious problem when dealing with morphologically similar cryptic species complexes and early developmental stages that lack distinctive characters useful for taxonomic identification. DNA based barcoding could solve many of these problems. The standard barcode fragment, an approx. 650 base pairs long sequence of the 5′end of the mitochondrial cytochrome oxidase I (COI), enables differentiation of a very wide range of arthropods. However, problems remain in some taxa, such as Tephritidae, where recent genetic differentiation among some of the described species hinders accurate molecular discrimination. Results In order to explore the full species discrimination potential of COI, we sequenced the barcoding region of the COI gene of a range of economically important Tephritid species and complemented these data with all GenBank and BOLD entries for the systematic group available as of January 2012. We explored the limits of species delimitation of this barcode fragment among 193 putative Tephritid species and established operational taxonomic units (OTUs), between which discrimination is reliably possible. Furthermore, to enable future development of rapid diagnostic assays based on this sequence information, we characterized all single nucleotide polymorphisms (SNPs) and established “near-minimal” sets of SNPs that differentiate among all included OTUs with at least three and four SNPs, respectively. Conclusions We found that although several species cannot be differentiated based on the genetic diversity observed in COI and hence form composite OTUs, 85% of all OTUs correspond to described species. Because our SNP panels are developed based on all currently available sequence information and rely on a minimal pairwise difference of three SNPs, they are highly reliable and hence represent an important resource for developing taxon-specific diagnostic assays. For selected cases, possible explanations that may cause composite OTUs are discussed. PMID:23718854
Shen, Xing-Xing; Salichos, Leonidas; Rokas, Antonis
2016-09-02
Molecular phylogenetic inference is inherently dependent on choices in both methodology and data. Many insightful studies have shown how choices in methodology, such as the model of sequence evolution or optimality criterion used, can strongly influence inference. In contrast, much less is known about the impact of choices in the properties of the data, typically genes, on phylogenetic inference. We investigated the relationships between 52 gene properties (24 sequence-based, 19 function-based, and 9 tree-based) with each other and with three measures of phylogenetic signal in two assembled data sets of 2,832 yeast and 2,002 mammalian genes. We found that most gene properties, such as evolutionary rate (measured through the percent average of pairwise identity across taxa) and total tree length, were highly correlated with each other. Similarly, several gene properties, such as gene alignment length, Guanine-Cytosine content, and the proportion of tree distance on internal branches divided by relative composition variability (treeness/RCV), were strongly correlated with phylogenetic signal. Analysis of partial correlations between gene properties and phylogenetic signal in which gene evolutionary rate and alignment length were simultaneously controlled, showed similar patterns of correlations, albeit weaker in strength. Examination of the relative importance of each gene property on phylogenetic signal identified gene alignment length, alongside with number of parsimony-informative sites and variable sites, as the most important predictors. Interestingly, the subsets of gene properties that optimally predicted phylogenetic signal differed considerably across our three phylogenetic measures and two data sets; however, gene alignment length and RCV were consistently included as predictors of all three phylogenetic measures in both yeasts and mammals. These results suggest that a handful of sequence-based gene properties are reliable predictors of phylogenetic signal and could be useful in guiding the choice of phylogenetic markers. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Ito, Mika; Kuroda, Moegi; Masuda, Tsuneyuki; Akagami, Masataka; Haga, Kei; Tsuchiaka, Shinobu; Kishimoto, Mai; Naoi, Yuki; Sano, Kaori; Omatsu, Tsutomu; Katayama, Yukie; Oba, Mami; Aoki, Hiroshi; Ichimaru, Toru; Mukono, Itsuro; Ouchi, Yoshinao; Yamasato, Hiroshi; Shirai, Junsuke; Katayama, Kazuhiko; Mizutani, Tetsuya; Nagai, Makoto
2017-06-01
Porcine astroviruses (PoAstVs) are ubiquitous enteric virus of pigs that are distributed in several countries throughout the world. Since PoAstVs are detected in apparent healthy pigs, the clinical significance of infection is unknown. However, AstVs have recently been associated with a severe neurological disorder in animals, including humans, and zoonotic potential has been suggested. To date, little is known about the epidemiology of PoAstVs among the pig population in Japan. In this report, we present an analysis of nearly complete genomes of 36 PoAstVs detected by a metagenomics approach in the feces of Japanese pigs. Based on a phylogenetic analysis and pairwise sequence comparison, 10, 5, 15, and 6 sequences were classified as PoAstV2, PoAstV3, PoAstV4, and PoAstV5, respectively. Co-infection with two or three strains was found in individual fecal samples from eight pigs. The phylogenetic trees of ORF1a, ORF1b, and ORF2 of PoAstV2 and PoAstV4 showed differences in their topologies. The PoAstV3 and PoAstV5 strains shared high sequence identities within each genotype in all ORFs; however, one PoAstV3 strain and one PoAstV5 strain showed considerable sequence divergence from the other PoAstV3 and PoAstV5 strains, respectively, in ORF2. Recombination analysis using whole genomes revealed evidence of multiple possible intra-genotype recombination events in PoAstV2 and PoAstV4, suggesting that recombination might have contributed to the genetic diversity and played an important role in the evolution of Japanese PoAstVs. Copyright © 2017 Elsevier B.V. All rights reserved.
Sechovcová, Hana; Killer, Jiri; Pechar, Radko; Geigerová, Martina; Švejstil, Roman; Salmonová, Hana; Mekadim, Chahrazed; Rada, Vojtěch; Vlková, Eva; Kofroňová, Olga; Benada, Oldřich
2017-08-01
A slightly irregular, short rod-shaped bacterial strain, MOZIV/2T, showing activity of fructose 6-phosphate phosphoketolase was isolated from the oral cavity of a home-bred guinea-pig. Based on comparative 16S rRNA gene sequence analyses, its closest relatives were Alloscardovia omnicolens DSM 21503T and Alloscardovia criceti DSM 17774T with 96.0 and 95.6 % pairwise similarities, respectively. Completeness of the compared sequences was 97.3 and 96.9 %, respectively. Growth was found only under anaerobic conditions. Activities of α- and β-gluco(galacto)sidases were detected in strain MOZIV/2T, which is characteristic for almost all members of the family Bifidobacteriaceae. Sequencing of other molecular markers (fusA, gyrB and xfp) revealed low gene sequence similarities to A. omnicolens DSM 21503T ranging from 72.7 to 87.5 %. Strain MOZIV/2T differed from other species within the genus Alloscardovia by the presence of C18 : 1ω9t. In addition, much higher proportions of C8 : 0, C11 : 0, C12 : 0, C14 : 1, C16 : 1 and C17 : 0 fatty acids were found in cells of strain MOZIV/2T. The peptidoglycan structure was of type A4α [l-Lys(l-Orn)-d-Asp], which is consistent with its classification within the genus Alloscardovia. The DNA G+C content (45.8 mol%) was lower than those found in other alloscardovia. Phylogenetic studies and evaluation of phenotypic characteristics including the results of biochemical, physiological and chemotaxonomic analyses confirmed the novel species status for strain MOZIV/2T, for which the name Alloscardovia venturai sp. nov. is proposed. The type strain is MOZIV/2T (=DSM 100237T=CCM 8604T=LMG 28781T).
How to Choose the Suitable Template for Homology Modelling of GPCRs: 5-HT7 Receptor as a Test Case.
Shahaf, Nir; Pappalardo, Matteo; Basile, Livia; Guccione, Salvatore; Rayan, Anwar
2016-09-01
G protein-coupled receptors (GPCRs) are a super-family of membrane proteins that attract great pharmaceutical interest due to their involvement in almost every physiological activity, including extracellular stimuli, neurotransmission, and hormone regulation. Currently, structural information on many GPCRs is mainly obtained by the techniques of computer modelling in general and by homology modelling in particular. Based on a quantitative analysis of eighteen antagonist-bound, resolved structures of rhodopsin family "A" receptors - also used as templates to build 153 homology models - it was concluded that a higher sequence identity between two receptors does not guarantee a lower RMSD between their structures, especially when their pair-wise sequence identity (within trans-membrane domain and/or in binding pocket) lies between 25 % and 40 %. This study suggests that we should consider all template receptors having a sequence identity ≤50 % with the query receptor. In fact, most of the GPCRs, compared to the currently available resolved structures of GPCRs, fall within this range and lack a correlation between structure and sequence. When testing suitability for structure-based drug design, it was found that choosing as a template the most similar resolved protein, based on sequence resemblance only, led to unsound results in many cases. Molecular docking analyses were carried out, and enrichment factors as well as attrition rates were utilized as criteria for assessing suitability for structure-based drug design. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Population genetic implications from sequence variation in four Y chromosome genes.
Shen, P; Wang, F; Underhill, P A; Franco, C; Yang, W H; Roxas, A; Sung, R; Lin, A A; Hyman, R W; Vollrath, D; Davis, R W; Cavalli-Sforza, L L; Oefner, P J
2000-06-20
Some insight into human evolution has been gained from the sequencing of four Y chromosome genes. Primary genomic sequencing determined gene SMCY to be composed of 27 exons that comprise 4,620 bp of coding sequence. The unfinished sequencing of the 5' portion of gene UTY1 was completed by primer walking, and a total of 20 exons were found. By using denaturing HPLC, these two genes, as well as DBY and DFFRY, were screened for polymorphic sites in 53-72 representatives of the five continents. A total of 98 variants were found, yielding nucleotide diversity estimates of 2.45 x 10(-5), 5. 07 x 10(-5), and 8.54 x 10(-5) for the coding regions of SMCY, DFFRY, and UTY1, respectively, with no variant having been observed in DBY. In agreement with most autosomal genes, diversity estimates for the noncoding regions were about 2- to 3-fold higher and ranged from 9. 16 x 10(-5) to 14.2 x 10(-5) for the four genes. Analysis of the frequencies of derived alleles for all four genes showed that they more closely fit the expectation of a Luria-Delbrück distribution than a distribution expected under a constant population size model, providing evidence for exponential population growth. Pairwise nucleotide mismatch distributions date the occurrence of population expansion to approximately 28,000 years ago. This estimate is in accord with the spread of Aurignacian technology and the disappearance of the Neanderthals.
Complete Chloroplast Genome of the Wollemi Pine (Wollemia nobilis): Structure and Evolution
Yap, Jia-Yee S.; Rohner, Thore; Greenfield, Abigail; Van Der Merwe, Marlien; McPherson, Hannah; Glenn, Wendy; Kornfeld, Geoff; Marendy, Elessa; Pan, Annie Y. H.; Wilkins, Marc R.; Rossetto, Maurizio; Delaney, Sven K.
2015-01-01
The Wollemi pine (Wollemia nobilis) is a rare Southern conifer with striking morphological similarity to fossil pines. A small population of W. nobilis was discovered in 1994 in a remote canyon system in the Wollemi National Park (near Sydney, Australia). This population contains fewer than 100 individuals and is critically endangered. Previous genetic studies of the Wollemi pine have investigated its evolutionary relationship with other pines in the family Araucariaceae, and have suggested that the Wollemi pine genome contains little or no variation. However, these studies were performed prior to the widespread use of genome sequencing, and their conclusions were based on a limited fraction of the Wollemi pine genome. In this study, we address this problem by determining the entire sequence of the W. nobilis chloroplast genome. A detailed analysis of the structure of the genome is presented, and the evolution of the genome is inferred by comparison with the chloroplast sequences of other members of the Araucariaceae and the related family Podocarpaceae. Pairwise alignments of whole genome sequences, and the presence of unique pseudogenes, gene duplications and insertions in W. nobilis and Araucariaceae, indicate that the W. nobilis chloroplast genome is most similar to that of its sister taxon Agathis. However, the W. nobilis genome contains an unusually high number of repetitive sequences, and these could be used in future studies to investigate and conserve any remnant genetic diversity in the Wollemi pine. PMID:26061691
Xia, Xia-Yu; Ge, Meng; Hsi, Jenny H; He, Xiang; Ruan, Yu-Hua; Wang, Zhi-Xin; Shao, Yi-Ming; Pan, Xian-Ming
2014-01-01
Accurate estimates of HIV-1 incidence are essential for monitoring epidemic trends and evaluating intervention efforts. However, the long asymptomatic stage of HIV-1 infection makes it difficult to effectively distinguish incident infections from chronic ones. Current incidence assays based on serology or viral sequence diversity are both still lacking in accuracy. In the present work, a sequence clustering based diversity (SCBD) assay was devised by utilizing the fact that viral sequences derived from each transmitted/founder (T/F) strain tend to cluster together at early stage, and that only the intra-cluster diversity is correlated with the time since HIV-1 infection. The dot-matrix pairwise alignment was used to eliminate the disproportional impact of insertion/deletions (indels) and recombination events, and so was the proportion of clusterable sequences (Pc) as an index to identify late chronic infections with declined viral genetic diversity. Tested on a dataset containing 398 incident and 163 chronic infection cases collected from the Los Alamos HIV database (last modified 2/8/2012), our SCBD method achieved 99.5% sensitivity and 98.8% specificity, with an overall accuracy of 99.3%. Further analysis and evaluation also suggested its performance was not affected by host factors such as the viral subtypes and transmission routes. The SCBD method demonstrated the potential of sequencing based techniques to become useful for identifying incident infections. Its use may be most advantageous for settings with low to moderate incidence relative to available resources. The online service is available at http://www.bioinfo.tsinghua.edu.cn:8080/SCBD/index.jsp.
The Gap Procedure: for the identification of phylogenetic clusters in HIV-1 sequence data.
Vrbik, Irene; Stephens, David A; Roger, Michel; Brenner, Bluma G
2015-11-04
In the context of infectious disease, sequence clustering can be used to provide important insights into the dynamics of transmission. Cluster analysis is usually performed using a phylogenetic approach whereby clusters are assigned on the basis of sufficiently small genetic distances and high bootstrap support (or posterior probabilities). The computational burden involved in this phylogenetic threshold approach is a major drawback, especially when a large number of sequences are being considered. In addition, this method requires a skilled user to specify the appropriate threshold values which may vary widely depending on the application. This paper presents the Gap Procedure, a distance-based clustering algorithm for the classification of DNA sequences sampled from individuals infected with the human immunodeficiency virus type 1 (HIV-1). Our heuristic algorithm bypasses the need for phylogenetic reconstruction, thereby supporting the quick analysis of large genetic data sets. Moreover, this fully automated procedure relies on data-driven gaps in sorted pairwise distances to infer clusters, thus no user-specified threshold values are required. The clustering results obtained by the Gap Procedure on both real and simulated data, closely agree with those found using the threshold approach, while only requiring a fraction of the time to complete the analysis. Apart from the dramatic gains in computational time, the Gap Procedure is highly effective in finding distinct groups of genetically similar sequences and obviates the need for subjective user-specified values. The clusters of genetically similar sequences returned by this procedure can be used to detect patterns in HIV-1 transmission and thereby aid in the prevention, treatment and containment of the disease.
Yokoyama, Eiji; Hirai, Shinichiro; Ishige, Taichiro; Murakami, Satoshi
2018-01-02
Seventeen clusters of Shiga toxin-producing Escherichia coli O157:H7/- (O157) strains, determined by cluster analysis of pulsed-field gel electrophoresis patterns, were analyzed using whole genome sequence (WGS) data to investigate this pathogen's molecular epidemiology. The 17 clusters included 136 strains containing strains from nine outbreaks, with each outbreak caused by a single source contaminated with the organism, as shown by epidemiological contact surveys. WGS data of these strains were used to identify single nucleotide polymorphisms (SNPs) by two methods: short read data were directly mapped to a reference genome (mapping derived SNPs) and common SNPs between the mapping derived SNPs and SNPs in assembled data of short read data (common SNPs). Among both SNPs, those that were detected in genes with a gap were excluded to remove ambiguous SNPs from further analysis. The effectiveness of both SNPs was investigated among all the concatenated SNPs that were detected (whole SNP set); SNPs were divided into three categories based on the genes in which they were located (i.e., backbone SNP set, O-island SNP set, and mobile element SNP set); and SNPs in non-coding regions (intergenic region SNP set). When SNPs from strains isolated from the nine single source derived outbreaks were analyzed using an unweighted pair group method with arithmetic mean tree (UPGMA) and a minimum spanning tree (MST), the maximum pair-wise distances of the backbone SNP set of the mapping derived SNPs were significantly smaller than those of the whole and intergenic region SNP set on both UPGMAs and MSTs. This significant difference was also observed when the backbone SNP set of the common SNPs were examined (Steel-Dwass test, P≤0.01). When the maximum pair-wise distances were compared between the mapping derived and common SNPs, significant differences were observed in those of the whole, mobile element, and intergenic region SNP set (Wilcoxon signed rank test, P≤0.01). When all the strains included in one complex on an MST or one cluster on a UPGMA were designated as the same genotype, the values of the Hunter-Gaston Discriminatory Power Index for the backbone SNP set of the mapping derived and common SNPs were higher than those of other SNP sets. In contrast, the mobile element SNP set could not robustly subdivide lineage I strains of tested O157 strains using both the mapping derived and common SNPs. These results suggested that the backbone SNP set were the most effective for analysis of WGS data for O157 in enabling an appropriation of its molecular epidemiology. Copyright © 2017 Elsevier B.V. All rights reserved.
How Good Are Statistical Models at Approximating Complex Fitness Landscapes?
du Plessis, Louis; Leventhal, Gabriel E.; Bonhoeffer, Sebastian
2016-01-01
Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations. PMID:27189564
DNA-based watermarks using the DNA-Crypt algorithm.
Heider, Dominik; Barnekow, Angelika
2007-05-29
The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms.
DNA-based watermarks using the DNA-Crypt algorithm
Heider, Dominik; Barnekow, Angelika
2007-01-01
Background The aim of this paper is to demonstrate the application of watermarks based on DNA sequences to identify the unauthorized use of genetically modified organisms (GMOs) protected by patents. Predicted mutations in the genome can be corrected by the DNA-Crypt program leaving the encrypted information intact. Existing DNA cryptographic and steganographic algorithms use synthetic DNA sequences to store binary information however, although these sequences can be used for authentication, they may change the target DNA sequence when introduced into living organisms. Results The DNA-Crypt algorithm and image steganography are based on the same watermark-hiding principle, namely using the least significant base in case of DNA-Crypt and the least significant bit in case of the image steganography. It can be combined with binary encryption algorithms like AES, RSA or Blowfish. DNA-Crypt is able to correct mutations in the target DNA with several mutation correction codes such as the Hamming-code or the WDH-code. Mutations which can occur infrequently may destroy the encrypted information, however an integrated fuzzy controller decides on a set of heuristics based on three input dimensions, and recommends whether or not to use a correction code. These three input dimensions are the length of the sequence, the individual mutation rate and the stability over time, which is represented by the number of generations. In silico experiments using the Ypt7 in Saccharomyces cerevisiae shows that the DNA watermarks produced by DNA-Crypt do not alter the translation of mRNA into protein. Conclusion The program is able to store watermarks in living organisms and can maintain the original information by correcting mutations itself. Pairwise or multiple sequence alignments show that DNA-Crypt produces few mismatches between the sequences similar to all steganographic algorithms. PMID:17535434
Jones, David T; Kandathil, Shaun M
2018-04-26
In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. DeepCov is freely available at https://github.com/psipred/DeepCov. d.t.jones@ucl.ac.uk.
Montangie, Lisandro; Montani, Fernando
2016-10-01
Spike correlations among neurons are widely encountered in the brain. Although models accounting for pairwise interactions have proved able to capture some of the most important features of population activity at the level of the retina, the evidence shows that pairwise neuronal correlation analysis does not resolve cooperative population dynamics by itself. By means of a series expansion for short time scales of the mutual information conveyed by a population of neurons, the information transmission can be broken down into firing rate and correlational components. In a proposed extension of this framework, we investigate the information components considering both second- and higher-order correlations. We show that the existence of a mixed stimulus-dependent correlation term defines a new scenario for the interplay between pairwise and higher-than-pairwise interactions in noise and signal correlations that would lead either to redundancy or synergy in the information-theoretic sense.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tartakovsky, Alexandre M.; Panchenko, Alexander
2016-01-01
We present a novel formulation of the Pairwise Force Smoothed Particle Hydrodynamics Model (PF-SPH) and use it to simulate two- and three-phase flows in bounded domains. In the PF-SPH model, the Navier-Stokes equations are discretized with the Smoothed Particle Hydrodynamics (SPH) method and the Young-Laplace boundary condition at the fluid-fluid interface and the Young boundary condition at the fluid-fluid-solid interface are replaced with pairwise forces added into the Navier-Stokes equations. We derive a relationship between the parameters in the pairwise forces and the surface tension and static contact angle. Next, we demonstrate the accuracy of the model under static andmore » dynamic conditions. Finally, to demonstrate the capabilities and robustness of the model we use it to simulate flow of three fluids in a porous material.« less
MR-assisted PET motion correction in simultaneous PET/MRI studies of dementia subjects.
Chen, Kevin T; Salcedo, Stephanie; Chonde, Daniel B; Izquierdo-Garcia, David; Levine, Michael A; Price, Julie C; Dickerson, Bradford C; Catana, Ciprian
2018-03-08
Subject motion in positron emission tomography (PET) studies leads to image blurring and artifacts; simultaneously acquired magnetic resonance imaging (MRI) data provides a means for motion correction (MC) in integrated PET/MRI scanners. To assess the effect of realistic head motion and MR-based MC on static [ 18 F]-fluorodeoxyglucose (FDG) PET images in dementia patients. Observational study. Thirty dementia subjects were recruited. 3T hybrid PET/MR scanner where EPI-based and T 1 -weighted sequences were acquired simultaneously with the PET data. Head motion parameters estimated from high temporal resolution MR volumes were used for PET MC. The MR-based MC method was compared to PET frame-based MC methods in which motion parameters were estimated by coregistering 5-minute frames before and after accounting for the attenuation-emission mismatch. The relative changes in standardized uptake value ratios (SUVRs) between the PET volumes processed with the various MC methods, without MC, and the PET volumes with simulated motion were compared in relevant brain regions. The absolute value of the regional SUVR relative change was assessed with pairwise paired t-tests testing at the P = 0.05 level, comparing the values obtained through different MR-based MC processing methods as well as across different motion groups. The intraregion voxelwise variability of regional SUVRs obtained through different MR-based MC processing methods was also assessed with pairwise paired t-tests testing at the P = 0.05 level. MC had a greater impact on PET data quantification in subjects with larger amplitude motion (higher than 18% in the medial orbitofrontal cortex) and greater changes were generally observed for the MR-based MC method compared to the frame-based methods. Furthermore, a mean relative change of ∼4% was observed after MC even at the group level, suggesting the importance of routinely applying this correction. The intraregion voxelwise variability of regional SUVRs was also decreased using MR-based MC. All comparisons were significant at the P = 0.05 level. Incorporating temporally correlated MR data to account for intraframe motion has a positive impact on the FDG PET image quality and data quantification in dementia patients. 3 Technical Efficacy: Stage 1 J. Magn. Reson. Imaging 2018. © 2018 International Society for Magnetic Resonance in Medicine.
Cortical Dynamics in Presence of Assemblies of Densely Connected Weight-Hub Neurons
Setareh, Hesam; Deger, Moritz; Petersen, Carl C. H.; Gerstner, Wulfram
2017-01-01
Experimental measurements of pairwise connection probability of pyramidal neurons together with the distribution of synaptic weights have been used to construct randomly connected model networks. However, several experimental studies suggest that both wiring and synaptic weight structure between neurons show statistics that differ from random networks. Here we study a network containing a subset of neurons which we call weight-hub neurons, that are characterized by strong inward synapses. We propose a connectivity structure for excitatory neurons that contain assemblies of densely connected weight-hub neurons, while the pairwise connection probability and synaptic weight distribution remain consistent with experimental data. Simulations of such a network with generalized integrate-and-fire neurons display regular and irregular slow oscillations akin to experimentally observed up/down state transitions in the activity of cortical neurons with a broad distribution of pairwise spike correlations. Moreover, stimulation of a model network in the presence or absence of assembly structure exhibits responses similar to light-evoked responses of cortical layers in optogenetically modified animals. We conclude that a high connection probability into and within assemblies of excitatory weight-hub neurons, as it likely is present in some but not all cortical layers, changes the dynamics of a layer of cortical microcircuitry significantly. PMID:28690508
Fiebig, Lena; Kohl, Thomas A; Popovici, Odette; Mühlenfeld, Margarita; Indra, Alexander; Homorodean, Daniela; Chiotan, Domnica; Richter, Elvira; Rüsch-Gerdes, Sabine; Schmidgruber, Beatrix; Beckert, Patrick; Hauer, Barbara; Niemann, Stefan; Allerberger, Franz; Haas, Walter
2017-01-12
Molecular surveillance of multidrug-resistant tuberculosis (MDR-TB) using 24-loci MIRU-VNTR in the European Union suggests the occurrence of international transmission. In early 2014, Austria detected a molecular MDR-TB cluster of five isolates. Links to Romania and Germany prompted the three countries to investigate possible cross-border MDR-TB transmission jointly. We searched genotyping databases, genotyped additional isolates from Romania, used whole genome sequencing (WGS) to infer putative transmission links, and investigated pairwise epidemiological links and patient mobility. Ten isolates from 10 patients shared the same 24-loci MIRU-VNTR pattern. Within this cluster, WGS defined two subgroups of four patients each. The first comprised an MDR-TB patient from Romania who had sought medical care in Austria and two patients from Austria. The second comprised patients, two of them epidemiologically linked, who lived in three different countries but had the same city of provenance in Romania. Our findings strongly suggested that the two cases in Austrian citizens resulted from a newly introduced MDR-TB strain, followed by domestic transmission. For the other cases, transmission probably occurred in the same city of provenance. To prevent further MDR-TB transmission, we need to ensure universal access to early and adequate therapy and collaborate closely in tuberculosis care beyond administrative borders. This article is copyright of The Authors, 2017.
Genetic diversity analysis of the oriental river prawn (Macrobrachium nipponense) in Huaihe River.
Cui, Feng; Yu, Yanyan; Bao, Fangyin; Wang, Song; Xiao, Ming Song
2018-04-19
The oriental river prawn (Macrobrachium nipponense) is an economically and nutritionally important species of decapod crustaceans in China. Genetic structure and demographic history of Macrobrachium nipponense were examined using sequence data from portions of the mitochondrial DNA cytochrome oxidase subunit I (COI) gene. Samples of 191 individuals were collected from 10 localities in the upper to middle reaches of the Huaihe River. Variability was detected at a total of 42 nucleotide sites along 684 bp length of homologous sequence (6.14%), and base substitutions occurred mostly at the second codon position. Haplotype diversity (h) and nucleotide diversity (π) of all populations were 0.9136 ± 0.0116 and 0.0078 ± 0.0042, respectively. Phylogenetic tree constructed using the maximum-likelihood (ML) method showed that the 44 haplotypes were assigned to two obvious clades associated with geographic regions. Moreover, the median-joining network was similar to the topology of the phylogenetic tree with 44 haplotypes. The pairwise F ST values between the populations varied from -0.0298 to 0.2994. Generally, moderate genetic differentiation (F ST = 0.1598, p = .0000) among different geographic populations was detected, with the significant differentiation between the Huaibin (HB) and other Macrobrachium nipponense populations. Both mismatch distribution analyses and neutrality tests suggested the early stage of Late Pleistocene population expansion 85,500 years before present for the species, which was consistent with the palaeoclimatic condition of the Huaihe River Basin.
Hellberg, M E; Moy, G W; Vacquier, V D
2000-03-01
Male-specific proteins have increasingly been reported as targets of positive selection and are of special interest because of the role they may play in the evolution of reproductive isolation. We report the rapid interspecific divergence of cDNA encoding a major acrosomal protein of unknown function (TMAP) of sperm from five species of teguline gastropods. A mitochondrial DNA clock (calibrated by congeneric species divided by the Isthmus of Panama) estimates that these five species diverged 2-10 MYA. Inferred amino acid sequences reveal a propeptide that has diverged rapidly between species. The mature protein has diverged faster still due to high nonsynonymous substitution rates (> 25 nonsynonymous substitutions per site per 10(9) years). cDNA encoding the mature protein (89-100 residues) shows evidence of positive selection (Dn/Ds > 1) for 4 of 10 pairwise species comparisons. cDNA and predicted secondary-structure comparisons suggest that TMAP is neither orthologous nor paralogous to abalone lysin, and thus marks a second, phylogenetically independent, protein subject to strong positive selection in free-spawning marine gastropods. In addition, an internal repeat in one species (Tegula aureotincta) produces a duplicated cleavage site which results in two alternatively processed mature proteins differing by nine amino acid residues. Such alternative processing may provide a mechanism for introducing novel amino acid sequence variation at the amino-termini of proteins. Highly divergent TMAP N-termini from two other tegulines (Tegula regina and Norrisia norrisii) may have originated by such a mechanism.
David, Sophia; Rusniok, Christophe; Mentasti, Massimo; Gomez-Valero, Laura; Harris, Simon R.; Lechat, Pierre; Lees, John; Ginevra, Christophe; Glaser, Philippe; Ma, Laurence; Bouchier, Christiane; Underwood, Anthony; Jarraud, Sophie; Harrison, Timothy G.; Parkhill, Julian; Buchrieser, Carmen
2016-01-01
Legionella pneumophila is an environmental bacterium and the leading cause of Legionnaires’ disease. Just five sequence types (ST), from more than 2000 currently described, cause nearly half of disease cases in northwest Europe. Here, we report the sequence and analyses of 364 L. pneumophila genomes, including 337 from the five disease-associated STs and 27 representative of the species diversity. Phylogenetic analyses revealed that the five STs have independent origins within a highly diverse species. The number of de novo mutations is extremely low with maximum pairwise single-nucleotide polymorphisms (SNPs) ranging from 19 (ST47) to 127 (ST1), which suggests emergences within the last century. Isolates sampled geographically far apart differ by only a few SNPs, demonstrating rapid dissemination. These five STs have been recombining recently, leading to a shared pool of allelic variants potentially contributing to their increased disease propensity. The oldest clone, ST1, has spread globally; between 1940 and 2000, four new clones have emerged in Europe, which show long-distance, rapid dispersal. That a large proportion of clinical cases is caused by recently emerged and internationally dispersed clones, linked by convergent evolution, is surprising for an environmental bacterium traditionally considered to be an opportunistic pathogen. To simultaneously explain recent emergence, rapid spread and increased disease association, we hypothesize that these STs have adapted to new man-made environmental niches, which may be linked by human infection and transmission. PMID:27662900
David, Sophia; Rusniok, Christophe; Mentasti, Massimo; Gomez-Valero, Laura; Harris, Simon R; Lechat, Pierre; Lees, John; Ginevra, Christophe; Glaser, Philippe; Ma, Laurence; Bouchier, Christiane; Underwood, Anthony; Jarraud, Sophie; Harrison, Timothy G; Parkhill, Julian; Buchrieser, Carmen
2016-11-01
Legionella pneumophila is an environmental bacterium and the leading cause of Legionnaires' disease. Just five sequence types (ST), from more than 2000 currently described, cause nearly half of disease cases in northwest Europe. Here, we report the sequence and analyses of 364 L. pneumophila genomes, including 337 from the five disease-associated STs and 27 representative of the species diversity. Phylogenetic analyses revealed that the five STs have independent origins within a highly diverse species. The number of de novo mutations is extremely low with maximum pairwise single-nucleotide polymorphisms (SNPs) ranging from 19 (ST47) to 127 (ST1), which suggests emergences within the last century. Isolates sampled geographically far apart differ by only a few SNPs, demonstrating rapid dissemination. These five STs have been recombining recently, leading to a shared pool of allelic variants potentially contributing to their increased disease propensity. The oldest clone, ST1, has spread globally; between 1940 and 2000, four new clones have emerged in Europe, which show long-distance, rapid dispersal. That a large proportion of clinical cases is caused by recently emerged and internationally dispersed clones, linked by convergent evolution, is surprising for an environmental bacterium traditionally considered to be an opportunistic pathogen. To simultaneously explain recent emergence, rapid spread and increased disease association, we hypothesize that these STs have adapted to new man-made environmental niches, which may be linked by human infection and transmission. © 2016 David et al.; Published by Cold Spring Harbor Laboratory Press.
Mariadassou, Mahendra; Bardowski, Jacek K.; Bidnenko, Elena
2011-01-01
Background The single-stranded-nucleic acid binding (SSB) protein superfamily includes proteins encoded by different organisms from Bacteria and their phages to Eukaryotes. SSB proteins share common structural characteristics and have been suggested to descend from an ancestor polypeptide. However, as other proteins involved in DNA replication, bacterial SSB proteins are clearly different from those found in Archaea and Eukaryotes. It was proposed that the corresponding genes in the phage genomes were transferred from the bacterial hosts. Recently new SSB proteins encoded by the virulent lactococcal bacteriophages (Orf14bIL67-like proteins) have been identified and characterized structurally and biochemically. Methodology/Principal Findings This study focused on the determination of phylogenetic relationships between Orf14bIL67-like proteins and other SSBs. We have performed a large scale phylogenetic analysis and pairwise sequence comparisons of SSB proteins from different phyla. The results show that, in remarkable contrast to other phage SSBs, the Orf14bIL67–like proteins form a distinct, self-contained and well supported phylogenetic group connected to the archaeal SSBs. Functional studies demonstrated that, despite the structural and amino acid sequence differences from bacterial SSBs, Orf14bIL67 protein complements the conditional lethal ssb-1 mutation of Escherichia coli. Conclusions/Significance Here we identified for the first time a group of phages encoded SSBs which are clearly distinct from their bacterial counterparts. All methods supported the recognition of these phage proteins as a new family within the SSB superfamily. Our findings suggest that unlike other phages, the virulent lactococcal phages carry ssb genes that were not acquired from their hosts, but transferred from an archaeal genome. This represents a unique example of a horizontal gene transfer between Archaea and bacterial phages. PMID:22073223
Mohanta, U K; Rana, H B; Devkota, B; Itagaki, T
2017-07-01
Explanatum explanatum flukes, liver amphistomes of ruminants, cause significant economic loss in the livestock industry by inducing severe liver damage. A total of 66 flukes from 26 buffaloes and 7 cattle in four different geographic areas of Bangladesh and 20 flukes from 10 buffaloes in the Chitwan district of Nepal were subjected for analysis. The sequences (442 bp) of the second internal transcribed spacer (ITS2) of ribosomal DNA and the variable fragments (657 bp) of mitochondrial nicotinamide dehydrogenase subunit 1 (nad1) of E. explanatum flukes from Bangladesh and Nepal were analysed. The aim of this study was molecular characterization of the flukes and to elucidate their origin and biogeography. In the ITS2 region, two genotypes were detected among the flukes from Bangladesh, while flukes from Nepal were of only one genotype. Phylogenetic analyses inferred from the nad1 gene revealed that at least four divergent populations (groups I-IV) are distributed in Bangladesh, whereas two divergent populations were found to be distributed in Nepal. Fst values (pairwise fixation index) suggest that Bangladeshi and Nepalese populations of group I to IV are significantly different from each other; but within groups III and IV, the populations from Bangladesh and Nepal were genetically close. This divergence in the nad1 gene indicates that each lineage of E. explanatum from diverse geography was co-adapted during the multiple domestication events of ruminants. This study, for the first time, provides molecular characterization of E. explanatum in Bangladesh and Nepal, and may provide useful information for elucidating its origin and dispersal route in Asia.