Science.gov

Sample records for multiple-sequence local alignment

  1. Biclustering as a method for RNA local multiple sequence alignment.

    PubMed

    Wang, Shu; Gutell, Robin R; Miranker, Daniel P

    2007-12-15

    Biclustering is a clustering method that simultaneously clusters both the domain and range of a relation. A challenge in multiple sequence alignment (MSA) is that the alignment of sequences is often intended to reveal groups of conserved functional subsequences. Simultaneously, the grouping of the sequences can impact the alignment; precisely the kind of dual situation biclustering is intended to address. We define a representation of the MSA problem enabling the application of biclustering algorithms. We develop a computer program for local MSA, BlockMSA, that combines biclustering with divide-and-conquer. BlockMSA simultaneously finds groups of similar sequences and locally aligns subsequences within them. Further alignment is accomplished by dividing both the set of sequences and their contents. The net result is both a multiple sequence alignment and a hierarchical clustering of the sequences. BlockMSA was tested on the subsets of the BRAliBase 2.1 benchmark suite that display high variability and on an extension to that suite to larger problem sizes. Also, alignments were evaluated of two large datasets of current biological interest, T box sequences and Group IC1 Introns. The results were compared with alignments computed by ClustalW, MAFFT, MUCLE and PROBCONS alignment programs using Sum of Pairs (SPS) and Consensus Count. Results for the benchmark suite are sensitive to problem size. On problems of 15 or greater sequences, BlockMSA is consistently the best. On none of the problems in the test suite are there appreciable differences in scores among BlockMSA, MAFFT and PROBCONS. On the T box sequences, BlockMSA does the most faithful job of reproducing known annotations. MAFFT and PROBCONS do not. On the Intron sequences, BlockMSA, MAFFT and MUSCLE are comparable at identifying conserved regions. BlockMSA is implemented in Java. Source code and supplementary datasets are available at http://aug.csres.utexas.edu/msa/

  2. Multiple Sequence Alignment.

    PubMed

    Bawono, Punto; Dijkstra, Maurits; Pirovano, Walter; Feenstra, Anton; Abeln, Sanne; Heringa, Jaap

    2017-01-01

    The increasing importance of Next Generation Sequencing (NGS) techniques has highlighted the key role of multiple sequence alignment (MSA) in comparative structure and function analysis of biological sequences. MSA often leads to fundamental biological insight into sequence-structure-function relationships of nucleotide or protein sequence families. Significant advances have been achieved in this field, and many useful tools have been developed for constructing alignments, although many biological and methodological issues are still open. This chapter first provides some background information and considerations associated with MSA techniques, concentrating on the alignment of protein sequences. Then, a practical overview of currently available methods and a description of their specific advantages and limitations are given, to serve as a helpful guide or starting point for researchers who aim to construct a reliable MSA.

  3. DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment.

    PubMed

    Wright, Erik S

    2015-10-06

    Alignment of large and diverse sequence sets is a common task in biological investigations, yet there remains considerable room for improvement in alignment quality. Multiple sequence alignment programs tend to reach maximal accuracy when aligning only a few sequences, and then diminish steadily as more sequences are added. This drop in accuracy can be partly attributed to a build-up of error and ambiguity as more sequences are aligned. Most high-throughput sequence alignment algorithms do not use contextual information under the assumption that sites are independent. This study examines the extent to which local sequence context can be exploited to improve the quality of large multiple sequence alignments. Two predictors based on local sequence context were assessed: (i) single sequence secondary structure predictions, and (ii) modulation of gap costs according to the surrounding residues. The results indicate that context-based predictors have appreciable information content that can be utilized to create more accurate alignments. Furthermore, local context becomes more informative as the number of sequences increases, enabling more accurate protein alignments of large empirical benchmarks. These discoveries became the basis for DECIPHER, a new context-aware program for sequence alignment, which outperformed other programs on large sequence sets. Predicting secondary structure based on local sequence context is an efficient means of breaking the independence assumption in alignment. Since secondary structure is more conserved than primary sequence, it can be leveraged to improve the alignment of distantly related proteins. Moreover, secondary structure predictions increase in accuracy as more sequences are used in the prediction. This enables the scalable generation of large sequence alignments that maintain high accuracy even on diverse sequence sets. The DECIPHER R package and source code are freely available for download at DECIPHER.cee.wisc.edu and from the

  4. Mulan: Multiple-Sequence Local Alignment and Visualization for Studying Function and Evolution

    SciTech Connect

    Ovcharenko, I; Loots, G; Giardine, B; Hou, M; Ma, J; Hardison, R; Stubbs, L; Miller, W

    2004-07-14

    Multiple sequence alignment analysis is a powerful approach for understanding phylogenetic relationships, annotating genes and detecting functional regulatory elements. With a growing number of partly or fully sequenced vertebrate genomes, effective tools for performing multiple comparisons are required to accurately and efficiently assist biological discoveries. Here we introduce Mulan (http://mulan.dcode.org/), a novel method and a network server for comparing multiple draft and finished-quality sequences to identify functional elements conserved over evolutionary time. Mulan brings together several novel algorithms: the tba multi-aligner program for rapid identification of local sequence conservation and the multiTF program for detecting evolutionarily conserved transcription factor binding sites in multiple alignments. In addition, Mulan supports two-way communication with the GALA database; alignments of multiple species dynamically generated in GALA can be viewed in Mulan, and conserved transcription factor binding sites identified with Mulan/multiTF can be integrated and overlaid with extensive genome annotation data using GALA. Local multiple alignments computed by Mulan ensure reliable representation of short-and large-scale genomic rearrangements in distant organisms. Mulan allows for interactive modification of critical conservation parameters to differentially predict conserved regions in comparisons of both closely and distantly related species. We illustrate the uses and applications of the Mulan tool through multi-species comparisons of the GATA3 gene locus and the identification of elements that are conserved differently in avians than in other genomes allowing speculation on the evolution of birds. Source code for the aligners and the aligner-evaluation software can be freely downloaded from http://bio.cse.psu.edu/.

  5. An efficient method for multiple sequence alignment

    SciTech Connect

    Kim, J.; Pramanik, S.

    1994-12-31

    Multiple sequence alignment has been a useful method in the study of molecular evolution and sequence-structure relationships. This paper presents a new method for multiple sequence alignment based on simulated annealing technique. Dynamic programming has been widely used to find an optimal alignment. However, dynamic programming has several limitations to obtain optimal alignment. It requires long computation time and cannot apply certain types of cost functions. We describe detail mechanisms of simulated annealing for multiple sequence alignment problem. It is shown that simulated annealing can be an effective approach to overcome the limitations of dynamic programming in multiple sequence alignment problem.

  6. Heuristics for multiobjective multiple sequence alignment.

    PubMed

    Abbasi, Maryam; Paquete, Luís; Pereira, Francisco B

    2016-07-15

    Aligning multiple sequences arises in many tasks in Bioinformatics. However, the alignments produced by the current software packages are highly dependent on the parameters setting, such as the relative importance of opening gaps with respect to the increase of similarity. Choosing only one parameter setting may provide an undesirable bias in further steps of the analysis and give too simplistic interpretations. In this work, we reformulate multiple sequence alignment from a multiobjective point of view. The goal is to generate several sequence alignments that represent a trade-off between maximizing the substitution score and minimizing the number of indels/gaps in the sum-of-pairs score function. This trade-off gives to the practitioner further information about the similarity of the sequences, from which she could analyse and choose the most plausible alignment. We introduce several heuristic approaches, based on local search procedures, that compute a set of sequence alignments, which are representative of the trade-off between the two objectives (substitution score and indels). Several algorithm design options are discussed and analysed, with particular emphasis on the influence of the starting alignment and neighborhood search definitions on the overall performance. A perturbation technique is proposed to improve the local search, which provides a wide range of high-quality alignments. The proposed approach is tested experimentally on a wide range of instances. We performed several experiments with sequences obtained from the benchmark database BAliBASE 3.0. To evaluate the quality of the results, we calculate the hypervolume indicator of the set of score vectors returned by the algorithms. The results obtained allow us to identify reasonably good choices of parameters for our approach. Further, we compared our method in terms of correctly aligned pairs ratio and columns correctly aligned ratio with respect to reference alignments. Experimental results show

  7. PRALINE: a versatile multiple sequence alignment toolkit.

    PubMed

    Bawono, Punto; Heringa, Jaap

    2014-01-01

    Profile ALIgNmEnt (PRALINE) is a versatile multiple sequence alignment toolkit. In its main alignment protocol, PRALINE follows the global progressive alignment algorithm. It provides various alignment optimization strategies to address the different situations that call for protein multiple sequence alignment: global profile preprocessing, homology-extended alignment, secondary structure-guided alignment, and transmembrane aware alignment. A number of combinations of these strategies are enabled as well. PRALINE is accessible via the online server http://www.ibi.vu.nl/programs/PRALINEwww/. The server facilitates extensive visualization possibilities aiding the interpretation of alignments generated, which can be written out in pdf format for publication purposes. PRALINE also allows the sequences in the alignment to be represented in a dendrogram to show their mutual relationships according to the alignment. The chapter ends with a discussion of various issues occurring in multiple sequence alignment.

  8. SNUFER: A software for localization and presentation of single nucleotide polymorphisms using a Clustal multiple sequence alignment output file

    PubMed Central

    Mansur, Marco A B; Cardozo, Giovana P; Santos, Elaine V; Marins, Mozart

    2008-01-01

    SNUFER is a software for the automatic localization and generation of tables used for the presentation of single nucleotide polymorphisms (SNPs). After input of a fasta file containing the sequences to be analyzed, a multiple sequence alignment is generated using ClustalW ran inside SNUFER. The ClustalW output file is then used to generate a table which displays the SNPs detected in the aligned sequences and their degree of similarity. This table can be exported to Microsoft Word, Microsoft Excel or as a single text file, permitting further editing for publication. The software was written using Delphi 7 for programming and FireBird 2.0 for sequence database management. It is freely available for noncommercial use and can be downloaded from http://www.heranza.com.br/bioinformatica2.htm. PMID:19238196

  9. Instability in progressive multiple sequence alignment algorithms.

    PubMed

    Boyce, Kieran; Sievers, Fabian; Higgins, Desmond G

    2015-01-01

    Progressive alignment is the standard approach used to align large numbers of sequences. As with all heuristics, this involves a tradeoff between alignment accuracy and computation time. We examine this tradeoff and find that, because of a loss of information in the early steps of the approach, the alignments generated by the most common multiple sequence alignment programs are inherently unstable, and simply reversing the order of the sequences in the input file will cause a different alignment to be generated. Although this effect is more obvious with larger numbers of sequences, it can also be seen with data sets in the order of one hundred sequences. We also outline the means to determine the number of sequences in a data set beyond which the probability of instability will become more pronounced. This has major ramifications for both the designers of large-scale multiple sequence alignment algorithms, and for the users of these alignments.

  10. Two Hybrid Algorithms for Multiple Sequence Alignment

    NASA Astrophysics Data System (ADS)

    Naznin, Farhana; Sarker, Ruhul; Essam, Daryl

    2010-01-01

    In order to design life saving drugs, such as cancer drugs, the design of Protein or DNA structures has to be accurate. These structures depend on Multiple Sequence Alignment (MSA). MSA is used to find the accurate structure of Protein and DNA sequences from existing approximately correct sequences. To overcome the overly greedy nature of the well known global progressive alignment method for multiple sequence alignment, we have proposed two different algorithms in this paper; one is using an iterative approach with a progressive alignment method (PAMIM) and the second one is using a genetic algorithm with a progressive alignment method (PAMGA). Both of our methods started with a "kmer" distance table to generate single guide-tree. In the iterative approach, we have introduced two new techniques: the first technique is to generate Guide-trees with randomly selected sequences and the second is of shuffling the sequences inside that tree. The output of the tree is a multiple sequence alignment which has been evaluated by the Sum of Pairs Method (SPM) considering the real value data from PAM250. In our second GA approach, these two techniques are used to generate an initial population and also two different approaches of genetic operators are implemented in crossovers and mutation. To test the performance of our two algorithms, we have compared these with the existing well known methods: T-Coffee, MUSCEL, MAFFT and Probcon, using BAliBase benchmarks. The experimental results show that the first algorithm works well for some situations, where other existing methods face difficulties in obtaining better solutions. The proposed second method works well compared to the existing methods for all situations and it shows better performance over the first one.

  11. Multiple sequence alignment accuracy and phylogenetic inference.

    PubMed

    Ogden, T Heath; Rosenberg, Michael S

    2006-04-01

    Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.

  12. MergeAlign: improving multiple sequence alignment performance by dynamic reconstruction of consensus multiple sequence alignments

    PubMed Central

    2012-01-01

    Background The generation of multiple sequence alignments (MSAs) is a crucial step for many bioinformatic analyses. Thus improving MSA accuracy and identifying potential errors in MSAs is important for a wide range of post-genomic research. We present a novel method called MergeAlign which constructs consensus MSAs from multiple independent MSAs and assigns an alignment precision score to each column. Results Using conventional benchmark tests we demonstrate that on average MergeAlign MSAs are more accurate than MSAs generated using any single matrix of sequence substitution. We show that MergeAlign column scores are related to alignment precision and hence provide an ab initio method of estimating alignment precision in the absence of curated reference MSAs. Using two novel and independent alignment performance tests that utilise a large set of orthologous gene families we demonstrate that increasing MSA performance leads to an increase in the performance of downstream phylogenetic analyses. Conclusion Using multiple tests of alignment performance we demonstrate that this novel method has broad general application in biological research. PMID:22646090

  13. Multiple sequence alignment based on dynamic weighted guidance tree.

    PubMed

    Nguyen, Ken D; Pan, Yi

    2011-01-01

    Aligning multiple DNA/RNA/protein sequences to identify common functionalities, structures, or relationships between species is a fundamental task in bioinformatics. In this study, we propose a new multiple sequence strategy that extracts sequence information, sequence global and local similarities to provide different weights for each input sequence. A weighted pair-wise distance matrix is calculated from these sequences to build a dynamic alignment guiding tree. The tree can reorder its higher-level branches based on corresponding alignment results from lower tree levels to guarantee the highest alignment scores at each level of the tree. This technique improves the alignment accuracy up to 10% on many benchmarks tested against alignment tools such as CLUSTALW (Thompson et al., 1994), DIALIGN (Morgenstern, 1999), T-COFFEE (Notredame et al., 2000), MUSCLE (Edgar, 2004), and PROBCONS (Do et al., 2005) of the multiple sequence alignment.

  14. Identifying subset errors in multiple sequence alignments.

    PubMed

    Roy, Aparna; Taddese, Bruck; Vohra, Shabana; Thimmaraju, Phani K; Illingworth, Christopher J R; Simpson, Lisa M; Mukherjee, Keya; Reynolds, Christopher A; Chintapalli, Sree V

    2014-01-01

    Multiple sequence alignment (MSA) accuracy is important, but there is no widely accepted method of judging the accuracy that different alignment algorithms give. We present a simple approach to detecting two types of error, namely block shifts and the misplacement of residues within a gap. Given a MSA, subsets of very similar sequences are generated through the use of a redundancy filter, typically using a 70-90% sequence identity cut-off. Subsets thus produced are typically small and degenerate, and errors can be easily detected even by manual examination. The errors, albeit minor, are inevitably associated with gaps in the alignment, and so the procedure is particularly relevant to homology modelling of protein loop regions. The usefulness of the approach is illustrated in the context of the universal but little known [K/R]KLH motif that occurs in intracellular loop 1 of G protein coupled receptors (GPCR); other issues relevant to GPCR modelling are also discussed.

  15. Protein multiple sequence alignment by hybrid bio-inspired algorithms.

    PubMed

    Cutello, Vincenzo; Nicosia, Giuseppe; Pavone, Mario; Prizzi, Igor

    2011-03-01

    This article presents an immune inspired algorithm to tackle the Multiple Sequence Alignment (MSA) problem. MSA is one of the most important tasks in biological sequence analysis. Although this paper focuses on protein alignments, most of the discussion and methodology may also be applied to DNA alignments. The problem of finding the multiple alignment was investigated in the study by Bonizzoni and Vedova and Wang and Jiang, and proved to be a NP-hard (non-deterministic polynomial-time hard) problem. The presented algorithm, called Immunological Multiple Sequence Alignment Algorithm (IMSA), incorporates two new strategies to create the initial population and specific ad hoc mutation operators. It is based on the 'weighted sum of pairs' as objective function, to evaluate a given candidate alignment. IMSA was tested using both classical benchmarks of BAliBASE (versions 1.0, 2.0 and 3.0), and experimental results indicate that it is comparable with state-of-the-art multiple alignment algorithms, in terms of quality of alignments, weighted Sums-of-Pairs (SP) and Column Score (CS) values. The main novelty of IMSA is its ability to generate more than a single suboptimal alignment, for every MSA instance; this behaviour is due to the stochastic nature of the algorithm and of the populations evolved during the convergence process. This feature will help the decision maker to assess and select a biologically relevant multiple sequence alignment. Finally, the designed algorithm can be used as a local search procedure to properly explore promising alignments of the search space.

  16. Multiple sequence alignment based on profile alignment of intermediate sequences.

    PubMed

    Lu, Yue; Sze, Sing-Hoi

    2008-09-01

    Despite considerable efforts, it remains difficult to obtain accurate multiple sequence alignments. By using additional hits from database search of the input sequences, a few strategies have been proposed to significantly improve alignment accuracy, including the construction of profiles from the hits while performing profile alignment, the inclusion of high scoring hits into the input sequences, the use of intermediate sequence search to link distant homologs, and the use of secondary structure information. We develop an algorithm that integrates these strategies to further improve alignment accuracy by modifying the pair-Hidden Markov Model (HMM) approach in ProbCons to incorporate profiles of intermediate sequences from database search and utilize secondary structure predictions as in SPEM. We test our algorithm on a few sets of benchmark multiple alignments, including BAliBASE, HOMSTRAD, PREFAB, and SABmark, and show that it significantly outperforms MAFFT and ProbCons, which are among the best multiple alignment algorithms that do not utilize additional information, and SPEM, which is among the best multiple alignment algorithms that utilize additional hits from database search. The improvement in accuracy over SPEM can be as much as 5-10% when aligning divergent sequences. A software program that implements this approach (ISPAlign) is available at http://faculty.cs.tamu.edu/shsze/ispalign.

  17. Multiple sequence alignment with hierarchical clustering.

    PubMed Central

    Corpet, F

    1988-01-01

    An algorithm is presented for the multiple alignment of sequences, either proteins or nucleic acids, that is both accurate and easy to use on microcomputers. The approach is based on the conventional dynamic-programming method of pairwise alignment. Initially, a hierarchical clustering of the sequences is performed using the matrix of the pairwise alignment scores. The closest sequences are aligned creating groups of aligned sequences. Then close groups are aligned until all sequences are aligned in one group. The pairwise alignments included in the multiple alignment form a new matrix that is used to produce a hierarchical clustering. If it is different from the first one, iteration of the process can be performed. The method is illustrated by an example: a global alignment of 39 sequences of cytochrome c. PMID:2849754

  18. Progressive multiple sequence alignments from triplets

    PubMed Central

    Kruspe, Matthias; Stadler, Peter F

    2007-01-01

    Background The quality of progressive sequence alignments strongly depends on the accuracy of the individual pairwise alignment steps since gaps that are introduced at one step cannot be removed at later aggregation steps. Adjacent insertions and deletions necessarily appear in arbitrary order in pairwise alignments and hence form an unavoidable source of errors. Research Here we present a modified variant of progressive sequence alignments that addresses both issues. Instead of pairwise alignments we use exact dynamic programming to align sequence or profile triples. This avoids a large fractions of the ambiguities arising in pairwise alignments. In the subsequent aggregation steps we follow the logic of the Neighbor-Net algorithm, which constructs a phylogenetic network by step-wisely replacing triples by pairs instead of combining pairs to singletons. To this end the three-way alignments are subdivided into two partial alignments, at which stage all-gap columns are naturally removed. This alleviates the "once a gap, always a gap" problem of progressive alignment procedures. Conclusion The three-way Neighbor-Net based alignment program aln3nn is shown to compare favorably on both protein sequences and nucleic acids sequences to other progressive alignment tools. In the latter case one easily can include scoring terms that consider secondary structure features. Overall, the quality of resulting alignments in general exceeds that of clustalw or other multiple alignments tools even though our software does not included heuristics for context dependent (mis)match scores. PMID:17631683

  19. Characterization of pairwise and multiple sequence alignment errors.

    PubMed

    Landan, Giddy; Graur, Dan

    2009-07-15

    We characterize pairwise and multiple sequence alignment (MSA) errors by comparing true alignments from simulations of sequence evolution with reconstructed alignments. The vast majority of reconstructed alignments contain many errors. Error rates rapidly increase with sequence divergence, thus, for even intermediate degrees of sequence divergence, more than half of the columns of a reconstructed alignment may be expected to be erroneous. In closely related sequences, most errors consist of the erroneous positioning of a single indel event and their effect is local. As sequences diverge, errors become more complex as a result of the simultaneous mis-reconstruction of many indel events, and the lengths of the affected MSA segments increase dramatically. We found a systematic bias towards underestimation of the number of gaps, which leads to the reconstructed MSA being on average shorter than the true one. Alignment errors are unavoidable even when the evolutionary parameters are known in advance. Correct reconstruction can only be guaranteed when the likelihood of true alignment is uniquely optimal. However, true alignment features are very frequently sub-optimal or co-optimal, with the result that optimal albeit erroneous features are incorporated into the reconstructed MSA. Progressive MSA utilizes a guide-tree in the reconstruction of MSAs. The quality of the guide-tree was found to affect MSA error levels only marginally.

  20. MANGO: a new approach to multiple sequence alignment.

    PubMed

    Zhang, Zefeng; Lin, Hao; Li, Ming

    2007-01-01

    Multiple sequence alignment is a classical and challenging task for biological sequence analysis. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state of the art multiple sequence alignment programs suffer from the 'once a gap, always a gap' phenomenon. Is there a radically new way to do multiple sequence alignment? This paper introduces a novel and orthogonal multiple sequence alignment method, using multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds are provably significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks showing that MANGO compares favorably, in both accuracy and speed, against state-of-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, Prob-ConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0 and Kalign 2.0.

  1. Refining multiple sequence alignments with conserved core regions

    PubMed Central

    Chakrabarti, Saikat; Lanczycki, Christopher J.; Panchenko, Anna R.; Przytycka, Teresa M.; Thiessen, Paul A.; Bryant, Stephen H.

    2006-01-01

    Accurate multiple sequence alignments of proteins are very important to several areas of computational biology and provide an understanding of phylogenetic history of domain families, their identification and classification. This article presents a new algorithm, REFINER, that refines a multiple sequence alignment by iterative realignment of its individual sequences with the predetermined conserved core (block) model of a protein family. Realignment of each sequence can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family's overall block model. Large-scale benchmarking studies showed a noticeable improvement of alignment after refinement. This can be inferred from the increased alignment score and enhanced sensitivity for database searching using the sequence profiles derived from refined alignments compared with the original alignments. A standalone version of the program is available by ftp distribution () and will be incorporated into the next release of the Cn3D structure/alignment viewer. PMID:16707662

  2. Novel hybrid genetic algorithm for progressive multiple sequence alignment.

    PubMed

    Afridi, Muhammad Ishaq

    2013-01-01

    The family of evolutionary or genetic algorithms is used in various fields of bioinformatics. Genetic algorithms (GAs) can be used for simultaneous comparison of a large pool of DNA or protein sequences. This article explains how the GA is used in combination with other methods like the progressive multiple sequence alignment strategy to get an optimal multiple sequence alignment (MSA). Optimal MSA get much importance in the field of bioinformatics and some other related disciplines. Evolutionary algorithms evolve and improve their performance. In this optimisation, the initial pair-wise alignment is achieved through a progressive method and then a good objective function is used to select and align more alignments and profiles. Child and subpopulation initialisation is based upon changes in the probability of similarity or the distance matrix of the alignment population. In this genetic algorithm, optimisation of mutation, crossover and migration in the population of candidate solution reflect events of natural organic evolution.

  3. A knowledge-based multiple-sequence alignment algorithm.

    PubMed

    Nguyen, Ken D; Pan, Yi

    2013-01-01

    A common and cost-effective mechanism to identify the functionalities, structures, or relationships between species is multiple-sequence alignment, in which DNA/RNA/protein sequences are arranged and aligned so that similarities between sequences are clustered together. Correctly identifying and aligning these sequence biological similarities help from unwinding the mystery of species evolution to drug design. We present our knowledge-based multiple sequence alignment (KB-MSA) technique that utilizes the existing knowledge databases such as SWISSPROT, GENBANK, or HOMSTRAD to provide a more realistic and reliable sequence alignment. We also provide a modified version of this algorithm (CB-MSA) that utilizes the sequence consistency information when sequence knowledge databases are not available. Our benchmark tests on BAliBASE, PREFAB, HOMSTRAD, and SABMARK references show accuracy improvements up to 10 percent on twilight data sets against many leading alignment tools such as ISPALIGN, PADT, CLUSTALW, MAFFT, PROBCONS, and T-COFFEE.

  4. msa: an R package for multiple sequence alignment.

    PubMed

    Bodenhofer, Ulrich; Bonatesta, Enrico; Horejš-Kainrath, Christoph; Hochreiter, Sepp

    2015-12-15

    Although the R platform and the add-on packages of the Bioconductor project are widely used in bioinformatics, the standard task of multiple sequence alignment has been neglected so far. The msa package, for the first time, provides a unified R interface to the popular multiple sequence alignment algorithms ClustalW, ClustalOmega and MUSCLE. The package requires no additional software and runs on all major platforms. Moreover, the msa package provides an R interface to the powerful package shade which allows for flexible and customizable plotting of multiple sequence alignments. msa is available via the Bioconductor project: http://bioconductor.org/packages/release/bioc/html/msa.html. Further information and the R code of the example presented in this paper are available at http://www.bioinf.jku.at/software/msa/. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  5. Genetic algorithms with permutation coding for multiple sequence alignment.

    PubMed

    Ben Othman, Mohamed Tahar; Abdel-Azim, Gamil

    2013-08-01

    Multiple sequence alignment (MSA) is one of the topics of bio informatics that has seriously been researched. It is known as NP-complete problem. It is also considered as one of the most important and daunting tasks in computational biology. Concerning this a wide number of heuristic algorithms have been proposed to find optimal alignment. Among these heuristic algorithms are genetic algorithms (GA). The GA has mainly two major weaknesses: it is time consuming and can cause local minima. One of the significant aspects in the GA process in MSA is to maximize the similarities between sequences by adding and shuffling the gaps of Solution Coding (SC). Several ways for SC have been introduced. One of them is the Permutation Coding (PC). We propose a hybrid algorithm based on genetic algorithms (GAs) with a PC and 2-opt algorithm. The PC helps to code the MSA solution which maximizes the gain of resources, reliability and diversity of GA. The use of the PC opens the area by applying all functions over permutations for MSA. Thus, we suggest an algorithm to calculate the scoring function for multiple alignments based on PC, which is used as fitness function. The time complexity of the GA is reduced by using this algorithm. Our GA is implemented with different selections strategies and different crossovers. The probability of crossover and mutation is set as one strategy. Relevant patents have been probed in the topic.

  6. MSAViewer: interactive JavaScript visualization of multiple sequence alignments.

    PubMed

    Yachdav, Guy; Wilzbach, Sebastian; Rauscher, Benedikt; Sheridan, Robert; Sillitoe, Ian; Procter, James; Lewis, Suzanna E; Rost, Burkhard; Goldberg, Tatyana

    2016-11-15

    The MSAViewer is a quick and easy visualization and analysis JavaScript component for Multiple Sequence Alignment data of any size. Core features include interactive navigation through the alignment, application of popular color schemes, sorting, selecting and filtering. The MSAViewer is 'web ready': written entirely in JavaScript, compatible with modern web browsers and does not require any specialized software. The MSAViewer is part of the BioJS collection of components.

  7. Quantifying the Displacement of Mismatches in Multiple Sequence Alignment Benchmarks

    PubMed Central

    Bawono, Punto; van der Velde, Arjan; Abeln, Sanne; Heringa, Jaap

    2015-01-01

    Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request. PMID:25993129

  8. Quantifying the displacement of mismatches in multiple sequence alignment benchmarks.

    PubMed

    Bawono, Punto; van der Velde, Arjan; Abeln, Sanne; Heringa, Jaap

    2015-01-01

    Multiple Sequence Alignment (MSA) methods are typically benchmarked on sets of reference alignments. The quality of the alignment can then be represented by the sum-of-pairs (SP) or column (CS) scores, which measure the agreement between a reference and corresponding query alignment. Both the SP and CS scores treat mismatches between a query and reference alignment as equally bad, and do not take the separation into account between two amino acids in the query alignment, that should have been matched according to the reference alignment. This is significant since the magnitude of alignment shifts is often of relevance in biological analyses, including homology modeling and MSA refinement/manual alignment editing. In this study we develop a new alignment benchmark scoring scheme, SPdist, that takes the degree of discordance of mismatches into account by measuring the sequence distance between mismatched residue pairs in the query alignment. Using this new score along with the standard SP score, we investigate the discriminatory behavior of the new score by assessing how well six different MSA methods perform with respect to BAliBASE reference alignments. The SP score and the SPdist score yield very similar outcomes when the reference and query alignments are close. However, for more divergent reference alignments the SPdist score is able to distinguish between methods that keep alignments approximately close to the reference and those exhibiting larger shifts. We observed that by using SPdist together with SP scoring we were able to better delineate the alignment quality difference between alternative MSA methods. With a case study we exemplify why it is important, from a biological perspective, to consider the separation of mismatches. The SPdist scoring scheme has been implemented in the VerAlign web server (http://www.ibi.vu.nl/programs/veralignwww/). The code for calculating SPdist score is also available upon request.

  9. Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment.

    PubMed

    Naznin, Farhana; Sarker, Ruhul; Essam, Daryl

    2011-08-25

    Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships. In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0. The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research.

  10. Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment

    PubMed Central

    2011-01-01

    Background Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships. Results In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0. Conclusions The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research. PMID:21867510

  11. Scaling statistical multiple sequence alignment to large datasets.

    PubMed

    Nute, Michael; Warnow, Tandy

    2016-11-11

    Multiple sequence alignment is an important task in bioinformatics, and alignments of large datasets containing hundreds or thousands of sequences are increasingly of interest. While many alignment methods exist, the most accurate alignments are likely to be based on stochastic models where sequences evolve down a tree with substitutions, insertions, and deletions. While some methods have been developed to estimate alignments under these stochastic models, only the Bayesian method BAli-Phy has been able to run on even moderately large datasets, containing 100 or so sequences. A technique to extend BAli-Phy to enable alignments of thousands of sequences could potentially improve alignment and phylogenetic tree accuracy on large-scale data beyond the best-known methods today. We use simulated data with up to 10,000 sequences representing a variety of model conditions, including some that are significantly divergent from the statistical models used in BAli-Phy and elsewhere. We give a method for incorporating BAli-Phy into PASTA and UPP, two strategies for enabling alignment methods to scale to large datasets, and give alignment and tree accuracy results measured against the ground truth from simulations. Comparable results are also given for other methods capable of aligning this many sequences. Extensions of BAli-Phy using PASTA and UPP produce significantly more accurate alignments and phylogenetic trees than the current leading methods.

  12. Multiple sequence alignment: in pursuit of homologous DNA positions.

    PubMed

    Kumar, Sudhir; Filipski, Alan

    2007-02-01

    DNA sequence alignment is a prerequisite to virtually all comparative genomic analyses, including the identification of conserved sequence motifs, estimation of evolutionary divergence between sequences, and inference of historical relationships among genes and species. While it is mere common sense that inaccuracies in multiple sequence alignments can have detrimental effects on downstream analyses, it is important to know the extent to which the inferences drawn from these alignments are robust to errors and biases inherent in all sequence alignments. A survey of investigations into strengths and weaknesses of sequence alignments reveals, as expected, that alignment quality is generally poor for two distantly related sequences and can often be improved by adding additional sequences as stepping stones between distantly related species. Errors in sequence alignment are also found to have a significant negative effect on subsequent inference of sequence divergence, phylogenetic trees, and conserved motifs. However, our understanding of alignment biases remains rudimentary, and sequence alignment procedures continue to be used somewhat like benign formatting operations to make sequences equal in length. Because of the central role these alignments now play in our endeavors to establish the tree of life and to identify important parts of genomes through evolutionary functional genomics, we see a need for increased community effort to investigate influences of alignment bias on the accuracy of large-scale comparative genomics.

  13. Multiple sequence alignment accuracy and evolutionary distance estimation.

    PubMed

    Rosenberg, Michael S

    2005-11-23

    Sequence alignment is a common tool in bioinformatics and comparative genomics. It is generally assumed that multiple sequence alignment yields better results than pair wise sequence alignment, but this assumption has rarely been tested, and never with the control provided by simulation analysis. This study used sequence simulation to examine the gain in accuracy of adding a third sequence to a pair wise alignment, particularly concentrating on how the phylogenetic position of the additional sequence relative to the first pair changes the accuracy of the initial pair's alignment as well as their estimated evolutionary distance. The maximal gain in alignment accuracy was found not when the third sequence is directly intermediate between the initial two sequences, but rather when it perfectly subdivides the branch leading from the root of the tree to one of the original sequences (making it half as close to one sequence as the other). Evolutionary distance estimation in the multiple alignment framework, however, is largely unrelated to alignment accuracy and rather is dependent on the position of the third sequence; the closer the branch leading to the third sequence is to the root of the tree, the larger the estimated distance between the first two sequences. The bias in distance estimation appears to be a direct result of the standard greedy progressive algorithm used by many multiple alignment methods. These results have implications for choosing new taxa and genomes to sequence when resources are limited.

  14. ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach.

    PubMed

    Lyras, Dimitrios P; Metzler, Dirk

    2014-08-07

    Obtaining an accurate sequence alignment is fundamental for consistently analyzing biological data. Although this problem may be efficiently solved when only two sequences are considered, the exact inference of the optimal alignment easily gets computationally intractable for the multiple sequence alignment case. To cope with the high computational expenses, approximate heuristic methods have been proposed that address the problem indirectly by progressively aligning the sequences in pairs according to their relatedness. These methods however are not flexible to change the alignment of an already aligned group of sequences in the view of new data, resulting thus in compromises on the quality of the deriving alignment. In this paper we present ReformAlign, a novel meta-alignment approach that may significantly improve on the quality of the deriving alignments from popular aligners. We call ReformAlign a meta-aligner as it requires an initial alignment, for which a variety of alignment programs can be used. The main idea behind ReformAlign is quite straightforward: at first, an existing alignment is used to construct a standard profile which summarizes the initial alignment and then all sequences are individually re-aligned against the formed profile. From each sequence-profile comparison, the alignment of each sequence against the profile is recorded and the final alignment is indirectly inferred by merging all the individual sub-alignments into a unified set. The employment of ReformAlign may often result in alignments which are significantly more accurate than the starting alignments. We evaluated the effect of ReformAlign on the generated alignments from ten leading alignment methods using real data of variable size and sequence identity. The experimental results suggest that the proposed meta-aligner approach may often lead to statistically significant more accurate alignments. Furthermore, we show that ReformAlign results in more substantial improvement in

  15. Accuracy Estimation and Parameter Advising for Protein Multiple Sequence Alignment

    PubMed Central

    DeBlasio, Dan

    2013-01-01

    Abstract We develop a novel and general approach to estimating the accuracy of multiple sequence alignments without knowledge of a reference alignment, and use our approach to address a new task that we call parameter advising: the problem of choosing values for alignment scoring function parameters from a given set of choices to maximize the accuracy of a computed alignment. For protein alignments, we consider twelve independent features that contribute to a quality alignment. An accuracy estimator is learned that is a polynomial function of these features; its coefficients are determined by minimizing its error with respect to true accuracy using mathematical optimization. Compared to prior approaches for estimating accuracy, our new approach (a) introduces novel feature functions that measure nonlocal properties of an alignment yet are fast to evaluate, (b) considers more general classes of estimators beyond linear combinations of features, and (c) develops new regression formulations for learning an estimator from examples; in addition, for parameter advising, we (d) determine the optimal parameter set of a given cardinality, which specifies the best parameter values from which to choose. Our estimator, which we call Facet (for “feature-based accuracy estimator”), yields a parameter advisor that on the hardest benchmarks provides more than a 27% improvement in accuracy over the best default parameter choice, and for parameter advising significantly outperforms the best prior approaches to assessing alignment quality. PMID:23489379

  16. A simple genetic algorithm for multiple sequence alignment.

    PubMed

    Gondro, C; Kinghorn, B P

    2007-10-05

    Multiple sequence alignment plays an important role in molecular sequence analysis. An alignment is the arrangement of two (pairwise alignment) or more (multiple alignment) sequences of 'residues' (nucleotides or amino acids) that maximizes the similarities between them. Algorithmically, the problem consists of opening and extending gaps in the sequences to maximize an objective function (measurement of similarity). A simple genetic algorithm was developed and implemented in the software MSA-GA. Genetic algorithms, a class of evolutionary algorithms, are well suited for problems of this nature since residues and gaps are discrete units. An evolutionary algorithm cannot compete in terms of speed with progressive alignment methods but it has the advantage of being able to correct for initially misaligned sequences; which is not possible with the progressive method. This was shown using the BaliBase benchmark, where Clustal-W alignments were used to seed the initial population in MSA-GA, improving outcome. Alignment scoring functions still constitute an open field of research, and it is important to develop methods that simplify the testing of new functions. A general evolutionary framework for testing and implementing different scoring functions was developed. The results show that a simple genetic algorithm is capable of optimizing an alignment without the need of the excessively complex operators used in prior study. The clear distinction between objective function and genetic algorithms used in MSA-GA makes extending and/or replacing objective functions a trivial task.

  17. Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment

    PubMed Central

    Kam, Alfred; Kwak, Daniel; Leung, Clarence; Wu, Chu; Zarour, Eleyine; Sarmenta, Luis; Blanchette, Mathieu; Waldispühl, Jérôme

    2012-01-01

    Background Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server. Methodology/Principal Findings We introduce Phylo, a human-based computing framework applying “crowd sourcing” techniques to solve the Multiple Sequence Alignment (MSA) problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered. Conclusions/Significance We demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of “human-brain peta-flops” of computation that are spent every day playing games. Phylo is

  18. Phylo: a citizen science approach for improving multiple sequence alignment.

    PubMed

    Kawrykow, Alexander; Roumanis, Gary; Kam, Alfred; Kwak, Daniel; Leung, Clarence; Wu, Chu; Zarour, Eleyine; Sarmenta, Luis; Blanchette, Mathieu; Waldispühl, Jérôme

    2012-01-01

    Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server. We introduce Phylo, a human-based computing framework applying "crowd sourcing" techniques to solve the Multiple Sequence Alignment (MSA) problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered. We demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of "human-brain peta-flops" of computation that are spent every day playing games. Phylo is available at: http://phylo.cs.mcgill.ca.

  19. Evaluating the Accuracy and Efficiency of Multiple Sequence Alignment Methods

    PubMed Central

    Pervez, Muhammad Tariq; Babar, Masroor Ellahi; Nadeem, Asif; Aslam, Muhammad; Awan, Ali Raza; Aslam, Naeem; Hussain, Tanveer; Naveed, Nasir; Qadri, Salman; Waheed, Usman; Shoaib, Muhammad

    2014-01-01

    A comparison of 10 most popular Multiple Sequence Alignment (MSA) tools, namely, MUSCLE, MAFFT(L-INS-i), MAFFT (FFT-NS-2), T-Coffee, ProbCons, SATe, Clustal Omega, Kalign, Multalin, and Dialign-TX is presented. We also focused on the significance of some implementations embedded in algorithm of each tool. Based on 10 simulated trees of different number of taxa generated by R, 400 known alignments and sequence files were constructed using indel-Seq-Gen. A total of 4000 test alignments were generated to study the effect of sequence length, indel size, deletion rate, and insertion rate. Results showed that alignment quality was highly dependent on the number of deletions and insertions in the sequences and that the sequence length and indel size had a weaker effect. Overall, ProbCons was consistently on the top of list of the evaluated MSA tools. SATe, being little less accurate, was 529.10% faster than ProbCons and 236.72% faster than MAFFT(L-INS-i). Among other tools, Kalign and MUSCLE achieved the highest sum of pairs. We also considered BALiBASE benchmark datasets and the results relative to BAliBASE- and indel-Seq-Gen-generated alignments were consistent in the most cases. PMID:25574120

  20. Inferring phylogenies of evolving sequences without multiple sequence alignment.

    PubMed

    Chan, Cheong Xin; Bernard, Guillaume; Poirion, Olivier; Hogan, James M; Ragan, Mark A

    2014-09-30

    Alignment-free methods, in which shared properties of sub-sequences (e.g. identity or match length) are extracted and used to compute a distance matrix, have recently been explored for phylogenetic inference. However, the scalability and robustness of these methods to key evolutionary processes remain to be investigated. Here, using simulated sequence sets of various sizes in both nucleotides and amino acids, we systematically assess the accuracy of phylogenetic inference using an alignment-free approach, based on D2 statistics, under different evolutionary scenarios. We find that compared to a multiple sequence alignment approach, D2 methods are more robust against among-site rate heterogeneity, compositional biases, genetic rearrangements and insertions/deletions, but are more sensitive to recent sequence divergence and sequence truncation. Across diverse empirical datasets, the alignment-free methods perform well for sequences sharing low divergence, at greater computation speed. Our findings provide strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics.

  1. Inferring phylogenies of evolving sequences without multiple sequence alignment

    PubMed Central

    Chan, Cheong Xin; Bernard, Guillaume; Poirion, Olivier; Hogan, James M.; Ragan, Mark A.

    2014-01-01

    Alignment-free methods, in which shared properties of sub-sequences (e.g. identity or match length) are extracted and used to compute a distance matrix, have recently been explored for phylogenetic inference. However, the scalability and robustness of these methods to key evolutionary processes remain to be investigated. Here, using simulated sequence sets of various sizes in both nucleotides and amino acids, we systematically assess the accuracy of phylogenetic inference using an alignment-free approach, based on D2 statistics, under different evolutionary scenarios. We find that compared to a multiple sequence alignment approach, D2 methods are more robust against among-site rate heterogeneity, compositional biases, genetic rearrangements and insertions/deletions, but are more sensitive to recent sequence divergence and sequence truncation. Across diverse empirical datasets, the alignment-free methods perform well for sequences sharing low divergence, at greater computation speed. Our findings provide strong evidence for the scalability and the potential use of alignment-free methods in large-scale phylogenomics. PMID:25266120

  2. Improved multiple sequence alignments using coupled pattern mining.

    PubMed

    Hossain, K S M Tozammel; Patnaik, Debprakash; Laxman, Srivatsan; Jain, Prateek; Bailey-Kellogg, Chris; Ramakrishnan, Naren

    2013-01-01

    We present alignment refinement by mining coupled residues (ARMiCoRe), a novel approach to a classical bioinformatics problem, viz., multiple sequence alignment (MSA) of gene and protein sequences. Aligning multiple biological sequences is a key step in elucidating evolutionary relationships, annotating newly sequenced segments, and understanding the relationship between biological sequences and functions. Classical MSA algorithms are designed to primarily capture conservations in sequences whereas couplings, or correlated mutations, are well known as an additional important aspect of sequence evolution. (Two sequence positions are coupled when mutations in one are accompanied by compensatory mutations in another). As a result, better exposition of couplings is sometimes one of the reasons for hand-tweaking of MSAs by practitioners. ARMiCoRe introduces a distinctly pattern mining approach to improving MSAs: using frequent episode mining as a foundational basis, we define the notion of a coupled pattern and demonstrate how the discovery and tiling of coupled patterns using a max-flow approach can yield MSAs that are better than conservation-based alignments. Although we were motivated to improve MSAs for the sake of better exposing couplings, we demonstrate that our MSAs are also improvements in terms of traditional metrics of assessment. We demonstrate the effectiveness of ARMiCoRe on a large collection of data sets.

  3. RNA-RNA interaction prediction based on multiple sequence alignments.

    PubMed

    Li, Andrew X; Marz, Manja; Qin, Jing; Reidys, Christian M

    2011-02-15

    Many computerized methods for RNA-RNA interaction structure prediction have been developed. Recently, O(N(6)) time and O(N(4)) space dynamic programming algorithms have become available that compute the partition function of RNA-RNA interaction complexes. However, few of these methods incorporate the knowledge concerning related sequences, thus relevant evolutionary information is often neglected from the structure determination. Therefore, it is of considerable practical interest to introduce a method taking into consideration both: thermodynamic stability as well as sequence/structure covariation. We present the a priori folding algorithm ripalign, whose input consists of two (given) multiple sequence alignments (MSA). ripalign outputs (i) the partition function, (ii) base pairing probabilities, (iii) hybrid probabilities and (iv) a set of Boltzmann-sampled suboptimal structures consisting of canonical joint structures that are compatible to the alignments. Compared to the single sequence-pair folding algorithm rip, ripalign requires negligible additional memory resource but offers much better sensitivity and specificity, once alignments of suitable quality are given. ripalign additionally allows to incorporate structure constraints as input parameters. The algorithm described here is implemented in C as part of the rip package.

  4. Score distributions of gapped multiple sequence alignments down to the low-probability tail

    NASA Astrophysics Data System (ADS)

    Fieth, Pascal; Hartmann, Alexander K.

    2016-08-01

    Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.

  5. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

    PubMed Central

    Yen, Ian E. H.; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

    2016-01-01

    Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems. PMID:27559428

  6. The Construction and Use of Log-Odds Substitution Scores for Multiple Sequence Alignment

    PubMed Central

    Altschul, Stephen F.; Wootton, John C.; Zaslavsky, Elena; Yu, Yi-Kuo

    2010-01-01

    Most pairwise and multiple sequence alignment programs seek alignments with optimal scores. Central to defining such scores is selecting a set of substitution scores for aligned amino acids or nucleotides. For local pairwise alignment, substitution scores are implicitly of log-odds form. We now extend the log-odds formalism to multiple alignments, using Bayesian methods to construct “BILD” (“Bayesian Integral Log-odds”) substitution scores from prior distributions describing columns of related letters. This approach has been used previously only to define scores for aligning individual sequences to sequence profiles, but it has much broader applicability. We describe how to calculate BILD scores efficiently, and illustrate their uses in Gibbs sampling optimization procedures, gapped alignment, and the construction of hidden Markov model profiles. BILD scores enable automated selection of optimal motif and domain model widths, and can inform the decision of whether to include a sequence in a multiple alignment, and the selection of insertion and deletion locations. Other applications include the classification of related sequences into subfamilies, and the definition of profile-profile alignment scores. Although a fully realized multiple alignment program must rely upon more than substitution scores, many existing multiple alignment programs can be modified to employ BILD scores. We illustrate how simple BILD score based strategies can enhance the recognition of DNA binding domains, including the Api-AP2 domain in Toxoplasma gondii and Plasmodium falciparum. PMID:20657661

  7. The construction and use of log-odds substitution scores for multiple sequence alignment.

    PubMed

    Altschul, Stephen F; Wootton, John C; Zaslavsky, Elena; Yu, Yi-Kuo

    2010-07-15

    Most pairwise and multiple sequence alignment programs seek alignments with optimal scores. Central to defining such scores is selecting a set of substitution scores for aligned amino acids or nucleotides. For local pairwise alignment, substitution scores are implicitly of log-odds form. We now extend the log-odds formalism to multiple alignments, using Bayesian methods to construct "BILD" ("Bayesian Integral Log-odds") substitution scores from prior distributions describing columns of related letters. This approach has been used previously only to define scores for aligning individual sequences to sequence profiles, but it has much broader applicability. We describe how to calculate BILD scores efficiently, and illustrate their uses in Gibbs sampling optimization procedures, gapped alignment, and the construction of hidden Markov model profiles. BILD scores enable automated selection of optimal motif and domain model widths, and can inform the decision of whether to include a sequence in a multiple alignment, and the selection of insertion and deletion locations. Other applications include the classification of related sequences into subfamilies, and the definition of profile-profile alignment scores. Although a fully realized multiple alignment program must rely upon more than substitution scores, many existing multiple alignment programs can be modified to employ BILD scores. We illustrate how simple BILD score based strategies can enhance the recognition of DNA binding domains, including the Api-AP2 domain in Toxoplasma gondii and Plasmodium falciparum.

  8. TM-Aligner: Multiple sequence alignment tool for transmembrane proteins with reduced time and improved accuracy.

    PubMed

    Bhat, Basharat; Ganai, Nazir A; Andrabi, Syed Mudasir; Shah, Riaz A; Singh, Ashutosh

    2017-10-02

    Membrane proteins plays significant role in living cells. Transmembrane proteins are estimated to constitute approximately 30% of proteins at genomic scale. It has been a difficult task to develop specific alignment tools for transmembrane proteins due to limited number of experimentally validated protein structures. Alignment tools based on homology modeling provide fairly good result by recapitulating 70-80% residues in reference alignment provided all input sequences should have known template structures. However, homology modeling tools took substantial amount of time, thus aligning large numbers of sequences becomes computationally demanding. Here we present TM-Aligner, a new tool for transmembrane protein sequence alignment. TM-Aligner is based on Wu-Manber and dynamic string matching algorithm which has significantly improved its accuracy and speed of multiple sequence alignment. We compared TM-Aligner with prevailing other popular tools and performed benchmarking using three separate reference sets, BaliBASE3.0 reference set7 of alpha-helical transmembrane proteins, structure based alignment of transmembrane proteins from Pfam database and structure alignment from GPCRDB. Benchmarking against reference datasets indicated that TM-Aligner is more advanced method having least turnaround time with significant improvements over the most accurate methods such as PROMALS, MAFFT, TM-Coffee, Kalign, ClustalW, Muscle and PRALINE. TM-Aligner is freely available through http://lms.snu.edu.in/TM-Aligner/ .

  9. A simple method to control over-alignment in the MAFFT multiple sequence alignment program.

    PubMed

    Katoh, Kazutaka; Standley, Daron M

    2016-07-01

    We present a new feature of the MAFFT multiple alignment program for suppressing over-alignment (aligning unrelated segments). Conventional MAFFT is highly sensitive in aligning conserved regions in remote homologs, but the risk of over-alignment is recently becoming greater, as low-quality or noisy sequences are increasing in protein sequence databases, due, for example, to sequencing errors and difficulty in gene prediction. The proposed method utilizes a variable scoring matrix for different pairs of sequences (or groups) in a single multiple sequence alignment, based on the global similarity of each pair. This method significantly increases the correctly gapped sites in real examples and in simulations under various conditions. Regarding sensitivity, the effect of the proposed method is slightly negative in real protein-based benchmarks, and mostly neutral in simulation-based benchmarks. This approach is based on natural biological reasoning and should be compatible with many methods based on dynamic programming for multiple sequence alignment. The new feature is available in MAFFT versions 7.263 and higher. http://mafft.cbrc.jp/alignment/software/ katoh@ifrec.osaka-u.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  10. Lower bounds on multiple sequence alignment using exact 3-way alignment.

    PubMed

    Colbourn, Charles J; Kumar, Sudhir

    2007-04-30

    Multiple sequence alignment is fundamental. Exponential growth in computation time appears to be inevitable when an optimal alignment is required for many sequences. Exact costs of optimum alignments are therefore rarely computed. Consequently much effort has been invested in algorithms for alignment that are heuristic, or explore a restricted class of solutions. These give an upper bound on the alignment cost, but it is equally important to determine the quality of the solution obtained. In the absence of an optimal alignment with which to compare, lower bounds may be calculated to assess the quality of the alignment. As more effort is invested in improving upper bounds (alignment algorithms), it is therefore important to improve lower bounds as well. Although numerous cost metrics can be used to determine the quality of an alignment, many are based on sum-of-pairs (SP) measures and their generalizations. Two standard and two new methods are considered for using exact 2-way and 3-way alignments to compute lower bounds on total SP alignment cost; one new method fares well with respect to accuracy, while the other reduces the computation time. The first employs exhaustive computation of exact 3-way alignments, while the second employs an efficient heuristic to compute a much smaller number of exact 3-way alignments. Calculating all 3-way alignments exactly and computing their average improves lower bounds on sum of SP cost in v-way alignments. However judicious selection of a subset of all 3-way alignments can yield a further improvement with minimal additional effort. On the other hand, a simple heuristic to select a random subset of 3-way alignments (a random packing) yields accuracy comparable to averaging all 3-way alignments with substantially less computational effort. Calculation of lower bounds on SP cost (and thus the quality of an alignment) can be improved by employing a mixture of 3-way and 2-way alignments.

  11. webPRANK: a phylogeny-aware multiple sequence aligner with interactive alignment browser.

    PubMed

    Löytynoja, Ari; Goldman, Nick

    2010-11-26

    Phylogeny-aware progressive alignment has been found to perform well in phylogenetic alignment benchmarks and to produce superior alignments for the inference of selection on codon sequences. Its implementation in the PRANK alignment program package also allows modelling of complex evolutionary processes and inference of posterior probabilities for sequence sites evolving under each distinct scenario, either simultaneously with the alignment of sequences or as a post-processing step for an existing alignment. This has led to software with many advanced features, and users may find it difficult to generate optimal alignments, visualise the full information in their alignment results, or post-process these results, e.g. by objectively selecting subsets of alignment sites. We have created a web server called webPRANK that provides an easy-to-use interface to the PRANK phylogeny-aware alignment algorithm. The webPRANK server supports the alignment of DNA, protein and codon sequences as well as protein-translated alignment of cDNAs, and includes built-in structure models for the alignment of genomic sequences. The resulting alignments can be exported in various formats widely used in evolutionary sequence analyses. The webPRANK server also includes a powerful web-based alignment browser for the visualisation and post-processing of the results in the context of a cladogram relating the sequences, allowing (e.g.) removal of alignment columns with low posterior reliability. In addition to de novo alignments, webPRANK can be used for the inference of ancestral sequences with phylogenetically realistic gap patterns, and for the annotation and post-processing of existing alignments. The webPRANK server is freely available on the web at http://tinyurl.com/webprank . The webPRANK server incorporates phylogeny-aware multiple sequence alignment, visualisation and post-processing in an easy-to-use web interface. It widens the user base of phylogeny-aware multiple sequence

  12. M2Align: parallel multiple sequence alignment with a multi-objective metaheuristic.

    PubMed

    Zambrano-Vega, Cristian; Nebro, Antonio J; García-Nieto, José; Aldana-Montes, José F

    2017-10-01

    Multiple sequence alignment (MSA) is an NP-complete optimization problem found in computational biology, where the time complexity of finding an optimal alignment raises exponentially along with the number of sequences and their lengths. Additionally, to assess the quality of a MSA, a number of objectives can be taken into account, such as maximizing the sum-of-pairs, maximizing the totally conserved columns, minimizing the number of gaps, or maximizing structural information based scores such as STRIKE. An approach to deal with MSA problems is to use multi-objective metaheuristics, which are non-exact stochastic optimization methods that can produce high quality solutions to complex problems having two or more objectives to be optimized at the same time. Our motivation is to provide a multi-objective metaheuristic for MSA that can run in parallel taking advantage of multi-core-based computers. The software tool we propose, called M2Align (Multi-objective Multiple Sequence Alignment), is a parallel and more efficient version of the three-objective optimizer for sequence alignments MO-SAStrE, able of reducing the algorithm computing time by exploiting the computing capabilities of common multi-core CPU clusters. Our performance evaluation over datasets of the benchmark BAliBASE (v3.0) shows that significant time reductions can be achieved by using up to 20 cores. Even in sequential executions, M2Align is faster than MO-SAStrE, thanks to the encoding method used for the alignments. M2Align is an open source project hosted in GitHub, where the source code and sample datasets can be freely obtained: https://github.com/KhaosResearch/M2Align. antonio@lcc.uma.es. Supplementary data are available at Bioinformatics online.

  13. An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm.

    PubMed

    Kumar, Manish

    2015-01-01

    One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the population evolution and quality of the sequence aligned. The proposed method is assessed with protein benchmark dataset, e.g., BALIBASE, by comparing the obtained results to those obtained with other alignment algorithms, e.g., SAGA, RBT-GA, PRRP, HMMT, SB-PIMA, CLUSTALX, CLUSTAL W, DIALIGN and PILEUP8 etc. Experiments on a wide range of data have shown that the proposed algorithm is much better (it terms of score) than previously proposed algorithms in its ability to achieve high alignment quality.

  14. An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm

    PubMed Central

    Kumar, Manish

    2015-01-01

    One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the population evolution and quality of the sequence aligned. The proposed method is assessed with protein benchmark dataset, e.g., BALIBASE, by comparing the obtained results to those obtained with other alignment algorithms, e.g., SAGA, RBT-GA, PRRP, HMMT, SB-PIMA, CLUSTALX, CLUSTAL W, DIALIGN and PILEUP8 etc. Experiments on a wide range of data have shown that the proposed algorithm is much better (it terms of score) than previously proposed algorithms in its ability to achieve high alignment quality. PMID:27065770

  15. A new greedy randomised adaptive search procedure for Multiple Sequence Alignment.

    PubMed

    Layeb, Abdesslem; Selmane, Marwa; Elhoucine, Maroua Bencheikh

    2013-01-01

    The Multiple Sequence Alignment (MSA) is one of the most challenging tasks in bioinformatics. It consists of aligning several sequences to show the fundamental relationship and the common characteristics between a set of protein or nucleic sequences; this problem has been shown to be NP-complete if the number of sequences is >2. In this paper, a new incomplete algorithm based on a Greedy Randomised Adaptive Search Procedure (GRASP) is presented to deal with the MSA problem. The first GRASP's phase is a new greedy algorithm based on the application of a new random progressive method and a hybrid global/local algorithm. The second phase is an adaptive refinement method based on consensus alignment. The obtained results are very encouraging and show the feasibility and effectiveness of the proposed approach.

  16. MSA-PAD: DNA multiple sequence alignment framework based on PFAM accessed domain information.

    PubMed

    Balech, Bachir; Vicario, Saverio; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

    2015-08-01

    Here we present the MSA-PAD application, a DNA multiple sequence alignment framework that uses PFAM protein domain information to align DNA sequences encoding either single or multiple protein domains. MSA-PAD has two alignment options: gene and genome mode.

  17. IP-MSA: Independent order of progressive multiple sequence alignments using different substitution matrices

    NASA Astrophysics Data System (ADS)

    Boraik, Aziz Nasser; Abdullah, Rosni; Venkat, Ibrahim

    2014-12-01

    Multiple sequence alignment (MSA) is an essential process for many biological sequence analyses. There are many algorithms developed to solve MSA, but an efficient computation method with very high accuracy is still a challenge. Progressive alignment is the most widely used approach to compute the final MSA. In this paper, we present a simple and effective progressive approach. Based on the independent order of sequences progressive alignment which proposed in QOMA, this method has been modified to align the whole sequences to maximize the score of MSA. Moreover, in order to further improve the accuracy of the method, we estimate the similarity of any pair of input sequences by using their percent identity, and based on this measure, we choose different substitution matrices during the progressive alignment. In addition, we have included horizontal information to alignment by adjusting the weights of amino acid residues based on their neighboring residues. The experimental results have been tested on popular benchmark of global protein sequences BAliBASE 3.0 and local protein sequences IRMBASE 2.0. The results of the proposed approach outperform the original method in QOMA in terms of sum-of-pair score and column score by up to 14% and 7% respectively.

  18. Multiple sequence alignment in HTML: colored, possibly hyperlinked, compact representations.

    PubMed

    Campagne, F; Maigret, B

    1998-02-01

    Protein sequence alignments are widely used in protein structure prediction, protein engineering, modeling of proteins, etc. This type of representation is useful at different stages of scientific activity: looking at previous results, working on a research project, and presenting the results. There is a need to make it available through a network (intranet or WWW), in a way that allows biologists, chemists, and noncomputer specialists to look at the data and carry on research--possibly in a collaborative research. Previous methods (text-based, Java-based) are reported and their advantages are discussed. We have developed two novel approaches to represent the alignments as colored, hyper-linked HTML pages. The first method creates an HTML page that uses efficiently the image cache mechanism of a WWW browser, thereby allowing the user to browse different alignments without waiting for the images to be loaded through the network, but only for the first viewed alignment. The generated pages can be browsed with any HTML2.0-compliant browser. The second method that we propose uses W3C-CSS1-style sheets to render alignments. This new method generates pages that require recent browsers to be viewed. We implemented these methods in the Viseur program and made a WWW service available that allows a user to convert an MSF alignment file in HTML for WWW publishing. The latter service is available at http:@www.lctn.u-nancy.fr/viseur/services.htm l.

  19. MSACompro: protein multiple sequence alignment using predicted secondary structure, solvent accessibility, and residue-residue contacts.

    PubMed

    Deng, Xin; Cheng, Jianlin

    2011-12-14

    Multiple Sequence Alignment (MSA) is a basic tool for bioinformatics research and analysis. It has been used essentially in almost all bioinformatics tasks such as protein structure modeling, gene and protein function prediction, DNA motif recognition, and phylogenetic analysis. Therefore, improving the accuracy of multiple sequence alignment is important for advancing many bioinformatics fields. We designed and developed a new method, MSACompro, to synergistically incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into the currently most accurate posterior probability-based MSA methods to improve the accuracy of multiple sequence alignments. The method is different from the multiple sequence alignment methods (e.g. 3D-Coffee) that use the tertiary structure information of some sequences since the structural information of our method is fully predicted from sequences. To the best of our knowledge, applying predicted relative solvent accessibility and contact map to multiple sequence alignment is novel. The rigorous benchmarking of our method to the standard benchmarks (i.e. BAliBASE, SABmark and OXBENCH) clearly demonstrated that incorporating predicted protein structural information improves the multiple sequence alignment accuracy over the leading multiple protein sequence alignment tools without using this information, such as MSAProbs, ProbCons, Probalign, T-coffee, MAFFT and MUSCLE. And the performance of the method is comparable to the state-of-the-art method PROMALS of using structural features and additional homologous sequences by slightly lower scores. MSACompro is an efficient and reliable multiple protein sequence alignment tool that can effectively incorporate predicted protein structural information into multiple sequence alignment. The software is available at http://sysbio.rnet.missouri.edu/multicom_toolbox/.

  20. A review on multiple sequence alignment from the perspective of genetic algorithm.

    PubMed

    Chowdhury, Biswanath; Garai, Gautam

    2017-06-29

    Sequence alignment is an active research area in the field of bioinformatics. It is also a crucial task as it guides many other tasks like phylogenetic analysis, function, and/or structure prediction of biological macromolecules like DNA, RNA, and Protein. Proteins are the building blocks of every living organism. Although protein alignment problem has been studied for several decades, unfortunately, every available method produces alignment results differently for a single alignment problem. Multiple sequence alignment is characterized as a very high computational complex problem. Many stochastic methods, therefore, are considered for improving the accuracy of alignment. Among them, many researchers frequently use Genetic Algorithm. In this study, we have shown different types of the method applied in alignment and the recent trends in the multiobjective genetic algorithm for solving multiple sequence alignment. Many recent studies have demonstrated considerable progress in finding the alignment accuracy. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Binary particle swarm optimization algorithm with mutation for multiple sequence alignment.

    PubMed

    Long, Hai-Xia; Xu, Wen-Bo; Sun, Jun

    2009-01-01

    Multiple sequence alignment (MSA) is a fundamental and challenging problem in the analysis of biologic sequence. The MSA problem is hard to be solved directly, for it always results in exponential complexity with the scale of the problem. In this paper, we propose mutation-based binary particle swarm optimization (M-BPSO) for MSA solving. In the proposed M-BPSO algorithm, BPSO algorithm is conducted to provide alignments. Thereafter, mutation operator is performed to move out of local optima and speed up convergence. From simulation results of nucleic acid and amino acid sequences, it is shown that the proposed M-BPSO algorithm has superior performance when compared to other existing algorithms. Furthermore, this algorithm can be used quickly and efficiently for smaller and medium size sequences.

  2. A novel approach to multiple sequence alignment using hadoop data grids.

    PubMed

    Sudha Sadasivam, G; Baktavatchalam, G

    2010-01-01

    Multiple alignment of protein sequences helps to determine evolutionary linkage and to predict molecular structures. The factors to be considered while aligning multiple sequences are speed and accuracy of alignment. Although dynamic programming algorithms produce accurate alignments, they are computation intensive. In this paper we propose a time efficient approach to sequence alignment that also produces quality alignment. The dynamic nature of the algorithm coupled with data and computational parallelism of hadoop data grids improves the accuracy and speed of sequence alignment. The principle of block splitting in hadoop coupled with its scalability facilitates alignment of very large sequences.

  3. The impact of single substitutions on multiple sequence alignments.

    PubMed

    Klaere, Steffen; Gesell, Tanja; von Haeseler, Arndt

    2008-12-27

    We introduce another view of sequence evolution. Contrary to other approaches, we model the substitution process in two steps. First we assume (arbitrary) scaled branch lengths on a given phylogenetic tree. Second we allocate a Poisson distributed number of substitutions on the branches. The probability to place a mutation on a branch is proportional to its relative branch length. More importantly, the action of a single mutation on an alignment column is described by a doubly stochastic matrix, the so-called one-step mutation matrix. This matrix leads to analytical formulae for the posterior probability distribution of the number of substitutions for an alignment column.

  4. Multiple sequence alignment with affine gap by using multi-objective genetic algorithm.

    PubMed

    Kaya, Mehmet; Sarhan, Abdullah; Alhajj, Reda

    2014-04-01

    Multiple sequence alignment is of central importance to bioinformatics and computational biology. Although a large number of algorithms for computing a multiple sequence alignment have been designed, the efficient computation of highly accurate and statistically significant multiple alignments is still a challenge. In this paper, we propose an efficient method by using multi-objective genetic algorithm (MSAGMOGA) to discover optimal alignments with affine gap in multiple sequence data. The main advantage of our approach is that a large number of tradeoff (i.e., non-dominated) alignments can be obtained by a single run with respect to conflicting objectives: affine gap penalty minimization and similarity and support maximization. To the best of our knowledge, this is the first effort with three objectives in this direction. The proposed method can be applied to any data set with a sequential character. Furthermore, it allows any choice of similarity measures for finding alignments. By analyzing the obtained optimal alignments, the decision maker can understand the tradeoff between the objectives. We compared our method with the three well-known multiple sequence alignment methods, MUSCLE, SAGA and MSA-GA. As the first of them is a progressive method, and the other two are based on evolutionary algorithms. Experiments on the BAliBASE 2.0 database were conducted and the results confirm that MSAGMOGA obtains the results with better accuracy statistical significance compared with the three well-known methods in aligning multiple sequence alignment with affine gap. The proposed method also finds solutions faster than the other evolutionary approaches mentioned above. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.

  5. A Comprehensive Benchmark Study of Multiple Sequence Alignment Methods: Current Challenges and Future Perspectives

    PubMed Central

    Thompson, Julie D.; Linard, Benjamin; Lecompte, Odile; Poch, Olivier

    2011-01-01

    Multiple comparison or alignmentof protein sequences has become a fundamental tool in many different domains in modern molecular biology, from evolutionary studies to prediction of 2D/3D structure, molecular function and inter-molecular interactions etc. By placing the sequence in the framework of the overall family, multiple alignments can be used to identify conserved features and to highlight differences or specificities. In this paper, we describe a comprehensive evaluation of many of the most popular methods for multiple sequence alignment (MSA), based on a new benchmark test set. The benchmark is designed to represent typical problems encountered when aligning the large protein sequence sets that result from today's high throughput biotechnologies. We show that alignmentmethods have significantly progressed and can now identify most of the shared sequence features that determine the broad molecular function(s) of a protein family, even for divergent sequences. However,we have identified a number of important challenges. First, the locally conserved regions, that reflect functional specificities or that modulate a protein's function in a given cellular context,are less well aligned. Second, motifs in natively disordered regions are often misaligned. Third, the badly predicted or fragmentary protein sequences, which make up a large proportion of today's databases, lead to a significant number of alignment errors. Based on this study, we demonstrate that the existing MSA methods can be exploited in combination to improve alignment accuracy, although novel approaches will still be needed to fully explore the most difficult regions. We then propose knowledge-enabled, dynamic solutions that will hopefully pave the way to enhanced alignment construction and exploitation in future evolutionary systems biology studies. PMID:21483869

  6. ABC: software for interactive browsing of genomic multiple sequence alignment data.

    PubMed

    Cooper, Gregory M; Singaravelu, Senthil A G; Sidow, Arend

    2004-12-08

    Alignment and comparison of related genome sequences is a powerful method to identify regions likely to contain functional elements. Such analyses are data intensive, requiring the inclusion of genomic multiple sequence alignments, sequence annotations, and scores describing regional attributes of columns in the alignment. Visualization and browsing of results can be difficult, and there are currently limited software options for performing this task. The Application for Browsing Constraints (ABC) is interactive Java software for intuitive and efficient exploration of multiple sequence alignments and data typically associated with alignments. It is used to move quickly from a summary view of the entire alignment via arbitrary levels of resolution to individual alignment columns. It allows for the simultaneous display of quantitative data, (e.g., sequence similarity or evolutionary rates) and annotation data (e.g. the locations of genes, repeats, and constrained elements). It can be used to facilitate basic comparative sequence tasks, such as export of data in plain-text formats, visualization of phylogenetic trees, and generation of alignment summary graphics. The ABC is a lightweight, stand-alone, and flexible graphical user interface for browsing genomic multiple sequence alignments of specific loci, up to hundreds of kilobases or a few megabases in length. It is coded in Java for cross-platform use and the program and source code are freely available under the General Public License. Documentation and a sample data set are also available http://mendel.stanford.edu/sidowlab/downloads.html.

  7. PASTA: Ultra-Large Multiple Sequence Alignment for Nucleotide and Amino-Acid Sequences.

    PubMed

    Mirarab, Siavash; Nguyen, Nam; Guo, Sheng; Wang, Li-San; Kim, Junhyong; Warnow, Tandy

    2015-05-01

    We introduce PASTA, a new multiple sequence alignment algorithm. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy and scalability of the leading alignment methods (including SATé). We also show that trees estimated on PASTA alignments are highly accurate--slightly better than SATé trees, but with substantial improvements relative to other methods. Finally, PASTA is faster than SATé, highly parallelizable, and requires relatively little memory.

  8. MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

    PubMed

    Katoh, Kazutaka; Standley, Daron M

    2013-04-01

    We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.

  9. Fuzzy Hidden Markov Models: a new approach in multiple sequence alignment.

    PubMed

    Collyda, Chrysa; Diplaris, Sotiris; Mitkas, Pericles A; Maglaveras, Nicos; Pappas, Costas

    2006-01-01

    This paper proposes a novel method for aligning multiple genomic or proteomic sequences using a fuzzyfied Hidden Markov Model (HMM). HMMs are known to provide compelling performance among multiple sequence alignment (MSA) algorithms, yet their stochastic nature does not help them cope with the existing dependence among the sequence elements. Fuzzy HMMs are a novel type of HMMs based on fuzzy sets and fuzzy integrals which generalizes the classical stochastic HMM, by relaxing its independence assumptions. In this paper, the fuzzy HMM model for MSA is mathematically defined. New fuzzy algorithms are described for building and training fuzzy HMMs, as well as for their use in aligning multiple sequences. Fuzzy HMMs can also increase the model capability of aligning multiple sequences mainly in terms of computation time. Modeling the multiple sequence alignment procedure with fuzzy HMMs can yield a robust and time-effective solution that can be widely used in bioinformatics in various applications, such as protein classification, phylogenetic analysis and gene prediction, among others.

  10. KISSa: a strategy to build multiple sequence alignments from pairwise comparisons of very closely related sequences.

    PubMed

    Marass, Francesco; Upton, Chris

    2009-05-20

    The volume of viral genomic sequence data continues to increase rapidly. This is especially true for the smaller RNA viruses, which are relatively easy to sequence in large numbers. The data volumes cause a number of significant problems for research applications that require large multiple alignments of essentially complete genomes, which are of the order of 10 kb. We present a simple strategy to enable the creation of large quasi-multiple sequence alignments from pairwise alignment data. This process is suitable for large, closely related sequences such as the polyproteins of dengue viruses, which need the insertion of very few indels. The quasi-multiple sequence alignments generated by KISSa are sufficiently accurate to support tree-based genome selection for interactive bioinformatics analysis tools. The speed of this process is critical to providing an interactive experience for the user.

  11. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega

    PubMed Central

    Sievers, Fabian; Wilm, Andreas; Dineen, David; Gibson, Toby J; Karplus, Kevin; Li, Weizhong; Lopez, Rodrigo; McWilliam, Hamish; Remmert, Michael; Söding, Johannes; Thompson, Julie D; Higgins, Desmond G

    2011-01-01

    Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam. PMID:21988835

  12. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega.

    PubMed

    Sievers, Fabian; Wilm, Andreas; Dineen, David; Gibson, Toby J; Karplus, Kevin; Li, Weizhong; Lopez, Rodrigo; McWilliam, Hamish; Remmert, Michael; Söding, Johannes; Thompson, Julie D; Higgins, Desmond G

    2011-10-11

    Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

  13. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    PubMed

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  14. FASconCAT-G: extensive functions for multiple sequence alignment preparations concerning phylogenetic studies.

    PubMed

    Kück, Patrick; Longo, Gary C

    2014-01-01

    Phylogenetic and population genetic studies often deal with multiple sequence alignments that require manipulation or processing steps such as sequence concatenation, sequence renaming, sequence translation or consensus sequence generation. In recent years phylogenetic data sets have expanded from single genes to genome wide markers comprising hundreds to thousands of loci. Processing of these large phylogenomic data sets is impracticable without using automated process pipelines. Currently no stand-alone or pipeline compatible program exists that offers a broad range of manipulation and processing steps for multiple sequence alignments in a single process run. Here we present FASconCAT-G, a system independent editor, which offers various processing options for multiple sequence alignments. The software provides a wide range of possibilities to edit and concatenate multiple nucleotide, amino acid, and structure sequence alignment files for phylogenetic and population genetic purposes. The main options include sequence renaming, file format conversion, sequence translation between nucleotide and amino acid states, consensus generation of specific sequence blocks, sequence concatenation, model selection of amino acid replacement with ProtTest, two types of RY coding as well as site exclusions and extraction of parsimony informative sites. Convieniently, most options can be invoked in combination and performed during a single process run. Additionally, FASconCAT-G prints useful information regarding alignment characteristics and editing processes such as base compositions of single in- and outfiles, sequence areas in a concatenated supermatrix, as well as paired stem and loop regions in secondary structure sequence strings. FASconCAT-G is a command-line driven Perl program that delivers computationally fast and user-friendly processing of multiple sequence alignments for phylogenetic and population genetic applications and is well suited for incorporation into

  15. Color and graphic display (CGD): programs for multiple sequence alignment analysis in spreadsheet software.

    PubMed

    Delamarche, C

    2000-07-01

    Interpretation of multiple sequence alignments is of major interest for the prediction of functional and structural domains in proteins or for the organization of related sequences in families and subfamilies. However, a necessity for the bench scientist is the use of outstanding programs in a friendly computing environment. This paper describes Color and Graphic Display (CGD), a set of modules that runs as part of the Microsoft Excel spreadsheet to color and analyze multiple sequence alignments. Discussed here are the main functions of CGD and the use of the program to highlight residues of importance in a water channel family. Although CGD was created for protein sequences, most of the modules are compatible with DNA sequences.

  16. Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction.

    PubMed

    Chu, Wei; Ghahramani, Zoubin; Podtelezhnikov, Alexei; Wild, David L

    2006-01-01

    In this paper, we develop a segmental semi-Markov model (SSMM) for protein secondary structure prediction which incorporates multiple sequence alignment profiles with the purpose of improving the predictive performance. The segmental model is a generalization of the hidden Markov model where a hidden state generates segments of various length and secondary structure type. A novel parameterized model is proposed for the likelihood function that explicitly represents multiple sequence alignment profiles to capture the segmental conformation. Numerical results on benchmark data sets show that incorporating the profiles results in substantial improvements and the generalization performance is promising. By incorporating the information from long range interactions in beta-sheets, this model is also capable of carrying out inference on contact maps. This is an important advantage of probabilistic generative models over the traditional discriminative approach to protein secondary structure prediction. The Web server of our algorithm and supplementary materials are available at http://public.kgi.edu/-wild/bsm.html.

  17. Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment.

    PubMed

    Iantorno, Stefano; Gori, Kevin; Goldman, Nick; Gil, Manuel; Dessimoz, Christophe

    2014-01-01

    Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies-based on simulation, consistency, protein structure, and phylogeny-and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application-with a keen awareness of the assumptions underlying each benchmarking strategy.

  18. Greene SCPrimer: a rapid comprehensive tool for designing degenerate primers from multiple sequence alignments

    PubMed Central

    Jabado, Omar J.; Palacios, Gustavo; Kapoor, Vishal; Hui, Jeffrey; Renwick, Neil; Zhai, Junhui; Briese, Thomas; Lipkin, W. Ian

    2006-01-01

    Polymerase chain reaction (PCR) is widely applied in clinical and environmental microbiology. Primer design is key to the development of successful assays and is often performed manually by using multiple nucleic acid alignments. Few public software tools exist that allow comprehensive design of degenerate primers for large groups of related targets based on complex multiple sequence alignments. Here we present a method for designing such primers based on tree building followed by application of a set covering algorithm, and demonstrate its utility in compiling Multiplex PCR primer panels for detection and differentiation of viral pathogens. PMID:17135211

  19. RBT-L: a location based approach for solving the Multiple Sequence Alignment problem.

    PubMed

    Taheri, Javid; Zomaya, Albert Y

    2010-01-01

    This paper presents a novel approach to solve the Multiple Sequence Alignment (MSA) problem. The Rubber Band Technique: Location Base (RBT-L) introduced in this paper, is inspired by the elastic behaviour of a Rubber Band (RB) on a plate with poles. RBT-L is an iterative optimisation algorithm designed and implemented to find the optimal alignment for a set of input protein sequences. RBT-L is tested with one of the well-known benchmarks (BALiBASE 2.0) in this field. The obtained results show the superiority of the proposed technique even in the case of formidable sequences.

  20. AlexSys: a knowledge-based expert system for multiple sequence alignment construction and analysis.

    PubMed

    Aniba, Mohamed Radhouene; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn

    2010-10-01

    Multiple sequence alignment (MSA) is a cornerstone of modern molecular biology and represents a unique means of investigating the patterns of conservation and diversity in complex biological systems. Many different algorithms have been developed to construct MSAs, but previous studies have shown that no single aligner consistently outperforms the rest. This has led to the development of a number of 'meta-methods' that systematically run several aligners and merge the output into one single solution. Although these methods generally produce more accurate alignments, they are inefficient because all the aligners need to be run first and the choice of the best solution is made a posteriori. Here, we describe the development of a new expert system, AlexSys, for the multiple alignment of protein sequences. AlexSys incorporates an intelligent inference engine to automatically select an appropriate aligner a priori, depending only on the nature of the input sequences. The inference engine was trained on a large set of reference multiple alignments, using a novel machine learning approach. Applying AlexSys to a test set of 178 alignments, we show that the expert system represents a good compromise between alignment quality and running time, making it suitable for high throughput projects. AlexSys is freely available from http://alnitak.u-strasbg.fr/∼aniba/alexsys.

  1. Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference

    PubMed Central

    Tan, Ge; Muffato, Matthieu; Ledergerber, Christian; Herrero, Javier; Goldman, Nick; Gil, Manuel; Dessimoz, Christophe

    2015-01-01

    Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms. PMID:26031838

  2. Multiple sequence alignment algorithm based on a dispersion graph and ant colony algorithm.

    PubMed

    Chen, Weiyang; Liao, Bo; Zhu, Wen; Xiang, Xuyu

    2009-10-01

    In this article, we describe a representation for the processes of multiple sequences alignment (MSA) and used it to solve the problem of MSA. By this representation, we took every possible aligning result into account by defining the representation of gap insertion, the value of heuristic information in every optional path and scoring rule. On the basis of the proposed multidimensional graph, we used the ant colony algorithm to find the better path that denotes a better aligning result. In our article, we proposed the instance of three-dimensional graph and four-dimensional graph and advanced a special ichnographic representation to analyze MSA. It is yet only an experimental software, and we gave an example for finding the best aligning result by three-dimensional graph and ant colony algorithm. Experimental results show that our method can improve the solution quality on MSA benchmarks. Copyright 2009 Wiley Periodicals, Inc.

  3. Multiple sequence alignment with arbitrary gap costs: computing an optimal solution using polyhedral combinatorics.

    PubMed

    Althaus, Ernst; Caprara, Alberto; Lenhof, Hans-Peter; Reinert, Knut

    2002-01-01

    Multiple sequence alignment is one of the dominant problems in computational molecular biology. Numerous scoring functions and methods have been proposed, most of which result in NP-hard problems. In this paper we propose for the first time a general formulation for multiple alignment with arbitrary gap-costs based on an integer linear program (ILP). In addition we describe a branch-and-cut algorithm to effectively solve the ILP to optimality. We evaluate the performances of our approach in terms of running time and quality of the alignments using the BAliBase database of reference alignments. The results show that our implementation ranks amongst the best programs developed so far.

  4. A memory-efficient algorithm for multiple sequence alignment with constraints.

    PubMed

    Lu, Chin Lung; Huang, Yen Pin

    2005-01-01

    Recently, the concept of the constrained sequence alignment was proposed to incorporate the knowledge of biologists about structures/functionalities/consensuses of their datasets into sequence alignment such that the user-specified residues/nucleotides are aligned together in the computed alignment. The currently developed programs use the so-called progressive approach to efficiently obtain a constrained alignment of several sequences. However, the kernels of these programs, the dynamic programming algorithms for computing an optimal constrained alignment between two sequences, run in (gamman2) memory, where gamma is the number of the constraints and n is the maximum of the lengths of sequences. As a result, such a high memory requirement limits the overall programs to align short sequences only. We adopt the divide-and-conquer approach to design a memory-efficient algorithm for computing an optimal constrained alignment between two sequences, which greatly reduces the memory requirement of the dynamic programming approaches at the expense of a small constant factor in CPU time. This new algorithm consumes only O(alphan) space, where alpha is the sum of the lengths of constraints and usually alpha < n in practical applications. Based on this algorithm, we have developed a memory-efficient tool for multiple sequence alignment with constraints. http://genome.life.nctu.edu.tw/MUSICME.

  5. A Novel Approach to Multiple Sequence Alignment Using Multiobjective Evolutionary Algorithm Based on Decomposition.

    PubMed

    Zhu, Huazheng; He, Zhongshi; Jia, Yuanyuan

    2016-03-01

    Multiple sequence alignment (MSA) is a fundamental and key step for implementing other tasks in bioinformatics, such as phylogenetic analyses, identification of conserved motifs and domains, structure prediction, etc. Despite the fact that there are many methods to implement MSA, biologically perfect alignment approaches are not found hitherto. This paper proposes a novel idea to perform MSA, where MSA is treated as a multiobjective optimization problem. A famous multiobjective evolutionary algorithm framework based on decomposition is applied for solving MSA, named MOMSA. In the MOMSA algorithm, we develop a new population initialization method and a novel mutation operator. We compare the performance of MOMSA with several alignment methods based on evolutionary algorithms, including VDGA, GAPAM, and IMSA, and also with state-of-the-art progressive alignment approaches, such as MSAprobs, Probalign, MAFFT, Procons, Clustal omega, T-Coffee, Kalign2, MUSCLE, FSA, Dialign, PRANK, and CLUSTALW. These alignment algorithms are tested on benchmark datasets BAliBASE 2.0 and BAliBASE 3.0. Experimental results show that MOMSA can obtain the significantly better alignments than VDGA, GAPAM on the most of test cases by statistical analyses, produce better alignments than IMSA in terms of TC scores, and also indicate that MOMSA is comparable with the leading progressive alignment approaches in terms of quality of alignments.

  6. PnpProbs: a better multiple sequence alignment tool by better handling of guide trees.

    PubMed

    Ye, Yongtao; Lam, Tak-Wah; Ting, Hing-Fung

    2016-08-31

    This paper describes a new MSA tool called PnpProbs, which constructs better multiple sequence alignments by better handling of guide trees. It classifies sequences into two types: normally related and distantly related. For normally related sequences, it uses an adaptive approach to construct the guide tree needed for progressive alignment; it first estimates the input's discrepancy by computing the standard deviation of their percent identities, and based on this estimate, it chooses the better method to construct the guide tree. For distantly related sequences, PnpProbs abandons the guide tree and uses instead some non-progressive alignment method to generate the alignment. To evaluate PnpProbs, we have compared it with thirteen other popular MSA tools, and PnpProbs has the best alignment scores in all but one test. We have also used it for phylogenetic analysis, and found that the phylogenetic trees constructed from PnpProbs' alignments are closest to the model trees. By combining the strength of the progressive and non-progressive alignment methods, we have developed an MSA tool called PnpProbs. We have compared PnpProbs with thirteen other popular MSA tools and our results showed that our tool usually constructed the best alignments.

  7. M-Coffee: combining multiple sequence alignment methods with T-Coffee

    PubMed Central

    Wallace, Iain M.; O'Sullivan, Orla; Higgins, Desmond G.; Notredame, Cedric

    2006-01-01

    We introduce M-Coffee, a meta-method for assembling multiple sequence alignments (MSA) by combining the output of several individual methods into one single MSA. M-Coffee is an extension of T-Coffee and uses consistency to estimate a consensus alignment. We show that the procedure is robust to variations in the choice of constituent methods and reasonably tolerant to duplicate MSAs. We also show that performances can be improved by carefully selecting the constituent methods. M-Coffee outperforms all the individual methods on three major reference datasets: HOMSTRAD, Prefab and Balibase. We also show that on a case-by-case basis, M-Coffee is twice as likely to deliver the best alignment than any individual method. Given a collection of pre-computed MSAs, M-Coffee has similar CPU requirements to the original T-Coffee. M-Coffee is a freeware open-source package available from . PMID:16556910

  8. SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

    PubMed

    Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver

    2012-07-15

    In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license.

  9. R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server.

    PubMed

    Cannone, Jamie J; Sweeney, Blake A; Petrov, Anton I; Gutell, Robin R; Zirbel, Craig L; Leontis, Neocles

    2015-07-01

    The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa.

  10. R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server

    PubMed Central

    Cannone, Jamie J.; Sweeney, Blake A.; Petrov, Anton I.; Gutell, Robin R.; Zirbel, Craig L.; Leontis, Neocles

    2015-01-01

    The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. PMID:26048960

  11. Accurate Simulation and Detection of Coevolution Signals in Multiple Sequence Alignments

    PubMed Central

    Ackerman, Sharon H.; Tillier, Elisabeth R.; Gatti, Domenico L.

    2012-01-01

    Background While the conserved positions of a multiple sequence alignment (MSA) are clearly of interest, non-conserved positions can also be important because, for example, destabilizing effects at one position can be compensated by stabilizing effects at another position. Different methods have been developed to recognize the evolutionary relationship between amino acid sites, and to disentangle functional/structural dependencies from historical/phylogenetic ones. Methodology/Principal Findings We have used two complementary approaches to test the efficacy of these methods. In the first approach, we have used a new program, MSAvolve, for the in silico evolution of MSAs, which records a detailed history of all covarying positions, and builds a global coevolution matrix as the accumulated sum of individual matrices for the positions forced to co-vary, the recombinant coevolution, and the stochastic coevolution. We have simulated over 1600 MSAs for 8 protein families, which reflect sequences of different sizes and proteins with widely different functions. The calculated coevolution matrices were compared with the coevolution matrices obtained for the same evolved MSAs with different coevolution detection methods. In a second approach we have evaluated the capacity of the different methods to predict close contacts in the representative X-ray structures of an additional 150 protein families using only experimental MSAs. Conclusions/Significance Methods based on the identification of global correlations between pairs were found to be generally superior to methods based only on local correlations in their capacity to identify coevolving residues using either simulated or experimental MSAs. However, the significant variability in the performance of different methods with different proteins suggests that the simulation of MSAs that replicate the statistical properties of the experimental MSA can be a valuable tool to identify the coevolution detection method that is most

  12. Accurate simulation and detection of coevolution signals in multiple sequence alignments.

    PubMed

    Ackerman, Sharon H; Tillier, Elisabeth R; Gatti, Domenico L

    2012-01-01

    While the conserved positions of a multiple sequence alignment (MSA) are clearly of interest, non-conserved positions can also be important because, for example, destabilizing effects at one position can be compensated by stabilizing effects at another position. Different methods have been developed to recognize the evolutionary relationship between amino acid sites, and to disentangle functional/structural dependencies from historical/phylogenetic ones. We have used two complementary approaches to test the efficacy of these methods. In the first approach, we have used a new program, MSAvolve, for the in silico evolution of MSAs, which records a detailed history of all covarying positions, and builds a global coevolution matrix as the accumulated sum of individual matrices for the positions forced to co-vary, the recombinant coevolution, and the stochastic coevolution. We have simulated over 1600 MSAs for 8 protein families, which reflect sequences of different sizes and proteins with widely different functions. The calculated coevolution matrices were compared with the coevolution matrices obtained for the same evolved MSAs with different coevolution detection methods. In a second approach we have evaluated the capacity of the different methods to predict close contacts in the representative X-ray structures of an additional 150 protein families using only experimental MSAs. Methods based on the identification of global correlations between pairs were found to be generally superior to methods based only on local correlations in their capacity to identify coevolving residues using either simulated or experimental MSAs. However, the significant variability in the performance of different methods with different proteins suggests that the simulation of MSAs that replicate the statistical properties of the experimental MSA can be a valuable tool to identify the coevolution detection method that is most effective in each case.

  13. Open-Phylo: a customizable crowd-computing platform for multiple sequence alignment

    PubMed Central

    2013-01-01

    Citizen science games such as Galaxy Zoo, Foldit, and Phylo aim to harness the intelligence and processing power generated by crowds of online gamers to solve scientific problems. However, the selection of the data to be analyzed through these games is under the exclusive control of the game designers, and so are the results produced by gamers. Here, we introduce Open-Phylo, a freely accessible crowd-computing platform that enables any scientist to enter our system and use crowds of gamers to assist computer programs in solving one of the most fundamental problems in genomics: the multiple sequence alignment problem. PMID:24148814

  14. Open-Phylo: a customizable crowd-computing platform for multiple sequence alignment.

    PubMed

    Kwak, Daniel; Kam, Alfred; Becerra, David; Zhou, Qikuan; Hops, Adam; Zarour, Eleyine; Kam, Arthur; Sarmenta, Luis; Blanchette, Mathieu; Waldispühl, Jérôme

    2013-01-01

    Citizen science games such as Galaxy Zoo, Foldit, and Phylo aim to harness the intelligence and processing power generated by crowds of online gamers to solve scientific problems. However, the selection of the data to be analyzed through these games is under the exclusive control of the game designers, and so are the results produced by gamers. Here, we introduce Open-Phylo, a freely accessible crowd-computing platform that enables any scientist to enter our system and use crowds of gamers to assist computer programs in solving one of the most fundamental problems in genomics: the multiple sequence alignment problem.

  15. Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

    PubMed

    Tan, Ge; Muffato, Matthieu; Ledergerber, Christian; Herrero, Javier; Goldman, Nick; Gil, Manuel; Dessimoz, Christophe

    2015-09-01

    Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society of Systematic Biologists.

  16. SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

    PubMed

    Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen

    2010-07-01

    We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.

  17. EvalMSA: A Program to Evaluate Multiple Sequence Alignments and Detect Outliers.

    PubMed

    Chiner-Oms, Alvaro; González-Candelas, Fernando

    2016-01-01

    We present EvalMSA, a software tool for evaluating and detecting outliers in multiple sequence alignments (MSAs). This tool allows the identification of divergent sequences in MSAs by scoring the contribution of each row in the alignment to its quality using a sum-of-pair-based method and additional analyses. Our main goal is to provide users with objective data in order to take informed decisions about the relevance and/or pertinence of including/retaining a particular sequence in an MSA. EvalMSA is written in standard Perl and also uses some routines from the statistical language R. Therefore, it is necessary to install the R-base package in order to get full functionality. Binary packages are freely available from http://sourceforge.net/projects/evalmsa/for Linux and Windows.

  18. EvalMSA: A Program to Evaluate Multiple Sequence Alignments and Detect Outliers

    PubMed Central

    Chiner-Oms, Alvaro; González-Candelas, Fernando

    2016-01-01

    We present EvalMSA, a software tool for evaluating and detecting outliers in multiple sequence alignments (MSAs). This tool allows the identification of divergent sequences in MSAs by scoring the contribution of each row in the alignment to its quality using a sum-of-pair-based method and additional analyses. Our main goal is to provide users with objective data in order to take informed decisions about the relevance and/or pertinence of including/retaining a particular sequence in an MSA. EvalMSA is written in standard Perl and also uses some routines from the statistical language R. Therefore, it is necessary to install the R-base package in order to get full functionality. Binary packages are freely available from http://sourceforge.net/projects/evalmsa/for Linux and Windows. PMID:27920488

  19. Multiple sequence alignment based on combining genetic algorithm with chaotic sequences.

    PubMed

    Gao, C; Wang, B; Zhou, C J; Zhang, Q

    2016-06-24

    In bioinformatics, sequence alignment is one of the most common problems. Multiple sequence alignment is an NP (nondeterministic polynomial time) problem, which requires further study and exploration. The chaos optimization algorithm is a type of chaos theory, and a procedure for combining the genetic algorithm (GA), which uses ergodicity, and inherent randomness of chaotic iteration. It is an efficient method to solve the basic premature phenomenon of the GA. Applying the Logistic map to the GA and using chaotic sequences to carry out the chaotic perturbation can improve the convergence of the basic GA. In addition, the random tournament selection and optimal preservation strategy are used in the GA. Experimental evidence indicates good results for this process.

  20. A probabilistic coding based quantum genetic algorithm for multiple sequence alignment.

    PubMed

    Huo, Hongwei; Xie, Qiaoluan; Shen, Xubang; Stojkovic, Vojislav

    2008-01-01

    This paper presents an original Quantum Genetic algorithm for Multiple sequence ALIGNment (QGMALIGN) that combines a genetic algorithm and a quantum algorithm. A quantum probabilistic coding is designed for representing the multiple sequence alignment. A quantum rotation gate as a mutation operator is used to guide the quantum state evolution. Six genetic operators are designed on the coding basis to improve the solution during the evolutionary process. The features of implicit parallelism and state superposition in quantum mechanics and the global search capability of the genetic algorithm are exploited to get efficient computation. A set of well known test cases from BAliBASE2.0 is used as reference to evaluate the efficiency of the QGMALIGN optimization. The QGMALIGN results have been compared with the most popular methods (CLUSTALX, SAGA, DIALIGN, SB_PIMA, and QGMALIGN) results. The QGMALIGN results show that QGMALIGN performs well on the presenting biological data. The addition of genetic operators to the quantum algorithm lowers the cost of overall running time.

  1. SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes

    PubMed Central

    Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver

    2012-01-01

    Motivation: In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. Results: In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Availability: Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license. Contact: epruesse@mpi-bremen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22556368

  2. A parallel approach of COFFEE objective function to multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Zafalon, G. F. D.; Visotaky, J. M. V.; Amorim, A. R.; Valêncio, C. R.; Neves, L. A.; de Souza, R. C. G.; Machado, J. M.

    2015-09-01

    The computational tools to assist genomic analyzes show even more necessary due to fast increasing of data amount available. With high computational costs of deterministic algorithms for sequence alignments, many works concentrate their efforts in the development of heuristic approaches to multiple sequence alignments. However, the selection of an approach, which offers solutions with good biological significance and feasible execution time, is a great challenge. Thus, this work aims to show the parallelization of the processing steps of MSA-GA tool using multithread paradigm in the execution of COFFEE objective function. The standard objective function implemented in the tool is the Weighted Sum of Pairs (WSP), which produces some distortions in the final alignments when sequences sets with low similarity are aligned. Then, in studies previously performed we implemented the COFFEE objective function in the tool to smooth these distortions. Although the nature of COFFEE objective function implies in the increasing of execution time, this approach presents points, which can be executed in parallel. With the improvements implemented in this work, we can verify the execution time of new approach is 24% faster than the sequential approach with COFFEE. Moreover, the COFFEE multithreaded approach is more efficient than WSP, because besides it is slightly fast, its biological results are better.

  3. Graph-based modeling of tandem repeats improves global multiple sequence alignment

    PubMed Central

    Szalkowski, Adam M.; Anisimova, Maria

    2013-01-01

    Tandem repeats (TRs) are often present in proteins with crucial functions, responsible for resistance, pathogenicity and associated with infectious or neurodegenerative diseases. This motivates numerous studies of TRs and their evolution, requiring accurate multiple sequence alignment. TRs may be lost or inserted at any position of a TR region by replication slippage or recombination, but current methods assume fixed unit boundaries, and yet are of high complexity. We present a new global graph-based alignment method that does not restrict TR unit indels by unit boundaries. TR indels are modeled separately and penalized using the phylogeny-aware alignment algorithm. This ensures enhanced accuracy of reconstructed alignments, disentangling TRs and measuring indel events and rates in a biologically meaningful way. Our method detects not only duplication events but also all changes in TR regions owing to recombination, strand slippage and other events inserting or deleting TR units. We evaluate our method by simulation incorporating TR evolution, by either sampling TRs from a profile hidden Markov model or by mimicking strand slippage with duplications. The new method is illustrated on a family of type III effectors, a pathogenicity determinant in agriculturally important bacteria Ralstonia solanacearum. We show that TR indel rate variation contributes to the diversification of this protein family. PMID:23877246

  4. Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm.

    PubMed

    Rani, R Ranjani; Ramyachitra, D

    2016-12-01

    Multiple sequence alignment (MSA) is a widespread approach in computational biology and bioinformatics. MSA deals with how the sequences of nucleotides and amino acids are sequenced with possible alignment and minimum number of gaps between them, which directs to the functional, evolutionary and structural relationships among the sequences. Still the computation of MSA is a challenging task to provide an efficient accuracy and statistically significant results of alignments. In this work, the Bacterial Foraging Optimization Algorithm was employed to align the biological sequences which resulted in a non-dominated optimal solution. It employs Multi-objective, such as: Maximization of Similarity, Non-gap percentage, Conserved blocks and Minimization of gap penalty. BAliBASE 3.0 benchmark database was utilized to examine the proposed algorithm against other methods In this paper, two algorithms have been proposed: Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC) and Bacterial Foraging Optimization Algorithm. It was found that Hybrid Genetic Algorithm with Artificial Bee Colony performed better than the existing optimization algorithms. But still the conserved blocks were not obtained using GA-ABC. Then BFO was used for the alignment and the conserved blocks were obtained. The proposed Multi-Objective Bacterial Foraging Optimization Algorithm (MO-BFO) was compared with widely used MSA methods Clustal Omega, Kalign, MUSCLE, MAFFT, Genetic Algorithm (GA), Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO) and Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC). The final results show that the proposed MO-BFO algorithm yields better alignment than most widely used methods.

  5. Java bioinformatics analysis web services for multiple sequence alignment--JABAWS:MSA.

    PubMed

    Troshin, Peter V; Procter, James B; Barton, Geoffrey J

    2011-07-15

    JABAWS is a web services framework that simplifies the deployment of web services for bioinformatics. JABAWS:MSA provides services for five multiple sequence alignment (MSA) methods (Probcons, T-coffee, Muscle, Mafft and ClustalW), and is the system employed by the Jalview multiple sequence analysis workbench since version 2.6. A fully functional, easy to set up server is provided as a Virtual Appliance (VA), which can be run on most operating systems that support a virtualization environment such as VMware or Oracle VirtualBox. JABAWS is also distributed as a Web Application aRchive (WAR) and can be configured to run on a single computer and/or a cluster managed by Grid Engine, LSF or other queuing systems that support DRMAA. JABAWS:MSA provides clients full access to each application's parameters, allows administrators to specify named parameter preset combinations and execution limits for each application through simple configuration files. The JABAWS command-line client allows integration of JABAWS services into conventional scripts. JABAWS is made freely available under the Apache 2 license and can be obtained from: http://www.compbio.dundee.ac.uk/jabaws.

  6. IBBOMSA: An Improved Biogeography-based Approach for Multiple Sequence Alignment

    PubMed Central

    Yadav, Rohit Kumar; Banka, Haider

    2016-01-01

    In bioinformatics, multiple sequence alignment (MSA) is an NP-hard problem. Hence, nature-inspired techniques can better approximate the solution. In the current study, a novel biogeography-based optimization (NBBO) is proposed to solve an MSA problem. The biogeography-based optimization (BBO) is a new paradigm for optimization. But, there exists some deficiencies in solving complicated problems such as low population diversity and slow convergence rate. NBBO is an enhanced version of BBO, in which, a new migration operation is proposed to overcome the limitations of BBO. The new migration adopts more information from other habitats, maintains population diversity, and preserves exploitation ability. In the performance analysis, the proposed and existing techniques such as VDGA, MOMSA, and GAPAM are tested on publicly available benchmark datasets (ie, Bali base). It has been observed that the proposed method shows the superiority/competitiveness with the existing techniques. PMID:27812276

  7. IBBOMSA: An Improved Biogeography-based Approach for Multiple Sequence Alignment.

    PubMed

    Yadav, Rohit Kumar; Banka, Haider

    2016-01-01

    In bioinformatics, multiple sequence alignment (MSA) is an NP-hard problem. Hence, nature-inspired techniques can better approximate the solution. In the current study, a novel biogeography-based optimization (NBBO) is proposed to solve an MSA problem. The biogeography-based optimization (BBO) is a new paradigm for optimization. But, there exists some deficiencies in solving complicated problems such as low population diversity and slow convergence rate. NBBO is an enhanced version of BBO, in which, a new migration operation is proposed to overcome the limitations of BBO. The new migration adopts more information from other habitats, maintains population diversity, and preserves exploitation ability. In the performance analysis, the proposed and existing techniques such as VDGA, MOMSA, and GAPAM are tested on publicly available benchmark datasets (ie, Bali base). It has been observed that the proposed method shows the superiority/competitiveness with the existing techniques.

  8. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    PubMed

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.

  9. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV)

    PubMed Central

    Martin, Andrew C. R.

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and ’dotifying’ repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/. PMID:25653836

  10. A unified statistical model of protein multiple sequence alignment integrating direct coupling and insertions

    PubMed Central

    Kinjo, Akira R.

    2016-01-01

    The multiple sequence alignment (MSA) of a protein family provides a wealth of information in terms of the conservation pattern of amino acid residues not only at each alignment site but also between distant sites. In order to statistically model the MSA incorporating both short-range and long-range correlations as well as insertions, I have derived a lattice gas model of the MSA based on the principle of maximum entropy. The partition function, obtained by the transfer matrix method with a mean-field approximation, accounts for all possible alignments with all possible sequences. The model parameters for short-range and long-range interactions were determined by a self-consistent condition and by a Gaussian approximation, respectively. Using this model with and without long-range interactions, I analyzed the globin and V-set domains by increasing the “temperature” and by “mutating” a site. The correlations between residue conservation and various measures of the system’s stability indicate that the long-range interactions make the conservation pattern more specific to the structure, and increasingly stabilize better conserved residues. PMID:27924257

  11. QuickProbs--a fast multiple sequence alignment algorithm designed for graphics processors.

    PubMed

    Gudyś, Adam; Deorowicz, Sebastian

    2014-01-01

    Multiple sequence alignment is a crucial task in a number of biological analyses like secondary structure prediction, domain searching, phylogeny, etc. MSAProbs is currently the most accurate alignment algorithm, but its effectiveness is obtained at the expense of computational time. In the paper we present QuickProbs, the variant of MSAProbs customised for graphics processors. We selected the two most time consuming stages of MSAProbs to be redesigned for GPU execution: the posterior matrices calculation and the consistency transformation. Experiments on three popular benchmarks (BAliBASE, PREFAB, OXBench-X) on quad-core PC equipped with high-end graphics card show QuickProbs to be 5.7 to 9.7 times faster than original CPU-parallel MSAProbs. Additional tests performed on several protein families from Pfam database give overall speed-up of 6.7. Compared to other algorithms like MAFFT, MUSCLE, or ClustalW, QuickProbs proved to be much more accurate at similar speed. Additionally we introduce a tuned variant of QuickProbs which is significantly more accurate on sets of distantly related sequences than MSAProbs without exceeding its computation time. The GPU part of QuickProbs was implemented in OpenCL, thus the package is suitable for graphics processors produced by all major vendors.

  12. QuickProbs—A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors

    PubMed Central

    Gudyś, Adam; Deorowicz, Sebastian

    2014-01-01

    Multiple sequence alignment is a crucial task in a number of biological analyses like secondary structure prediction, domain searching, phylogeny, etc. MSAProbs is currently the most accurate alignment algorithm, but its effectiveness is obtained at the expense of computational time. In the paper we present QuickProbs, the variant of MSAProbs customised for graphics processors. We selected the two most time consuming stages of MSAProbs to be redesigned for GPU execution: the posterior matrices calculation and the consistency transformation. Experiments on three popular benchmarks (BAliBASE, PREFAB, OXBench-X) on quad-core PC equipped with high-end graphics card show QuickProbs to be 5.7 to 9.7 times faster than original CPU-parallel MSAProbs. Additional tests performed on several protein families from Pfam database give overall speed-up of 6.7. Compared to other algorithms like MAFFT, MUSCLE, or ClustalW, QuickProbs proved to be much more accurate at similar speed. Additionally we introduce a tuned variant of QuickProbs which is significantly more accurate on sets of distantly related sequences than MSAProbs without exceeding its computation time. The GPU part of QuickProbs was implemented in OpenCL, thus the package is suitable for graphics processors produced by all major vendors. PMID:24586435

  13. MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.

    PubMed

    González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil

    2016-12-15

    MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-05

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles). © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  15. Resolving the multiple sequence alignment problem using biogeography-based optimization with multiple populations.

    PubMed

    Zemali, El-Amine; Boukra, Abdelmadjid

    2015-08-01

    The multiple sequence alignment (MSA) is one of the most challenging problems in bioinformatics, it involves discovering similarity between a set of protein or DNA sequences. This paper introduces a new method for the MSA problem called biogeography-based optimization with multiple populations (BBOMP). It is based on a recent metaheuristic inspired from the mathematics of biogeography named biogeography-based optimization (BBO). To improve the exploration ability of BBO, we have introduced a new concept allowing better exploration of the search space. It consists of manipulating multiple populations having each one its own parameters. These parameters are used to build up progressive alignments allowing more diversity. At each iteration, the best found solution is injected in each population. Moreover, to improve solution quality, six operators are defined. These operators are selected with a dynamic probability which changes according to the operators efficiency. In order to test proposed approach performance, we have considered a set of datasets from Balibase 2.0 and compared it with many recent algorithms such as GAPAM, MSA-GA, QEAMSA and RBT-GA. The results show that the proposed approach achieves better average score than the previously cited methods.

  16. RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem

    PubMed Central

    Taheri, Javid; Zomaya, Albert Y

    2009-01-01

    Background Multiple Sequence Alignment (MSA) has always been an active area of research in Bioinformatics. MSA is mainly focused on discovering biologically meaningful relationships among different sequences or proteins in order to investigate the underlying main characteristics/functions. This information is also used to generate phylogenetic trees. Results This paper presents a novel approach, namely RBT-GA, to solve the MSA problem using a hybrid solution methodology combining the Rubber Band Technique (RBT) and the Genetic Algorithm (GA) metaheuristic. RBT is inspired by the behavior of an elastic Rubber Band (RB) on a plate with several poles, which is analogues to locations in the input sequences that could potentially be biologically related. A GA attempts to mimic the evolutionary processes of life in order to locate optimal solutions in an often very complex landscape. RBT-GA is a population based optimization algorithm designed to find the optimal alignment for a set of input protein sequences. In this novel technique, each alignment answer is modeled as a chromosome consisting of several poles in the RBT framework. These poles resemble locations in the input sequences that are most likely to be correlated and/or biologically related. A GA-based optimization process improves these chromosomes gradually yielding a set of mostly optimal answers for the MSA problem. Conclusion RBT-GA is tested with one of the well-known benchmarks suites (BALiBASE 2.0) in this area. The obtained results show that the superiority of the proposed technique even in the case of formidable sequences. PMID:19594869

  17. Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty.

    PubMed

    Md Mukarram Hossain, A S; Blackburne, Benjamin P; Shah, Abhijeet; Whelan, Simon

    2015-07-01

    Evolutionary studies usually use a two-step process to investigate sequence data. Step one estimates a multiple sequence alignment (MSA) and step two applies phylogenetic methods to ask evolutionary questions of that MSA. Modern phylogenetic methods infer evolutionary parameters using maximum likelihood or Bayesian inference, mediated by a probabilistic substitution model that describes sequence change over a tree. The statistical properties of these methods mean that more data directly translates to an increased confidence in downstream results, providing the substitution model is adequate and the MSA is correct. Many studies have investigated the robustness of phylogenetic methods in the presence of substitution model misspecification, but few have examined the statistical properties of those methods when the MSA is unknown. This simulation study examines the statistical properties of the complete two-step process when inferring sequence divergence and the phylogenetic tree topology. Both nucleotide and amino acid analyses are negatively affected by the alignment step, both through inaccurate guide tree estimates and through overfitting to that guide tree. For many alignment tools these effects become more pronounced when additional sequences are added to the analysis. Nucleotide sequences are particularly susceptible, with MSA errors leading to statistical support for long-branch attraction artifacts, which are usually associated with gross substitution model misspecification. Amino acid MSAs are more robust, but do tend to arbitrarily resolve multifurcations in favor of the guide tree. No inference strategies produce consistently accurate estimates of divergence between sequences, although amino acid MSAs are again more accurate than their nucleotide counterparts. We conclude with some practical suggestions about how to limit the effect of MSA uncertainty on evolutionary inference. © The Author(s) 2015. Published by Oxford University Press on behalf of the

  18. AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

    PubMed Central

    2010-01-01

    Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used to reliably detect

  19. A quantum-inspired genetic algorithm based on probabilistic coding for multiple sequence alignment.

    PubMed

    Huo, Hong-Wei; Stojkovic, Vojislav; Xie, Qiao-Luan

    2010-02-01

    Quantum parallelism arises from the ability of a quantum memory register to exist in a superposition of base states. Since the number of possible base states is 2(n), where n is the number of qubits in the quantum memory register, one operation on a quantum computer performs what an exponential number of operations on a classical computer performs. The power of quantum algorithms comes from taking advantages of quantum parallelism. Quantum algorithms are exponentially faster than classical algorithms. Genetic optimization algorithms are stochastic search algorithms which are used to search large, nonlinear spaces where expert knowledge is lacking or difficult to encode. QGMALIGN--a probabilistic coding based quantum-inspired genetic algorithm for multiple sequence alignment is presented. A quantum rotation gate as a mutation operator is used to guide the quantum state evolution. Six genetic operators are designed on the coding basis to improve the solution during the evolutionary process. The experimental results show that QGMALIGN can compete with the popular methods, such as CLUSTALX and SAGA, and performs well on the presenting biological data. Moreover, the addition of genetic operators to the quantum-inspired algorithm lowers the cost of overall running time.

  20. CUDA ClustalW: An efficient parallel algorithm for progressive multiple sequence alignment on Multi-GPUs.

    PubMed

    Hung, Che-Lun; Lin, Yu-Shiang; Lin, Chun-Yuan; Chung, Yeh-Ching; Chung, Yi-Fang

    2015-10-01

    For biological applications, sequence alignment is an important strategy to analyze DNA and protein sequences. Multiple sequence alignment is an essential methodology to study biological data, such as homology modeling, phylogenetic reconstruction and etc. However, multiple sequence alignment is a NP-hard problem. In the past decades, progressive approach has been proposed to successfully align multiple sequences by adopting iterative pairwise alignments. Due to rapid growth of the next generation sequencing technologies, a large number of sequences can be produced in a short period of time. When the problem instance is large, progressive alignment will be time consuming. Parallel computing is a suitable solution for such applications, and GPU is one of the important architectures for contemporary parallel computing researches. Therefore, we proposed a GPU version of ClustalW v2.0.11, called CUDA ClustalW v1.0, in this work. From the experiment results, it can be seen that the CUDA ClustalW v1.0 can achieve more than 33× speedups for overall execution time by comparing to ClustalW v2.0.11.

  1. Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments

    PubMed Central

    Wei, Jyh-Da; Cheng, Hui-Jun; Lin, Chun-Yuan; Ye, Jin; Yeh, Kuan-Yu

    2017-01-01

    High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments. PMID:28835734

  2. Embedded-Based Graphics Processing Unit Cluster Platform for Multiple Sequence Alignments.

    PubMed

    Wei, Jyh-Da; Cheng, Hui-Jun; Lin, Chun-Yuan; Ye, Jin; Yeh, Kuan-Yu

    2017-01-01

    High-end graphics processing units (GPUs), such as NVIDIA Tesla/Fermi/Kepler series cards with thousands of cores per chip, are widely applied to high-performance computing fields in a decade. These desktop GPU cards should be installed in personal computers/servers with desktop CPUs, and the cost and power consumption of constructing a GPU cluster platform are very high. In recent years, NVIDIA releases an embedded board, called Jetson Tegra K1 (TK1), which contains 4 ARM Cortex-A15 CPUs and 192 Compute Unified Device Architecture cores (belong to Kepler GPUs). Jetson Tegra K1 has several advantages, such as the low cost, low power consumption, and high applicability, and it has been applied into several specific applications. In our previous work, a bioinformatics platform with a single TK1 (STK platform) was constructed, and this previous work is also used to prove that the Web and mobile services can be implemented in the STK platform with a good cost-performance ratio by comparing a STK platform with the desktop CPU and GPU. In this work, an embedded-based GPU cluster platform will be constructed with multiple TK1s (MTK platform). Complex system installation and setup are necessary procedures at first. Then, 2 job assignment modes are designed for the MTK platform to provide services for users. Finally, ClustalW v2.0.11 and ClustalWtk will be ported to the MTK platform. The experimental results showed that the speedup ratios achieved 5.5 and 4.8 times for ClustalW v2.0.11 and ClustalWtk, respectively, by comparing 6 TK1s with a single TK1. The MTK platform is proven to be useful for multiple sequence alignments.

  3. Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures.

    PubMed

    Taly, Jean-Francois; Magis, Cedrik; Bussotti, Giovanni; Chang, Jia-Ming; Di Tommaso, Paolo; Erb, Ionas; Espinosa-Carrasco, Jose; Kemena, Carsten; Notredame, Cedric

    2011-11-01

    T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biological sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homology) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homology extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.

  4. Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening.

    PubMed

    Pierri, Ciro Leonardo; Parisi, Giovanni; Porcelli, Vito

    2010-09-01

    The functional characterization of proteins represents a daily challenge for biochemical, medical and computational sciences. Although finally proved on the bench, the function of a protein can be successfully predicted by computational approaches that drive the further experimental assays. Current methods for comparative modeling allow the construction of accurate 3D models for proteins of unknown structure, provided that a crystal structure of a homologous protein is available. Binding regions can be proposed by using binding site predictors, data inferred from homologous crystal structures, and data provided from a careful interpretation of the multiple sequence alignment of the investigated protein and its homologs. Once the location of a binding site has been proposed, chemical ligands that have a high likelihood of binding can be identified by using ligand docking and structure-based virtual screening of chemical libraries. Most docking algorithms allow building a list sorted by energy of the lowest energy docking configuration for each ligand of the library. In this review the state-of-the-art of computational approaches in 3D protein comparative modeling and in the study of protein-ligand interactions is provided. Furthermore a possible combined/concerted multistep strategy for protein function prediction, based on multiple sequence alignment, comparative modeling, binding region prediction, and structure-based virtual screening of chemical libraries, is described by using suitable examples. As practical examples, Abl-kinase molecular modeling studies, HPV-E6 protein multiple sequence alignment analysis, and some other model docking-based characterization reports are briefly described to highlight the importance of computational approaches in protein function prediction.

  5. Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization.

    PubMed

    Bauer, Markus; Klau, Gunnar W; Reinert, Knut

    2007-07-27

    The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of input sequences. Our program LARA is freely available for academic purposes from http://www.planet-lisa.net.

  6. New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era (2010 JGI/ANL HPC Workshop)

    ScienceCinema

    Notredame, Cedric [Centre for Genomic Regulation

    2016-07-12

    Cedric Notredame from the Centre for Genomic Regulation gives a presentation on "New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era" at the JGI/Argonne HPC Workshop on January 26, 2010.

  7. A non-independent energy-based multiple sequence alignment improves prediction of transcription factor binding sites.

    PubMed

    Salama, Rafik A; Stekel, Dov J

    2013-11-01

    Multiple sequence alignments (MSAs) are usually scored under the assumption that the sequences being aligned have evolved by common descent. Consequently, the differences between sequences reflect the impact of insertions, deletions and mutations. However, non-coding DNA binding sequences, such as transcription factor binding sites (TFBSs), are frequently not related by common descent, and so the existing alignment scoring methods are not well suited for aligning such sequences. We present a novel multiple MSA methodology that scores TFBS DNA sequences by including the interdependence of neighboring bases. We introduced two variants supported by different underlying null hypotheses, one statistically and the other thermodynamically generated. We assessed the alignments through their performance in TFBS prediction; both methods show considerable improvements when compared with standard MSA algorithms. Moreover, the thermodynamically generated null hypothesis outperforms the statistical one due to improved stability in the base stacking free energy of the alignment. The thermodynamically generated null hypothesis method can be downloaded from http://sourceforge.net/projects/msa-edna/. dov.stekel@nottingham.ac.uk. Supplementary data are available at Bioinformatics online.

  8. JDet: interactive calculation and visualization of function-related conservation patterns in multiple sequence alignments and structures.

    PubMed

    Muth, Thilo; García-Martín, Juan A; Rausell, Antonio; Juan, David; Valencia, Alfonso; Pazos, Florencio

    2012-02-15

    We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet. The package includes two of our recently developed programs for detecting functional positions in protein alignments (Xdet and S3Det), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available.

  9. Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search

    NASA Technical Reports Server (NTRS)

    Wheeler, Ward C.

    2003-01-01

    A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unlike general and globally optimal multiple-alignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These "lines of correspondence" link ancestor-descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion-deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA + Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.

  10. Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search

    NASA Technical Reports Server (NTRS)

    Wheeler, Ward C.

    2003-01-01

    A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unlike general and globally optimal multiple-alignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These "lines of correspondence" link ancestor-descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion-deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA + Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.

  11. Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search.

    PubMed

    Wheeler, Ward C

    2003-06-01

    A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unlike general and globally optimal multiple-alignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These "lines of correspondence" link ancestor-descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion-deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA + Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed.

  12. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles.

    PubMed

    Gautheret, D; Lambert, A

    2001-11-09

    We present here a new approach to the problem of defining RNA signatures and finding their occurrences in sequence databases. The proposed method is based on "secondary structure profiles". An RNA sequence alignment with secondary structure information is used as an input. Two types of weight matrices/profiles are constructed from this alignment: single strands are represented by a classical lod-scores profile while helical regions are represented by an extended "helical profile" comprising 16 lod-scores per position, one for each of the 16 possible base-pairs. Database searches are then conducted using a simultaneous search for helical profiles and dynamic programming alignment of single strand profiles. The algorithm has been implemented into a new software, ERPIN, that performs both profile construction and database search. Applications are presented for several RNA motifs. The automated use of sequence information in both single-stranded and helical regions yields better sensitivity/specificity ratios than descriptor-based programs. Furthermore, since the translation of alignments into profiles is straightforward with ERPIN, iterative searches can easily be conducted to enrich collections of homologous RNAs. Copyright 2001 Academic Press.

  13. Match-Box_server: a multiple sequence alignment tool placing emphasis on reliability.

    PubMed

    Depiereux, E; Baudoux, G; Briffeuil, P; Reginster, I; De Bolle, X; Vinals, C; Feytmans, E

    1997-06-01

    The Match-Box software comprises protein sequence alignment tools based on strict statistical thresholds of similarity between protein segments. The method circumvents the gap penalty requirement: gaps being the result of the alignment and not a governing parameter of the procedure. The reliable conserved regions outlined by Match-Box are particularly relevant for homology modelling of protein structures, prediction of essential residues for site-directed mutagenesis and oligonucleotide design for cloning homologous genes by polymerase chain reaction (PCR). The method produces reliable results, as assessed by tests performed on protein families of known structures and of low sequence similarity. A reliability score is computed in relation to a threshold of similarity progressively raised to extend the aligned regions to their maximal length, up to the significance limit of matching segments. The score obtained at each position is printed below the sequences and allows a discriminant reading of each aligned region. Sequences may be submitted to a Web server at http://www.fundp.ac.be/sciences/biologie/bms/+ ++matchbox_submit.html or sent by e-mail to matchbox/biq.fundp.ac.be (help available by just mailing help).

  14. Issues in bioinformatics benchmarking: the case study of multiple sequence alignment

    PubMed Central

    Aniba, Mohamed Radhouene; Poch, Olivier; Thompson, Julie D.

    2010-01-01

    The post-genomic era presents many new challenges for the field of bioinformatics. Novel computational approaches are now being developed to handle the large, complex and noisy datasets produced by high throughput technologies. Objective evaluation of these methods is essential (i) to assure high quality, (ii) to identify strong and weak points of the algorithms, (iii) to measure the improvements introduced by new methods and (iv) to enable non-specialists to choose an appropriate tool. Here, we discuss the development of formal benchmarks, designed to represent the current problems encountered in the bioinformatics field. We consider several criteria for building good benchmarks and the advantages to be gained when they are used intelligently. To illustrate these principles, we present a more detailed discussion of benchmarks for multiple alignments of protein sequences. As in many other domains, significant progress has been achieved in the multiple alignment field and the datasets have become progressively more challenging as the existing algorithms have evolved. Finally, we propose directions for future developments that will ensure that the bioinformatics benchmarks correspond to the challenges posed by the high throughput data. PMID:20639539

  15. PyMod: sequence similarity searches, multiple sequence-structure alignments, and homology modeling within PyMOL

    PubMed Central

    2012-01-01

    Background In recent years, an exponential growing number of tools for protein sequence analysis, editing and modeling tasks have been put at the disposal of the scientific community. Despite the vast majority of these tools have been released as open source software, their deep learning curves often discourages even the most experienced users. Results A simple and intuitive interface, PyMod, between the popular molecular graphics system PyMOL and several other tools (i.e., [PSI-]BLAST, ClustalW, MUSCLE, CEalign and MODELLER) has been developed, to show how the integration of the individual steps required for homology modeling and sequence/structure analysis within the PyMOL framework can hugely simplify these tasks. Sequence similarity searches, multiple sequence and structural alignments generation and editing, and even the possibility to merge sequence and structure alignments have been implemented in PyMod, with the aim of creating a simple, yet powerful tool for sequence and structure analysis and building of homology models. Conclusions PyMod represents a new tool for the analysis and the manipulation of protein sequences and structures. The ease of use, integration with many sequence retrieving and alignment tools and PyMOL, one of the most used molecular visualization system, are the key features of this tool. Source code, installation instructions, video tutorials and a user's guide are freely available at the URL http://schubert.bio.uniroma1.it/pymod/index.html PMID:22536966

  16. Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting.

    PubMed

    Ye, Kai; Feenstra, K Anton; Heringa, Jaap; Ijzerman, Adriaan P; Marchiori, Elena

    2008-01-01

    Identification of residues that account for protein function specificity is crucial, not only for understanding the nature of functional specificity, but also for protein engineering experiments aimed at switching the specificity of an enzyme, regulator or transporter. Available algorithms generally use multiple sequence alignments to identify residue positions conserved within subfamilies but divergent in between. However, many biological examples show a much subtler picture than simple intra-group conservation versus inter-group divergence. We present multi-RELIEF, a novel approach for identifying specificity residues that is based on RELIEF, a state-of-the-art Machine-Learning technique for feature weighting. It estimates the expected 'local' functional specificity of residues from an alignment divided in multiple classes. Optionally, 3D structure information is exploited by increasing the weight of residues that have high-weight neighbors. Using ROC curves over a large body of experimental reference data, we show that (a) multi-RELIEF identifies specificity residues for the seven test sets used, (b) incorporating structural information improves prediction for specificity of interaction with small molecules and (c) comparison of multi-RELIEF with four other state-of-the-art algorithms indicates its robustness and best overall performance. A web-server implementation of multi-RELIEF is available at www.ibi.vu.nl/programs/multirelief. Matlab source code of the algorithm and data sets are available on request for academic use.

  17. Optimizing multiple sequence alignments using a genetic algorithm based on three objectives: structural information, non-gaps percentage and totally conserved columns.

    PubMed

    Ortuño, Francisco M; Valenzuela, Olga; Rojas, Fernando; Pomares, Hector; Florido, Javier P; Urquiza, Jose M; Rojas, Ignacio

    2013-09-01

    Multiple sequence alignments (MSAs) are widely used approaches in bioinformatics to carry out other tasks such as structure predictions, biological function analyses or phylogenetic modeling. However, current tools usually provide partially optimal alignments, as each one is focused on specific biological features. Thus, the same set of sequences can produce different alignments, above all when sequences are less similar. Consequently, researchers and biologists do not agree about which is the most suitable way to evaluate MSAs. Recent evaluations tend to use more complex scores including further biological features. Among them, 3D structures are increasingly being used to evaluate alignments. Because structures are more conserved in proteins than sequences, scores with structural information are better suited to evaluate more distant relationships between sequences. The proposed multiobjective algorithm, based on the non-dominated sorting genetic algorithm, aims to jointly optimize three objectives: STRIKE score, non-gaps percentage and totally conserved columns. It was significantly assessed on the BAliBASE benchmark according to the Kruskal-Wallis test (P < 0.01). This algorithm also outperforms other aligners, such as ClustalW, Multiple Sequence Alignment Genetic Algorithm (MSA-GA), PRRP, DIALIGN, Hidden Markov Model Training (HMMT), Pattern-Induced Multi-sequence Alignment (PIMA), MULTIALIGN, Sequence Alignment Genetic Algorithm (SAGA), PILEUP, Rubber Band Technique Genetic Algorithm (RBT-GA) and Vertical Decomposition Genetic Algorithm (VDGA), according to the Wilcoxon signed-rank test (P < 0.05), whereas it shows results not significantly different to 3D-COFFEE (P > 0.05) with the advantage of being able to use less structures. Structural information is included within the objective function to evaluate more accurately the obtained alignments. The source code is available at http://www.ugr.es/~fortuno/MOSAStrE/MO-SAStrE.zip.

  18. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

    PubMed

    Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-07-08

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  19. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases

    PubMed Central

    Floden, Evan W.; Tommaso, Paolo D.; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-01-01

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. PMID:27106060

  20. Liquid-theory analogy of direct-coupling analysis of multiple-sequence alignment and its implications for protein structure prediction.

    PubMed

    Kinjo, Akira R

    2015-01-01

    The direct-coupling analysis is a powerful method for protein contact prediction, and enables us to extract "direct" correlations between distant sites that are latent in "indirect" correlations observed in a protein multiple-sequence alignment. I show that the direct correlation can be obtained by using a formulation analogous to the Ornstein-Zernike integral equation in liquid theory. This formulation intuitively illustrates how the indirect or apparent correlation arises from an infinite series of direct correlations, and provides interesting insights into protein structure prediction.

  1. Liquid-theory analogy of direct-coupling analysis of multiple-sequence alignment and its implications for protein structure prediction

    PubMed Central

    Kinjo, Akira R.

    2015-01-01

    The direct-coupling analysis is a powerful method for protein contact prediction, and enables us to extract “direct” correlations between distant sites that are latent in “indirect” correlations observed in a protein multiple-sequence alignment. I show that the direct correlation can be obtained by using a formulation analogous to the Ornstein-Zernike integral equation in liquid theory. This formulation intuitively illustrates how the indirect or apparent correlation arises from an infinite series of direct correlations, and provides interesting insights into protein structure prediction. PMID:27493860

  2. ProfileGrids as a new visual representation of large multiple sequence alignments: a case study of the RecA protein family

    PubMed Central

    Roca, Alberto I; Almada, Albert E; Abajian, Aaron C

    2008-01-01

    Background Multiple sequence alignments are a fundamental tool for the comparative analysis of proteins and nucleic acids. However, large data sets are no longer manageable for visualization and investigation using the traditional stacked sequence alignment representation. Results We introduce ProfileGrids that represent a multiple sequence alignment as a matrix color-coded according to the residue frequency occurring at each column position. JProfileGrid is a Java application for computing and analyzing ProfileGrids. A dynamic interaction with the alignment information is achieved by changing the ProfileGrid color scheme, by extracting sequence subsets at selected residues of interest, and by relating alignment information to residue physical properties. Conserved family motifs can be identified by the overlay of similarity plot calculations on a ProfileGrid. Figures suitable for publication can be generated from the saved spreadsheet output of the colored matrices as well as by the export of conservation information for use in the PyMOL molecular visualization program. We demonstrate the utility of ProfileGrids on 300 bacterial homologs of the RecA family – a universally conserved protein involved in DNA recombination and repair. Careful attention was paid to curating the collected RecA sequences since ProfileGrids allow the easy identification of rare residues in an alignment. We relate the RecA alignment sequence conservation to the following three topics: the recently identified DNA binding residues, the unexplored MAW motif, and a unique Bacillus subtilis RecA homolog sequence feature. Conclusion ProfileGrids allow large protein families to be visualized more effectively than the traditional stacked sequence alignment form. This new graphical representation facilitates the determination of the sequence conservation at residue positions of interest, enables the examination of structural patterns by using residue physical properties, and permits the display

  3. CLIPS-1D: analysis of multiple sequence alignments to deduce for residue-positions a role in catalysis, ligand-binding, or protein structure

    PubMed Central

    2012-01-01

    Background One aim of the in silico characterization of proteins is to identify all residue-positions, which are crucial for function or structure. Several sequence-based algorithms exist, which predict functionally important sites. However, with respect to sequence information, many functionally and structurally important sites are hard to distinguish and consequently a large number of incorrectly predicted functional sites have to be expected. This is why we were interested to design a new classifier that differentiates between functionally and structurally important sites and to assess its performance on representative datasets. Results We have implemented CLIPS-1D, which predicts a role in catalysis, ligand-binding, or protein structure for residue-positions in a mutually exclusive manner. By analyzing a multiple sequence alignment, the algorithm scores conservation as well as abundance of residues at individual sites and their local neighborhood and categorizes by means of a multiclass support vector machine. A cross-validation confirmed that residue-positions involved in catalysis were identified with state-of-the-art quality; the mean MCC-value was 0.34. For structurally important sites, prediction quality was considerably higher (mean MCC = 0.67). For ligand-binding sites, prediction quality was lower (mean MCC = 0.12), because binding sites and structurally important residue-positions share conservation and abundance values, which makes their separation difficult. We show that classification success varies for residues in a class-specific manner. This is why our algorithm computes residue-specific p-values, which allow for the statistical assessment of each individual prediction. CLIPS-1D is available as a Web service at http://www-bioinf.uni-regensburg.de/. Conclusions CLIPS-1D is a classifier, whose prediction quality has been determined separately for catalytic sites, ligand-binding sites, and structurally important sites. It generates hypotheses about

  4. Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis.

    PubMed

    Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn

    2009-01-01

    The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented.

  5. Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis

    PubMed Central

    Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron

    2009-01-01

    The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented. PMID:18971242

  6. ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection.

    PubMed

    Alachiotis, Nikolaos; Vogiatzi, Emmanouella; Pavlidis, Pavlos; Stamatakis, Alexandros

    2013-01-01

    Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors.

  7. Inexact Local Alignment Search over Suffix Arrays.

    PubMed

    Ghodsi, Mohammadreza; Pop, Mihai

    2009-11-01

    We describe an algorithm for finding approximate seeds for DNA homology searches. In contrast to previous algorithms that use exact or spaced seeds, our approximate seeds may contain insertions and deletions. We present a generalized heuristic for finding such seeds efficiently and prove that the heuristic does not affect sensitivity. We show how to adapt this algorithm to work over the memory efficient suffix array with provably minimal overhead in running time.We demonstrate the effectiveness of our algorithm on two tasks: whole genome alignment of bacteria and alignment of the DNA sequences of 177 genes that are orthologous in human and mouse. We show our algorithm achieves better sensitivity and uses less memory than other commonly used local alignment tools.

  8. MC64-ClustalWP2: a highly-parallel hybrid strategy to align multiple sequences in many-core architectures.

    PubMed

    Díaz, David; Esteban, Francisco J; Hernández, Pilar; Caballero, Juan Antonio; Guevara, Antonio; Dorado, Gabriel; Gálvez, Sergio

    2014-01-01

    We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification

  9. MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures

    PubMed Central

    Díaz, David; Esteban, Francisco J.; Hernández, Pilar; Caballero, Juan Antonio; Guevara, Antonio

    2014-01-01

    We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification

  10. Image denoising using local tangent space alignment

    NASA Astrophysics Data System (ADS)

    Feng, JianZhou; Song, Li; Huo, Xiaoming; Yang, XiaoKang; Zhang, Wenjun

    2010-07-01

    We propose a novel image denoising approach, which is based on exploring an underlying (nonlinear) lowdimensional manifold. Using local tangent space alignment (LTSA), we 'learn' such a manifold, which approximates the image content effectively. The denoising is performed by minimizing a newly defined objective function, which is a sum of two terms: (a) the difference between the noisy image and the denoised image, (b) the distance from the image patch to the manifold. We extend the LTSA method from manifold learning to denoising. We introduce the local dimension concept that leads to adaptivity to different kind of image patches, e.g. flat patches having lower dimension. We also plug in a basic denoising stage to estimate the local coordinate more accurately. It is found that the proposed method is competitive: its performance surpasses the K-SVD denoising method.

  11. Differentiated evolutionary relationships among chordates from comparative alignments of multiple sequences of MyoD and MyoG myogenic regulatory factors.

    PubMed

    Oliani, L C; Lidani, K C F; Gabriel, J E

    2015-10-16

    MyoD and MyoG are transcription factors that have essential roles in myogenic lineage determination and muscle differentiation. The purpose of this study was to compare multiple amino acid sequences of myogenic regulatory proteins to infer evolutionary relationships among chordates. Protein sequences from Mus musculus (P10085 and P12979), human Homo sapiens (P15172 and P15173), bovine Bos taurus (Q7YS82 and Q7YS81), wild pig Sus scrofa (P49811 and P49812), quail Coturnix coturnix (P21572 and P34060), chicken Gallus gallus (P16075 and P17920), rat Rattus norvegicus (Q02346 and P20428), domestic water buffalo Bubalus bubalis (D2SP11 and A7L034), and sheep Ovis aries (Q90477 and D3YKV7) were searched from a non-redundant protein sequence database UniProtKB/Swiss-Prot, and subsequently analyzed using the Mega6.0 software. MyoD evolutionary analyses revealed the presence of three main clusters with all mammals branched in one cluster, members of the order Rodentia (mouse and rat) in a second branch linked to the first, and birds of the order Galliformes (chicken and quail) remaining isolated in a third. MyoG evolutionary analyses aligned sequences in two main clusters, all mammalian specimens grouped in different sub-branches, and birds clustered in a second branch. These analyses suggest that the evolution of MyoD and MyoG was driven by different pathways.

  12. Local versus global biological network alignment

    PubMed Central

    Meng, Lei; Striegel, Aaron; Milenković, Tijana

    2016-01-01

    Motivation: Network alignment (NA) aims to find regions of similarities between species’ molecular networks. There exist two NA categories: local (LNA) and global (GNA). LNA finds small highly conserved network regions and produces a many-to-many node mapping. GNA finds large conserved regions and produces a one-to-one node mapping. Given the different outputs of LNA and GNA, when a new NA method is proposed, it is compared against existing methods from the same category. However, both NA categories have the same goal: to allow for transferring functional knowledge from well- to poorly-studied species between conserved network regions. So, which one to choose, LNA or GNA? To answer this, we introduce the first systematic evaluation of the two NA categories. Results: We introduce new measures of alignment quality that allow for fair comparison of the different LNA and GNA outputs, as such measures do not exist. We provide user-friendly software for efficient alignment evaluation that implements the new and existing measures. We evaluate prominent LNA and GNA methods on synthetic and real-world biological networks. We study the effect on alignment quality of using different interaction types and confidence levels. We find that the superiority of one NA category over the other is context-dependent. Further, when we contrast LNA and GNA in the application of learning novel protein functional knowledge, the two produce very different predictions, indicating their complementarity. Our results and software provide guidelines for future NA method development and evaluation. Availability and implementation: Software: http://www.nd.edu/~cone/LNA_GNA Contact: tmilenko@nd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27357169

  13. Local alignment of two-base encoded DNA sequence

    PubMed Central

    Homer, Nils; Merriman, Barry; Nelson, Stanley F

    2009-01-01

    Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732

  14. HomBlocks: A multiple-alignment construction pipeline for organelle phylogenomics based on locally collinear block searching.

    PubMed

    Bi, Guiqi; Mao, Yunxiang; Xing, Qikun; Cao, Min

    2017-08-03

    Organelle phylogenomic analysis requires precisely constructed multi-gene alignment matrices concatenated by pre-aligned single gene datasets. For non-bioinformaticians, it can take days to weeks to manually create high-quality multi-gene alignments comprising tens or hundreds of homologous genes. Here, we describe a new and highly efficient pipeline, HomBlocks, which uses a homologous block searching method to construct multiple sequence alignment. This approach can automatically recognize locally collinear blocks among organelle genomes and excavate phylogenetically informative regions to construct multiple sequence alignment in a few hours. In addition, HomBlocks supports organelle genomes without annotation and makes adjustment to different taxon datasets, thereby enabling the inclusion of as many common genes as possible. Topology comparison of trees built by conventional multi-gene and HomBlocks alignments implemented in different taxon categories shows that the same efficiency can be achieved by HomBlocks as when using the traditional method. The availability of Homblocks makes organelle phylogenetic analyses more accessible to non-bioinformaticians, thereby promising to lead to a better understanding of phylogenic relationships at an organelle genome level. HomBlocks is implemented in Perl and is supported by Unix-like operative systems, including Linux and macOS. The Perl source code is freely available for download from https://github.com/fenghen360/HomBlocks.git, and documentation and tutorials are available at https://github.com/fenghen360/HomBlocks. yxmao@ouc.edu.cn or fenghen360@126.com. Copyright © 2017 Elsevier Inc. All rights reserved.

  15. Dynamical localization in molecular alignment of kicked quantum rotors

    SciTech Connect

    Kamalov, A.; Broege, D. W.; Bucksbaum, P. H.

    2015-07-13

    The periodically δ -kicked quantum linear rotor is known to experience nonclassical bounded energy growth due to quantum dynamical localization in angular momentum space. We study the effect of random deviations of the kick period in simulations and experiments. This breaks the energy and angular momentum localization and increases the rotational alignment, which is the analog of the onset of Anderson localization in one-dimensional chains.

  16. Local coordinates alignment with global preservation for dimensionality reduction.

    PubMed

    Chen, Jing; Ma, Zhengming; Liu, Yang

    2013-01-01

    Dimensionality reduction is vital in many fields, and alignment-based methods for nonlinear dimensionality reduction have become popular recently because they can map the high-dimensional data into a low-dimensional subspace with the property of local isometry. However, the relationships between patches in original high-dimensional space cannot be ensured to be fully preserved during the alignment process. In this paper, we propose a novel method for nonlinear dimensionality reduction called local coordinates alignment with global preservation. We first introduce a reasonable definition of topology-preserving landmarks (TPLs), which not only contribute to preserving the global structure of datasets and constructing a collection of overlapping linear patches, but they also ensure that the right landmark is allocated to the new test point. Then, an existing method for dimensionality reduction that has good performance in preserving the global structure is used to derive the low-dimensional coordinates of TPLs. Local coordinates of each patch are derived using tangent space of the manifold at the corresponding landmark, and then these local coordinates are aligned into a global coordinate space with the set of landmarks in low-dimensional space as reference points. The proposed alignment method, called landmarks-based alignment, can produce a closed-form solution without any constraints, while most previous alignment-based methods impose the unit covariance constraint, which will result in the deficiency of global metrics and undesired rescaling of the manifold. Experiments on both synthetic and real-world datasets demonstrate the effectiveness of the proposed algorithm.

  17. Proteins comparison through probabilistic optimal structure local alignment.

    PubMed

    Micale, Giovanni; Pulvirenti, Alfredo; Giugno, Rosalba; Ferro, Alfredo

    2014-01-01

    Multiple local structure comparison helps to identify common structural motifs or conserved binding sites in 3D structures in distantly related proteins. Since there is no best way to compare structures and evaluate the alignment, a wide variety of techniques and different similarity scoring schemes have been proposed. Existing algorithms usually compute the best superposition of two structures or attempt to solve it as an optimization problem in a simpler setting (e.g., considering contact maps or distance matrices). Here, we present PROPOSAL (PROteins comparison through Probabilistic Optimal Structure local ALignment), a stochastic algorithm based on iterative sampling for multiple local alignment of protein structures. Our method can efficiently find conserved motifs across a set of protein structures. Only the distances between all pairs of residues in the structures are computed. To show the accuracy and the effectiveness of PROPOSAL we tested it on a few families of protein structures. We also compared PROPOSAL with two state-of-the-art tools for pairwise local alignment on a dataset of manually annotated motifs. PROPOSAL is available as a Java 2D standalone application or a command line program at http://ferrolab.dmi.unict.it/proposal/proposal.html.

  18. Automatic Parameter Learning for Multiple Local Network Alignment

    PubMed Central

    Novak, Antal; Do, Chuong B.; Srinivasan, Balaji S.; Batzoglou, Serafim

    2009-01-01

    Abstract We developed Græmlin 2.0, a new multiple network aligner with (1) a new multi-stage approach to local network alignment; (2) a novel scoring function that can use arbitrary features of a multiple network alignment, such as protein deletions, protein duplications, protein mutations, and interaction losses; (3) a parameter learning algorithm that uses a training set of known network alignments to learn parameters for our scoring function and thereby adapt it to any set of networks; and (4) an algorithm that uses our scoring function to find approximate multiple network alignments in linear time. We tested Græmlin 2.0's accuracy on protein interaction networks from IntAct, DIP, and the Stanford Network Database. We show that, on each of these datasets, Græmlin 2.0 has higher sensitivity and specificity than existing network aligners. Græmlin 2.0 is available under the GNU public license at http://graemlin.stanford.edu. PMID:19645599

  19. Enforcing Convexity for Improved Alignment with Constrained Local Models

    PubMed Central

    Wang, Yang; Lucey, Simon; Cohn, Jeffrey F.

    2010-01-01

    Constrained local models (CLMs) have recently demonstrated good performance in non-rigid object alignment/tracking in comparison to leading holistic approaches (e.g., AAMs). A major problem hindering the development of CLMs further, for non-rigid object alignment/tracking, is how to jointly optimize the global warp update across all local search responses. Previous methods have either used general purpose optimizers (e.g., simplex methods) or graph based optimization techniques. Unfortunately, problems exist with both these approaches when applied to CLMs. In this paper, we propose a new approach for optimizing the global warp update in an efficient manner by enforcing convexity at each local patch response surface. Furthermore, we show that the classic Lucas-Kanade approach to gradient descent image alignment can be viewed as a special case of our proposed framework. Finally, we demonstrate that our approach receives improved performance for the task of non-rigid face alignment/tracking on the MultiPIE database and the UNBC-McMaster archive. PMID:20622926

  20. Critical thresholds in flocking hydrodynamics with non-local alignment.

    PubMed

    Tadmor, Eitan; Tan, Changhui

    2014-11-13

    We study the large-time behaviour of Eulerian systems augmented with non-local alignment. Such systems arise as hydrodynamic descriptions of agent-based models for self-organized dynamics, e.g. Cucker & Smale (2007 IEEE Trans. Autom. Control 52, 852-862. (doi:10.1109/TAC.2007.895842)) and Motsch & Tadmor (2011 J. Stat. Phys. 144, 923-947. (doi:10.1007/s10955-011-0285-9)) models. We prove that, in analogy with the agent-based models, the presence of non-local alignment enforces strong solutions to self-organize into a macroscopic flock. This then raises the question of existence of such strong solutions. We address this question in one- and two-dimensional set-ups, proving global regularity for subcritical initial data. Indeed, we show that there exist critical thresholds in the phase space of the initial configuration which dictate the global regularity versus a finite-time blow-up. In particular, we explore the regularity of non-local alignment in the presence of vacuum. © 2014 The Author(s) Published by the Royal Society. All rights reserved.

  1. Flexible structural protein alignment by a sequence of local transformations

    PubMed Central

    Rocha, Jairo; Segura, Joan; Wilson, Richard C.; Dasgupta, Swagata

    2009-01-01

    Motivation: Throughout evolution, homologous proteins have common regions that stay semi-rigid relative to each other and other parts that vary in a more noticeable way. In order to compare the increasing number of structures in the PDB, flexible geometrical alignments are needed, that are reliable and easy to use. Results: We present a protein structure alignment method whose main feature is the ability to consider different rigid transformations at different sites, allowing for deformations beyond a global rigid transformation. The performance of the method is comparable with that of the best ones from 10 aligners tested, regarding both the quality of the alignments with respect to hand curated ones, and the classification ability. An analysis of some structure pairs from the literature that need to be matched in a flexible fashion are shown. The use of a series of local transformations can be exported to other classifiers, and a future golden protein similarity measure could benefit from it. Availability: A public server for the program is available at http://dmi.uib.es/ProtDeform/. Contact: jairo@uib.es Supplementary information: All data used, results and examples are available at http://dmi.uib.es/people/jairo/bio/ProtDeform.Supplementary data are available at Bioinformatics online. PMID:19417057

  2. Spin Alignment in Analogues of The Local Sheet

    NASA Astrophysics Data System (ADS)

    Conidis, George J.

    2016-10-01

    Tidal torque theory and simulations of large scale structure predict spin vectors of massive galaxies should be coplanar with sheets in the cosmic web. Recently demonstrated, the giants (K s <= -22.5 mag) in the Local Volume beyond the Local Sheet have spin vectors directed close to the plane of the Local Supercluster, supporting the predictions of Tidal Torque Theory. However, the giants in the Local Sheet encircling the Local Group display a distinctly different arrangement, suggesting that the mass asymmetry of the Local Group or its progenitor torqued them from their primordial spin directions. To investigate the origin of the spin alignment of giants locally, analogues of the Local Sheet were identified in the SDSS DR9. Similar to the Local Sheet, analogues have an interacting pair of disk galaxies isolated from the remaining sheet members. Modified sheets in which there is no interacting pair of disk galaxies were identified as a control sample. Galaxies in face-on control sheets do not display axis ratios predominantly weighted toward low values, contrary to the expectation of tidal torque theory. For face-on and edge-on sheets, the distribution of axis ratios for galaxies in analogues is distinct from that in controls with a confidence of 97.6% & 96.9%, respectively. This corroborates the hypothesis that an interacting pair can affect spin directions of neighbouring galaxies.

  3. Heuristic reusable dynamic programming: efficient updates of local sequence alignment.

    PubMed

    Hong, Changjin; Tewfik, Ahmed H

    2009-01-01

    Recomputation of the previously evaluated similarity results between biological sequences becomes inevitable when researchers realize errors in their sequenced data or when the researchers have to compare nearly similar sequences, e.g., in a family of proteins. We present an efficient scheme for updating local sequence alignments with an affine gap model. In principle, using the previous matching result between two amino acid sequences, we perform a forward-backward alignment to generate heuristic searching bands which are bounded by a set of suboptimal paths. Given a correctly updated sequence, we initially predict a new score of the alignment path for each contour to select the best candidates among them. Then, we run the Smith-Waterman algorithm in this confined space. Furthermore, our heuristic alignment for an updated sequence shows that it can be further accelerated by using reusable dynamic programming (rDP), our prior work. In this study, we successfully validate "relative node tolerance bound" (RNTB) in the pruned searching space. Furthermore, we improve the computational performance by quantifying the successful RNTB tolerance probability and switch to rDP on perturbation-resilient columns only. In our searching space derived by a threshold value of 90 percent of the optimal alignment score, we find that 98.3 percent of contours contain correctly updated paths. We also find that our method consumes only 25.36 percent of the runtime cost of sparse dynamic programming (sDP) method, and to only 2.55 percent of that of a normal dynamic programming with the Smith-Waterman algorithm.

  4. Sampling rare events: statistics of local sequence alignments.

    PubMed

    Hartmann, Alexander K

    2002-05-01

    A method to calculate probability distributions in regions where the events are very unlikely (e.g., p approximately 10(-40)) is presented. The basic idea is to map the underlying model on a physical system. The system is simulated at a low temperature, such that preferably configurations with originally low probabilities are generated. Since the distribution of such a physical system is known, the original unbiased distribution can be obtained. As an application, local alignment of protein sequences is studied. The deviation of the distribution p(S) of optimum scores from the extreme-value distribution is quantified. This deviation decreases with growing sequence length.

  5. fMRI alignment based on local functional connectivity patterns

    NASA Astrophysics Data System (ADS)

    Jiang, Di; Du, Yuhui; Cheng, Hewei; Jiang, Tianzi; Fan, Yong

    2012-02-01

    In functional neuroimaging studies, the inter-subject alignment of functional magnetic resonance imaging (fMRI) data is a necessary precursor to improve functional consistency across subjects. Traditional structural MRI based registration methods cannot achieve accurate inter-subject functional consistency in that functional units are not necessarily consistently located relative to anatomical structures due to functional variability across subjects. Although spatial smoothing commonly used in fMRI data preprocessing can reduce the inter-subject functional variability, it may blur the functional signals and thus lose the fine-grained information. In this paper we propose a novel functional signal based fMRI image registration method which aligns local functional connectivity patterns of different subjects to improve the inter-subject functional consistency. Particularly, the functional connectivity is measured using Pearson correlation. For each voxel of an fMRI image, its functional connectivity to every voxel in its local spatial neighborhood, referred to as its local functional connectivity pattern, is characterized by a rotation and shift invariant representation. Based on this representation, the spatial registration of two fMRI images is achieved by minimizing the difference between their corresponding voxels' local functional connectivity patterns using a deformable image registration model. Experiment results based on simulated fMRI data have demonstrated that the proposed method is more robust and reliable than the existing fMRI image registration methods, including maximizing functional correlations and minimizing difference of global connectivity matrices across different subjects. Experiment results based on real resting-state fMRI data have further demonstrated that the proposed fMRI registration method can statistically significantly improve functional consistency across subjects.

  6. Alignment-free local structural search by writhe decomposition.

    PubMed

    Zhi, Degui; Shatsky, Maxim; Brenner, Steven E

    2010-05-01

    Rapid methods for protein structure search enable biological discoveries based on flexibly defined structural similarity, unleashing the power of the ever greater number of solved protein structures. Projection methods show promise for the development of fast structural database search solutions. Projection methods map a structure to a point in a high-dimensional space and compare two structures by measuring distance between their projected points. These methods offer a tremendous increase in speed over residue-level structural alignment methods. However, current projection methods are not practical, partly because they are unable to identify local similarities. We propose a new projection-based approach that can rapidly detect global as well as local structural similarities. Local structural search is enabled by a topology-inspired writhe decomposition protocol that produces a small number of fragments while ensuring that similar structures are cut in a similar manner. In benchmark tests, we show that our method, writher, improves accuracy over existing projection methods in terms of recognizing scop domains out of multi-domain proteins, while maintaining accuracy comparable with existing projection methods in a standard single-domain benchmark test. The source code is available at the following website: http://compbio.berkeley.edu/proj/writher/.

  7. A local multiple alignment method for detection of non-coding RNA sequences.

    PubMed

    Tabei, Yasuo; Asai, Kiyoshi

    2009-06-15

    Non-coding RNAs (ncRNAs) show a unique evolutionary process in which the substitutions of distant bases are correlated in order to conserve the secondary structure of the ncRNA molecule. Therefore, the multiple alignment method for the detection of ncRNAs should take into account both the primary sequence and the secondary structure. Recently, there has been intense focus on multiple alignment investigations for the detection of ncRNAs; however, most of the proposed methods are designed for global multiple alignments. For this reason, these methods are not appropriate to identify locally conserved ncRNAs among genomic sequences. A more efficient local multiple alignment method for the detection of ncRNAs is required. We propose a new local multiple alignment method for the detection of ncRNAs. This method uses a local multiple alignment construction procedure inspired by ProDA, which is a local multiple aligner program for protein sequences with repeated and shuffled elements. To align sequences based on secondary structure information, we propose a new alignment model which incorporates secondary structure features. We define the conditional probability of an alignment via a conditional random field and use a gamma-centroid estimator to align sequences. The locally aligned subsequences are clustered into blocks of approximately globally alignable subsequences between pairwise alignments. Finally, these blocks are multiply aligned via MXSCARNA. In benchmark experiments, we demonstrate the high ability of the implemented software, SCARNA_LM, for local multiple alignment for the detection of ncRNAs. The C++ source code for SCARNA_LM and its experimental datasets are available at http://www.ncrna.org/software/scarna_lm/download. Supplementary data are available at Bioinformatics online.

  8. Citation Matching in Sanskrit Corpora Using Local Alignment

    NASA Astrophysics Data System (ADS)

    Prasad, Abhinandan S.; Rao, Shrisha

    Citation matching is the problem of finding which citation occurs in a given textual corpus. Most existing citation matching work is done on scientific literature. The goal of this paper is to present methods for performing citation matching on Sanskrit texts. Exact matching and approximate matching are the two methods for performing citation matching. The exact matching method checks for exact occurrence of the citation with respect to the textual corpus. Approximate matching is a fuzzy string-matching method which computes a similarity score between an individual line of the textual corpus and the citation. The Smith-Waterman-Gotoh algorithm for local alignment, which is generally used in bioinformatics, is used here for calculating the similarity score. This similarity score is a measure of the closeness between the text and the citation. The exact- and approximate-matching methods are evaluated and compared. The methods presented can be easily applied to corpora in other Indic languages like Kannada, Tamil, etc. The approximate-matching method can in particular be used in the compilation of critical editions and plagiarism detection in a literary work.

  9. Skeleton-based human action recognition using multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Ding, Wenwen; Liu, Kai; Cheng, Fei; Zhang, Jin; Li, YunSong

    2015-05-01

    Human action recognition and analysis is an active research topic in computer vision for many years. This paper presents a method to represent human actions based on trajectories consisting of 3D joint positions. This method first decompose action into a sequence of meaningful atomic actions (actionlets), and then label actionlets with English alphabets according to the Davies-Bouldin index value. Therefore, an action can be represented using a sequence of actionlet symbols, which will preserve the temporal order of occurrence of each of the actionlets. Finally, we employ sequence comparison to classify multiple actions through using string matching algorithms (Needleman-Wunsch). The effectiveness of the proposed method is evaluated on datasets captured by commodity depth cameras. Experiments of the proposed method on three challenging 3D action datasets show promising results.

  10. Dark Field Technology - A Practical Approach To Local Alignment

    NASA Astrophysics Data System (ADS)

    Beaulieu, David R.; Hellebrekers, Paul P.

    1987-01-01

    A fully automated direct reticle reference alignment system for use in step and repeat camera systems is described. The technique, first outlined by Janus S. Wilczynski, ("Optical Step and Repeat Camera with Dark Field Alignment", J. Vac. Technol., 16(6), Nov./Dec. 1979), has been implemented on GCA Corporation's DSW Wafer Stepper. Results from various process levels covering the typical CMOS process have shown that better than ±0.2μm alignment accuracy can be obtained with minimal process sensitivity. The technique employs fixed illumination and microscope optics to achieve excellent registration stability and maintenance-free operation. Latent image techniques can be exploited for intra-field, grid and focus characterization.

  11. Prostate lesion detection and localization based on locality alignment discriminant analysis

    NASA Astrophysics Data System (ADS)

    Lin, Mingquan; Chen, Weifu; Zhao, Mingbo; Gibson, Eli; Bastian-Jordan, Matthew; Cool, Derek W.; Kassam, Zahra; Chow, Tommy W. S.; Ward, Aaron; Chiu, Bernard

    2017-03-01

    Prostatic adenocarcinoma is one of the most commonly occurring cancers among men in the world, and it also the most curable cancer when it is detected early. Multiparametric MRI (mpMRI) combines anatomic and functional prostate imaging techniques, which have been shown to produce high sensitivity and specificity in cancer localization, which is important in planning biopsies and focal therapies. However, in previous investigations, lesion localization was achieved mainly by manual segmentation, which is time-consuming and prone to observer variability. Here, we developed an algorithm based on locality alignment discriminant analysis (LADA) technique, which can be considered as a version of linear discriminant analysis (LDA) localized to patches in the feature space. Sensitivity, specificity and accuracy generated by the proposed algorithm in five prostates by LADA were 52.2%, 89.1% and 85.1% respectively, compared to 31.3%, 85.3% and 80.9% generated by LDA. The delineation accuracy attainable by this tool has a potential in increasing the cancer detection rate in biopsies and in minimizing collateral damage of surrounding tissues in focal therapies.

  12. Swarm intelligence in bioinformatics: methods and implementations for discovering patterns of multiple sequences.

    PubMed

    Cui, Zhihua; Zhang, Yi

    2014-02-01

    As a promising and innovative research field, bioinformatics has attracted increasing attention recently. Beneath the enormous number of open problems in this field, one fundamental issue is about the accurate and efficient computational methodology that can deal with tremendous amounts of data. In this paper, we survey some applications of swarm intelligence to discover patterns of multiple sequences. To provide a deep insight, ant colony optimization, particle swarm optimization, artificial bee colony and artificial fish swarm algorithm are selected, and their applications to multiple sequence alignment and motif detecting problem are discussed.

  13. Alignment of galaxies relative to their local environment in SDSS-DR8

    NASA Astrophysics Data System (ADS)

    Hirv, A.; Pelt, J.; Saar, E.; Tago, E.; Tamm, A.; Tempel, E.; Einasto, M.

    2017-03-01

    Aims: We study the alignment of galaxies relative to their local environment in SDSS-DR8 and, using these data, we discuss evolution scenarios for different types of galaxies. Methods: We defined a vector field of the direction of anisotropy of the local environment of galaxies. We summed the unit direction vectors of all close neighbours of a given galaxy in a particular way to estimate this field. We found the alignment angles between the spin axes of disc galaxies, or the minor axes of elliptical galaxies, and the direction of anisotropy. The distributions of cosines of these angles are compared to the random distributions to analyse the alignment of galaxies. Results: Sab galaxies show perpendicular alignment relative to the direction of anisotropy in a sparse environment, for single galaxies and galaxies of low luminosity. Most of the parallel alignment of Scd galaxies comes from dense regions, from 2...3 member groups and from galaxies with low luminosity. The perpendicular alignment of S0 galaxies does not depend strongly on environmental density nor luminosity; it is detected for single and 2...3 member group galaxies, and for main galaxies of 4...10 member groups. The perpendicular alignment of elliptical galaxies is clearly detected for single galaxies and for members of ≤10 member groups; the alignment increases with environmental density and luminosity. Conclusions: We confirm the existence of fossil tidally induced alignment of Sab galaxies at low z. The alignment of Scd galaxies can be explained via the infall of matter to filaments. S0 galaxies may have encountered relatively massive mergers along the direction of anisotropy. Major mergers along this direction can explain the alignment of elliptical galaxies. Less massive, but repeated mergers are possibly responsible for the formation of elliptical galaxies in sparser areas and for less luminous elliptical galaxies.

  14. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

    SciTech Connect

    Daily, Jeffrey A.

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates per second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.

  15. Alignments of the galaxies in and around the Virgo cluster with the local velocity shear

    SciTech Connect

    Lee, Jounghun; Rey, Soo Chang; Kim, Suk

    2014-08-10

    Observational evidence is presented for the alignment between the cosmic sheet and the principal axis of the velocity shear field at the position of the Virgo cluster. The galaxies in and around the Virgo cluster from the Extended Virgo Cluster Catalog that was recently constructed by Kim et al. are used to determine the direction of the local sheet. The peculiar velocity field reconstructed from the Sloan Digital Sky Survey Data Release 7 is analyzed to estimate the local velocity shear tensor at the Virgo center. Showing first that the minor principal axis of the local velocity shear tensor is almost parallel to the direction of the line of sight, we detect a clear signal of alignment between the positions of the Virgo satellites and the intermediate principal axis of the local velocity shear projected onto the plane of the sky. Furthermore, the dwarf satellites are found to appear more strongly aligned than their normal counterparts, which is interpreted as an indication of the following. (1) The normal satellites and the dwarf satellites fall in the Virgo cluster preferentially along the local filament and the local sheet, respectively. (2) The local filament is aligned with the minor principal axis of the local velocity shear while the local sheet is parallel to the plane spanned by the minor and intermediate principal axes. Our result is consistent with the recent numerical claim that the velocity shear is a good tracer of the cosmic web.

  16. Localized α4 Integrin Phosphorylation Directs Shear Stress-Induced Endothelial Cell Alignment

    PubMed Central

    Goldfinger, Lawrence E.; Tzima, Eleni; Stockton, Rebecca; Kiosses, William B.; Kinbara, Kayoko; Tkachenko, Eugene; Gutierrez, Edgar; Groisman, Alex; Nguyen, Phu; Chien, Shu; Ginsberg1, Mark H.

    2009-01-01

    Vascular endothelial cells respond to laminar shear stress by aligning in the direction of flow, a process which may contribute to athero-protection. Here we report that localized α4 integrin phosphorylation is a mechanism for establishing the directionality of shear stress-induced alignment in microvascular endothelial cells. Within 5 minutes of exposure to a physiological level of shear stress, endothelial α4 integrins became phosphorylated on Ser988. In wounded monolayers, phosphorylation was enhanced at the downstream edges of cells relative to the source of flow. The shear-induced α4 integrin phosphorylation was blocked by inhibitors of cAMP-dependent protein kinase A (PKA), an enzyme involved in the alignment of endothelial cells under prolonged shear. Moreover, shear-induced localized activation of the small GTPase Rac1, which specifies the directionality of endothelial alignment, was similarly blocked by PKA inhibitors. Furthermore, endothelial cells bearing a non-phosphorylatable α4(S988A) mutation failed to align in response to shear stress, thus establishing α4 as a relevant PKA substrate. We thereby show that shear-induced PKA-dependent α4 integrin phosphorylation at the downstream edge of endothelial cells promotes localized Rac1 activation, which in turn directs cytoskeletal alignment in response to shear stress. PMID:18583710

  17. Information conveyed by inferior colliculus neurons about stimuli with aligned and misaligned sound localization cues

    PubMed Central

    Young, Eric D.

    2011-01-01

    Previous studies have demonstrated that single neurons in the central nucleus of the inferior colliculus (ICC) are sensitive to multiple sound localization cues. We investigated the hypothesis that ICC neurons are specialized to encode multiple sound localization cues that are aligned in space (as would naturally occur from a single broadband sound source). Sound localization cues including interaural time differences (ITDs), interaural level differences (ILDs), and spectral shapes (SSs) were measured in a marmoset monkey. Virtual space methods were used to generate stimuli with aligned and misaligned combinations of cues while recording in the ICC of the same monkey. Mutual information (MI) between spike rates and stimuli for aligned versus misaligned cues were compared. Neurons with best frequencies (BFs) less than ∼11 kHz mostly encoded information about a single sound localization cue, ITD or ILD depending on frequency, consistent with the dominance of ear acoustics by either ITD or ILD at those frequencies. Most neurons with BFs >11 kHz encoded information about multiple sound localization cues, usually ILD and SS, and were sensitive to their alignment. In some neurons MI between stimuli and spike responses was greater for aligned cues, while in others it was greater for misaligned cues. If SS cues were shifted to lower frequencies in the virtual space stimuli, a similar result was found for neurons with BFs <11 kHz, showing that the cue interaction reflects the spectra of the stimuli and not a specialization for representing SS cues. In general the results show that ICC neurons are sensitive to multiple localization cues if they are simultaneously present in the frequency response area of the neuron. However, the representation is diffuse in that there is not a specialization in the ICC for encoding aligned sound localization cues. PMID:21653729

  18. Local alignment vectors reveal cancer cell-induced ECM fiber remodeling dynamics.

    PubMed

    Lee, Byoungkoo; Konen, Jessica; Wilkinson, Scott; Marcus, Adam I; Jiang, Yi

    2017-01-03

    Invasive cancer cells interact with the surrounding extracellular matrix (ECM), remodeling ECM fiber network structure by condensing, degrading, and aligning these fibers. We developed a novel local alignment vector analysis method to quantitatively measure collagen fiber alignment as a vector field using Circular Statistics. This method was applied to human non-small cell lung carcinoma (NSCLC) cell lines, embedded as spheroids in a collagen gel. Collagen remodeling was monitored using second harmonic generation imaging under normal conditions and when the LKB1-MARK1 pathway was disrupted through RNAi-based approaches. The results showed that inhibiting LKB1 or MARK1 in NSCLC increases the collagen fiber alignment and captures outward alignment vectors from the tumor spheroid, corresponding to high invasiveness of LKB1 mutant cancer cells. With time-lapse imaging of ECM micro-fiber morphology, the local alignment vector can measure the dynamic signature of invasive cancer cell activity and cell-migration-induced ECM and collagen remodeling and realigning dynamics.

  19. Local alignment vectors reveal cancer cell-induced ECM fiber remodeling dynamics

    PubMed Central

    Lee, Byoungkoo; Konen, Jessica; Wilkinson, Scott; Marcus, Adam I.; Jiang, Yi

    2017-01-01

    Invasive cancer cells interact with the surrounding extracellular matrix (ECM), remodeling ECM fiber network structure by condensing, degrading, and aligning these fibers. We developed a novel local alignment vector analysis method to quantitatively measure collagen fiber alignment as a vector field using Circular Statistics. This method was applied to human non-small cell lung carcinoma (NSCLC) cell lines, embedded as spheroids in a collagen gel. Collagen remodeling was monitored using second harmonic generation imaging under normal conditions and when the LKB1-MARK1 pathway was disrupted through RNAi-based approaches. The results showed that inhibiting LKB1 or MARK1 in NSCLC increases the collagen fiber alignment and captures outward alignment vectors from the tumor spheroid, corresponding to high invasiveness of LKB1 mutant cancer cells. With time-lapse imaging of ECM micro-fiber morphology, the local alignment vector can measure the dynamic signature of invasive cancer cell activity and cell-migration-induced ECM and collagen remodeling and realigning dynamics. PMID:28045069

  20. Ligand Binding Site Detection by Local Structure Alignment and Its Performance Complementarity

    PubMed Central

    Lee, Hui Sun; Im, Wonpil

    2013-01-01

    Accurate determination of potential ligand binding sites (BS) is a key step for protein function characterization and structure-based drug design. Despite promising results of template-based BS prediction methods using global structure alignment (GSA), there is a room to improve the performance by properly incorporating local structure alignment (LSA) because BS are local structures and often similar for proteins with dissimilar global folds. We present a template-based ligand BS prediction method using G-LoSA, our LSA tool. A large benchmark set validation shows that G-LoSA predicts drug-like ligands’ positions in single-chain protein targets more precisely than TM-align, a GSA-based method, while the overall success rate of TM-align is better. G-LoSA is particularly efficient for accurate detection of local structures conserved across proteins with diverse global topologies. Recognizing the performance complementarity of G-LoSA to TM-align and a non-template geometry-based method, fpocket, a robust consensus scoring method, CMCS-BSP (Complementary Methods and Consensus Scoring for ligand Binding Site Prediction), is developed and shows improvement on prediction accuracy. The G-LoSA source code is freely available at http://im.bioinformatics.ku.edu/GLoSA. PMID:23957286

  1. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

    DOE PAGES

    Daily, Jeffrey A.

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less

  2. LocalAli: an evolutionary-based local alignment approach to identify functionally conserved modules in multiple networks.

    PubMed

    Hu, Jialu; Reinert, Knut

    2015-02-01

    Sequences and protein interaction data are of significance to understand the underlying molecular mechanism of organisms. Local network alignment is one of key systematic ways for predicting protein functions, identifying functional modules and understanding the phylogeny from these data. Most of currently existing tools, however, encounter their limitations, which are mainly concerned with scoring scheme, speed and scalability. Therefore, there are growing demands for sophisticated network evolution models and efficient local alignment algorithms. We developed a fast and scalable local network alignment tool called LocalAli for the identification of functionally conserved modules in multiple networks. In this algorithm, we firstly proposed a new framework to reconstruct the evolution history of conserved modules based on a maximum-parsimony evolutionary model. By relying on this model, LocalAli facilitates interpretation of resulting local alignments in terms of conserved modules, which have been evolved from a common ancestral module through a series of evolutionary events. A meta-heuristic method simulated annealing was used to search for the optimal or near-optimal inner nodes (i.e. ancestral modules) of the evolutionary tree. To evaluate the performance and the statistical significance, LocalAli were tested on 26 real datasets and 1040 randomly generated datasets. The results suggest that LocalAli outperforms all existing algorithms in terms of coverage, consistency and scalability, meanwhile retains a high precision in the identification of functionally coherent subnetworks. The source code and test datasets are freely available for download under the GNU GPL v3 license at https://code.google.com/p/localali/. jialu.hu@fu-berlin.de or knut.reinert@fu-berlin.de. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  3. Calculation of residual dipolar couplings from disordered state ensembles using local alignment.

    PubMed

    Marsh, Joseph A; Baker, Jennifer M R; Tollinger, Martin; Forman-Kay, Julie D

    2008-06-25

    Residual dipolar couplings (RDCs) have been observed in disordered states of several proteins. While their nonuniform values were initially surprising, it has been shown that reasonable approximation of experimental RDCs can be obtained using simple statistical coil models and assuming global alignment of each structure, provided that many thousands of conformers are averaged. Here we show that, by using short local alignment tensors, we can achieve good agreement between experimental and simulated RDCs with far fewer structures than required when using global alignment. This makes the possibility of using RDCs as direct restraints in structural calculations of disordered proteins much more feasible. In addition, it provides insight into the nature of RDCs in disordered states, suggesting that they are primarily reporting on local structure.

  4. ALP & FALP: C++ libraries for pairwise local alignment E-values.

    PubMed

    Sheetlin, Sergey; Park, Yonil; Frith, Martin C; Spouge, John L

    2016-01-15

    Pairwise local alignment is an indispensable tool for molecular biologists. In real time (i.e. in about 1 s), ALP (Ascending Ladder Program) calculates the E-values for protein-protein or DNA-DNA local alignments of random sequences, for arbitrary substitution score matrix, gap costs and letter abundances; and FALP (Frameshift Ascending Ladder Program) performs a similar task, although more slowly, for frameshifting DNA-protein alignments. To permit other C++ programmers to implement the computational efficiencies in ALP and FALP directly within their own programs, C++ source codes are available in the public domain at http://go.usa.gov/3GTSW under 'ALP' and 'FALP', along with the standalone programs ALP and FALP. spouge@nih.gov Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2015. This work is written by US Government employees and is in the public domain in the US.

  5. Dust Grain Alignment and Magnetic Field Strength in the Wall of the Local Bubble

    NASA Astrophysics Data System (ADS)

    Andersson, B.-G.; Medan, Ilija

    2017-01-01

    We use archival data on polarization (Berdyugin 2014) and extinction in the wall of the Local Bubble to study the grain alignment efficiency and the magnetic field strength. We find that the grain alignment efficiency variations can be directly tied to the location of the known OB-associations within 200pc from the Sun, strongly supporting modern, radiation-driven dust grain alignment. Based on the Davis-Chandrasekhar-Fermi method, we find a bimodal magnetic field-strength distribution, where the locations of the strongest fields correlate with the directions towards the near-by OB associations. We hypothesize that this strengthening is due to compression of the bubble wall by the opposing outflows in the Local Bubble and from the surrounding OB associations.

  6. Use of local alignment tensors for the determination of relative configurations in organic compounds.

    PubMed

    Thiele, Christina M; Maliniak, Arnold; Stevensson, Baltzar

    2009-09-16

    In this proof of principle the use of local alignment tensors for the determination of relative configurations in moderately flexible molecules is demonstrated. These tensors are derived from residual dipolar couplings. Two methods for the analysis of partly linearly dependent RDCs in a rigid molecular fragment are also presented.

  7. Coordination Analysis Using Global Structural Constraints and Alignment-based Local Features

    NASA Astrophysics Data System (ADS)

    Hara, Kazuo; Shimbo, Masashi; Matsumoto, Yuji

    We propose a hybrid approach to coordinate structure analysis that combines a simple grammar to ensure consistent global structure of coordinations in a sentence, and features based on sequence alignment to capture local symmetry of conjuncts. The weight of the alignment-based features, which in turn determines the score of coordinate structures, is optimized by perceptron training on a given corpus. A bottom-up chart parsing algorithm efficiently finds the best scoring structure, taking both nested or non-overlapping flat coordinations into account. We demonstrate that our approach outperforms existing parsers in coordination scope detection on the Genia corpus.

  8. Local-global alignment for finding 3D similarities in protein structures

    DOEpatents

    Zemla, Adam T [Brentwood, CA

    2011-09-20

    A method of finding 3D similarities in protein structures of a first molecule and a second molecule. The method comprises providing preselected information regarding the first molecule and the second molecule. Comparing the first molecule and the second molecule using Longest Continuous Segments (LCS) analysis. Comparing the first molecule and the second molecule using Global Distance Test (GDT) analysis. Comparing the first molecule and the second molecule using Local Global Alignment Scoring function (LGA_S) analysis. Verifying constructed alignment and repeating the steps to find the regions of 3D similarities in protein structures.

  9. Usp16 regulates kinetochore localization of Plk1 to promote proper chromosome alignment in mitosis

    PubMed Central

    Zhuo, Xiaolong; Guo, Xiao; Zhang, Xiaoyan; Jing, Guihua; Wang, Yao; Chen, Qiang; Jiang, Qing

    2015-01-01

    During the G2 to M phase transition, a portion of mitotic regulator Plk1 localizes to the kinetochores and regulates the initiation of kinetochore–microtubule attachments for proper chromosome alignment. Once kinetochore–microtubule attachment is achieved, this portion of Plk1 is removed from the kinetochores as a result of ubiquitination. However, the crucial molecular mechanism that promotes the localization and the maintenance of Plk1 on the kinetochores until metaphase is still unclear. We report that ubiquitin-specific peptidase 16 (Usp16) plays a key role during this process. Usp16 deubiquitinates Plk1, resulting in an enhanced interaction with kinetochore-localized proteins such as BubR1, and thereby retains Plk1 on the kinetochores to promote proper chromosome alignment in early mitosis. Down-regulation of Usp16 causes increased ubiquitination and decreased kinetochore localization of Plk1. Thus, our data unveil a unique mechanism by which Usp16 promotes the localization and maintenance of Plk1 on the kinetochores for proper chromosome alignment. PMID:26323689

  10. Usp16 regulates kinetochore localization of Plk1 to promote proper chromosome alignment in mitosis.

    PubMed

    Zhuo, Xiaolong; Guo, Xiao; Zhang, Xiaoyan; Jing, Guihua; Wang, Yao; Chen, Qiang; Jiang, Qing; Liu, Junjun; Zhang, Chuanmao

    2015-08-31

    During the G2 to M phase transition, a portion of mitotic regulator Plk1 localizes to the kinetochores and regulates the initiation of kinetochore-microtubule attachments for proper chromosome alignment. Once kinetochore-microtubule attachment is achieved, this portion of Plk1 is removed from the kinetochores as a result of ubiquitination. However, the crucial molecular mechanism that promotes the localization and the maintenance of Plk1 on the kinetochores until metaphase is still unclear. We report that ubiquitin-specific peptidase 16 (Usp16) plays a key role during this process. Usp16 deubiquitinates Plk1, resulting in an enhanced interaction with kinetochore-localized proteins such as BubR1, and thereby retains Plk1 on the kinetochores to promote proper chromosome alignment in early mitosis. Down-regulation of Usp16 causes increased ubiquitination and decreased kinetochore localization of Plk1. Thus, our data unveil a unique mechanism by which Usp16 promotes the localization and maintenance of Plk1 on the kinetochores for proper chromosome alignment. © 2015 Zhuo et al.

  11. Method of Deployment of a Space Tethered System Aligned to the Local Vertical

    NASA Astrophysics Data System (ADS)

    Zakrzhevskii, A. E.

    2016-09-01

    The object of this research is a space tether of two bodies connected by a flexible massless string. The research objective is the development and theoretical justification of a novel approach to the solution of the problem of deployment of the space tether in a circular orbit with its alignment to the local vertical. The approach is based on use of the theorem on the angular momentum change. It allows developing the open-loop control of the tether length that provides desired change of the angular momentum of the tether under the effect of the gravitational torque to the value, which corresponds to the angular momentum of the deployed tether aligned to the local vertical. The given example of application of the approach to a case of deployment of a tether demonstrates the simplicity of use of the method in practice, and also the method of validation of the mathematical model.

  12. MSACompro: improving multiple protein sequence alignment by predicted structural features.

    PubMed

    Deng, Xin; Cheng, Jianlin

    2014-01-01

    Multiple Sequence Alignment (MSA) is an essential tool in protein structure modeling, gene and protein function prediction, DNA motif recognition, phylogenetic analysis, and many other bioinformatics tasks. Therefore, improving the accuracy of multiple sequence alignment is an important long-term objective in bioinformatics. We designed and developed a new method MSACompro to incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into the currently most accurate posterior probability-based MSA methods to improve the accuracy of multiple sequence alignments. Different from the multiple sequence alignment methods that use the tertiary structure information of some sequences, our method uses the structural information purely predicted from sequences. In this chapter, we first introduce some background and related techniques in the field of multiple sequence alignment. Then, we describe the detailed algorithm of MSACompro. Finally, we show that integrating predicted protein structural information improved the multiple sequence alignment accuracy.

  13. Local Release of Paclitaxel from Aligned, Electrospun Microfibers Promotes Axonal Extension.

    PubMed

    Roman, Jose A; Reucroft, Ian; Martin, Russell A; Hurtado, Andres; Mao, Hai-Quan

    2016-10-01

    Traumatic spinal cord injuries ultimately result in an inhibitory environment that prevents axonal regeneration from occurring. A low concentration administration of paclitaxel has been previously shown to promote axonal extension and attenuate the upregulation of inhibitory molecules after a spinal cord injury. In this study, paclitaxel is incorporated into electrospun poly(l-lactic acid) (PLA) microfibers, and it is established that a local release of paclitaxel from aligned, electrospun microfibers promotes neurite extension in a growth-conducive and inhibitory environment. Isolated dorsal root ganglion cells are cultured for 5 d directly on tissue culture polystyrene surface, PLA film, random, or aligned electrospun PLA microfibers (1.44 ± 0.03 μm) with paclitaxel incorporated at various concentrations (0%-5.0% w/w in reference to fiber weight). To determine the effect of a local release of paclitaxel, paclitaxel-loaded microfibers are placed in CellCrown inserts above cultured neurons. Average neurite extension rate is quantified for each sample. A local release of paclitaxel maintains neuronal survival and neurite extension in a concentration-dependent manner when coupled with aligned microfibers when cultured on laminin or an inhibitory surface of aggrecan. The findings provide a targeted approach to improve axonal extension across the inhibitory environment present after a traumatic injury in the spinal cord.

  14. Local and nonlocal strain rate fields and vorticity alignment in turbulent flows.

    PubMed

    Hamlington, Peter E; Schumacher, Jörg; Dahm, Werner J A

    2008-02-01

    Local and nonlocal contributions to the total strain rate tensor S(ij) at any point x in a flow are formulated from an expansion of the vorticity field in a local spherical neighborhood of radius R centered on x. The resulting exact expression allows the nonlocal (background) strain rate tensor S(ij)(B)(x) to be obtained from S(ij)(x). In turbulent flows, where the vorticity naturally concentrates into relatively compact structures, this allows the local alignment of vorticity with the most extensional principal axis of the background strain rate tensor to be evaluated. In the vicinity of any vortical structure, the required radius R and corresponding order n to which the expansion must be carried are determined by the viscous length scale lambda(nu). We demonstrate the convergence to the background strain rate field with increasing R and n for an equilibrium Burgers vortex, and show that this resolves the anomalous alignment of vorticity with the intermediate eigenvector of the total strain rate tensor. We then evaluate the background strain field S(ij)(B)(x) in direct numerical simulations of homogeneous isotropic turbulence where, even for the limited R and n corresponding to the truncated series expansion, the results show an increase in the expected equilibrium alignment of vorticity with the most extensional principal axis of the background strain rate tensor.

  15. An Efficient Parallel Algorithm for Multiple Sequence Similarities Calculation Using a Low Complexity Method

    PubMed Central

    Marucci, Evandro A.; Neves, Leandro A.; Valêncio, Carlo R.; Pinto, Alex R.; Cansian, Adriano M.; de Souza, Rogeria C. G.; Shiyou, Yang; Machado, José M.

    2014-01-01

    With the advance of genomic researches, the number of sequences involved in comparative methods has grown immensely. Among them, there are methods for similarities calculation, which are used by many bioinformatics applications. Due the huge amount of data, the union of low complexity methods with the use of parallel computing is becoming desirable. The k-mers counting is a very efficient method with good biological results. In this work, the development of a parallel algorithm for multiple sequence similarities calculation using the k-mers counting method is proposed. Tests show that the algorithm presents a very good scalability and a nearly linear speedup. For 14 nodes was obtained 12x speedup. This algorithm can be used in the parallelization of some multiple sequence alignment tools, such as MAFFT and MUSCLE. PMID:25140318

  16. A new method for improving functional-to-structural MRI alignment using local Pearson correlation.

    PubMed

    Saad, Ziad S; Glen, Daniel R; Chen, Gang; Beauchamp, Michael S; Desai, Rutvik; Cox, Robert W

    2009-02-01

    Accurate registration of Functional Magnetic Resonance Imaging (FMRI) T2-weighted volumes to same-subject high-resolution T1-weighted structural volumes is important for Blood Oxygenation Level Dependent (BOLD) FMRI and crucial for applications such as cortical surface-based analyses and pre-surgical planning. Such registration is generally implemented by minimizing a cost functional, which measures the mismatch between two image volumes over the group of proper affine transformations. Widely used cost functionals, such as mutual information (MI) and correlation ratio (CR), appear to yield decent alignments when visually judged by matching outer brain contours. However, close inspection reveals that internal brain structures are often significantly misaligned. Poor registration is most evident in the ventricles and sulcal folds, where CSF is concentrated. This observation motivated our development of an improved modality-specific cost functional which uses a weighted local Pearson coefficient (LPC) to align T2- and T1-weighted images. In the absence of an alignment gold standard, we used three human observers blinded to registration method to provide an independent assessment of the quality of the registration for each cost functional. We found that LPC performed significantly better (p<0.001) than generic cost functionals including MI and CR. Generic cost functionals were very often not minimal near the best alignment, thereby suggesting that optimization is not the cause of their failure. Lastly, we emphasize the importance of precise visual inspection of alignment quality and present an automated method for generating composite images that help capture errors of misalignment.

  17. An effortless procedure to align the local frame of an inertial measurement unit to the local frame of another motion capture system.

    PubMed

    Chardonnens, Julien; Favre, Julien; Aminian, Kamiar

    2012-08-31

    Inertial measurement units (IMUs) offer great opportunities to analyze segmental and joints kinematics. When combined with another motion capture system (MCS), for example, to validate new IMU-based applications or to develop mixed systems, it is necessary to align the local frame of the IMU sensors to the local frame of the MCS. Currently, all alignment methods use landmarks on the IMU's casing. Therefore, they can only be used with well-documented IMUs and they are prone to error when the IMU's casing is small. This study proposes an effortless procedure to align the local frame of any IMU to the local frame of any other MCS able to measure the orientation of its local frame. The general concept of this method is to derive the gyroscopic angles for both devices during an alignment movement, and then to use an optimization algorithm to calculate the alignment matrix between both local frames. The alignment movement consists of rotations around three more or less orthogonal axes and it can easily be performed by hands. To test the alignment procedure, an IMU and a magnetic marker were attached to a plate, and 20 alignment movements were recorded. The maximum errors of alignment (accuracy±precision) were 1.02°±0.32° and simulations showed that the method was robust against noise that typically affect IMUs. In conclusion, this study describes an efficient alignment procedure that is quick and easy to perform, and that does not require any alignment device or any knowledge about the IMU casing.

  18. Hole localization, water dissociation mechanisms, and band alignment at aqueous-titania interfaces

    NASA Astrophysics Data System (ADS)

    Lyons, John L.

    Photocatalytic water splitting is a promising method for generating clean energy, but materials that can efficiently act as photocatalysts are scarce. This is in part due to the fact that exposure to water can strongly alter semiconductor surfaces and therefore photocatalyst performance. Many materials are not stable in aqueous environments; in other cases, local changes in structure may occur, affecting energy-level alignment. Even in the simplest case, dynamic fluctuations modify the organization of interface water. Accounting for such effects requires knowledge of the dominant local structural motifs and also accurate semiconductor band-edge positions, making quantitative prediction of energy-level alignments computationally challenging. Here we employ a combined theoretical approach to study the structure, energy alignment, and hole localization at aqueous-titania interfaces. We calculate the explicit aqueous-semiconductor interface using ab initio molecular dynamics, which provides the fluctuating atomic structure, the extent of water dissociation, and the resulting electrostatic potential. For both anatase and rutile TiO2 we observe spontaneous water dissociation and re-association events that occur via distinct mechanisms. We also find a higher-density water layer occurring on anatase. In both cases, we find that the second monolayer of water plays a crucial role in controlling the extent of water dissociation. Using hybrid functional calculations, we then investigate the propensity for dissociated waters to stabilize photo-excited carriers, and compare the results of rutile and anatase aqueous interfaces. Finally, we use the GW approach from many-body perturbation theory to obtain the position of semiconductor band edges relative to the occupied 1b1 level and thus the redox levels of water, and examine how local structural modifications affect these offsets. This work was performed in collaboration with N. Kharche, M. Z. Ertem, J. T. Muckerman, and M. S

  19. Model of myosin node aggregation into a contractile ring: the effect of local alignment

    NASA Astrophysics Data System (ADS)

    Ojkic, Nikola; Wu, Jian-Qiu; Vavylonis, Dimitrios

    2011-09-01

    Actomyosin bundles frequently form through aggregation of membrane-bound myosin clusters. One such example is the formation of the contractile ring in fission yeast from a broad band of cortical nodes. Nodes are macromolecular complexes containing several dozens of myosin-II molecules and a few formin dimers. The condensation of a broad band of nodes into the contractile ring has been previously described by a search, capture, pull and release (SCPR) model. In SCPR, a random search process mediated by actin filaments nucleated by formins leads to transient actomyosin connections among nodes that pull one another into a ring. The SCPR model reproduces the transport of nodes over long distances and predicts observed clump-formation instabilities in mutants. However, the model does not generate transient linear elements and meshwork structures as observed in some wild-type and mutant cells during ring assembly. As a minimal model of node alignment, we added short-range aligning forces to the SCPR model representing currently unresolved mechanisms that may involve structural components, cross-linking and bundling proteins. We studied the effect of the local node alignment mechanism on ring formation numerically. We varied the new parameters and found viable rings for a realistic range of values. Morphologically, transient structures that form during ring assembly resemble those observed in experiments with wild-type and cdc25-22 cells. Our work supports a hierarchical process of ring self-organization involving components drawn together from distant parts of the cell followed by progressive stabilization.

  20. Co-Orientation: Quantifying Simultaneous Co-Localization and Orientational Alignment of Filaments in Light Microscopy.

    PubMed

    Nieuwenhuizen, Robert P J; Nahidiazar, Leila; Manders, Erik M M; Jalink, Kees; Stallinga, Sjoerd; Rieger, Bernd

    2015-01-01

    Co-localization analysis is a widely used tool to seek evidence for functional interactions between molecules in different color channels in microscopic images. Here we extend the basic co-localization analysis by including the orientations of the structures on which the molecules reside. We refer to the combination of co-localization of molecules and orientational alignment of the structures on which they reside as co-orientation. Because the orientation varies with the length scale at which it is evaluated, we consider this scale as a separate informative dimension in the analysis. Additionally we introduce a data driven method for testing the statistical significance of the co-orientation and provide a method for visualizing the local co-orientation strength in images. We demonstrate our methods on simulated localization microscopy data of filamentous structures, as well as experimental images of similar structures acquired with localization microscopy in different color channels. We also show that in cultured primary HUVEC endothelial cells, filaments of the intermediate filament vimentin run close to and parallel with microtubuli. In contrast, no co-orientation was found between keratin and actin filaments. Co-orientation between vimentin and tubulin was also observed in an endothelial cell line, albeit to a lesser extent, but not in 3T3 fibroblasts. These data therefore suggest that microtubuli functionally interact with the vimentin network in a cell-type specific manner.

  1. Mapping local orientation of aligned fibrous scatterers for cancerous tissues using backscattering Mueller matrix imaging

    NASA Astrophysics Data System (ADS)

    He, Honghui; Sun, Minghao; Zeng, Nan; Du, E.; Liu, Shaoxiong; Guo, Yihong; Wu, Jian; He, Yonghong; Ma, Hui

    2014-10-01

    Polarization measurements are sensitive to the microstructure of tissues and can be used to detect pathological changes. Many tissues contain anisotropic fibrous structures. We obtain the local orientation of aligned fibrous scatterers using different groups of the backscattering Mueller matrix elements. Experiments on concentrically well-aligned silk fibers and unstained human papillary thyroid carcinoma tissues show that the m22, m33, m23, and m32 elements have better contrast but higher degeneracy for the extraction of orientation angles. The m12 and m13 elements show lower contrast, but allow us to determine the orientation angle for the fibrous scatterers along all directions. Moreover, Monte Carlo simulations based on the sphere-cylinder scattering model indicate that the oblique incidence of the illumination beam introduces some errors in the orientation angles obtained by both methods. Mapping the local orientation of anisotropic tissues may not only provide information on pathological changes, but can also give new leads to reduce the orientation dependence of polarization measurements.

  2. Understanding the nanoscale local buckling behavior of vertically aligned MWCNT arrays with van der Waals interactions.

    PubMed

    Li, Yupeng; Kim, Hyung-ick; Wei, Bingqing; Kang, Junmo; Choi, Jae-boong; Nam, Jae-Do; Suhr, Jonghwan

    2015-09-14

    The local buckling behavior of vertically aligned carbon nanotubes (VACNTs) has been investigated and interpreted in the view of a collective nanotube response by taking van der Waals interactions into account. To the best of our knowledge, this is the first report on the case of collective VACNT behavior regarding van der Waals force among nanotubes as a lateral support effect during the buckling process. The local buckling propagation and development of VACNTs were experimentally observed and theoretically analyzed by employing finite element modeling with lateral support from van der Waals interactions among nanotubes. Both experimental and theoretical analyses show that VACNTs buckled in the bottom region with many short waves and almost identical wavelengths, indicating a high mode buckling. Furthermore, the propagation and development mechanism of buckling waves follow the wave damping effect.

  3. Improving the Robustness of Local Network Alignment: Design and Extensive Assessment of a Markov Clustering-Based Approach.

    PubMed

    Mina, Marco; Guzzi, Pietro Hiram

    2014-01-01

    The analysis of protein behavior at the network level had been applied to elucidate the mechanisms of protein interaction that are similar in different species. Published network alignment algorithms proved to be able to recapitulate known conserved modules and protein complexes, and infer new conserved interactions confirmed by wet lab experiments. In the meantime, however, a plethora of continuously evolving protein-protein interaction (PPI) data sets have been developed, each featuring different levels of completeness and reliability. For instance, algorithms performance may vary significantly when changing the data set used in their assessment. Moreover, existing papers did not deeply investigate the robustness of alignment algorithms. For instance, some algorithms performances vary significantly when changing the data set used in their assessment. In this work, we design an extensive assessment of current algorithms discussing the robustness of the results on the basis of input networks. We also present AlignMCL, a local network alignment algorithm based on an improved model of alignment graph and Markov Clustering. AlignMCL performs better than other state-of-the-art local alignment algorithms over different updated data sets. In addition, AlignMCL features high levels of robustness, producing similar results regardless the selected data set.

  4. Understanding the nanoscale local buckling behavior of vertically aligned MWCNT arrays with van der Waals interactions

    NASA Astrophysics Data System (ADS)

    Li, Yupeng; Kim, Hyung-Ick; Wei, Bingqing; Kang, Junmo; Choi, Jae-Boong; Nam, Jae-Do; Suhr, Jonghwan

    2015-08-01

    The local buckling behavior of vertically aligned carbon nanotubes (VACNTs) has been investigated and interpreted in the view of a collective nanotube response by taking van der Waals interactions into account. To the best of our knowledge, this is the first report on the case of collective VACNT behavior regarding van der Waals force among nanotubes as a lateral support effect during the buckling process. The local buckling propagation and development of VACNTs were experimentally observed and theoretically analyzed by employing finite element modeling with lateral support from van der Waals interactions among nanotubes. Both experimental and theoretical analyses show that VACNTs buckled in the bottom region with many short waves and almost identical wavelengths, indicating a high mode buckling. Furthermore, the propagation and development mechanism of buckling waves follow the wave damping effect.The local buckling behavior of vertically aligned carbon nanotubes (VACNTs) has been investigated and interpreted in the view of a collective nanotube response by taking van der Waals interactions into account. To the best of our knowledge, this is the first report on the case of collective VACNT behavior regarding van der Waals force among nanotubes as a lateral support effect during the buckling process. The local buckling propagation and development of VACNTs were experimentally observed and theoretically analyzed by employing finite element modeling with lateral support from van der Waals interactions among nanotubes. Both experimental and theoretical analyses show that VACNTs buckled in the bottom region with many short waves and almost identical wavelengths, indicating a high mode buckling. Furthermore, the propagation and development mechanism of buckling waves follow the wave damping effect. Electronic supplementary information (ESI) available. See DOI: 10.1039/c5nr03581c

  5. Aligning multiple protein sequences by parallel hybrid genetic algorithm.

    PubMed

    Nguyen, Hung Dinh; Yoshihara, Ikuo; Yamamori, Kunihito; Yasunaga, Moritoshi

    2002-01-01

    This paper presents a parallel hybrid genetic algorithm (GA) for solving the sum-of-pairs multiple protein sequence alignment. A new chromosome representation and its corresponding genetic operators are proposed. A multi-population GENITOR-type GA is combined with local search heuristics. It is then extended to run in parallel on a multiprocessor system for speeding up. Experimental results of benchmarks from the BAliBASE show that the proposed method is superior to MSA, OMA, and SAGA methods with regard to quality of solution and running time. It can be used for finding multiple sequence alignment as well as testing cost functions.

  6. Alignment-free cancelable fingerprint templates based on local minutiae information.

    PubMed

    Lee, Chulhan; Choi, Jeung-Yoon; Toh, Kar-Ann; Lee, Sangyoun; Kim, Jaihie

    2007-08-01

    To replace compromised biometric templates, cancelable biometrics has recently been introduced. The concept is to transform a biometric signal or feature into a new one for enrollment and matching. For making cancelable fingerprint templates, previous approaches used either the relative position of a minutia to a core point or the absolute position of a minutia in a given fingerprint image. Thus, a query fingerprint is required to be accurately aligned to the enrolled fingerprint in order to obtain identically transformed minutiae. In this paper, we propose a new method for making cancelable fingerprint templates that do not require alignment. For each minutia, a rotation and translation invariant value is computed from the orientation information of neighboring local regions around the minutia. The invariant value is used as the input to two changing functions that output two values for the translational and rotational movements of the original minutia, respectively, in the cancelable template. When a template is compromised, it is replaced by a new one generated by different changing functions. Our approach preserves the original geometric relationships (translation and rotation) between the enrolled and query templates after they are transformed. Therefore, the transformed templates can be used to verify a person without requiring alignment of the input fingerprint images. In our experiments, we evaluated the proposed method in terms of two criteria: performance and changeability. When evaluating the performance, we examined how verification accuracy varied as the transformed templates were used for matching. When evaluating the changeability, we measured the dissimilarities between the original and transformed templates, and between two differently transformed templates, which were obtained from the same original fingerprint. The experimental results show that the two criteria mutually affect each other and can be controlled by varying the control parameters of

  7. Dynamical formation of spatially localized arrays of aligned nanowires in plastic films with magnetic anisotropy.

    PubMed

    Fragouli, Despina; Buonsanti, Raffaella; Bertoni, Giovanni; Sangregorio, Claudio; Innocenti, Claudia; Falqui, Andrea; Gatteschi, Dante; Cozzoli, Pantaleo Davide; Athanassiou, Athanassia; Cingolani, Roberto

    2010-04-27

    We present a simple technique for magnetic-field-induced formation, assembling, and positioning of magnetic nanowires in a polymer film. Starting from a polymer/iron oxide nanoparticle casted solution that is allowed to dry along with the application of a weak magnetic field, nanocomposite films incorporating aligned nanocrystal-built nanowire arrays are obtained. The control of the dimensions of the nanowires and of their localization across the polymer matrix is achieved by varying the duration of the applied magnetic field, in combination with the evaporation dynamics. These multifunctional anisotropic free-standing nanocomposite films, which demonstrate high magnetic anisotropy, can be used in a wide field of technological applications, ranging from sensors to microfluidics and magnetic devices.

  8. Global classical solutions of the Vlasov-Fokker-Planck equation with local alignment forces

    NASA Astrophysics Data System (ADS)

    Choi, Young-Pil

    2016-07-01

    In this paper, we are concerned with the global well-posedness and time-asymptotic decay of the Vlasov-Fokker-Planck equation with local alignment forces. The equation can be formally derived from an agent-based model for self-organized dynamics called the Motsch-Tadmor model with noises. We present the global existence and uniqueness of classical solutions to the equation around the global Maxwellian in the whole space. For the large-time behavior, we show the algebraic decay rate of solutions towards the equilibrium under suitable assumptions on the initial data. We also remark that the rate of convergence is exponential when the spatial domain is periodic. The main methods used in this paper are the classical energy estimates combined with hyperbolic-parabolic dissipation arguments.

  9. Identification of ligand templates using local structure alignment for structure-based drug design.

    PubMed

    Lee, Hui Sun; Im, Wonpil

    2012-10-22

    With a rapid increase in the number of high-resolution protein-ligand structures, the known protein-ligand structures can be used to gain insight into ligand-binding modes in a target protein. On the basis of the fact that the structurally similar binding sites share information about their ligands, we have developed a local structure alignment tool, G-LoSA (graph-based local structure alignment). The known protein-ligand binding-site structure library is searched by G-LoSA to detect binding-site structures with similar geometry and physicochemical properties to a query binding-site structure regardless of sequence continuity and protein fold. Then, the ligands in the identified complexes are used as templates (i.e., template ligands) to predict/design a ligand for the target protein. The performance of G-LoSA is validated against 76 benchmark targets from the Astex diverse set. Using the currently available protein-ligand structure library, G-LoSA is able to identify a single template ligand (from a nonhomologous protein complex) that is highly similar to the target ligand in more than half of the benchmark targets. In addition, our benchmark analyses show that an assembly of structural fragments from multiple template ligands with partial similarity to the target ligand can be used to design novel ligand structures specific to the target protein. This study clearly indicates that a template-based ligand modeling has potential for de novo ligand design and can be a complementary approach to the receptor structure based methods.

  10. Identification of Ligand Templates using Local Structure Alignment for Structure-based Drug Design

    PubMed Central

    Lee, Hui Sun; Im, Wonpil

    2012-01-01

    With a rapid increase in the number of high-resolution protein-ligand structures, the known protein-ligand structures can be used to gain insight into ligand-binding modes in a target protein. Based on the fact that the structurally similar binding sites share information about their ligands, we have developed a local structure alignment tool, G-LoSA (Graph-based Local Structure Alignment). In G-LoSA, the known protein-ligand binding-site structure library is searched to detect binding-site structures with similar geometry and physicochemical properties to a query binding-site structure regardless of sequence continuity and protein fold. Then, the ligands in the identified complexes are used as templates (i.e., template ligands) to predict/design a ligand for the target protein. The performance of G-LoSA is validated against 76 benchmark targets from the Astex diverse set. Using the currently available protein-ligand structure library, G-LoSA is able to identify a single template ligand (from a non-homologous protein complex) that is highly similar to the target ligand in more than half of the benchmark targets. In addition, our benchmark analyses show that an assembly of structural fragments from multiple template ligands with partial similarity to the target ligand can be used to design novel ligand structures specific to the target protein. This study clearly indicates that a template-based ligand modeling has potential for de novo ligand design and can be a complementary approach to the receptor structure based methods. PMID:22978550

  11. Efficient Constrained Local Model Fitting for Non-Rigid Face Alignment

    PubMed Central

    Wang, Yang; Cox, Mark; Sridharan, Sridha; Cohn, Jeffery F.

    2009-01-01

    Active appearance models (AAMs) have demonstrated great utility when being employed for non-rigid face alignment/tracking. The “simultaneous” algorithm for fitting an AAM achieves good non-rigid face registration performance, but has poor real time performance (2-3 fps). The “project-out” algorithm for fitting an AAM achieves faster than real time performance (> 200 fps) but suffers from poor generic alignment performance. In this paper we introduce an extension to a discriminative method for non-rigid face registration/tracking referred to as a constrained local model (CLM). Our proposed method is able to achieve superior performance to the “simultaneous” AAM algorithm along with real time fitting speeds (35 fps). We improve upon the canonical CLM formulation, to gain this performance, in a number of ways by employing: (i) linear SVMs as patch-experts, (ii) a simplified optimization criteria, and (iii) a composite rather than additive warp update step. Most notably, our simplified optimization criteria for fitting the CLM divides the problem of finding a single complex registration/warp displacement into that of finding N simple warp displacements. From these N simple warp displacements, a single complex warp displacement is estimated using a weighted least-squares constraint. Another major advantage of this simplified optimization lends from its ability to be parallelized, a step which we also theoretically explore in this paper. We refer to our approach for fitting the CLM as the “exhaustive local search” (ELS) algorithm. Experiments were conducted on the CMU Multi-PIE database. PMID:20046797

  12. Efficient Constrained Local Model Fitting for Non-Rigid Face Alignment.

    PubMed

    Lucey, Simon; Wang, Yang; Cox, Mark; Sridharan, Sridha; Cohn, Jeffery F

    2009-11-01

    Active appearance models (AAMs) have demonstrated great utility when being employed for non-rigid face alignment/tracking. The "simultaneous" algorithm for fitting an AAM achieves good non-rigid face registration performance, but has poor real time performance (2-3 fps). The "project-out" algorithm for fitting an AAM achieves faster than real time performance (> 200 fps) but suffers from poor generic alignment performance. In this paper we introduce an extension to a discriminative method for non-rigid face registration/tracking referred to as a constrained local model (CLM). Our proposed method is able to achieve superior performance to the "simultaneous" AAM algorithm along with real time fitting speeds (35 fps). We improve upon the canonical CLM formulation, to gain this performance, in a number of ways by employing: (i) linear SVMs as patch-experts, (ii) a simplified optimization criteria, and (iii) a composite rather than additive warp update step. Most notably, our simplified optimization criteria for fitting the CLM divides the problem of finding a single complex registration/warp displacement into that of finding N simple warp displacements. From these N simple warp displacements, a single complex warp displacement is estimated using a weighted least-squares constraint. Another major advantage of this simplified optimization lends from its ability to be parallelized, a step which we also theoretically explore in this paper. We refer to our approach for fitting the CLM as the "exhaustive local search" (ELS) algorithm. Experiments were conducted on the CMU Multi-PIE database.

  13. Localized field-aligned currents in the polar cap associated with airglow patches

    NASA Astrophysics Data System (ADS)

    Zou, Ying; Nishimura, Yukitoshi; Burchill, Johnathan K.; Knudsen, David J.; Lyons, Larry R.; Shiokawa, Kazuo; Buchert, Stephan; Chen, Steve; Nicolls, Michael J.; Ruohoniemi, J. Michael; McWilliams, Kathryn A.; Nishitani, Nozomu

    2016-10-01

    Airglow patches have been recently associated with channels of enhanced antisunward ionospheric flows propagating across the polar cap from the dayside to nightside auroral ovals. However, how these flows maintain their localized nature without diffusing away remains unsolved. We examine whether patches and collocated flows are associated with localized field-aligned currents (FACs) in the polar cap by using coordinated observations of the Swarm spacecraft, a polar cap all-sky imager, and Super Dual Auroral Radar Network (SuperDARN) radars. We commonly (66% of cases) identify substantial FAC enhancements around patches, particularly near the patches' leading edge and center, in contrast to what is seen in the otherwise quiet polar cap. These FACs have densities of 0.1-0.2 μA/m-2 and have a distribution of width peaking at 75 km. They can be approximated as infinite current sheets that are orientated roughly parallel to patches. They usually exhibit a Region 1 sense, i.e., a downward FAC lying eastward of an upward FAC. With the addition of Resolute Bay Incoherent Scatter radar data, we find that the FACs can close through Pedersen currents in the ionosphere, consistent with the locally enhanced dawn-dusk electric field across the patch. Our results suggest that ionospheric polar cap flow channels are imposed by structures in the magnetospheric lobe via FACs, and thus manifest mesoscale magnetosphere-ionosphere coupling embedded in large-scale convection.

  14. MARS: improving multiple circular sequence alignment using refined sequences.

    PubMed

    Ayad, Lorraine A K; Pissis, Solon P

    2017-01-14

    A fundamental assumption of all widely-used multiple sequence alignment techniques is that the left- and right-most positions of the input sequences are relevant to the alignment. However, the position where a sequence starts or ends can be totally arbitrary due to a number of reasons: arbitrariness in the linearisation (sequencing) of a circular molecular structure; or inconsistencies introduced into sequence databases due to different linearisation standards. These scenarios are relevant, for instance, in the process of multiple sequence alignment of mitochondrial DNA, viroid, viral or other genomes, which have a circular molecular structure. A solution for these inconsistencies would be to identify a suitable rotation (cyclic shift) for each sequence; these refined sequences may in turn lead to improved multiple sequence alignments using the preferred multiple sequence alignment program. We present MARS, a new heuristic method for improving Multiple circular sequence Alignment using Refined Sequences. MARS was implemented in the C++ programming language as a program to compute the rotations (cyclic shifts) required to best align a set of input sequences. Experimental results, using real and synthetic data, show that MARS improves the alignments, with respect to standard genetic measures and the inferred maximum-likelihood-based phylogenies, and outperforms state-of-the-art methods both in terms of accuracy and efficiency. Our results show, among others, that the average pairwise distance in the multiple sequence alignment of a dataset of widely-studied mitochondrial DNA sequences is reduced by around 5% when MARS is applied before a multiple sequence alignment is performed. Analysing multiple sequences simultaneously is fundamental in biological research and multiple sequence alignment has been found to be a popular method for this task. Conventional alignment techniques cannot be used effectively when the position where sequences start is arbitrary. We present

  15. Performance evaluation of Warshall algorithm and dynamic programming for Markov chain in local sequence alignment.

    PubMed

    Khan, Mohammad Ibrahim; Kamal, Md Sarwar

    2015-03-01

    Markov Chain is very effective in prediction basically in long data set. In DNA sequencing it is always very important to find the existence of certain nucleotides based on the previous history of the data set. We imposed the Chapman Kolmogorov equation to accomplish the task of Markov Chain. Chapman Kolmogorov equation is the key to help the address the proper places of the DNA chain and this is very powerful tools in mathematics as well as in any other prediction based research. It incorporates the score of DNA sequences calculated by various techniques. Our research utilize the fundamentals of Warshall Algorithm (WA) and Dynamic Programming (DP) to measures the score of DNA segments. The outcomes of the experiment are that Warshall Algorithm is good for small DNA sequences on the other hand Dynamic Programming are good for long DNA sequences. On the top of above findings, it is very important to measure the risk factors of local sequencing during the matching of local sequence alignments whatever the length.

  16. PicXAA-R: Efficient structural alignment of multiple RNA sequences using a greedy approach

    PubMed Central

    2011-01-01

    Background Accurate and efficient structural alignment of non-coding RNAs (ncRNAs) has grasped more and more attentions as recent studies unveiled the significance of ncRNAs in living organisms. While the Sankoff style structural alignment algorithms cannot efficiently serve for multiple sequences, mostly progressive schemes are used to reduce the complexity. However, this idea tends to propagate the early stage errors throughout the entire process, thereby degrading the quality of the final alignment. For multiple protein sequence alignment, we have recently proposed PicXAA which constructs an accurate alignment in a non-progressive fashion. Results Here, we propose PicXAA-R as an extension to PicXAA for greedy structural alignment of ncRNAs. PicXAA-R efficiently grasps both folding information within each sequence and local similarities between sequences. It uses a set of probabilistic consistency transformations to improve the posterior base-pairing and base alignment probabilities using the information of all sequences in the alignment. Using a graph-based scheme, we greedily build up the structural alignment from sequence regions with high base-pairing and base alignment probabilities. Conclusions Several experiments on datasets with different characteristics confirm that PicXAA-R is one of the fastest algorithms for structural alignment of multiple RNAs and it consistently yields accurate alignment results, especially for datasets with locally similar sequences. PicXAA-R source code is freely available at: http://www.ece.tamu.edu/~bjyoon/picxaa/. PMID:21342569

  17. Sheet resistance characterization of locally anisotropic transparent conductive films made of aligned metal-enriched single-walled carbon nanotubes.

    PubMed

    Kang, Hosung; Kim, Duckjong; Baik, Seunghyun

    2014-09-21

    One-dimensional conductive fillers such as single-walled carbon nanotubes (SWNTs) can be aggregated and aligned during transparent conductive film (TCF) formation by the vacuum filtration method. The potential error of analysing the average sheet resistance of these anisotropic films, using the four-point probe in-line method and the conversion formula developed assuming uniform isotropic material properties, was systematically investigated by finite element analysis and experiments. The finite element analysis of anisotropic stripe-patterned TCFs with alternating low (ρ1) and high (ρ2) resistivities revealed that the estimated average sheet resistance approached ρ1/t when the probes were parallel to the aligned nanotubes. The thickness of the film is t. It was more close to ρ2/t when the probes were perpendicular to the aligned tubes. Indeed, TCFs fabricated by the vacuum filtration method using metal-enriched SWNTs exhibited highly anisotropic local regions where tubes were aggregated and aligned. The local sheet resistances of randomly oriented, aligned, and perpendicular tube regions of the TCF at a transmittance of 89.9% were 5000, 2.4, and 12 300 Ω □(-1), respectively. Resistivities of the aggregated and aligned tube region (ρ1 = 1.2 × 10(-5) Ω cm) and the region between tubes (ρ2 = 6.2 × 10(-2) Ω cm) could be approximated with the aid of finite element analysis. This work demonstrates the potential error of characterizing the average sheet resistance of anisotropic TCFs using the four-point probe in-line method since surprisingly high or low values could be obtained depending on the measurement angle. On the other hand, a better control of aggregation and alignment of nanotubes would realize TCFs with a very small anisotropic resistivity and a high transparency.

  18. Protein function annotation with Structurally Aligned Local Sites of Activity (SALSAs).

    PubMed

    Wang, Zhouxi; Yin, Pengcheng; Lee, Joslynn S; Parasuram, Ramya; Somarowthu, Srinivas; Ondrechen, Mary Jo

    2013-01-01

    The prediction of biochemical function from the 3D structure of a protein has proved to be much more difficult than was originally foreseen. A reliable method to test the likelihood of putative annotations and to predict function from structure would add tremendous value to structural genomics data. We report on a new method, Structurally Aligned Local Sites of Activity (SALSA), for the prediction of biochemical function based on a local structural match at the predicted catalytic or binding site. Implementation of the SALSA method is described. For the structural genomics protein PY01515 (PDB ID 2aqw) from Plasmodium yoelii, it is shown that the putative annotation, Orotidine 5'-monophosphate decarboxylase (OMPDC), is most likely correct. SALSA analysis of YP_001304206.1 (PDB ID 3h3l), a putative sugar hydrolase from Parabacteroides distasonis, shows that its active site does not bear close resemblance to any previously characterized member of its superfamily, the Concanavalin A-like lectins/glucanases. It is noted that three residues in the active site of the thermophilic beta-1,4-xylanase from Nonomuraea flexuosa (PDB ID 1m4w), Y78, E87, and E176, overlap with POOL-predicted residues of similar type, Y168, D153, and E232, in YP_001304206.1. The substrate recognition regions of the two proteins are rather different, suggesting that YP_001304206.1 is a new functional type within the superfamily. A structural genomics protein from Mycobacterium avium (PDB ID 3q1t) has been reported to be an enoyl-CoA hydratase (ECH), but SALSA analysis shows a poor match between the predicted residues for the SG protein and those of known ECHs. A better local structural match is obtained with Anabaena beta-diketone hydrolase (ABDH), a known β-diketone hydrolase from Cyanobacterium anabaena (PDB ID 2j5s). This suggests that the reported ECH function of the SG protein is incorrect and that it is more likely a β-diketone hydrolase. A local site match provides a more compelling

  19. Protein function annotation with Structurally Aligned Local Sites of Activity (SALSAs)

    PubMed Central

    2013-01-01

    Background The prediction of biochemical function from the 3D structure of a protein has proved to be much more difficult than was originally foreseen. A reliable method to test the likelihood of putative annotations and to predict function from structure would add tremendous value to structural genomics data. We report on a new method, Structurally Aligned Local Sites of Activity (SALSA), for the prediction of biochemical function based on a local structural match at the predicted catalytic or binding site. Results Implementation of the SALSA method is described. For the structural genomics protein PY01515 (PDB ID 2aqw) from Plasmodium yoelii, it is shown that the putative annotation, Orotidine 5'-monophosphate decarboxylase (OMPDC), is most likely correct. SALSA analysis of YP_001304206.1 (PDB ID 3h3l), a putative sugar hydrolase from Parabacteroides distasonis, shows that its active site does not bear close resemblance to any previously characterized member of its superfamily, the Concanavalin A-like lectins/glucanases. It is noted that three residues in the active site of the thermophilic beta-1,4-xylanase from Nonomuraea flexuosa (PDB ID 1m4w), Y78, E87, and E176, overlap with POOL-predicted residues of similar type, Y168, D153, and E232, in YP_001304206.1. The substrate recognition regions of the two proteins are rather different, suggesting that YP_001304206.1 is a new functional type within the superfamily. A structural genomics protein from Mycobacterium avium (PDB ID 3q1t) has been reported to be an enoyl-CoA hydratase (ECH), but SALSA analysis shows a poor match between the predicted residues for the SG protein and those of known ECHs. A better local structural match is obtained with Anabaena beta-diketone hydrolase (ABDH), a known β-diketone hydrolase from Cyanobacterium anabaena (PDB ID 2j5s). This suggests that the reported ECH function of the SG protein is incorrect and that it is more likely a β-diketone hydrolase. Conclusions A local site match

  20. Strategic alignment between senior and middle managers in local government and health.

    PubMed

    Clifford, N

    2001-01-01

    The North West Change Centre at Manchester Business School has been developing collaborative programmes aimed at public sector middle managers. Evaluation recently completed, using follow-up interviews, studied the affect of the programmes on individuals, their organisation and their work intra-organisationally. Reference is made to management theories on strategic alignment and cultural change. The results suggest that middle managers have been highly satisfied resulting in changed attitudes, greater confidence and worked-through project ideas being developed. However, a sense of being "out of alignment" with their organisation remains manifested in poor support for inter-agency working. A model of strategic alignment between middle and senior managers is demonstrated which illustrates the affects of strong and weak alignment and the implications for implementing modernised services. Recommendations are made on contextualising change, how managerial signalling can be improved, how pathways to changed organisational systems can be identified and how barriers to change can be overcome.

  1. Local extracellular matrix alignment directs cellular protrusion dynamics and migration through Rac1 and FAK.

    PubMed

    Carey, Shawn P; Goldblatt, Zachary E; Martin, Karen E; Romero, Bethsabe; Williams, Rebecca M; Reinhart-King, Cynthia A

    2016-08-08

    Cell migration within 3D interstitial microenvironments is sensitive to extracellular matrix (ECM) properties, but the mechanisms that regulate migration guidance by 3D matrix features remain unclear. To examine the mechanisms underlying the cell migration response to aligned ECM, which is prevalent at the tumor-stroma interface, we utilized time-lapse microscopy to compare the behavior of MDA-MB-231 breast adenocarcinoma cells within randomly organized and well-aligned 3D collagen ECM. We developed a novel experimental system in which cellular morphodynamics during initial 3D cell spreading served as a reductionist model for the complex process of matrix-directed 3D cell migration. Using this approach, we found that ECM alignment induced spatial anisotropy of cells' matrix probing by promoting protrusion frequency, persistence, and lengthening along the alignment axis and suppressing protrusion dynamics orthogonal to alignment. Preference for on-axis behaviors was dependent upon FAK and Rac1 signaling and translated across length and time scales such that cells within aligned ECM exhibited accelerated elongation, front-rear polarization, and migration relative to cells in random ECM. Together, these findings indicate that adhesive and protrusive signaling allow cells to respond to coordinated physical cues in the ECM, promoting migration efficiency and cell migration guidance by 3D matrix structure.

  2. Foundations of statistical methods for multiple sequence alignment and structure prediction

    SciTech Connect

    Lawrence, C.

    1995-12-31

    Statistical algorithms have proven to be useful in computational molecular biology. Many statistical problems are most easily addressed by pretending that critical missing data are available. For some problems statistical inference in facilitated by creating a set of latent variables, none of whose variables are observed. A key observation is that conditional probabilities for the values of the missing data can be inferred by application of Bayes theorem to the observed data. The statistical framework described in this paper employs Boltzmann like models, permutated data likelihood, EM, and Gibbs sampler algorithms. This tutorial reviews the common statistical framework behind all of these algorithms largely in tabular or graphical terms, illustrates its application, and describes the biological underpinnings of the models used.

  3. Interim Report on Multiple Sequence Alignments and TaqMan Signature Mapping to Phylogenetic Trees

    SciTech Connect

    Gardner, S; Jaing, C

    2012-03-27

    The goal of this project is to develop forensic genotyping assays for select agent viruses, addressing a significant capability gap for the viral bioforensics and law enforcement community. We used a multipronged approach combining bioinformatics analysis, PCR-enriched samples, microarrays and TaqMan assays to develop high resolution and cost effective genotyping methods for strain level forensic discrimination of viruses. We have leveraged substantial experience and efficiency gained through year 1 on software development, SNP discovery, TaqMan signature design and phylogenetic signature mapping to scale up the development of forensics signatures in year 2. In this report, we have summarized the Taqman signature development for South American hemorrhagic fever viruses, tick-borne encephalitis viruses and henipaviruses, Old World Arenaviruses, filoviruses, Crimean-Congo hemorrhagic fever virus, Rift Valley fever virus and Japanese encephalitis virus.

  4. Gleaning structural and functional information from correlations in protein multiple sequence alignments.

    PubMed

    Neuwald, Andrew F

    2016-06-01

    The availability of vast amounts of protein sequence data facilitates detection of subtle statistical correlations due to imposed structural and functional constraints. Recent breakthroughs using Direct Coupling Analysis (DCA) and related approaches have tapped into correlations believed to be due to compensatory mutations. This has yielded some remarkable results, including substantially improved prediction of protein intra- and inter-domain 3D contacts, of membrane and globular protein structures, of substrate binding sites, and of protein conformational heterogeneity. A complementary approach is Bayesian Partitioning with Pattern Selection (BPPS), which partitions related proteins into hierarchically-arranged subgroups based on correlated residue patterns. These correlated patterns are presumably due to structural and functional constraints associated with evolutionary divergence rather than to compensatory mutations. Hence joint application of DCA- and BPPS-based approaches should help sort out the structural and functional constraints contributing to sequence correlations.

  5. Multiple sequence assembly from reads alignable to a common reference genome.

    PubMed

    Peng, Qian; Smith, Andrew D

    2011-01-01

    We describe a set of computational problems motivated by certain analysis tasks in genome resequencing. These are assembly problems for which multiple distinct sequences must be assembled, but where the relative positions of reads to be assembled are already known. This information is obtained from a common reference genome and is characteristic of resequencing experiments. The simplest variant of the problem aims at determining a minimum set of superstrings such that each sequenced read matches at least one superstring. We give an algorithm with time complexity O(N), where N is the sum of the lengths of reads, substantially improving on previous algorithms for solving the same problem. We also examine the problem of finding the smallest number of reads to remove such that the remaining reads are consistent with k superstrings. By exploiting a surprising relationship with the minimum cost flow problem, we show that this problem can be solved in polynomial time when nested reads are excluded. If nested reads are permitted, this problem of removing the minimum number of reads becomes NP-hard. We show that permitting mismatches between reads and their nearest superstrings generally renders these problems NP-hard.

  6. Combining Multiple Pairwise Structure-based Alignments

    SciTech Connect

    2014-11-12

    CombAlign is a new Python code that generates a gapped, one-to-many, multiple structure-based sequence alignment(MSSA) given a set of pairwise structure-based alignments. In order to better define regions of similarity among related protein structures, it is useful to detect the residue-residue correspondences among a set of pairwise structure alignments. Few codes exist for constructing a one-to-many, multiple sequence alignment derived from a set of structure alignments, and we perceived a need for creating a new tool for combing pairwise structure alignments that would allow for insertion of gaps in the reference structure.

  7. Combining Multiple Pairwise Structure-based Alignments

    SciTech Connect

    2014-11-12

    CombAlign is a new Python code that generates a gapped, one-to-many, multiple structure-based sequence alignment(MSSA) given a set of pairwise structure-based alignments. In order to better define regions of similarity among related protein structures, it is useful to detect the residue-residue correspondences among a set of pairwise structure alignments. Few codes exist for constructing a one-to-many, multiple sequence alignment derived from a set of structure alignments, and we perceived a need for creating a new tool for combing pairwise structure alignments that would allow for insertion of gaps in the reference structure.

  8. Topological characterization of neuronal arbor morphology via sequence representation: II--global alignment.

    PubMed

    Gillette, Todd A; Hosseini, Parsa; Ascoli, Giorgio A

    2015-07-04

    The increasing abundance of neuromorphological data provides both the opportunity and the challenge to compare massive numbers of neurons from a wide diversity of sources efficiently and effectively. We implemented a modified global alignment algorithm representing axonal and dendritic bifurcations as strings of characters. Sequence alignment quantifies neuronal similarity by identifying branch-level correspondences between trees. The space generated from pairwise similarities is capable of classifying neuronal arbor types as well as, or better than, traditional topological metrics. Unsupervised cluster analysis produces groups that significantly correspond with known cell classes for axons, dendrites, and pyramidal apical dendrites. Furthermore, the distinguishing consensus topology generated by multiple sequence alignment of a group of neurons reveals their shared branching blueprint. Interestingly, the axons of dendritic-targeting interneurons in the rodent cortex associates with pyramidal axons but apart from the (more topologically symmetric) axons of perisomatic-targeting interneurons. Global pairwise and multiple sequence alignment of neurite topologies enables detailed comparison of neurites and identification of conserved topological features in alignment-defined clusters. The methods presented also provide a framework for incorporation of additional branch-level morphological features. Moreover, comparison of multiple alignment with motif analysis shows that the two techniques provide complementary information respectively revealing global and local features.

  9. The tree alignment problem.

    PubMed

    Varón, Andrés; Wheeler, Ward C

    2012-11-09

    The inference of homologies among DNA sequences, that is, positions in multiple genomes that share a common evolutionary origin, is a crucial, yet difficult task facing biologists. Its computational counterpart is known as the multiple sequence alignment problem. There are various criteria and methods available to perform multiple sequence alignments, and among these, the minimization of the overall cost of the alignment on a phylogenetic tree is known in combinatorial optimization as the Tree Alignment Problem. This problem typically occurs as a subproblem of the Generalized Tree Alignment Problem, which looks for the tree with the lowest alignment cost among all possible trees. This is equivalent to the Maximum Parsimony problem when the input sequences are not aligned, that is, when phylogeny and alignments are simultaneously inferred. For large data sets, a popular heuristic is Direct Optimization (DO). DO provides a good tradeoff between speed, scalability, and competitive scores, and is implemented in the computer program POY. All other (competitive) algorithms have greater time complexities compared to DO. Here, we introduce and present experiments a new algorithm Affine-DO to accommodate the indel (alignment gap) models commonly used in phylogenetic analysis of molecular sequence data. Affine-DO has the same time complexity as DO, but is correctly suited for the affine gap edit distance. We demonstrate its performance with more than 330,000 experimental tests. These experiments show that the solutions of Affine-DO are close to the lower bound inferred from a linear programming solution. Moreover, iterating over a solution produced using Affine-DO shows little improvement. Our results show that Affine-DO is likely producing near-optimal solutions, with approximations within 10% for sequences with small divergence, and within 30% for random sequences, for which Affine-DO produced the worst solutions. The Affine-DO algorithm has the necessary scalability and

  10. The tree alignment problem

    PubMed Central

    2012-01-01

    Background The inference of homologies among DNA sequences, that is, positions in multiple genomes that share a common evolutionary origin, is a crucial, yet difficult task facing biologists. Its computational counterpart is known as the multiple sequence alignment problem. There are various criteria and methods available to perform multiple sequence alignments, and among these, the minimization of the overall cost of the alignment on a phylogenetic tree is known in combinatorial optimization as the Tree Alignment Problem. This problem typically occurs as a subproblem of the Generalized Tree Alignment Problem, which looks for the tree with the lowest alignment cost among all possible trees. This is equivalent to the Maximum Parsimony problem when the input sequences are not aligned, that is, when phylogeny and alignments are simultaneously inferred. Results For large data sets, a popular heuristic is Direct Optimization (DO). DO provides a good tradeoff between speed, scalability, and competitive scores, and is implemented in the computer program POY. All other (competitive) algorithms have greater time complexities compared to DO. Here, we introduce and present experiments a new algorithm Affine-DO to accommodate the indel (alignment gap) models commonly used in phylogenetic analysis of molecular sequence data. Affine-DO has the same time complexity as DO, but is correctly suited for the affine gap edit distance. We demonstrate its performance with more than 330,000 experimental tests. These experiments show that the solutions of Affine-DO are close to the lower bound inferred from a linear programming solution. Moreover, iterating over a solution produced using Affine-DO shows little improvement. Conclusions Our results show that Affine-DO is likely producing near-optimal solutions, with approximations within 10% for sequences with small divergence, and within 30% for random sequences, for which Affine-DO produced the worst solutions. The Affine-DO algorithm has

  11. The RNA structure alignment ontology

    PubMed Central

    Brown, James W.; Birmingham, Amanda; Griffiths, Paul E.; Jossinet, Fabrice; Kachouri-Lafond, Rym; Knight, Rob; Lang, B. Franz; Leontis, Neocles; Steger, Gerhard; Stombaugh, Jesse; Westhof, Eric

    2009-01-01

    Multiple sequence alignments are powerful tools for understanding the structures, functions, and evolutionary histories of linear biological macromolecules (DNA, RNA, and proteins), and for finding homologs in sequence databases. We address several ontological issues related to RNA sequence alignments that are informed by structure. Multiple sequence alignments are usually shown as two-dimensional (2D) matrices, with rows representing individual sequences, and columns identifying nucleotides from different sequences that correspond structurally, functionally, and/or evolutionarily. However, the requirement that sequences and structures correspond nucleotide-by-nucleotide is unrealistic and hinders representation of important biological relationships. High-throughput sequencing efforts are also rapidly making 2D alignments unmanageable because of vertical and horizontal expansion as more sequences are added. Solving the shortcomings of traditional RNA sequence alignments requires explicit annotation of the meaning of each relationship within the alignment. We introduce the notion of “correspondence,” which is an equivalence relation between RNA elements in sets of sequences as the basis of an RNA alignment ontology. The purpose of this ontology is twofold: first, to enable the development of new representations of RNA data and of software tools that resolve the expansion problems with current RNA sequence alignments, and second, to facilitate the integration of sequence data with secondary and three-dimensional structural information, as well as other experimental information, to create simultaneously more accurate and more exploitable RNA alignments. PMID:19622678

  12. An efficient algorithm for pairwise local alignment of protein interaction networks

    DOE PAGES

    Chen, Wenbin; Schmidt, Matthew; Tian, Wenhong; ...

    2015-04-01

    Recently, researchers seeking to understand, modify, and create beneficial traits in organisms have looked for evolutionarily conserved patterns of protein interactions. Their conservation likely means that the proteins of these conserved functional modules are important to the trait's expression. In this paper, we formulate the problem of identifying these conserved patterns as a graph optimization problem, and develop a fast heuristic algorithm for this problem. We compare the performance of our network alignment algorithm to that of the MaWISh algorithm [Koyuturk M, Kim Y, Topkara U, Subramaniam S, Szpankowski W, Grama A, Pairwise alignment of protein interaction networks, J Computmore » Biol 13(2): 182-199, 2006.], which bases its search algorithm on a related decision problem formulation. We find that our algorithm discovers conserved modules with a larger number of proteins in an order of magnitude less time. In conclusion, the protein sets found by our algorithm correspond to known conserved functional modules at comparable precision and recall rates as those produced by the MaWISh algorithm.« less

  13. An efficient algorithm for pairwise local alignment of protein interaction networks

    SciTech Connect

    Chen, Wenbin; Schmidt, Matthew; Tian, Wenhong; Samatova, Nagiza F.; Zhang, Shaohong

    2015-04-01

    Recently, researchers seeking to understand, modify, and create beneficial traits in organisms have looked for evolutionarily conserved patterns of protein interactions. Their conservation likely means that the proteins of these conserved functional modules are important to the trait's expression. In this paper, we formulate the problem of identifying these conserved patterns as a graph optimization problem, and develop a fast heuristic algorithm for this problem. We compare the performance of our network alignment algorithm to that of the MaWISh algorithm [Koyuturk M, Kim Y, Topkara U, Subramaniam S, Szpankowski W, Grama A, Pairwise alignment of protein interaction networks, J Comput Biol 13(2): 182-199, 2006.], which bases its search algorithm on a related decision problem formulation. We find that our algorithm discovers conserved modules with a larger number of proteins in an order of magnitude less time. In conclusion, the protein sets found by our algorithm correspond to known conserved functional modules at comparable precision and recall rates as those produced by the MaWISh algorithm.

  14. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    PubMed

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples.

  15. An Analysis of State and Local Alignment of Teacher Evaluation in Maryland

    ERIC Educational Resources Information Center

    Peterson, Serene N.

    2014-01-01

    This study explored the components of Maryland's newly-implemented teacher evaluation framework and compared state requirements with evaluations to three local school systems' evaluation procedures. The study sought to investigate the relationship between three evaluation protocols in comparison to the state requirements. Three local school…

  16. An Analysis of State and Local Alignment of Teacher Evaluation in Maryland

    ERIC Educational Resources Information Center

    Peterson, Serene N.

    2014-01-01

    This study explored the components of Maryland's newly-implemented teacher evaluation framework and compared state requirements with evaluations to three local school systems' evaluation procedures. The study sought to investigate the relationship between three evaluation protocols in comparison to the state requirements. Three local school…

  17. Local growth of vertical aligned carbon nanotubes by laserinduced surface modification of coated silicon substrates

    NASA Astrophysics Data System (ADS)

    Zimmer, K.; Böhme, R.; Ruthe, D.; Rudolph, Th; Rauschenbach, B.

    2007-04-01

    The stimulation of carbon nanotubes (CNT) growth in a thermal CVD process using an acetylene/nitrogen gas mixture by KrF-excimer laser exposure of iron nitrate coated silicon is described. At moderate laser fluences of ~1 J/cm2 the growth of nanotube bundles up to 100 μm consisting of vertical aligned multi-walled carbon nanotubes (VA-MWCNT) is observed. AFM measurements show the formation of nanoparticles in the laser-exposed areas. At this catalytic sites the nanotubes grow and sustain one another and forming the well-defined bundles. Via the laser exposure the control of the catalytic sites formation and consequently the nanotube growth and properties can be achieved.

  18. Sigma: multiple alignment of weakly-conserved non-coding DNA sequence.

    PubMed

    Siddharthan, Rahul

    2006-03-16

    Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch-like pairwise alignment methods. We introduce a new tool, Sigma, with a new algorithm and scoring scheme designed specifically for non-coding DNA sequence. This problem acquires importance with the increasing number of published sequences of closely-related species. In particular, studies of gene regulation seek to take advantage of comparative genomics, and recent algorithms for finding regulatory sites in phylogenetically-related intergenic sequence require alignment as a preprocessing step. Much can also be learned about evolution from intergenic DNA, which tends to evolve faster than coding DNA. Sigma uses a strategy of seeking the best possible gapless local alignments (a strategy earlier used by DiAlign), at each step making the best possible alignment consistent with existing alignments, and scores the significance of the alignment based on the lengths of the aligned fragments and a background model which may be supplied or estimated from an auxiliary file of intergenic DNA. Comparative tests of sigma with five earlier algorithms on synthetic data generated to mimic real data show excellent performance, with Sigma balancing high "sensitivity" (more bases aligned) with effective filtering of "incorrect" alignments. With real data, while "correctness" can't be directly quantified for the alignment, running the PhyloGibbs motif finder on pre-aligned sequence suggests that Sigma's alignments are superior. By taking into account the peculiarities of non-coding DNA, Sigma fills a gap in the toolbox of bioinformatics.

  19. Global and local aspects of the surface potential landscape for energy level alignment at organic-ZnO interfaces

    NASA Astrophysics Data System (ADS)

    Stähler, Julia; Rinke, Patrick

    2017-03-01

    Hybrid systems of organic and inorganic semiconductors are a promising route for the development of novel opto-electronic and light-harvesting devices. A key ingredient for achieving a superior functionality by means of a hybrid system is the right relative position of energy levels at the interfaces of the two material classes. In this Perspective, we address the sensitivity of the potential energy landscape at various ZnO surfaces, a key ingredient for interfacial energy level alignment, by combining one- and two-photon photoelectron spectroscopy with density-functional theory calculations (DFT). We show that even very large work function changes (>2.5 eV) do not necessarily have to be accompanied by surface band bending in ZnO. Band bending - if it does occur - may be localized to few Å or extend over hundreds of nanometers with very different results for the surface work function and energy level alignment. Managing the delicate balance of different interface manipulation mechanisms in organic-inorganic hybrid systems will be a major challenge towards future applications.

  20. Robust Eye Center Localization through Face Alignment and Invariant Isocentric Patterns.

    PubMed

    Pang, Zhiyong; Wei, Chuansheng; Teng, Dongdong; Chen, Dihu; Tan, Hongzhou

    2015-01-01

    The localization of eye centers is a very useful cue for numerous applications like face recognition, facial expression recognition, and the early screening of neurological pathologies. Several methods relying on available light for accurate eye-center localization have been exploited. However, despite the considerable improvements that eye-center localization systems have undergone in recent years, only few of these developments deal with the challenges posed by the profile (non-frontal face). In this paper, we first use the explicit shape regression method to obtain the rough location of the eye centers. Because this method extracts global information from the human face, it is robust against any changes in the eye region. We exploit this robustness and utilize it as a constraint. To locate the eye centers accurately, we employ isophote curvature features, the accuracy of which has been demonstrated in a previous study. By applying these features, we obtain a series of eye-center locations which are candidates for the actual position of the eye-center. Among these locations, the estimated locations which minimize the reconstruction error between the two methods mentioned above are taken as the closest approximation for the eye centers locations. Therefore, we combine explicit shape regression and isophote curvature feature analysis to achieve robustness and accuracy, respectively. In practical experiments, we use BioID and FERET datasets to test our approach to obtaining an accurate eye-center location while retaining robustness against changes in scale and pose. In addition, we apply our method to non-frontal faces to test its robustness and accuracy, which are essential in gaze estimation but have seldom been mentioned in previous works. Through extensive experimentation, we show that the proposed method can achieve a significant improvement in accuracy and robustness over state-of-the-art techniques, with our method ranking second in terms of accuracy

  1. Robust Eye Center Localization through Face Alignment and Invariant Isocentric Patterns

    PubMed Central

    Teng, Dongdong; Chen, Dihu; Tan, Hongzhou

    2015-01-01

    The localization of eye centers is a very useful cue for numerous applications like face recognition, facial expression recognition, and the early screening of neurological pathologies. Several methods relying on available light for accurate eye-center localization have been exploited. However, despite the considerable improvements that eye-center localization systems have undergone in recent years, only few of these developments deal with the challenges posed by the profile (non-frontal face). In this paper, we first use the explicit shape regression method to obtain the rough location of the eye centers. Because this method extracts global information from the human face, it is robust against any changes in the eye region. We exploit this robustness and utilize it as a constraint. To locate the eye centers accurately, we employ isophote curvature features, the accuracy of which has been demonstrated in a previous study. By applying these features, we obtain a series of eye-center locations which are candidates for the actual position of the eye-center. Among these locations, the estimated locations which minimize the reconstruction error between the two methods mentioned above are taken as the closest approximation for the eye centers locations. Therefore, we combine explicit shape regression and isophote curvature feature analysis to achieve robustness and accuracy, respectively. In practical experiments, we use BioID and FERET datasets to test our approach to obtaining an accurate eye-center location while retaining robustness against changes in scale and pose. In addition, we apply our method to non-frontal faces to test its robustness and accuracy, which are essential in gaze estimation but have seldom been mentioned in previous works. Through extensive experimentation, we show that the proposed method can achieve a significant improvement in accuracy and robustness over state-of-the-art techniques, with our method ranking second in terms of accuracy

  2. A novel method of aligning molecules by local surface shape similarity.

    PubMed

    Cosgrove, D A; Bayada, D M; Johnson, A P

    2000-08-01

    A novel shape-based method has been developed for overlaying a series of molecule surfaces into a common reference frame. The surfaces are represented by a set of circular patches of approximately constant curvature. Two molecules are overlaid using a clique-detection algorithm to find a set of patches in the two surfaces that correspond, and overlaying the molecules so that the similar patches on the two surfaces are coincident. The method is thus able to detect areas of local, rather than global, similarity. A consensus overlay for a group of molecules is performed by examining the scores of all pairwise overlays and performing a set of overlays with the highest scores. The utility of the method has been examined by comparing the overlaid and experimental configurations of 4 sets of molecules for which there are X-ray crystal structures of the molecules bound to a protein active site. Results for the overlays are generally encouraging. Of particular note is the correct prediction of the 'reverse orientation' for ligands binding to human rhinovirus coat protein HRV14.

  3. A novel method of aligning molecules by local surface shape similarity

    NASA Astrophysics Data System (ADS)

    Cosgrove, D. A.; Bayada, D. M.; Johnson, A. P.

    2000-08-01

    A novel shape-based method has been developed for overlaying a series of molecule surfaces into a common reference frame. The surfaces are represented by a set of circular patches of approximately constant curvature. Two molecules are overlaid using a clique-detection algorithm to find a set of patches in the two surfaces that correspond, and overlaying the molecules so that the similar patches on the two surfaces are coincident. The method is thus able to detect areas of local, rather than global, similarity. A consensus overlay for a group of molecules is performed by examining the scores of all pairwise overlays and performing a set of overlays with the highest scores. The utility of the method has been examined by comparing the overlaid and experimental configurations of 4 sets of molecules for which there are X-ray crystal structures of the molecules bound to a protein active site. Results for the overlays are generally encouraging. Of particular note is the correct prediction of the `reverse orientation' for ligands binding to human rhinovirus coat protein HRV14.

  4. CSA: an efficient algorithm to improve circular DNA multiple alignment.

    PubMed

    Fernandes, Francisco; Pereira, Luísa; Freitas, Ana T

    2009-07-23

    The comparison of homologous sequences from different species is an essential approach to reconstruct the evolutionary history of species and of the genes they harbour in their genomes. Several complete mitochondrial and nuclear genomes are now available, increasing the importance of using multiple sequence alignment algorithms in comparative genomics. MtDNA has long been used in phylogenetic analysis and errors in the alignments can lead to errors in the interpretation of evolutionary information. Although a large number of multiple sequence alignment algorithms have been proposed to date, they all deal with linear DNA and cannot handle directly circular DNA. Researchers interested in aligning circular DNA sequences must first rotate them to the "right" place using an essentially manual process, before they can use multiple sequence alignment tools. In this paper we propose an efficient algorithm that identifies the most interesting region to cut circular genomes in order to improve phylogenetic analysis when using standard multiple sequence alignment algorithms. This algorithm identifies the largest chain of non-repeated longest subsequences common to a set of circular mitochondrial DNA sequences. All the sequences are then rotated and made linear for multiple alignment purposes.To evaluate the effectiveness of this new tool, three different sets of mitochondrial DNA sequences were considered. Other tests considering randomly rotated sequences were also performed. The software package Arlequin was used to evaluate the standard genetic measures of the alignments obtained with and without the use of the CSA algorithm with two well known multiple alignment algorithms, the CLUSTALW and the MAVID tools, and also the visualization tool SinicView. The results show that a circularization and rotation pre-processing step significantly improves the efficiency of public available multiple sequence alignment algorithms when used in the alignment of circular DNA sequences

  5. Machinery running state identification based on discriminant semi-supervised local tangent space alignment for feature fusion and extraction

    NASA Astrophysics Data System (ADS)

    Su, Zuqiang; Xiao, Hong; Zhang, Yi; Tang, Baoping; Jiang, Yonghua

    2017-04-01

    Extraction of sensitive features is a challenging but key task in data-driven machinery running state identification. Aimed at solving this problem, a method for machinery running state identification that applies discriminant semi-supervised local tangent space alignment (DSS-LTSA) for feature fusion and extraction is proposed. Firstly, in order to extract more distinct features, the vibration signals are decomposed by wavelet packet decomposition WPD, and a mixed-domain feature set consisted of statistical features, autoregressive (AR) model coefficients, instantaneous amplitude Shannon entropy and WPD energy spectrum is extracted to comprehensively characterize the properties of machinery running state(s). Then, the mixed-dimension feature set is inputted into DSS-LTSA for feature fusion and extraction to eliminate redundant information and interference noise. The proposed DSS-LTSA can extract intrinsic structure information of both labeled and unlabeled state samples, and as a result the over-fitting problem of supervised manifold learning and blindness problem of unsupervised manifold learning are overcome. Simultaneously, class discrimination information is integrated within the dimension reduction process in a semi-supervised manner to improve sensitivity of the extracted fusion features. Lastly, the extracted fusion features are inputted into a pattern recognition algorithm to achieve the running state identification. The effectiveness of the proposed method is verified by a running state identification case in a gearbox, and the results confirm the improved accuracy of the running state identification.

  6. Reconstructing DNA copy number by joint segmentation of multiple sequences

    PubMed Central

    2012-01-01

    Background Variations in DNA copy number carry information on the modalities of genome evolution and mis-regulation of DNA replication in cancer cells. Their study can help localize tumor suppressor genes, distinguish different populations of cancerous cells, and identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to identify copy number variable sites, and the literature documents multiple effective algorithms. We focus here on the specific problem of detecting regions where variation in copy number is relatively common in the sample at hand. This problem encompasses the cases of copy number polymorphisms, related samples, technical replicates, and cancerous sub-populations from the same individual. Results We present a segmentation method named generalized fused lasso (GFL) to reconstruct copy number variant regions. GFL is based on penalized estimation and is capable of processing multiple signals jointly. Our approach is computationally very attractive and leads to sensitivity and specificity levels comparable to those of state-of-the-art specialized methodologies. We illustrate its applicability with simulated and real data sets. Conclusions The flexibility of our framework makes it applicable to data obtained with a wide range of technology. Its versatility and speed make GFL particularly useful in the initial screening stages of large data sets. PMID:22897923

  7. FASMA: a service to format and analyze sequences in multiple alignments.

    PubMed

    Costantini, Susan; Colonna, Giovanni; Facchiano, Angelo M

    2007-12-01

    Multiple sequence alignments are successfully applied in many studies for under- standing the structural and functional relations among single nucleic acids and protein sequences as well as whole families. Because of the rapid growth of sequence databases, multiple sequence alignments can often be very large and difficult to visualize and analyze. We offer a new service aimed to visualize and analyze the multiple alignments obtained with different external algorithms, with new features useful for the comparison of the aligned sequences as well as for the creation of a final image of the alignment. The service is named FASMA and is available at http://bioinformatica.isa.cnr.it/FASMA/.

  8. Mango: multiple alignment with N gapped oligos.

    PubMed

    Zhang, Zefeng; Lin, Hao; Li, Ming

    2008-06-01

    Multiple sequence alignment is a classical and challenging task. The problem is NP-hard. The full dynamic programming takes too much time. The progressive alignment heuristics adopted by most state-of-the-art works suffer from the "once a gap, always a gap" phenomenon. Is there a radically new way to do multiple sequence alignment? In this paper, we introduce a novel and orthogonal multiple sequence alignment method, using both multiple optimized spaced seeds and new algorithms to handle these seeds efficiently. Our new algorithm processes information of all sequences as a whole and tries to build the alignment vertically, avoiding problems caused by the popular progressive approaches. Because the optimized spaced seeds have proved significantly more sensitive than the consecutive k-mers, the new approach promises to be more accurate and reliable. To validate our new approach, we have implemented MANGO: Multiple Alignment with N Gapped Oligos. Experiments were carried out on large 16S RNA benchmarks, showing that MANGO compares favorably, in both accuracy and speed, against state-of-the-art multiple sequence alignment methods, including ClustalW 1.83, MUSCLE 3.6, MAFFT 5.861, ProbConsRNA 1.11, Dialign 2.2.1, DIALIGN-T 0.2.1, T-Coffee 4.85, POA 2.0, and Kalign 2.0. We have further demonstrated the scalability of MANGO on very large datasets of repeat elements. MANGO can be downloaded at http://www.bioinfo.org.cn/mango/ and is free for academic usage.

  9. Evaluation of global sequence comparison and one-to-one FASTA local alignment in regulatory allergenicity assessment of transgenic proteins in food crops.

    PubMed

    Song, Ping; Herman, Rod A; Kumpatla, Siva

    2014-09-01

    To address the high false positive rate using >35% identity over 80 amino acids in the regulatory assessment of transgenic proteins for potential allergenicity and the change of E-value with database size, the Needleman-Wunsch global sequence alignment and a one-to-one (1:1) local FASTA search (one protein in the target database at a time) using FASTA were evaluated by comparing proteins randomly selected from Arabidopsis, rice, corn, and soybean with known allergens in a peer-reviewed allergen database (http://www.allergenonline.org/). Compared with the approach of searching >35%/80aa+, the false positive rate measured by specificity rate for identification of true allergens was reduced by a 1:1 global sequence alignment with a cut-off threshold of ≧30% identity and a 1:1 FASTA local alignment with a cut-off E-value of ≦1.0E-09 while maintaining the same sensitivity. Hence, a 1:1 sequence comparison, especially using the FASTA local alignment tool with a biological relevant E-value of 1.0E-09 as a threshold, is recommended for the regulatory assessment of sequence identities between transgenic proteins in food crops and known allergens.

  10. Combining many multiple alignments in one improved alignment.

    PubMed

    Bucka-Lassen, K; Caprani, O; Hein, J

    1999-02-01

    The fact that the multiple sequence alignment problem is of high complexity has led to many different heuristic algorithms attempting to find a solution in what would be considered a reasonable amount of computation time and space. Very few of these heuristics produce results that are guaranteed always to lie within a certain distance of an optimal solution (given a measure of quality, e.g. parsimony). Most practical heuristics cannot guarantee this, but nevertheless perform well for certain cases. An alignment, obtained with one of these heuristics and with a bad overall score, is not unusable though, it might contain important information on how substrings should be aligned. This paper presents a method that extracts qualitatively good sub-alignments from a set of multiple alignments and combines these into a new, often improved alignment. The algorithm is implemented as a variant of the traditional dynamic programming technique. An implementation of ComAlign (the algorithm that combines multiple alignments) has been run on several sets of artificially generated sequences and a set of 5S RNA sequences. To assess the quality of the alignments obtained, the results have been compared with the output of MSA 2.1 (Gupta et al., Proceedings of the Sixth Annual Symposium on Combinatorial Pattern Matching, 1995; Kececioglu et al., http://www.techfak.uni-bielefeld. de/bcd/Lectures/kececioglu.html, 1995). In all cases, ComAlign was able to produce a solution with a score comparable to the solution obtained by MSA. The results also show that ComAlign actually does combine parts from different alignments and not just select the best of them. The C source code (a Smalltalk version is being worked on) of ComAlign and the other programs that have been implemented in this context are free and available on WWW (http://www.daimi.au.dk/ õcaprani). klaus@bucka-lassen.dk; jotun@pop.bio.au.dk;ocaprani@daimi.au.dk

  11. Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA

    PubMed Central

    2010-01-01

    A primary component of next-generation sequencing analysis is to align short reads to a reference genome, with each read aligned independently. However, reads that observe the same non-reference DNA sequence are highly correlated and can be used to better model the true variation in the target genome. A novel short-read micro re-aligner, SRMA, that leverages this correlation to better resolve a consensus of the underlying DNA sequence of the targeted genome is described here. PMID:20932289

  12. ALIGN_MTX--an optimal pairwise textual sequence alignment program, adapted for using in sequence-structure alignment.

    PubMed

    Vishnepolsky, Boris; Pirtskhalava, Malak

    2009-06-01

    The presented program ALIGN_MTX makes alignment of two textual sequences with an opportunity to use any several characters for the designation of sequence elements and arbitrary user substitution matrices. It can be used not only for the alignment of amino acid and nucleotide sequences but also for sequence-structure alignment used in threading, amino acid sequence alignment, using preliminary known PSSM matrix, and in other cases when alignment of biological or non-biological textual sequences is required. This distinguishes it from the majority of similar alignment programs that make, as a rule, alignment only of amino acid or nucleotide sequences represented as a sequence of single alphabetic characters. ALIGN_MTX is presented as downloadable zip archive at http://www.imbbp.org/software/ALIGN_MTX/ and available for free use. As application of using the program, the results of comparison of different types of substitution matrix for alignment quality in distantly related protein pair sets were presented. Threading matrix SORDIS, based on side-chain orientation in relation to hydrophobic core centers with evolutionary change-based substitution matrix BLOSUM and using multiple sequence alignment information position-specific score matrices (PSSM) were taken for test alignment accuracy. The best performance shows PSSM matrix, but in the reduced set with lower sequence similarity threading matrix SORDIS shows the same performance and it was shown that combined potential with SORDIS and PSSM can improve alignment quality in evolutionary distantly related protein pairs.

  13. SAGA: sequence alignment by genetic algorithm.

    PubMed Central

    Notredame, C; Higgins, D G

    1996-01-01

    We describe a new approach to multiple sequence alignment using genetic algorithms and an associated software package called SAGA. The method involves evolving a population of alignments in a quasi evolutionary manner and gradually improving the fitness of the population as measured by an objective function which measures multiple alignment quality. SAGA uses an automatic scheduling scheme to control the usage of 22 different operators for combining alignments or mutating them between generations. When used to optimise the well known sums of pairs objective function, SAGA performs better than some of the widely used alternative packages. This is seen with respect to the ability to achieve an optimal solution and with regard to the accuracy of alignment by comparison with reference alignments based on sequences of known tertiary structure. The general attraction of the approach is the ability to optimise any objective function that one can invent. PMID:8628686

  14. Pairwise statistical significance of local sequence alignment using sequence-specific and position-specific substitution matrices.

    PubMed

    Agrawal, Ankit; Huang, Xiaoqiu

    2011-01-01

    Pairwise sequence alignment is a central problem in bioinformatics, which forms the basis of various other applications. Two related sequences are expected to have a high alignment score, but relatedness is usually judged by statistical significance rather than by alignment score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise alignment scores. The improvement was mainly attributed to making the statistical significance estimation process more sequence-specific and database-independent. In this paper, we use sequence-specific and position-specific substitution matrices to derive the estimates of pairwise statistical significance, which is expected to use more sequence-specific information in estimating pairwise statistical significance. Experiments on a benchmark database with sequence-specific substitution matrices at different levels of sequence-specific contribution were conducted, and results confirm that using sequence-specific substitution matrices for estimating pairwise statistical significance is significantly better than using a standard matrix like BLOSUM62, and than database statistical significance estimates reported by popular database search programs like BLAST, PSI-BLAST (without pretrained PSSMs), and SSEARCH on a benchmark database, but with pretrained PSSMs, PSI-BLAST results are significantly better. Further, using position-specific substitution matrices for estimating pairwise statistical significance gives significantly better results even than PSI-BLAST using pretrained PSSMs.

  15. Local photoelectric conversion properties of titanyl-phthalocyanine (TiOPc) coated aligned ZnO nanorods.

    PubMed

    Heng, Liping; Tian, Dongliang; Chen, Long; Su, Junxin; Zhai, Jin; Han, Dong; Jiang, Lei

    2010-02-21

    The direct electrical pathway for rapid collection of charge carriers generated in aligned TiOPc/ZnO nanorod is visualized by using photoconductive atomic force microscopy (pc-AFM), which can provide theoretical guidance for preparing high efficiency solar cells.

  16. Multiple protein structure alignment.

    PubMed Central

    Taylor, W. R.; Flores, T. P.; Orengo, C. A.

    1994-01-01

    A method was developed to compare protein structures and to combine them into a multiple structure consensus. Previous methods of multiple structure comparison have only concatenated pairwise alignments or produced a consensus structure by averaging coordinate sets. The current method is a fusion of the fast structure comparison program SSAP and the multiple sequence alignment program MULTAL. As in MULTAL, structures are progressively combined, producing intermediate consensus structures that are compared directly to each other and all remaining single structures. This leads to a hierarchic "condensation," continually evaluated in the light of the emerging conserved core regions. Following the SSAP approach, all interatomic vectors were retained with well-conserved regions distinguished by coherent vector bundles (the structural equivalent of a conserved sequence position). Each bundle of vectors is summarized by a resultant, whereas vector coherence is captured in an error term, which is the only distinction between conserved and variable positions. Resultant vectors are used directly in the comparison, which is weighted by their error values, giving greater importance to the matching of conserved positions. The resultant vectors and their errors can also be used directly in molecular modeling. Applications of the method were assessed by the quality of the resulting sequence alignments, phylogenetic tree construction, and databank scanning with the consensus. Visual assessment of the structural superpositions and consensus structure for various well-characterized families confirmed that the consensus had identified a reasonable core. PMID:7849601

  17. SigniSite: Identification of residue-level genotype-phenotype correlations in protein multiple sequence alignments

    PubMed Central

    Jessen, Leon Eyrich; Hoof, Ilka; Lund, Ole; Nielsen, Morten

    2013-01-01

    Identifying which mutation(s) within a given genotype is responsible for an observable phenotype is important in many aspects of molecular biology. Here, we present SigniSite, an online application for subgroup-free residue-level genotype–phenotype correlation. In contrast to similar methods, SigniSite does not require any pre-definition of subgroups or binary classification. Input is a set of protein sequences where each sequence has an associated real number, quantifying a given phenotype. SigniSite will then identify which amino acid residues are significantly associated with the data set phenotype. As output, SigniSite displays a sequence logo, depicting the strength of the phenotype association of each residue and a heat-map identifying ‘hot’ or ‘cold’ regions. SigniSite was benchmarked against SPEER, a state-of-the-art method for the prediction of specificity determining positions (SDP) using a set of human immunodeficiency virus protease-inhibitor genotype–phenotype data and corresponding resistance mutation scores from the Stanford University HIV Drug Resistance Database, and a data set of protein families with experimentally annotated SDPs. For both data sets, SigniSite was found to outperform SPEER. SigniSite is available at: http://www.cbs.dtu.dk/services/SigniSite/. PMID:23761454

  18. On the influence of localized electric fields and field-aligned currents associated with polar arcs on the global potential distribution

    SciTech Connect

    Marklund, G.T.; Blomberg, L.G. )

    1991-08-01

    The influence of localized field-aligned current, associated with intense transpolar arcs mostly occurring during periods of northward interplaentary magnetic field (IMF), on the global electrodynamics has been investigated using a numerical simulation model. Idealized field-aligned current distributions representing both the region 1/2 system of the auroral oval and the transpolar arc as well as a corresponding ionospheric conductivity distribution are fed into the model to calculate the potential distributions. The transpolar arc has been represented by a few alternative field-aligned current distributions which are different in the way the downward return currents are distributed in the ionosphere. If the conductivity of the main auroral oval is comparable to that of the polar arc the dusk cell will have two local potential minima and thus a region of weak antisunward convection in between. Depending on the direction of the polar arc current sheets the dawn-dusk electric field will either be reversed (or weakened) or intensified at the location of the transpolar arc. The presence of a reversal depend, however, not only on the relative magnitude between the polar arc current sand those of the region 1/2 system but also on the characteristics of the acceleration region and of the conductivity distribution associated with the polar arc. Comparisons are made between the model results and Viking electric field data for a number of polar arc crossings to reveal the most common electrodynamical signatures of these auroral phenomena.

  19. Simultaneous Alignment and Folding of Protein Sequences

    PubMed Central

    Waldispühl, Jérôme; O'Donnell, Charles W.; Will, Sebastian; Devadas, Srinivas; Backofen, Rolf

    2014-01-01

    Abstract Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We present partiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm's complexity is polynomial in time and space. Algorithmically, partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane β-barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments, partiFold-Align significantly outperforms state-of-the-art pairwise and multiple sequence alignment tools in the most difficult low-sequence homology case. It also improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families (partiFold-Align is available at http://partifold.csail.mit.edu/). PMID:24766258

  20. Imaging Analysis of Collagen Fiber Networks in Cusps of Porcine Aortic Valves: Effect of their Local Distribution and Alignment on Valve Functionality

    PubMed Central

    Mega, Mor; Marom, Gil; Halevi, Rotem; Hamdan, Ashraf; Bluestein, Danny; Haj-Ali, Rami

    2015-01-01

    The cusps of native Aortic Valve (AV) are composed of collagen bundles embedded in soft tissue, creating a heterogenic tissue with asymmetric alignment in each cusp. This study compares native collagen fiber networks (CFNs) with a goal to better understand their influence on stress distribution and valve kinematics. Images of CFNs from five porcine tricuspid AVs are analyzed and fluid-structure interaction models are generated based on them. Although the valves had similar overall kinematics, the CFNs had distinctive influence on local mechanics. The regions with dilute CFN are more prone to damage since they are subjected to higher stress magnitudes. PMID:26406926

  1. SISEQ: manipulation of multiple sequence and large database files for common platforms.

    PubMed

    Sato, N

    2000-02-01

    A multiple sequence file converter for common platforms, SISEQ,is described, which performs extraction of DNA sequences that correspond to CDS or RNA field of a large database file as well as subsequent multi-sequence conversions for phylogenetic or molecular biological analysis. Command-line interface as well as a GUI and a script-driven operation mode are provided. The program is freely available to academic users in the form of Macintosh FAT binary, DOS executable, or UNIX source code at http://www.molbiol.saitama-u.ac.jp/ñaoki/ Software.html. naokisat@molbiol.saitama-u.ac.jp

  2. webPRC: the Profile Comparer for alignment-based searching of public domain databases.

    PubMed

    Brandt, Bernd W; Heringa, Jaap

    2009-07-01

    Profile-profile methods are well suited to detect remote evolutionary relationships between protein families. Profile Comparer (PRC) is an existing stand-alone program for scoring and aligning hidden Markov models (HMMs), which are based on multiple sequence alignments. Since PRC compares profile HMMs instead of sequences, it can be used to find distant homologues. For this purpose, PRC is used by, for example, the CATH and Pfam-domain databases. As PRC is a profile comparer, it only reports profile HMM alignments and does not produce multiple sequence alignments. We have developed webPRC server, which makes it straightforward to search for distant homologues or similar alignments in a number of domain databases. In addition, it provides the results both as multiple sequence alignments and aligned HMMs. Furthermore, the user can view the domain annotation, evaluate the PRC hits with the Jalview multiple alignment editor and generate logos from the aligned HMMs or the aligned multiple alignments. Thus, this server assists in detecting distant homologues with PRC as well as in evaluating and using the results. The webPRC interface is available at http://www.ibi.vu.nl/programs/prcwww/.

  3. Making Health System Performance Measurement Useful to Policy Makers: Aligning Strategies, Measurement and Local Health System Accountability in Ontario

    PubMed Central

    Veillard, Jeremy; Huynh, Tai; Ardal, Sten; Kadandale, Sowmya; Klazinga, Niek S.; Brown, Adalsteinn D.

    2010-01-01

    This study examined the experience of the Ontario Ministry of Health and Long-Term Care in enhancing its stewardship and performance management role by developing a health system strategy map and a strategy-based scorecard through a process of policy reviews and expert consultations, and linking them to accountability agreements. An evaluation of the implementation and of the effects of the policy intervention has been carried out through direct policy observation over three years, document analysis, interviews with decision-makers and systematic discussion of findings with other authors and external reviewers. Cascading strategies at health and local health system levels were identified, and a core set of health system and local health system performance indicators was selected and incorporated into accountability agreements with the Local Health Integration Networks. despite the persistence of such challenges as measurement limitations and lack of systematic linkage to decision-making processes, these activities helped to strengthen substantially the ministry's performance management function. PMID:21286268

  4. DNAAlignEditor: DNA alignment editor tool

    PubMed Central

    Sanchez-Villeda, Hector; Schroeder, Steven; Flint-Garcia, Sherry; Guill, Katherine E; Yamasaki, Masanori; McMullen, Michael D

    2008-01-01

    Background With advances in DNA re-sequencing methods and Next-Generation parallel sequencing approaches, there has been a large increase in genomic efforts to define and analyze the sequence variability present among individuals within a species. For very polymorphic species such as maize, this has lead to a need for intuitive, user-friendly software that aids the biologist, often with naïve programming capability, in tracking, editing, displaying, and exporting multiple individual sequence alignments. To fill this need we have developed a novel DNA alignment editor. Results We have generated a nucleotide sequence alignment editor (DNAAlignEditor) that provides an intuitive, user-friendly interface for manual editing of multiple sequence alignments with functions for input, editing, and output of sequence alignments. The color-coding of nucleotide identity and the display of associated quality score aids in the manual alignment editing process. DNAAlignEditor works as a client/server tool having two main components: a relational database that collects the processed alignments and a user interface connected to database through universal data access connectivity drivers. DNAAlignEditor can be used either as a stand-alone application or as a network application with multiple users concurrently connected. Conclusion We anticipate that this software will be of general interest to biologists and population genetics in editing DNA sequence alignments and analyzing natural sequence variation regardless of species, and will be particularly useful for manual alignment editing of sequences in species with high levels of polymorphism. PMID:18366684

  5. Multiple whole-genome alignments without a reference organism.

    PubMed

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-04-01

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and six Drosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families-perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  6. Multiple Whole Genome Alignments Without a Reference Organism

    SciTech Connect

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  7. The genetic data environment an expandable GUI for multiple sequence analysis.

    PubMed

    Smith, S W; Overbeek, R; Woese, C R; Gilbert, W; Gillevet, P M

    1994-12-01

    An X-Windows-based graphic user interface is presented which allows the seamless integration of numerous existing biomolecular programs into a single analysis environment. This environment is based on a core multiple sequence editor that is linked to external programs by a user-expandable menu system and is supported on Sun and DEC workstations. There is no limitation to the number of external functions that can be linked to the interface. The length and number of sequences that can be handled are limited only by the size of virtual memory present on the workstation. The sequence data itself is used as the reference point from which analysis is done, and scalable graphic views are supported. It is suggested that future software development utilizing this expandable, user-defined menu system and the I/O linkage of external programs will allow biologists to easily integrate expertise from disparate fields into a single environment.

  8. ReLA, a local alignment search tool for the identification of distal and proximal gene regulatory regions and their conserved transcription factor binding sites

    PubMed Central

    González, Santi; Montserrat-Sentís, Bàrbara; Sánchez, Friman; Puiggròs, Montserrat; Blanco, Enrique; Ramirez, Alex; Torrents, David

    2012-01-01

    Motivation: The prediction and annotation of the genomic regions involved in gene expression has been largely explored. Most of the energy has been devoted to the development of approaches that detect transcription start sites, leaving the identification of regulatory regions and their functional transcription factor binding sites (TFBSs) largely unexplored and with important quantitative and qualitative methodological gaps. Results: We have developed ReLA (for REgulatory region Local Alignment tool), a unique tool optimized with the Smith–Waterman algorithm that allows local searches of conserved TFBS clusters and the detection of regulatory regions proximal to genes and enhancer regions. ReLA's performance shows specificities of 81 and 50% when tested on experimentally validated proximal regulatory regions and enhancers, respectively. Availability: The source code of ReLA's is freely available and can be remotely used through our web server under http://www.bsc.es/cg/rela. Contact: david.torrents@bsc.es Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22253291

  9. Alignment validation

    SciTech Connect

    ALICE; ATLAS; CMS; LHCb; Golling, Tobias

    2008-09-06

    The four experiments, ALICE, ATLAS, CMS and LHCb are currently under constructionat CERN. They will study the products of proton-proton collisions at the Large Hadron Collider. All experiments are equipped with sophisticated tracking systems, unprecedented in size and complexity. Full exploitation of both the inner detector andthe muon system requires an accurate alignment of all detector elements. Alignmentinformation is deduced from dedicated hardware alignment systems and the reconstruction of charged particles. However, the system is degenerate which means the data is insufficient to constrain all alignment degrees of freedom, so the techniques are prone to converging on wrong geometries. This deficiency necessitates validation and monitoring of the alignment. An exhaustive discussion of means to validate is subject to this document, including examples and plans from all four LHC experiments, as well as other high energy experiments.

  10. Applying Agrep to r-NSA to solve multiple sequences approximate matching.

    PubMed

    Ni, Bing; Wong, Man-Hon; Lam, Chi-Fai David; Leung, Kwong-Sak

    2014-01-01

    This paper addresses the approximate matching problem in a database consisting of multiple DNA sequences, where the proposed approach applies Agrep to a new truncated suffix array, r-NSA. The construction time of the structure is linear to the database size, and the computations of indexing a substring in the structure are constant. The number of characters processed in applying Agrep is analysed theoretically, and the theoretical upper-bound can approximate closely the empirical number of characters, which is obtained through enumerating the characters in the actual structure built. Experiments are carried out using (synthetic) random DNA sequences, as well as (real) genome sequences including Hepatitis-B Virus and X-chromosome. Experimental results show that, compared to the straight-forward approach that applies Agrep to multiple sequences individually, the proposed approach solves the matching problem in much shorter time. The speed-up of our approach depends on the sequence patterns, and for highly similar homologous genome sequences, which are the common cases in real-life genomes, it can be up to several orders of magnitude.

  11. An introduction to the Lagan alignment toolkit.

    PubMed

    Brudno, Michael

    2007-01-01

    The Lagan Toolkit is a software package for comparison of genomic sequences. It includes the CHAOS local alignment program, LAGAN global alignment program for two, or more sequences and Shuffle-LAGAN, a "glocal" alignment method that handles genomic rearrangements in a global alignment framework. The alignment programs included in the Lagan Toolkit have been widely used to compare genomes of many organisms, from bacteria to large mammalian genomes. This chapter provides an overview of the algorithms used by the LAGAN programs to construct genomic alignments, explains how to build alignments using either the standalone program or the web server, and discusses some of the common pitfalls users encounter when using the toolkit.

  12. Insight into strain effects on band alignment shifts, carrier localization and recombination kinetics in CdTe/CdS core/shell quantum dots.

    PubMed

    Jing, Lihong; Kershaw, Stephen V; Kipp, Tobias; Kalytchuk, Sergii; Ding, Ke; Zeng, Jianfeng; Jiao, Mingxia; Sun, Xiaoyu; Mews, Alf; Rogach, Andrey L; Gao, Mingyuan

    2015-02-11

    The impact of strain on the optical properties of semiconductor quantum dots (QDs) is fundamentally important while still awaiting detailed investigation. CdTe/CdS core/shell QDs represent a typical strained system due to the substantial lattice mismatch between CdTe and CdS. To probe the strain-related effects, aqueous CdTe/CdS QDs were synthesized by coating different sized CdTe QD cores with CdS shells upon the thermal decomposition of glutathione as a sulfur source under reflux. The shell growth was carefully monitored by both steady-state absorption and fluorescence spectroscopy and transient fluorescence spectroscopy. In combination with structural analysis, the band alignments as a consequence of the strain were modified based on band deformation potential theory. By further taking account of these strain-induced band shifts, the effective mass approximation (EMA) model was modified to simulate the electronic structure, carrier spatial localization, and electron-hole wave function overlap for comparing with experimentally derived results. In particular, the electron/hole eigen energies were predicted for a range of structures with different CdTe core sizes and different CdS shell thicknesses. The overlap of electron and hole wave functions was further simulated to reveal the impact of strain on the electron-hole recombination kinetics as the electron wave function progressively shifts into the CdS shell region while the hole wave function remains heavily localized in CdTe core upon the shell growth. The excellent agreement between the strain-modified EMA model with the experimental data suggests that strain exhibits remarkable effects on the optical properties of mismatched core/shell QDs by altering the electronic structure of the system.

  13. Alignment fixture

    DOEpatents

    Bell, Grover C.; Gibson, O. Theodore

    1980-01-01

    A part alignment fixture is provided which may be used for precise variable lateral and tilt alignment relative to the fixture base of various shaped parts. The fixture may be used as a part holder for machining or inspection of parts or alignment of parts during assembly and the like. The fixture includes a precisely machined diameter disc-shaped hub adapted to receive the part to be aligned. The hub is nested in a guide plate which is adapted to carry two oppositely disposed pairs of positioning wedges so that the wedges may be reciprocatively positioned by means of respective micrometer screws. The sloping faces of the wedges contact the hub at respective quadrants of the hub periphery. The lateral position of the hub relative to the guide plate is adjusted by positioning the wedges with the associated micrometer screws. The tilt of the part is adjusted relative to a base plate, to which the guide plate is pivotally connected by means of a holding plate. Two pairs of oppositely disposed wedges are mounted for reciprocative lateral positioning by means of separate micrometer screws between flanges of the guide plate and the base plate. Once the wedges are positioned to achieve the proper tilt of the part or hub on which the part is mounted relative to the base plate, the fixture may be bolted to a machining, inspection, or assembly device.

  14. Meta-Alignment with Crumble and Prune: Partitioning very large alignment problems for performance and parallelization

    PubMed Central

    2011-01-01

    Background Continuing research into the global multiple sequence alignment problem has resulted in more sophisticated and principled alignment methods. Unfortunately these new algorithms often require large amounts of time and memory to run, making it nearly impossible to run these algorithms on large datasets. As a solution, we present two general methods, Crumble and Prune, for breaking a phylogenetic alignment problem into smaller, more tractable sub-problems. We call Crumble and Prune meta-alignment methods because they use existing alignment algorithms and can be used with many current alignment programs. Crumble breaks long alignment problems into shorter sub-problems. Prune divides the phylogenetic tree into a collection of smaller trees to reduce the number of sequences in each alignment problem. These methods are orthogonal: they can be applied together to provide better scaling in terms of sequence length and in sequence depth. Both methods partition the problem such that many of the sub-problems can be solved independently. The results are then combined to form a solution to the full alignment problem. Results Crumble and Prune each provide a significant performance improvement with little loss of accuracy. In some cases, a gain in accuracy was observed. Crumble and Prune were tested on real and simulated data. Furthermore, we have implemented a system called Job-tree that allows hierarchical sub-problems to be solved in parallel on a compute cluster, significantly shortening the run-time. Conclusions These methods enabled us to solve gigabase alignment problems. These methods could enable a new generation of biologically realistic alignment algorithms to be applied to real world, large scale alignment problems. PMID:21569267

  15. Nova laser alignment control system

    SciTech Connect

    Van Arsdall, P.J.; Holloway, F.W.; McGuigan, D.L.; Shelton, R.T.

    1984-03-29

    Alignment of the Nova laser requires control of hundreds of optical components in the ten beam paths. Extensive application of computer technology makes daily alignment practical. The control system is designed in a manner which provides both centralized and local manual operator controls integrated with automatic closed loop alignment. Menudriven operator consoles using high resolution color graphics displays overlaid with transport touch panels allow laser personnel to interact efficiently with the computer system. Automatic alignment is accomplished by using image analysis techniques to determine beam references points from video images acquired along the laser chain. A major goal of the design is to contribute substantially to rapid experimental turnaround and consistent alignment results. This paper describes the computer-based control structure and the software methods developed for aligning this large laser system.

  16. Bayesian coestimation of phylogeny and sequence alignment.

    PubMed

    Lunter, Gerton; Miklós, István; Drummond, Alexei; Jensen, Jens Ledet; Hein, Jotun

    2005-04-01

    reliability broadly correspond to structural features of the proteins, and thus provides biologically meaningful information which is not existent in the usual point-estimate of the alignment. Our methods can handle input data of moderate size (10-20 protein sequences, each 100-200 bp), which we analyzed overnight on a standard 2 GHz personal computer. Joint analysis of multiple sequence alignment, evolutionary trees and additional evolutionary parameters can be now done within a single coherent statistical framework.

  17. Field-aligned currents in Saturn's magnetosphere: Local time dependence of southern summer currents in the dawn sector between midnight and noon

    NASA Astrophysics Data System (ADS)

    Hunt, G. J.; Cowley, S. W. H.; Provan, G.; Bunce, E. J.; Alexeev, I. I.; Belenkaya, E. S.; Kalegaev, V. V.; Dougherty, M. K.; Coates, A. J.

    2016-08-01

    We examine and compare the magnetic field perturbations associated with field-aligned ionosphere-magnetosphere coupling currents at Saturn, observed by the Cassini spacecraft during two sequences of highly inclined orbits in 2006/2007 and 2008 under late southern summer conditions. These sequences explore the southern currents in the dawn-noon and midnight sectors, respectively, thus allowing investigation of possible origins of the local time (LT) asymmetry in auroral Saturn kilometric radiation (SKR) emissions, which peak in power at ~8 h LT in the dawn-noon sector. We first show that the dawn-noon field data generally have the same four-sheet current structure as found previously in the midnight data and that both are similarly modulated by "planetary period oscillation" (PPO) currents. We then separate the averaged PPO-independent (e.g., subcorotation) and PPO-related currents for both LT sectors by using the current system symmetry properties. Surprisingly, we find that the PPO-independent currents are essentially identical within uncertainties in the dawn-dusk and midnight sectors, thus providing no explanation for the LT dependence of the SKR emissions. The main PPO-related currents are, however, found to be slightly stronger and narrower in latitudinal width at dawn-noon than at midnight, leading to estimated precipitating electron powers, and hence emissions, that are on average a factor of ~1.3 larger at dawn-noon than at midnight, inadequate to account for the observed LT asymmetry in SKR power by a factor of ~2.7. Some other factors must also be involved, such as a LT asymmetry in the hot magnetospheric auroral source electron population.

  18. Field-aligned currents in Saturn's magnetosphere: Local time dependence of southern summer currents in the dawn sector between midnight and noon

    NASA Astrophysics Data System (ADS)

    Hunt, G. J.; Cowley, S. W. H.; Provan, G.; Bunce, E. J.; Belenkaya, E. S.; Alexeev, I. I.; Kalegaev, V. V.; Dougherty, M. K.; Coates, A. J.

    2016-12-01

    We examine and compare the magnetic field perturbations associated with field-aligned ionosphere-magnetosphere coupling currents at Saturn, observed by the Cassini spacecraft during two sequences of highly inclined orbits in 2006/7 and 2008 under late southern summer conditions. These sequences explore the southern currents in the dawn-noon and midnight sectors, respectively. This allows investigation of possible origins of the local time (LT) asymmetry in auroral Saturn kilometric radiation (SKR) emissions, which peak in power at 8 h LT in the dawn-noon sector. We first show that the dawn-noon field data generally have the same four-sheet current structure as found previously in the midnight data, and that both are similarly modulated by "planetary period oscillation" (PPO) currents, these being associated with the 10.7 h magnetic field oscillations observed throughout Saturn's magnetosphere. We then separate the averaged PPO-independent (e.g., subcorotation) and PPO-related currents for both LT sectors using the latter current system symmetry properties. Surprisingly, we find that the PPO-independent currents are essentially identical within uncertainties in the dawn-dusk and midnight sectors, thus providing no explanation for the LT dependence of the SKR emissions. The main PPO-related currents are, however, found to be slightly stronger and narrower in latitudinal width at dawn-noon than at midnight, leading to estimated precipitating electron powers, and hence emissions, that are on average a factor of 1.3 larger at dawn-noon than at midnight, inadequate to account for the observed LT asymmetry in SKR power by a factor of 2.7. Some other factor must also be involved, such as a LT asymmetry in the hot magnetospheric auroral source electron population.

  19. Improving pairwise sequence alignment accuracy using near-optimal protein sequence alignments

    PubMed Central

    2010-01-01

    Background While the pairwise alignments produced by sequence similarity searches are a powerful tool for identifying homologous proteins - proteins that share a common ancestor and a similar structure; pairwise sequence alignments often fail to represent accurately the structural alignments inferred from three-dimensional coordinates. Since sequence alignment algorithms produce optimal alignments, the best structural alignments must reflect suboptimal sequence alignment scores. Thus, we have examined a range of suboptimal sequence alignments and a range of scoring parameters to understand better which sequence alignments are likely to be more structurally accurate. Results We compared near-optimal protein sequence alignments produced by the Zuker algorithm and a set of probabilistic alignments produced by the probA program with structural alignments produced by four different structure alignment algorithms. There is significant overlap between the solution spaces of structural alignments and both the near-optimal sequence alignments produced by commonly used scoring parameters for sequences that share significant sequence similarity (E-values < 10-5) and the ensemble of probA alignments. We constructed a logistic regression model incorporating three input variables derived from sets of near-optimal alignments: robustness, edge frequency, and maximum bits-per-position. A ROC analysis shows that this model more accurately classifies amino acid pairs (edges in the alignment path graph) according to the likelihood of appearance in structural alignments than the robustness score alone. We investigated various trimming protocols for removing incorrect edges from the optimal sequence alignment; the most effective protocol is to remove matches from the semi-global optimal alignment that are outside the boundaries of the local alignment, although trimming according to the model-generated probabilities achieves a similar level of improvement. The model can also be used to

  20. ALIGNING JIG

    DOEpatents

    Culver, J.S.; Tunnell, W.C.

    1958-08-01

    A jig or device is described for setting or aligning an opening in one member relative to another member or structure, with a predetermined offset, or it may be used for measuring the amount of offset with which the parts have previously been sct. This jig comprises two blocks rabbeted to each other, with means for securing thc upper block to the lower block. The upper block has fingers for contacting one of the members to be a1igmed, the lower block is designed to ride in grooves within the reference member, and calibration marks are provided to determine the amount of offset. This jig is specially designed to align the collimating slits of a mass spectrometer.

  1. Image alignment

    DOEpatents

    Dowell, Larry Jonathan

    2014-04-22

    Disclosed is a method and device for aligning at least two digital images. An embodiment may use frequency-domain transforms of small tiles created from each image to identify substantially similar, "distinguishing" features within each of the images, and then align the images together based on the location of the distinguishing features. To accomplish this, an embodiment may create equal sized tile sub-images for each image. A "key" for each tile may be created by performing a frequency-domain transform calculation on each tile. A information-distance difference between each possible pair of tiles on each image may be calculated to identify distinguishing features. From analysis of the information-distance differences of the pairs of tiles, a subset of tiles with high discrimination metrics in relation to other tiles may be located for each image. The subset of distinguishing tiles for each image may then be compared to locate tiles with substantially similar keys and/or information-distance metrics to other tiles of other images. Once similar tiles are located for each image, the images may be aligned in relation to the identified similar tiles.

  2. An alignment confidence score capturing robustness to guide tree uncertainty.

    PubMed

    Penn, Osnat; Privman, Eyal; Landan, Giddy; Graur, Dan; Pupko, Tal

    2010-08-01

    Multiple sequence alignment (MSA) is the basis for a wide range of comparative sequence analyses from molecular phylogenetics to 3D structure prediction. Sophisticated algorithms have been developed for sequence alignment, but in practice, many errors can be expected and extensive portions of the MSA are unreliable. Hence, it is imperative to understand and characterize the various sources of errors in MSAs and to quantify site-specific alignment confidence. In this paper, we show that uncertainties in the guide tree used by progressive alignment methods are a major source of alignment uncertainty. We use this insight to develop a novel method for quantifying the robustness of each alignment column to guide tree uncertainty. We build on the widely used bootstrap method for perturbing the phylogenetic tree. Specifically, we generate a collection of trees and use each as a guide tree in the alignment algorithm, thus producing a set of MSAs. We next test the consistency of every column of the MSA obtained from the unperturbed guide tree with respect to the set of MSAs. We name this measure the "GUIDe tree based AligNment ConfidencE" (GUIDANCE) score. Using the Benchmark Alignment data BASE benchmark as well as simulation studies, we show that GUIDANCE scores accurately identify errors in MSAs. Additionally, we compare our results with the previously published Heads-or-Tails score and show that the GUIDANCE score is a better predictor of unreliably aligned regions.

  3. Erasing Errors due to Alignment Ambiguity When Estimating Positive Selection

    PubMed Central

    Redelings, Benjamin

    2014-01-01

    Current estimates of diversifying positive selection rely on first having an accurate multiple sequence alignment. Simulation studies have shown that under biologically plausible conditions, relying on a single estimate of the alignment from commonly used alignment software can lead to unacceptably high false-positive rates in detecting diversifying positive selection. We present a novel statistical method that eliminates excess false positives resulting from alignment error by jointly estimating the degree of positive selection and the alignment under an evolutionary model. Our model treats both substitutions and insertions/deletions as sequence changes on a tree and allows site heterogeneity in the substitution process. We conduct inference starting from unaligned sequence data by integrating over all alignments. This approach naturally accounts for ambiguous alignments without requiring ambiguously aligned sites to be identified and removed prior to analysis. We take a Bayesian approach and conduct inference using Markov chain Monte Carlo to integrate over all alignments on a fixed evolutionary tree topology. We introduce a Bayesian version of the branch-site test and assess the evidence for positive selection using Bayes factors. We compare two models of differing dimensionality using a simple alternative to reversible-jump methods. We also describe a more accurate method of estimating the Bayes factor using Rao-Blackwellization. We then show using simulated data that jointly estimating the alignment and the presence of positive selection solves the problem with excessive false positives from erroneous alignments and has nearly the same power to detect positive selection as when the true alignment is known. We also show that samples taken from the posterior alignment distribution using the software BAli-Phy have substantially lower alignment error compared with MUSCLE, MAFFT, PRANK, and FSA alignments. PMID:24866534

  4. SeqTools: visual tools for manual analysis of sequence alignments.

    PubMed

    Barson, Gemma; Griffiths, Ed

    2016-01-22

    Manual annotation is essential to create high-quality reference alignments and annotation. Annotators need to be able to view sequence alignments in detail. The SeqTools package provides three tools for viewing different types of sequence alignment: Blixem is a many-to-one browser of pairwise alignments, displaying multiple match sequences aligned against a single reference sequence; Dotter provides a graphical dot-plot view of a single pairwise alignment; and Belvu is a multiple sequence alignment viewer, editor, and phylogenetic tool. These tools were originally part of the AceDB genome database system but have been completely rewritten to make them generally available as a standalone package of greatly improved function. Blixem is used by annotators to give a detailed view of the evidence for particular gene models. Blixem displays the gene model positions and the match sequences aligned against the genomic reference sequence. Annotators use this for many reasons, including to check the quality of an alignment, to find missing/misaligned sequence and to identify splice sites and polyA sites and signals. Dotter is used to give a dot-plot representation of a particular pairwise alignment. This is used to identify sequence that is not represented (or is misrepresented) and to quickly compare annotated gene models with transcriptional and protein evidence that putatively supports them. Belvu is used to analyse conservation patterns in multiple sequence alignments and to perform a combination of manual and automatic processing of the alignment. High-quality reference alignments are essential if they are to be used as a starting point for further automatic alignment generation. While there are many different alignment tools available, the SeqTools package provides unique functionality that annotators have found to be essential for analysing sequence alignments as part of the manual annotation process.

  5. IUS prerelease alignment

    NASA Technical Reports Server (NTRS)

    Evans, F. A.

    1978-01-01

    Space shuttle orbiter/IUS alignment transfer was evaluated. Although the orbiter alignment accuracy was originally believed to be the major contributor to the overall alignment transfer error, it was shown that orbiter alignment accuracy is not a factor affecting IUS alignment accuracy, if certain procedures are followed. Results are reported of alignment transfer accuracy analysis.

  6. A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%.

    PubMed Central

    Mehta, P. K.; Heringa, J.; Argos, P.

    1995-01-01

    To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within alpha-helical, beta-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual alpha-helix, beta-strand, and coil states were respectively predicted at 66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins. PMID:8580842

  7. MISHIMA--a new method for high speed multiple alignment of nucleotide sequences of bacterial genome scale data.

    PubMed

    Kryukov, Kirill; Saitou, Naruya

    2010-03-18

    Large nucleotide sequence datasets are becoming increasingly common objects of comparison. Complete bacterial genomes are reported almost everyday. This creates challenges for developing new multiple sequence alignment methods. Conventional multiple alignment methods are based on pairwise alignment and/or progressive alignment techniques. These approaches have performance problems when the number of sequences is large and when dealing with genome scale sequences. We present a new method of multiple sequence alignment, called MISHIMA (Method for Inferring Sequence History In terms of Multiple Alignment), that does not depend on pairwise sequence comparison. A new algorithm is used to quickly find rare oligonucleotide sequences shared by all sequences. Divide and conquer approach is then applied to break the sequences into fragments that can be aligned independently by an external alignment program. These partial alignments are assembled together to form a complete alignment of the original sequences. MISHIMA provides improved performance compared to the commonly used multiple alignment methods. As an example, six complete genome sequences of bacteria species Helicobacter pylori (about 1.7 Mb each) were successfully aligned in about 6 hours using a single PC.

  8. A minimal ligand binding pocket within a network of correlated mutations identified by multiple sequence and structural analysis of G protein coupled receptors.

    PubMed

    Moitra, Subhodeep; Tirupula, Kalyan C; Klein-Seetharaman, Judith; Langmead, Christopher James

    2012-06-29

    G protein coupled receptors (GPCRs) are seven helical transmembrane proteins that function as signal transducers. They bind ligands in their extracellular and transmembrane regions and activate cognate G proteins at their intracellular surface at the other side of the membrane. The relay of allosteric communication between the ligand binding site and the distant G protein binding site is poorly understood. In this study, GREMLIN 1, a recently developed method that identifies networks of co-evolving residues from multiple sequence alignments, was used to identify those that may be involved in communicating the activation signal across the membrane. The GREMLIN-predicted long-range interactions between amino acids were analyzed with respect to the seven GPCR structures that have been crystallized at the time this study was undertaken. GREMLIN significantly enriches the edges containing residues that are part of the ligand binding pocket, when compared to a control distribution of edges drawn from a random graph. An analysis of these edges reveals a minimal GPCR binding pocket containing four residues (T1183.33, M2075.42, Y2686.51 and A2927.39). Additionally, of the ten residues predicted to have the most long-range interactions (A1173.32, A2726.55, E1133.28, H2115.46, S186EC2, A2927.39, E1223.37, G902.57, G1143.29 and M2075.42), nine are part of the ligand binding pocket. We demonstrate the use of GREMLIN to reveal a network of statistically correlated and functionally important residues in class A GPCRs. GREMLIN identified that ligand binding pocket residues are extensively correlated with distal residues. An analysis of the GREMLIN edges across multiple structures suggests that there may be a minimal binding pocket common to the seven known GPCRs. Further, the activation of rhodopsin involves these long-range interactions between extracellular and intracellular domain residues mediated by the retinal domain.

  9. Fast discovery and visualization of conserved regions in DNA sequences using quasi-alignment.

    PubMed

    Nagar, Anurag; Hahsler, Michael

    2013-01-01

    Next Generation Sequencing techniques are producing enormous amounts of biological sequence data and analysis becomes a major computational problem. Currently, most analysis, especially the identification of conserved regions, relies heavily on Multiple Sequence Alignment and its various heuristics such as progressive alignment, whose run time grows with the square of the number and the length of the aligned sequences and requires significant computational resources. In this work, we present a method to efficiently discover regions of high similarity across multiple sequences without performing expensive sequence alignment. The method is based on approximating edit distance between segments of sequences using p-mer frequency counts. Then, efficient high-throughput data stream clustering is used to group highly similar segments into so called quasi-alignments. Quasi-alignments have numerous applications such as identifying species and their taxonomic class from sequences, comparing sequences for similarities, and, as in this paper, discovering conserved regions across related sequences. In this paper, we show that quasi-alignments can be used to discover highly similar segments across multiple sequences from related or different genomes efficiently and accurately. Experiments on a large number of unaligned 16S rRNA sequences obtained from the Greengenes database show that the method is able to identify conserved regions which agree with known hypervariable regions in 16S rRNA. Furthermore, the experiments show that the proposed method scales well for large data sets with a run time that grows only linearly with the number and length of sequences, whereas for existing multiple sequence alignment heuristics the run time grows super-linearly. Quasi-alignment-based algorithms can detect highly similar regions and conserved areas across multiple sequences. Since the run time is linear and the sequences are converted into a compact clustering model, we are able to

  10. STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time.

    PubMed

    Dalli, Deniz; Wilm, Andreas; Mainz, Indra; Steger, Gerhard

    2006-07-01

    Alignment of RNA has a wide range of applications, for example in phylogeny inference, consensus structure prediction and homology searches. Yet aligning structural or non-coding RNAs (ncRNAs) correctly is notoriously difficult as these RNA sequences may evolve by compensatory mutations, which maintain base pairing but destroy sequence homology. Ideally, alignment programs would take RNA structure into account. The Sankoff algorithm for the simultaneous solution of RNA structure prediction and RNA sequence alignment was proposed 20 years ago but suffers from its exponential complexity. A number of programs implement lightweight versions of the Sankoff algorithm by restricting its application to a limited type of structure and/or only pairwise alignment. Thus, despite recent advances, the proper alignment of multiple structural RNA sequences remains a problem. Here we present StrAl, a heuristic method for alignment of ncRNA that reduces sequence-structure alignment to a two-dimensional problem similar to standard multiple sequence alignment. The scoring function takes into account sequence similarity as well as up- and downstream pairing probability. To test the robustness of the algorithm and the performance of the program, we scored alignments produced by StrAl against a large set of published reference alignments. The quality of alignments predicted by StrAl is far better than that obtained by standard sequence alignment programs, especially when sequence homologies drop below approximately 65%; nevertheless StrAl's runtime is comparable to that of ClustalW.

  11. Reduction, alignment and visualisation of large diverse sequence families.

    PubMed

    Taylor, William R

    2016-08-02

    Current volumes of sequence data can lead to large numbers of hits identified on a search, typically in the range of 10s to 100s of thousands. It is often quite difficult to tell from these raw results whether the search has been a success or has picked-up sequences with little or no relationship to the query. The best approach to this problem is to cluster and align the resulting families, however, existing methods concentrate on fast clustering and either do not align the sequences or only perform a limited alignment. A method (MULSEL) is presented that combines fast peptide-based pre-sorting with a following cascade of mini-alignments, each of which are generated with a robust profile/profile method. From these mini-alignments, a representative sequence is selected, based on a variety of intrinsic and user-specified criteria that are combined to produce the sequence collection for the next cycle of alignment. For moderate sized sequence collections (10s of thousands) the method executes on a laptop computer within seconds or minutes. MULSEL bridges a gap between fast clustering methods and slower multiple sequence alignment methods and provides a seamless transition from one to the other. Furthermore, it presents the resulting reduced family in a graphical manner that makes it clear if family members have been misaligned or if there are sequences present that appear inconsistent.

  12. Alignment of, and phylogenetic inference from, random sequences: the susceptibility of alternative alignment methods to creating artifactual resolution and support.

    PubMed

    Simmons, Mark P; Müller, Kai F; Norton, Andrew P

    2010-12-01

    We used random sequences to determine which alignment methods are most susceptible to aligning sequences so as to create artifactual resolution and branch support in phylogenetic trees derived from those alignments. We compared four alignment methods (progressive pairwise alignment, simultaneous multiple alignment of sequence fragments, local pairwise alignment, and direct optimization) to determine which methods are most susceptible to creating false positives in phylogenetic trees. Implied alignments created using direct optimization provided more artifactual support than progressive pairwise alignment methods, which in turn generally provided more artifactual support than simultaneous and local alignment methods. Artifactual support derived from base pairs was generally reinforced by the incorporation of gap characters for progressive pairwise alignment, local pairwise alignment, and implied alignments. The amount of artifactual resolution and support was generally greater for simulated nucleotide sequences than for simulated amino acid sequences. In the context of direct optimization, the differences between static and dynamic approaches to calculating support were extreme, ranging from maximal to nearly minimal support. When applied to highly divergent sequences, it is important that dynamic, rather than static, characters be used whenever calculating branch support using direct optimization. In contrast to the tree-based approaches to alignment, simultaneous alignment of sequences using the similarity criterion generally does not create alignments that are biased in favor of any particular tree topology. Copyright © 2010 Elsevier Inc. All rights reserved.

  13. Pin-Align: a new dynamic programming approach to align protein-protein interaction networks.

    PubMed

    Amir-Ghiasvand, Farid; Nowzari-Dalini, Abbas; Momenzadeh, Vida

    2014-01-01

    To date, few tools for aligning protein-protein interaction networks have been suggested. These tools typically find conserved interaction patterns using various local or global alignment algorithms. However, the improvement of the speed, scalability, simplification, and accuracy of network alignment tools is still the target of new researches. In this paper, we introduce Pin-Align, a new tool for local alignment of protein-protein interaction networks. Pin-Align accuracy is tested on protein interaction networks from IntAct, DIP, and the Stanford Network Database and the results are compared with other well-known algorithms. It is shown that Pin-Align has higher sensitivity and specificity in terms of KEGG Ortholog groups.

  14. SinicView: a visualization environment for comparisons of multiple nucleotide sequence alignment tools.

    PubMed

    Shih, Arthur Chun-Chieh; Lee, D T; Lin, Laurent; Peng, Chin-Lin; Chen, Shiang-Heng; Wu, Yu-Wei; Wong, Chun-Yi; Chou, Meng-Yuan; Shiao, Tze-Chang; Hsieh, Mu-Fen

    2006-03-02

    Deluged by the rate and complexity of completed genomic sequences, the need to align longer sequences becomes more urgent, and many more tools have thus been developed. In the initial stage of genomic sequence analysis, a biologist is usually faced with the questions of how to choose the best tool to align sequences of interest and how to analyze and visualize the alignment results, and then with the question of whether poorly aligned regions produced by the tool are indeed not homologous or are just results due to inappropriate alignment tools or scoring systems used. Although several systematic evaluations of multiple sequence alignment (MSA) programs have been proposed, they may not provide a standard-bearer for most biologists because those poorly aligned regions in these evaluations are never discussed. Thus, a tool that allows cross comparison of the alignment results obtained by different tools simultaneously could help a biologist evaluate their correctness and accuracy. In this paper, we present a versatile alignment visualization system, called SinicView, (for Sequence-aligning INnovative and Interactive Comparison VIEWer), which allows the user to efficiently compare and evaluate assorted nucleotide alignment results obtained by different tools. SinicView calculates similarity of the alignment outputs under a fixed window using the sum-of-pairs method and provides scoring profiles of each set of aligned sequences. The user can visually compare alignment results either in graphic scoring profiles or in plain text format of the aligned nucleotides along with the annotations information. We illustrate the capabilities of our visualization system by comparing alignment results obtained by MLAGAN, MAVID, and MULTIZ, respectively. With SinicView, users can use their own data sequences to compare various alignment tools or scoring systems and select the most suitable one to perform alignment in the initial stage of sequence analysis.

  15. Automatic Word Alignment

    DTIC Science & Technology

    2014-02-18

    for each of the paired units includes forming a first alignment of units of the first language to units of the second language, and forming a second...alignment of units of the second language to units of the first language . The alignment parameters include a first set of parameters for forming an...alignment from the first language to the second language and a second set of parameters for forming an align­ ment from the second language to the

  16. The interplanetary magnetic field B[sub y] effects on large-scale field-aligned currents near local noon: Contributions from cusp part and noncusp part

    SciTech Connect

    Yamauchi, M.; Lundin, R.; Woch, J. )

    1993-04-01

    latitudinals develop a model to account for the effect of the interplanetary magnetic field (IMF) B[sub y] component on the dayside field-aligned currents (FACs). As part of the model the FACs are divided into a [open quotes]cusp part[close quotes] and a [open quotes]noncusp part[close quotes]. The authors then propose that the cusp part FACs shift in the longitudinal direction while the noncusplike part FACs shift in both longitudinal and latitudinal directions in response to the y component of the IMF. If combined, it is observed that the noncusp part FAC is found poleward of the cusp part FAC system when the y component of the IMF is large. These two FAC systems flow in the same direction. They reinforce one another, creating a strong FAC, termed the DPY-FAC. The model also predicts that the polewardmost part of the DPY-FAC flows on closed field lines, even in regions conventionally occupied by the polar cap. Results of the model are successfully compared with particle and magnetic field data from Viking missions.

  17. SARA-Coffee web server, a tool for the computation of RNA sequence and structure multiple alignments

    PubMed Central

    Di Tommaso, Paolo; Bussotti, Giovanni; Kemena, Carsten; Capriotti, Emidio; Chatzou, Maria; Prieto, Pablo; Notredame, Cedric

    2014-01-01

    This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for every sequence, column and aligned residue. SARA-Coffee combines SARA, a pairwise structural RNA aligner with the R-Coffee multiple RNA aligner in a way that has been shown to improve alignment accuracy over most sequence aligners when enough structural data is available. The server can be accessed from http://tcoffee.crg.cat/apps/tcoffee/do:saracoffee. PMID:24972831

  18. A minimal ligand binding pocket within a network of correlated mutations identified by multiple sequence and structural analysis of G protein coupled receptors

    PubMed Central

    2012-01-01

    Background G protein coupled receptors (GPCRs) are seven helical transmembrane proteins that function as signal transducers. They bind ligands in their extracellular and transmembrane regions and activate cognate G proteins at their intracellular surface at the other side of the membrane. The relay of allosteric communication between the ligand binding site and the distant G protein binding site is poorly understood. In this study, GREMLIN [1], a recently developed method that identifies networks of co-evolving residues from multiple sequence alignments, was used to identify those that may be involved in communicating the activation signal across the membrane. The GREMLIN-predicted long-range interactions between amino acids were analyzed with respect to the seven GPCR structures that have been crystallized at the time this study was undertaken. Results GREMLIN significantly enriches the edges containing residues that are part of the ligand binding pocket, when compared to a control distribution of edges drawn from a random graph. An analysis of these edges reveals a minimal GPCR binding pocket containing four residues (T1183.33, M2075.42, Y2686.51 and A2927.39). Additionally, of the ten residues predicted to have the most long-range interactions (A1173.32, A2726.55, E1133.28, H2115.46, S186EC2, A2927.39, E1223.37, G902.57, G1143.29 and M2075.42), nine are part of the ligand binding pocket. Conclusions We demonstrate the use of GREMLIN to reveal a network of statistically correlated and functionally important residues in class A GPCRs. GREMLIN identified that ligand binding pocket residues are extensively correlated with distal residues. An analysis of the GREMLIN edges across multiple structures suggests that there may be a minimal binding pocket common to the seven known GPCRs. Further, the activation of rhodopsin involves these long-range interactions between extracellular and intracellular domain residues mediated by the retinal domain. PMID:22748306

  19. Analysis of Multiple Genomic Sequence Alignments: A Web Resource, Online Tools, and Lessons Learned From Analysis of Mammalian SCL Loci

    PubMed Central

    Chapman, Michael A.; Donaldson, Ian J.; Gilbert, James; Grafham, Darren; Rogers, Jane; Green, Anthony R.; Göttgens, Berthold

    2004-01-01

    Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments. PMID:14718377

  20. Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees

    PubMed Central

    2010-01-01

    Background Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. Results ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Conclusions Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking

  1. LEON-BIS: multiple alignment evaluation of sequence neighbours using a Bayesian inference system.

    PubMed

    Vanhoutreve, Renaud; Kress, Arnaud; Legrand, Baptiste; Gass, Hélène; Poch, Olivier; Thompson, Julie D

    2016-07-07

    A standard procedure in many areas of bioinformatics is to use a multiple sequence alignment (MSA) as the basis for various types of homology-based inference. Applications include 3D structure modelling, protein functional annotation, prediction of molecular interactions, etc. These applications, however sophisticated, are generally highly sensitive to the alignment used, and neglecting non-homologous or uncertain regions in the alignment can lead to significant bias in the subsequent inferences. Here, we present a new method, LEON-BIS, which uses a robust Bayesian framework to estimate the homologous relations between sequences in a protein multiple alignment. Sequences are clustered into sub-families and relations are predicted at different levels, including 'core blocks', 'regions' and full-length proteins. The accuracy and reliability of the predictions are demonstrated in large-scale comparisons using well annotated alignment databases, where the homologous sequence segments are detected with very high sensitivity and specificity. LEON-BIS uses robust Bayesian statistics to distinguish the portions of multiple sequence alignments that are conserved either across the whole family or within subfamilies. LEON-BIS should thus be useful for automatic, high-throughput genome annotations, 2D/3D structure predictions, protein-protein interaction predictions etc.

  2. FastSP: linear time calculation of alignment accuracy.

    PubMed

    Mirarab, Siavash; Warnow, Tandy

    2011-12-01

    Multiple sequence alignment is a basic part of much biological research, including phylogeny estimation and protein structure and function prediction. Different alignments on the same set of unaligned sequences are often compared, sometimes in order to assess the accuracy of alignment methods or to infer a consensus alignment from a set of estimated alignments. Three of the standard techniques for comparing alignments, Developer, Modeler and Total Column (TC) scores can be derived through calculations of the set of homologies that the alignments share. However, the brute-force technique for calculating this set is quadratic in the input size. The remaining standard technique, Cline Shift Score, inherently requires quadratic time. In this article, we prove that each of these scores can be computed in linear time, and we present FastSP, a linear-time algorithm for calculating these scores. Even on the largest alignments we explored (one with 50 000 sequences), FastSP completed <2 min and used at most 2 GB of the main memory. The best alternative is qscore, a method whose empirical running time is approximately the same as FastSP when given sufficient memory (at least 8 GB), but whose asymptotic running time has never been theoretically established. In addition, for comparisons of large alignments under lower memory conditions (at most 4 GB of main memory), qscore uses substantial memory (up to 10 GB for the datasets we studied), took more time and failed to analyze the largest datasets. The open-source software and executables are available online at http://www.cs.utexas.edu/~phylo/software/fastsp/. tandy@cs.utexas.edu.

  3. PMD2HD--a web tool aligning a PubMed search results page with the local German Cancer Research Centre library collection.

    PubMed

    Bohne-Lang, Andreas; Lang, Elke; Taube, Anke

    2005-06-27

    Web-based searching is the accepted contemporary mode of retrieving relevant literature, and retrieving as many full text articles as possible is a typical prerequisite for research success. In most cases only a proportion of references will be directly accessible as digital reprints through displayed links. A large number of references, however, have to be verified in library catalogues and, depending on their availability, are accessible as print holdings or by interlibrary loan request. The problem of verifying local print holdings from an initial retrieval set of citations can be solved using Z39.50, an ANSI protocol for interactively querying library information systems. Numerous systems include Z39.50 interfaces and therefore can process Z39.50 interactive requests. However, the programmed query interaction command structure is non-intuitive and inaccessible to the average biomedical researcher. For the typical user, it is necessary to implement the protocol within a tool that hides and handles Z39.50 syntax, presenting a comfortable user interface. PMD2HD is a web tool implementing Z39.50 to provide an appropriately functional and usable interface to integrate into the typical workflow that follows an initial PubMed literature search, providing users with an immediate asset to assist in the most tedious step in literature retrieval, checking for subscription holdings against a local online catalogue. PMD2HD can facilitate literature access considerably with respect to the time and cost of manual comparisons of search results with local catalogue holdings. The example presented in this article is related to the library system and collections of the German Cancer Research Centre. However, the PMD2HD software architecture and use of common Z39.50 protocol commands allow for transfer to a broad range of scientific libraries using Z39.50-compatible library information systems.

  4. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses

    PubMed Central

    Capella-Gutiérrez, Salvador; Silla-Martínez, José M.; Gabaldón, Toni

    2009-01-01

    Summary: Multiple sequence alignments are central to many areas of bioinformatics. It has been shown that the removal of poorly aligned regions from an alignment increases the quality of subsequent analyses. Such an alignment trimming phase is complicated in large-scale phylogenetic analyses that deal with thousands of alignments. Here, we present trimAl, a tool for automated alignment trimming, which is especially suited for large-scale phylogenetic analyses. trimAl can consider several parameters, alone or in multiple combinations, for selecting the most reliable positions in the alignment. These include the proportion of sequences with a gap, the level of amino acid similarity and, if several alignments for the same set of sequences are provided, the level of consistency across different alignments. Moreover, trimAl can automatically select the parameters to be used in each specific alignment so that the signal-to-noise ratio is optimized. Availability: trimAl has been written in C++, it is portable to all platforms. trimAl is freely available for download (http://trimal.cgenomics.org) and can be used online through the Phylemon web server (http://phylemon2.bioinfo.cipf.es/). Supplementary Material is available at http://trimal.cgenomics.org/publications. Contact: tgabaldon@crg.es PMID:19505945

  5. MP-Align: alignment of metabolic pathways

    PubMed Central

    2014-01-01

    Background Comparing the metabolic pathways of different species is useful for understanding metabolic functions and can help in studying diseases and engineering drugs. Several comparison techniques for metabolic pathways have been introduced in the literature as a first attempt in this direction. The approaches are based on some simplified representation of metabolic pathways and on a related definition of a similarity score (or distance measure) between two pathways. More recent comparative research focuses on alignment techniques that can identify similar parts between pathways. Results We propose a methodology for the pairwise comparison and alignment of metabolic pathways that aims at providing the largest conserved substructure of the pathways under consideration. The proposed methodology has been implemented in a tool called MP-Align, which has been used to perform several validation tests. The results showed that our similarity score makes it possible to discriminate between different domains and to reconstruct a meaningful phylogeny from metabolic data. The results further demonstrate that our alignment algorithm correctly identifies subpathways sharing a common biological function. Conclusion The results of the validation tests performed with MP-Align are encouraging. A comparison with another proposal in the literature showed that our alignment algorithm is particularly well-suited to finding the largest conserved subpathway of the pathways under examination. PMID:24886436

  6. Girder Alignment Plan

    SciTech Connect

    Wolf, Zackary; Ruland, Robert; LeCocq, Catherine; Lundahl, Eric; Levashov, Yurii; Reese, Ed; Rago, Carl; Poling, Ben; Schafer, Donald; Nuhn, Heinz-Dieter; Wienands, Uli; /SLAC

    2010-11-18

    The girders for the LCLS undulator system contain components which must be aligned with high accuracy relative to each other. The alignment is one of the last steps before the girders go into the tunnel, so the alignment must be done efficiently, on a tight schedule. This note documents the alignment plan which includes efficiency and high accuracy. The motivation for girder alignment involves the following considerations. Using beam based alignment, the girder position will be adjusted until the beam goes through the center of the quadrupole and beam finder wire. For the machine to work properly, the undulator axis must be on this line and the center of the undulator beam pipe must be on this line. The physics reasons for the undulator axis and undulator beam pipe axis to be centered on the beam are different, but the alignment tolerance for both are similar. In addition, the beam position monitor must be centered on the beam to preserve its calibration. Thus, the undulator, undulator beam pipe, quadrupole, beam finder wire, and beam position monitor axes must all be aligned to a common line. All relative alignments are equally important, not just, for example, between quadrupole and undulator. We begin by making the common axis the nominal beam axis in the girder coordinate system. All components will be initially aligned to this axis. A more accurate alignment will then position the components relative to each other, without incorporating the girder itself.

  7. Covariance of maximum likelihood evolutionary distances between sequences aligned pairwise.

    PubMed

    Dessimoz, Christophe; Gil, Manuel

    2008-06-23

    The estimation of a distance between two biological sequences is a fundamental process in molecular evolution. It is usually performed by maximum likelihood (ML) on characters aligned either pairwise or jointly in a multiple sequence alignment (MSA). Estimators for the covariance of pairs from an MSA are known, but we are not aware of any solution for cases of pairs aligned independently. In large-scale analyses, it may be too costly to compute MSAs every time distances must be compared, and therefore a covariance estimator for distances estimated from pairs aligned independently is desirable. Knowledge of covariances improves any process that compares or combines distances, such as in generalized least-squares phylogenetic tree building, orthology inference, or lateral gene transfer detection. In this paper, we introduce an estimator for the covariance of distances from sequences aligned pairwise. Its performance is analyzed through extensive Monte Carlo simulations, and compared to the well-known variance estimator of ML distances. Our covariance estimator can be used together with the ML variance estimator to form covariance matrices. The estimator performs similarly to the ML variance estimator. In particular, it shows no sign of bias when sequence divergence is below 150 PAM units (i.e. above ~29% expected sequence identity). Above that distance, the covariances tend to be underestimated, but then ML variances are also underestimated.

  8. MSAIndelFR: a scheme for multiple protein sequence alignment using information on indel flanking regions.

    PubMed

    Al-Shatnawi, Mufleh; Ahmad, M Omair; Swamy, M N S

    2015-11-23

    The alignment of multiple protein sequences is one of the most commonly performed tasks in bioinformatics. In spite of considerable research and efforts that have been recently deployed for improving the performance of multiple sequence alignment (MSA) algorithms, finding a highly accurate alignment between multiple protein sequences is still a challenging problem. We propose a novel and efficient algorithm called, MSAIndelFR, for multiple sequence alignment using the information on the predicted locations of IndelFRs and the computed average log-loss values obtained from IndelFR predictors, each of which is designed for a different protein fold. We demonstrate that the introduction of a new variable gap penalty function based on the predicted locations of the IndelFRs and the computed average log-loss values into the proposed algorithm substantially improves the protein alignment accuracy. This is illustrated by evaluating the performance of the algorithm in aligning sequences belonging to the protein folds for which the IndelFR predictors already exist and by using the reference alignments of the four popular benchmarks, BAliBASE 3.0, OXBENCH, PREFAB 4.0, and SABRE (SABmark 1.65). We have proposed a novel and efficient algorithm, the MSAIndelFR algorithm, for multiple protein sequence alignment incorporating a new variable gap penalty function. It is shown that the performance of the proposed algorithm is superior to that of the most-widely used alignment algorithms, Clustal W2, Clustal Omega, Kalign2, MSAProbs, MAFFT, MUSCLE, ProbCons and Probalign, in terms of both the sum-of-pairs and total column metrics.

  9. Structural Signatures of Enzyme Binding Pockets from Order-Independent Surface Alignment: A Study of Metalloendopeptidase and NAD Binding Proteins

    PubMed Central

    Dundas, Joe; Adamian, Larisa; Liang, Jie

    2011-01-01

    Detecting similarities between local binding surfaces can facilitate identification of enzyme binding sites, prediction of enzyme functions, as well as aid in our understanding of enzyme mechanisms. A challenging task is to construct a template of local surface characteristics for a specific enzyme function or binding activity, as the size and shape of binding surfaces of a biochemical function often varies. Here we introduce the concept of signature binding pockets, which captures information about preserved and varied atomic positions at multi-resolution levels. For proteins with complex enzyme binding and activity, multiple signatures arise naturally in our model, which form a signature basis set that characterize this class of proteins. Both signatures and signature basis set can be automatically constructed by a method called Solar (Signature Of Local Active Regions). This method is based on a sequence order independent alignment of computed binding surface pockets. Solar also provides a structure based multiple sequence fragment alignment (MSFA) to facilitate interpretation of computed signatures. For studying a family of evolutionary related proteins, we show that for metzincin metalloendopeptidase, which has a broad spectrum of substrate binding, signature and basis set pockets can be used to discriminate metzincins from other enzymes, to predict the subclass of enzyme functions, and to identify the specific binding surfaces. For studying unrelated proteins which have evolved to bind to the same NAD co-factor, signatures of NAD binding pockets can be constructed and can be used to predict NAD binding proteins and to locate NAD binding pockets. By measuring preservation ratio and location variation, our method can identify residues and atoms important for binding affinity and specificity. In both cases, we show that signatures and signature basis set reveal significant biological insight. PMID:21145898

  10. De novo genome assembly of the economically important weed horseweed using integrated data from multiple sequencing platforms.

    PubMed

    Peng, Yanhui; Lai, Zhao; Lane, Thomas; Nageswara-Rao, Madhugiri; Okada, Miki; Jasieniuk, Marie; O'Geen, Henriette; Kim, Ryan W; Sammons, R Douglas; Rieseberg, Loren H; Stewart, C Neal

    2014-11-01

    Horseweed (Conyza canadensis), a member of the Compositae (Asteraceae) family, was the first broadleaf weed to evolve resistance to glyphosate. Horseweed, one of the most problematic weeds in the world, is a true diploid (2n = 2x = 18), with the smallest genome of any known agricultural weed (335 Mb). Thus, it is an appropriate candidate to help us understand the genetic and genomic bases of weediness. We undertook a draft de novo genome assembly of horseweed by combining data from multiple sequencing platforms (454 GS-FLX, Illumina HiSeq 2000, and PacBio RS) using various libraries with different insertion sizes (approximately 350 bp, 600 bp, 3 kb, and 10 kb) of a Tennessee-accessed, glyphosate-resistant horseweed biotype. From 116.3 Gb (approximately 350× coverage) of data, the genome was assembled into 13,966 scaffolds with 50% of the assembly = 33,561 bp. The assembly covered 92.3% of the genome, including the complete chloroplast genome (approximately 153 kb) and a nearly complete mitochondrial genome (approximately 450 kb in 120 scaffolds). The nuclear genome is composed of 44,592 protein-coding genes. Genome resequencing of seven additional horseweed biotypes was performed. These sequence data were assembled and used to analyze genome variation. Simple sequence repeat and single-nucleotide polymorphisms were surveyed. Genomic patterns were detected that associated with glyphosate-resistant or -susceptible biotypes. The draft genome will be useful to better understand weediness and the evolution of herbicide resistance and to devise new management strategies. The genome will also be useful as another reference genome in the Compositae. To our knowledge, this article represents the first published draft genome of an agricultural weed.

  11. Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments.

    PubMed

    Vacic, Vladimir; Iakoucheva, Lilia M; Radivojac, Predrag

    2006-06-15

    Two Sample Logo is a web-based tool that detects and displays statistically significant differences in position-specific symbol compositions between two sets of multiple sequence alignments. In a typical scenario, two groups of aligned sequences will share a common motif but will differ in their functional annotation. The inclusion of the background alignment provides an appropriate underlying amino acid or nucleotide distribution and addresses intersite symbol correlations. In addition, the difference detection process is sensitive to the sizes of the aligned groups. Two Sample Logo extends WebLogo, a widely-used sequence logo generator. The source code is distributed under the MIT Open Source license agreement and is available for download free of charge.

  12. Multiple alignment analysis on phylogenetic tree of the spread of SARS epidemic using distance method

    NASA Astrophysics Data System (ADS)

    Amiroch, S.; Pradana, M. S.; Irawan, M. I.; Mukhlash, I.

    2017-09-01

    Multiple Alignment (MA) is a particularly important tool for studying the viral genome and determine the evolutionary process of the specific virus. Application of MA in the case of the spread of the Severe acute respiratory syndrome (SARS) epidemic is an interesting thing because this virus epidemic a few years ago spread so quickly that medical attention in many countries. Although there has been a lot of software to process multiple sequences, but the use of pairwise alignment to process MA is very important to consider. In previous research, the alignment between the sequences to process MA algorithm, Super Pairwise Alignment, but in this study used a dynamic programming algorithm Needleman wunchs simulated in Matlab. From the analysis of MA obtained and stable region and unstable which indicates the position where the mutation occurs, the system network topology that produced the phylogenetic tree of the SARS epidemic distance method, and system area networks mutation.

  13. Tidal alignment of galaxies

    SciTech Connect

    Blazek, Jonathan; Vlah, Zvonimir; Seljak, Uroš E-mail: zvlah@stanford.edu

    2015-08-01

    We develop an analytic model for galaxy intrinsic alignments (IA) based on the theory of tidal alignment. We calculate all relevant nonlinear corrections at one-loop order, including effects from nonlinear density evolution, galaxy biasing, and source density weighting. Contributions from density weighting are found to be particularly important and lead to bias dependence of the IA amplitude, even on large scales. This effect may be responsible for much of the luminosity dependence in IA observations. The increase in IA amplitude for more highly biased galaxies reflects their locations in regions with large tidal fields. We also consider the impact of smoothing the tidal field on halo scales. We compare the performance of this consistent nonlinear model in describing the observed alignment of luminous red galaxies with the linear model as well as the frequently used 'nonlinear alignment model,' finding a significant improvement on small and intermediate scales. We also show that the cross-correlation between density and IA (the 'GI' term) can be effectively separated into source alignment and source clustering, and we accurately model the observed alignment down to the one-halo regime using the tidal field from the fully nonlinear halo-matter cross correlation. Inside the one-halo regime, the average alignment of galaxies with density tracers no longer follows the tidal alignment prediction, likely reflecting nonlinear processes that must be considered when modeling IA on these scales. Finally, we discuss tidal alignment in the context of cosmic shear measurements.

  14. Tidal alignment of galaxies

    SciTech Connect

    Blazek, Jonathan; Vlah, Zvonimir; Seljak, Uroš

    2015-08-01

    We develop an analytic model for galaxy intrinsic alignments (IA) based on the theory of tidal alignment. We calculate all relevant nonlinear corrections at one-loop order, including effects from nonlinear density evolution, galaxy biasing, and source density weighting. Contributions from density weighting are found to be particularly important and lead to bias dependence of the IA amplitude, even on large scales. This effect may be responsible for much of the luminosity dependence in IA observations. The increase in IA amplitude for more highly biased galaxies reflects their locations in regions with large tidal fields. We also consider the impact of smoothing the tidal field on halo scales. We compare the performance of this consistent nonlinear model in describing the observed alignment of luminous red galaxies with the linear model as well as the frequently used "nonlinear alignment model," finding a significant improvement on small and intermediate scales. We also show that the cross-correlation between density and IA (the "GI" term) can be effectively separated into source alignment and source clustering, and we accurately model the observed alignment down to the one-halo regime using the tidal field from the fully nonlinear halo-matter cross correlation. Inside the one-halo regime, the average alignment of galaxies with density tracers no longer follows the tidal alignment prediction, likely reflecting nonlinear processes that must be considered when modeling IA on these scales. Finally, we discuss tidal alignment in the context of cosmic shear measurements.

  15. Alignability of Optical Interconnects

    NASA Astrophysics Data System (ADS)

    Beech, Russell Scott

    With the continuing drive towards higher speed, density, and functionality in electronics, electrical interconnects become inadequate. Due to optics' high speed and bandwidth, freedom from capacitive loading effects, and freedom from crosstalk, optical interconnects can meet more stringent interconnect requirements. But, an optical interconnect requires additional components, such as an optical source and detector, lenses, holographic elements, etc. Fabrication and assembly of an optical interconnect requires precise alignment of these components. The successful development and deployment of optical interconnects depend on how easily the interconnect components can be aligned and/or how tolerant the interconnect is to misalignments. In this thesis, a method of quantitatively specifying the relative difficulty of properly aligning an optical interconnect is described. Ways of using this theory of alignment to obtain design and packaging guidelines for optical interconnects are examined. The measure of the ease with which an optical interconnect can be aligned, called the alignability, uses the efficiency of power transfer as a measure of alignment quality. The alignability is related to interconnect package design through the overall cost measure, which depends upon various physical parameters of the interconnect, such as the cost of the components and the time required for fabrication and alignment. Through a mutual dependence on detector size, the relationship between an interconnect's alignability and its bandwidth, signal-to-noise ratio, and bit-error -rate is examined. The results indicate that a range of device sizes exists for which given performance threshold values are satisfied. Next, the alignability of integrated planar-optic backplanes is analyzed in detail. The resulting data show that the alignability can be optimized by varying the substrate thickness or the angle of reflection. By including the effects of crosstalk, in a multi-channel backplane, the

  16. The GEM Detector projective alignment simulation system

    SciTech Connect

    Wuest, C.R.; Belser, F.C.; Holdener, F.R.; Roeben, M.D.; Paradiso, J.A.; Mitselmakher, G.; Ostapchuk, A.; Pier-Amory, J.

    1993-07-09

    Precision position knowledge (< 25 microns RMS) of the GEM Detector muon system at the Superconducting Super Collider Laboratory (SSCL) is an important physics requirement necessary to minimize sagitta error in detecting and tracking high energy muons that are deflected by the magnetic field within the GEM Detector. To validate the concept of the sagitta correction function determined by projective alignment of the muon detectors (Cathode Strip Chambers or CSCs), the basis of the proposed GEM alignment scheme, a facility, called the ``Alignment Test Stand`` (ATS), is being constructed. This system simulates the environment that the CSCs and chamber alignment systems are expected to experience in the GEM Detector, albeit without the 0.8 T magnetic field and radiation environment. The ATS experimental program will allow systematic study and characterization of the projective alignment approach, as well as general mechanical engineering of muon chamber mounting concepts, positioning systems and study of the mechanical behavior of the proposed 6 layer CSCs. The ATS will consist of a stable local coordinate system in which mock-ups of muon chambers (i.e., non-working mechanical analogs, representing the three superlayers of a selected barrel and endcap alignment tower) are implemented, together with a sufficient number of alignment monitors to overdetermine the sagitta correction function, providing a self-consistency check. This paper describes the approach to be used for the alignment of the GEM muon system, the design of the ATS, and the experiments to be conducted using the ATS.

  17. Differential Heating of Magnetically Aligned Dust Grains

    NASA Astrophysics Data System (ADS)

    Vaillancourt, John E.; Andersson, B.

    2013-01-01

    We use far-infrared photometric maps from IRAS and Herschel to search for the differential heating of asymmetric dust grains aligned with respect to an interstellar magnetic field and heated by a localized radiation source. The grains are known to be asymmetric and have a net alignment of their axes from observations of background starlight polarization. Modern theories on grain alignment suggest that photons from stars embedded in the foreground cloud are a key ingredient of the physical mechanism responsible for alignment (i.e., radiative torques). This theory predicts a relation between the grain alignment efficiency and the angle between the magnetic field and the direction to the aligning radiation source. This effect has been tentatively observed in a source with a very simple geometry (Andersson et al. 2011): the aligning photons are primarily from a single localized source (i.e., a single star) and the local magnetic field direction is known to be fairly uniform. Such a region also has consequences for the distribution of grain heating. For example, asymmetric grains whose largest cross-sections are normal to the incident stellar radiation will reach warmer equilibrium temperatures compared to grains whose largest cross-section is parallel to that direction. This should be observed as an azimuthal dependence of the dust color temperature. We present evidence of such a dependence using IRAS data at 60 and 100 micron. We expect this effect to be stronger using longer wavelength (i.e., 160 micron) data better coupled to the "big-grain" dust population, grains which are also more efficiently aligned with the local magnetic field. Here we also present the results of our on-going work to search for this signal using Herschel maps towards three candidate stars.

  18. Sequence alignment visualization in HTML5 without Java.

    PubMed

    Gille, Christoph; Birgit, Weyand; Gille, Andreas

    2014-01-01

    Java has been extensively used for the visualization of biological data in the web. However, the Java runtime environment is an additional layer of software with an own set of technical problems and security risks. HTML in its new version 5 provides features that for some tasks may render Java unnecessary. Alignment-To-HTML is the first HTML-based interactive visualization for annotated multiple sequence alignments. The server side script interpreter can perform all tasks like (i) sequence retrieval, (ii) alignment computation, (iii) rendering, (iv) identification of a homologous structural models and (v) communication with BioDAS-servers. The rendered alignment can be included in web pages and is displayed in all browsers on all platforms including touch screen tablets. The functionality of the user interface is similar to legacy Java applets and includes color schemes, highlighting of conserved and variable alignment positions, row reordering by drag and drop, interlinked 3D visualization and sequence groups. Novel features are (i) support for multiple overlapping residue annotations, such as chemical modifications, single nucleotide polymorphisms and mutations, (ii) mechanisms to quickly hide residue annotations, (iii) export to MS-Word and (iv) sequence icons. Alignment-To-HTML, the first interactive alignment visualization that runs in web browsers without additional software, confirms that to some extend HTML5 is already sufficient to display complex biological data. The low speed at which programs are executed in browsers is still the main obstacle. Nevertheless, we envision an increased use of HTML and JavaScript for interactive biological software. Under GPL at: http://www.bioinformatics.org/strap/toHTML/.

  19. ORFEUS alignment concept

    NASA Astrophysics Data System (ADS)

    Graue, R.; Kampf, D.; Rippel, H.; Witte, G.

    1991-09-01

    The alignment concept of ORFEUS, a short-term scientific space payload scheduled for launching by the STS in January 1993, is discussed. ORFEUS comprises two alternatively operating spectrometers (Echelle and Rowland) implemented in a CFC telescope with a 4-m tube length and an aperture of 1000 mm. The lightweight primary mirror has a focal length of 2426 mm. In order to achieve the required spectrometric high telescope resolution in the UV range (40-125 nm), a sophisticated alignment concept was developed. The centering of the alignment diaphragm (diameter: 15 microns) in the focus of the primary mirror has to be provided in the vertical tube position by means of an autocollimation telescope. The spectrometers have to be integrated into the horizontal telescope aligned within a special antigravity device to reduce optical surface deformations and to ensure the optical performance of the primary. The alignment of all optical components is to be performed in the visible spectral range.

  20. Fast and sensitive multiple alignment of large genomic sequences

    PubMed Central

    Brudno, Michael; Chapman, Michael; Göttgens, Berthold; Batzoglou, Serafim; Morgenstern, Burkhard

    2003-01-01

    Background Genomic sequence alignment is a powerful method for genome analysis and annotation, as alignments are routinely used to identify functional sites such as genes or regulatory elements. With a growing number of partially or completely sequenced genomes, multiple alignment is playing an increasingly important role in these studies. In recent years, various tools for pair-wise and multiple genomic alignment have been proposed. Some of them are extremely fast, but often efficiency is achieved at the expense of sensitivity. One way of combining speed and sensitivity is to use an anchored-alignment approach. In a first step, a fast search program identifies a chain of strong local sequence similarities. In a second step, regions between these anchor points are aligned using a slower but more accurate method. Results Herein, we present CHAOS, a novel algorithm for rapid identification of chains of local pair-wise sequence similarities. Local alignments calculated by CHAOS are used as anchor points to improve the running time of DIALIGN, a slow but sensitive multiple-alignment tool. We show that this way, the running time of DIALIGN can be reduced by more than 95% for BAC-sized and longer sequences, without affecting the quality of the resulting alignments. We apply our approach to a set of five genomic sequences around the stem-cell-leukemia (SCL) gene and demonstrate that exons and small regulatory elements can be identified by our multiple-alignment procedure. Conclusion We conclude that the novel CHAOS local alignment tool is an effective way to significantly speed up global alignment tools such as DIALIGN without reducing the alignment quality. We likewise demonstrate that the DIALIGN/CHAOS combination is able to accurately align short regulatory sequences in distant orthologues. PMID:14693042

  1. Barriers and strategies to align stakeholders in healthcare alliances.

    PubMed

    Herald, Larry R; Alexander, Jeffrey A; Beich, Jeff; Mittler, Jessica N; O'Hora, Jennifer L

    2012-09-01

    To identify barriers to stakeholder alignment and strategies used by 14 multi-stakeholder alliances participating in the Aligning Forces for Quality initiative to overcome these barriers. The study used a mixed method, comparative case study design. Alliances were categorized as more or less highly aligned based on an alignment index constructed from survey responses. Six alliances (top and bottom quartile) were selected for more in-depth qualitative analysis. Semi-structured interviews of key informants were used to identify factors that distinguished more highly aligned alliances from less highly aligned alliances. Market context was one of the most important factors differentiating alliances. More highly aligned alliances had more extensive histories of collaboration, established more credibility in the local community, and were more effective at balancing collaborative initiatives against competitive interests. More highly aligned alliances also took more active approaches to build consensus among stakeholders regarding alliance initiatives, and were able to successfully utilize small decision-making bodies to foster this consensus. In contrast, leadership credibility, leadership stability, and trust were important facilitators of alignment for all alliances, regardless of the level of alignment. These factors intersect and overlap in a multitude of ways to influence stakeholder alignment. Alignment in an alliance context is critical for leveraging the unique knowledge, skills, and abilities of stakeholders in ways that can build capacity to improve the health of the community in ways that cannot be achieved independently by stakeholders. The findings highlight the need for multifaceted approaches to promote stakeholder alignment.

  2. Algorithms for Automatic Alignment of Arrays

    NASA Technical Reports Server (NTRS)

    Chatterjee, Siddhartha; Gilbert, John R.; Oliker, Leonid; Schreiber, Robert; Sheffler, Thomas J.

    1996-01-01

    Aggregate data objects (such as arrays) are distributed across the processor memories when compiling a data-parallel language for a distributed-memory machine. The mapping determines the amount of communication needed to bring operands of parallel operations into alignment with each other. A common approach is to break the mapping into two stages: an alignment that maps all the objects to an abstract template, followed by a distribution that maps the template to the processors. This paper describes algorithms for solving the various facets of the alignment problem: axis and stride alignment, static and mobile offset alignment, and replication labeling. We show that optimal axis and stride alignment is NP-complete for general program graphs, and give a heuristic method that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. We also show how local graph contractions can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. We show how to model the static offset alignment problem using linear programming, and we show that loop-dependent mobile offset alignment is sometimes necessary for optimum performance. We describe an algorithm with for determining mobile alignments for objects within do loops. We also identify situations in which replicated alignment is either required by the program itself or can be used to improve performance. We describe an algorithm based on network flow that replicates objects so as to minimize the total amount of broadcast communication in replication.

  3. MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.

    PubMed

    Ranwez, Vincent; Harispe, Sébastien; Delsuc, Frédéric; Douzery, Emmanuel J P

    2011-01-01

    Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment.We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence.MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse.

  4. Sparse alignment for robust tensor learning.

    PubMed

    Lai, Zhihui; Wong, Wai Keung; Xu, Yong; Zhao, Cairong; Sun, Mingming

    2014-10-01

    Multilinear/tensor extensions of manifold learning based algorithms have been widely used in computer vision and pattern recognition. This paper first provides a systematic analysis of the multilinear extensions for the most popular methods by using alignment techniques, thereby obtaining a general tensor alignment framework. From this framework, it is easy to show that the manifold learning based tensor learning methods are intrinsically different from the alignment techniques. Based on the alignment framework, a robust tensor learning method called sparse tensor alignment (STA) is then proposed for unsupervised tensor feature extraction. Different from the existing tensor learning methods, L1- and L2-norms are introduced to enhance the robustness in the alignment step of the STA. The advantage of the proposed technique is that the difficulty in selecting the size of the local neighborhood can be avoided in the manifold learning based tensor feature extraction algorithms. Although STA is an unsupervised learning method, the sparsity encodes the discriminative information in the alignment step and provides the robustness of STA. Extensive experiments on the well-known image databases as well as action and hand gesture databases by encoding object images as tensors demonstrate that the proposed STA algorithm gives the most competitive performance when compared with the tensor-based unsupervised learning methods.

  5. Precision alignment device

    DOEpatents

    Jones, N.E.

    1988-03-10

    Apparatus for providing automatic alignment of beam devices having an associated structure for directing, collimating, focusing, reflecting, or otherwise modifying the main beam. A reference laser is attached to the structure enclosing the main beam producing apparatus and produces a reference beam substantially parallel to the main beam. Detector modules containing optical switching devices and optical detectors are positioned in the path of the reference beam and are effective to produce an electrical output indicative of the alignment of the main beam. This electrical output drives servomotor operated adjustment screws to adjust the position of elements of the structure associated with the main beam to maintain alignment of the main beam. 5 figs.

  6. Precision alignment device

    DOEpatents

    Jones, Nelson E.

    1990-01-01

    Apparatus for providing automatic alignment of beam devices having an associated structure for directing, collimating, focusing, reflecting, or otherwise modifying the main beam. A reference laser is attached to the structure enclosing the main beam producing apparatus and produces a reference beam substantially parallel to the main beam. Detector modules containing optical switching devices and optical detectors are positioned in the path of the reference beam and are effective to produce an electrical output indicative of the alignment of the main beam. This electrical output drives servomotor operated adjustment screws to adjust the position of elements of the structure associated with the main beam to maintain alignment of the main beam.

  7. Hybrid vehicle motor alignment

    DOEpatents

    Levin, Michael Benjamin

    2001-07-03

    A rotor of an electric motor for a motor vehicle is aligned to an axis of rotation for a crankshaft of an internal combustion engine having an internal combustion engine and an electric motor. A locator is provided on the crankshaft, a piloting tool is located radially by the first locator to the crankshaft. A stator of the electric motor is aligned to a second locator provided on the piloting tool. The stator is secured to the engine block. The rotor is aligned to the crankshaft and secured thereto.

  8. Grain alignment in starless cores

    SciTech Connect

    Jones, T. J.; Bagley, M.; Krejny, M.; Andersson, B.-G.; Bastien, P.

    2015-01-01

    We present near-IR polarimetry data of background stars shining through a selection of starless cores taken in the K band, probing visual extinctions up to A{sub V}∼48. We find that P{sub K}/τ{sub K} continues to decline with increasing A{sub V} with a power law slope of roughly −0.5. Examination of published submillimeter (submm) polarimetry of starless cores suggests that by A{sub V}≳20 the slope for P versus τ becomes ∼−1, indicating no grain alignment at greater optical depths. Combining these two data sets, we find good evidence that, in the absence of a central illuminating source, the dust grains in dense molecular cloud cores with no internal radiation source cease to become aligned with the local magnetic field at optical depths greater than A{sub V}∼20. A simple model relating the alignment efficiency to the optical depth into the cloud reproduces the observations well.

  9. OCPAT: an online codon-preserved alignment tool for evolutionary genomic analysis of protein coding sequences

    PubMed Central

    Liu, Guozhen; Uddin, Monica; Islam, Munirul; Goodman, Morris; Grossman, Lawrence I; Romero, Roberto; Wildman, Derek E

    2007-01-01

    Background Rapidly accumulating genome sequence data from multiple species offer powerful opportunities for the detection of DNA sequence evolution. Phylogenetic tree construction and codon-based tests for natural selection are the prevailing tools used to detect functionally important evolutionary change in protein coding sequences. These analyses often require multiple DNA sequence alignments that maintain the correct reading frame for each collection of putative orthologous sequences. Since this feature is not available in most alignment tools, codon reading frames often must be checked manually before evolutionary analyses can commence. Results Here we report an online codon-preserved alignment tool (OCPAT) that generates multiple sequence alignments automatically from the coding sequences of any list of human gene IDs and their putative orthologs from genomes of other vertebrate tetrapods. OCPAT is programmed to extract putative orthologous genes from genomes and to align the orthologs with the reading frame maintained in all species. OCPAT also optimizes the alignment by trimming the most variable alignment regions at the 5' and 3' ends of each gene. The resulting output of alignments is returned in several formats, which facilitates further molecular evolutionary analyses by appropriate available software. Alignments are generally robust and reliable, retaining the correct reading frame. The tool can serve as the first step for comparative genomic analyses of protein-coding gene sequences including phylogenetic tree reconstruction and detection of natural selection. We aligned 20,658 human RefSeq mRNAs using OCPAT. Most alignments are missing sequence(s) from at least one species; however, functional annotation clustering of the ~1700 transcripts that were alignable to all species shows that genes involved in multi-subunit protein complexes are highly conserved. Conclusion The OCPAT program facilitates large-scale evolutionary and phylogenetic analyses of

  10. Aligning parallel arrays to reduce communication

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.; Schreiber, Robert; Gilbert, John R.; Chatterjee, Siddhartha

    1994-01-01

    Axis and stride alignment is an important optimization in compiling data-parallel programs for distributed-memory machines. We previously developed an optimal algorithm for aligning array expressions. Here, we examine alignment for more general program graphs. We show that optimal alignment is NP-complete in this setting, so we study heuristic methods. This paper makes two contributions. First, we show how local graph transformations can reduce the size of the problem significantly without changing the best solution. This allows more complex and effective heuristics to be used. Second, we give a heuristic that can explore the space of possible solutions in a number of ways. We show that some of these strategies can give better solutions than a simple greedy approach proposed earlier. Our algorithms have been implemented; we present experimental results showing their effect on the performance of some example programs running on the CM-5.

  11. Desktop aligner for fabrication of multilayer microfluidic devices

    NASA Astrophysics Data System (ADS)

    Li, Xiang; Yu, Zeta Tak For; Geraldo, Dalton; Weng, Shinuo; Alve, Nitesh; Dun, Wu; Kini, Akshay; Patel, Karan; Shu, Roberto; Zhang, Feng; Li, Gang; Jin, Qinghui; Fu, Jianping

    2015-07-01

    Multilayer assembly is a commonly used technique to construct multilayer polydimethylsiloxane (PDMS)-based microfluidic devices with complex 3D architecture and connectivity for large-scale microfluidic integration. Accurate alignment of structure features on different PDMS layers before their permanent bonding is critical in determining the yield and quality of assembled multilayer microfluidic devices. Herein, we report a custom-built desktop aligner capable of both local and global alignments of PDMS layers covering a broad size range. Two digital microscopes were incorporated into the aligner design to allow accurate global alignment of PDMS structures up to 4 in. in diameter. Both local and global alignment accuracies of the desktop aligner were determined to be about 20 μm cm-1. To demonstrate its utility for fabrication of integrated multilayer PDMS microfluidic devices, we applied the desktop aligner to achieve accurate alignment of different functional PDMS layers in multilayer microfluidics including an organs-on-chips device as well as a microfluidic device integrated with vertical passages connecting channels located in different PDMS layers. Owing to its convenient operation, high accuracy, low cost, light weight, and portability, the desktop aligner is useful for microfluidic researchers to achieve rapid and accurate alignment for generating multilayer PDMS microfluidic devices.

  12. Desktop aligner for fabrication of multilayer microfluidic devices.

    PubMed

    Li, Xiang; Yu, Zeta Tak For; Geraldo, Dalton; Weng, Shinuo; Alve, Nitesh; Dun, Wu; Kini, Akshay; Patel, Karan; Shu, Roberto; Zhang, Feng; Li, Gang; Jin, Qinghui; Fu, Jianping

    2015-07-01

    Multilayer assembly is a commonly used technique to construct multilayer polydimethylsiloxane (PDMS)-based microfluidic devices with complex 3D architecture and connectivity for large-scale microfluidic integration. Accurate alignment of structure features on different PDMS layers before their permanent bonding is critical in determining the yield and quality of assembled multilayer microfluidic devices. Herein, we report a custom-built desktop aligner capable of both local and global alignments of PDMS layers covering a broad size range. Two digital microscopes were incorporated into the aligner design to allow accurate global alignment of PDMS structures up to 4 in. in diameter. Both local and global alignment accuracies of the desktop aligner were determined to be about 20 μm cm(-1). To demonstrate its utility for fabrication of integrated multilayer PDMS microfluidic devices, we applied the desktop aligner to achieve accurate alignment of different functional PDMS layers in multilayer microfluidics including an organs-on-chips device as well as a microfluidic device integrated with vertical passages connecting channels located in different PDMS layers. Owing to its convenient operation, high accuracy, low cost, light weight, and portability, the desktop aligner is useful for microfluidic researchers to achieve rapid and accurate alignment for generating multilayer PDMS microfluidic devices.

  13. Desktop aligner for fabrication of multilayer microfluidic devices

    PubMed Central

    Li, Xiang; Yu, Zeta Tak For; Geraldo, Dalton; Weng, Shinuo; Alve, Nitesh; Dun, Wu; Kini, Akshay; Patel, Karan; Shu, Roberto; Zhang, Feng; Li, Gang; Jin, Qinghui; Fu, Jianping

    2015-01-01

    Multilayer assembly is a commonly used technique to construct multilayer polydimethylsiloxane (PDMS)-based microfluidic devices with complex 3D architecture and connectivity for large-scale microfluidic integration. Accurate alignment of structure features on different PDMS layers before their permanent bonding is critical in determining the yield and quality of assembled multilayer microfluidic devices. Herein, we report a custom-built desktop aligner capable of both local and global alignments of PDMS layers covering a broad size range. Two digital microscopes were incorporated into the aligner design to allow accurate global alignment of PDMS structures up to 4 in. in diameter. Both local and global alignment accuracies of the desktop aligner were determined to be about 20 μm cm−1. To demonstrate its utility for fabrication of integrated multilayer PDMS microfluidic devices, we applied the desktop aligner to achieve accurate alignment of different functional PDMS layers in multilayer microfluidics including an organs-on-chips device as well as a microfluidic device integrated with vertical passages connecting channels located in different PDMS layers. Owing to its convenient operation, high accuracy, low cost, light weight, and portability, the desktop aligner is useful for microfluidic researchers to achieve rapid and accurate alignment for generating multilayer PDMS microfluidic devices. PMID:26233409

  14. Refinement by shifting secondary structure elements improves sequence alignments.

    PubMed

    Tong, Jing; Pei, Jimin; Otwinowski, Zbyszek; Grishin, Nick V

    2015-03-01

    Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template-defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile-based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at http://prodata.swmed.edu/sfesa. © 2014 Wiley Periodicals, Inc.

  15. Alignment of tactical tropo antennas

    NASA Astrophysics Data System (ADS)

    Bradley, Philip A.

    1986-07-01

    Alignment problems of parabolic reflector antennas for troposcatter radio communications are analyzed. Defects of previous alignment techniques are delineated and a new technique for automatic antenna alignment is presented.

  16. Alignment of CEBAF cryomodules

    SciTech Connect

    Schneider, W.J.; Bisognano, J.J.; Fischer, J.

    1993-06-01

    CEBAF, the Continuous Electron Beam Accelerator Facility, when completed, will house a 4 GeV recirculating accelerator. Each of the accelerator`s two linacs contains 160 superconducting radio frequency (SRF) 1497 MHz niobium cavities in 20 cryomodules. Alignments of the cavities within the cryomodule with respect to beam axis is critical to achieving the optimum accelerator performance. This paper discusses the rationale for the current specification on cavity mechanical alignment: 2 mrad (rms) applied to the 0.5 m active length cavities. We describe the tooling that was developed to achieve the tolerance at the time of cavity pair assembly, to preserve and integrate alignment during cryomodule assembly, and to translate alignment to appropriate installation in the beam line.

  17. Robust Nonnegative Patch Alignment for Dimensionality Reduction.

    PubMed

    You, Xinge; Ou, Weihua; Chen, Chun Lung Philip; Li, Qiang; Zhu, Ziqi; Tang, Yuanyan

    2015-11-01

    Dimensionality reduction is an important method to analyze high-dimensional data and has many applications in pattern recognition and computer vision. In this paper, we propose a robust nonnegative patch alignment for dimensionality reduction, which includes a reconstruction error term and a whole alignment term. We use correntropy-induced metric to measure the reconstruction error, in which the weight is learned adaptively for each entry. For the whole alignment, we propose locality-preserving robust nonnegative patch alignment (LP-RNA) and sparsity-preserviing robust nonnegative patch alignment (SP-RNA), which are unsupervised and supervised, respectively. In the LP-RNA, we propose a locally sparse graph to encode the local geometric structure of the manifold embedded in high-dimensional space. In particular, we select large p -nearest neighbors for each sample, then obtain the sparse representation with respect to these neighbors. The sparse representation is used to build a graph, which simultaneously enjoys locality, sparseness, and robustness. In the SP-RNA, we simultaneously use local geometric structure and discriminative information, in which the sparse reconstruction coefficient is used to characterize the local geometric structure and weighted distance is used to measure the separability of different classes. For the induced nonconvex objective function, we formulate it into a weighted nonnegative matrix factorization based on half-quadratic optimization. We propose a multiplicative update rule to solve this function and show that the objective function converges to a local optimum. Several experimental results on synthetic and real data sets demonstrate that the learned representation is more discriminative and robust than most existing dimensionality reduction methods.

  18. Pairwise Sequence Alignment Library

    SciTech Connect

    Jeff Daily, PNNL

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  19. Align-m--a new algorithm for multiple alignment of highly divergent sequences.

    PubMed

    Van Walle, Ivo; Lasters, Ignace; Wyns, Lode

    2004-06-12

    Multiple alignment of highly divergent sequences is a challenging problem for which available programs tend to show poor performance. Generally, this is due to a scoring function that does not describe biological reality accurately enough or a heuristic that cannot explore solution space efficiently enough. In this respect, we present a new program, Align-m, that uses a non-progressive local approach to guide a global alignment. Two large test sets were used that represent the entire SCOP classification and cover sequence similarities between 0 and 50% identity. Performance was compared with the publicly available algorithms ClustalW, T-Coffee and DiAlign. In general, Align-m has comparable or slightly higher accuracy in terms of correctly aligned residues, especially for distantly related sequences. Importantly, it aligns much fewer residues incorrectly, with average differences of over 15% compared with some of the other algorithms. Align-m and the test sets are available at http://bioinformatics.vub.ac.be

  20. PDV Probe Alignment Technique

    SciTech Connect

    Whitworth, T L; May, C M; Strand, O T

    2007-10-26

    This alignment technique was developed while performing heterodyne velocimetry measurements at LLNL. There are a few minor items needed, such as a white card with aperture in center, visible alignment laser, IR back reflection meter, and a microscope to view the bridge surface. The work was performed on KCP flyers that were 6 and 8 mils wide. The probes used were Oz Optics manufactured with focal distances of 42mm and 26mm. Both probes provide a spot size of approximately 80?m at 1550nm. The 42mm probes were specified to provide an internal back reflection of -35 to -40dB, and the probe back reflections were measured to be -37dB and -33dB. The 26mm probes were specified as -30dB and both measured -30.5dB. The probe is initially aligned normal to the flyer/bridge surface. This provides a very high return signal, up to -2dB, due to the bridge reflectivity. A white card with a hole in the center as an aperture can be used to check the reflected beam position relative to the probe and launch beam, and the alignment laser spot centered on the bridge, see Figure 1 and Figure 2. The IR back reflection meter is used to measure the dB return from the probe and surface, and a white card or similar object is inserted between the probe and surface to block surface reflection. It may take several iterations between the visible alignment laser and the IR back reflection meter to complete this alignment procedure. Once aligned normal to the surface, the probe should be tilted to position the visible alignment beam as shown in Figure 3, and the flyer should be translated in the X and Y axis to reposition the alignment beam onto the flyer as shown in Figure 4. This tilting of the probe minimizes the amount of light from the bridge reflection into the fiber within the probe while maintaining the alignment as near normal to the flyer surface as possible. When the back reflection is measured after the tilt adjustment, the level should be about -3dB to -6dB higher than the probes

  1. Curriculum Alignment Research Suggests that Alignment Can Improve Student Achievement

    ERIC Educational Resources Information Center

    Squires, David

    2012-01-01

    Curriculum alignment research has developed showing the relationship among three alignment categories: the taught curriculum, the tested curriculum and the written curriculum. Each pair (for example, the taught and the written curriculum) shows a positive impact for aligning those results. Following this, alignment results from the Third…

  2. Curriculum Alignment Research Suggests that Alignment Can Improve Student Achievement

    ERIC Educational Resources Information Center

    Squires, David

    2012-01-01

    Curriculum alignment research has developed showing the relationship among three alignment categories: the taught curriculum, the tested curriculum and the written curriculum. Each pair (for example, the taught and the written curriculum) shows a positive impact for aligning those results. Following this, alignment results from the Third…

  3. Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.

    PubMed

    Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo

    2016-07-19

    Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .

  4. Skip the Alignment: Degenerate, Multiplex Primer and Probe Design Using K-mer Matching Instead of Alignments

    PubMed Central

    Hysom, David A.; Naraghi-Arani, Pejman; Elsheikh, Maher; Carrillo, A. Celena; Williams, Peter L.; Gardner, Shea N.

    2012-01-01

    PriMux is a new software package for selecting multiplex compatible, degenerate primers and probes to detect diverse targets such as viruses. It requires no multiple sequence alignment, instead applying k-mer algorithms, hence it scales well for large target sets and saves user effort from curating sequences into alignable groups. PriMux has the capability to predict degenerate primers as well as probes suitable for TaqMan or other primer/probe triplet assay formats, or simply probes for microarray or other single-oligo assay formats. PriMux employs suffix array methods for efficient calculations on oligos 10-∼100 nt in length. TaqMan® primers and probes for each segment of Rift Valley fever virus were designed using PriMux, and lab testing comparing signatures designed using PriMux versus those designed using traditional methods demonstrated equivalent or better sensitivity for the PriMux-designed signatures compared to traditional signatures. In addition, we used PriMux to design TaqMan® primers and probes for unalignable or poorly alignable groups of targets: that is, all segments of Rift Valley fever virus analyzed as a single target set of 198 sequences, or all 2863 Dengue virus genomes for all four serotypes available at the time of our analysis. The PriMux software is available as open source from http://sourceforge.net/projects/PriMux. PMID:22485178

  5. Skip the alignment: degenerate, multiplex primer and probe design using K-mer matching instead of alignments.

    PubMed

    Hysom, David A; Naraghi-Arani, Pejman; Elsheikh, Maher; Carrillo, A Celena; Williams, Peter L; Gardner, Shea N

    2012-01-01

    PriMux is a new software package for selecting multiplex compatible, degenerate primers and probes to detect diverse targets such as viruses. It requires no multiple sequence alignment, instead applying k-mer algorithms, hence it scales well for large target sets and saves user effort from curating sequences into alignable groups. PriMux has the capability to predict degenerate primers as well as probes suitable for TaqMan or other primer/probe triplet assay formats, or simply probes for microarray or other single-oligo assay formats. PriMux employs suffix array methods for efficient calculations on oligos 10-~100 nt in length. TaqMan® primers and probes for each segment of Rift Valley fever virus were designed using PriMux, and lab testing comparing signatures designed using PriMux versus those designed using traditional methods demonstrated equivalent or better sensitivity for the PriMux-designed signatures compared to traditional signatures. In addition, we used PriMux to design TaqMan® primers and probes for unalignable or poorly alignable groups of targets: that is, all segments of Rift Valley fever virus analyzed as a single target set of 198 sequences, or all 2863 Dengue virus genomes for all four serotypes available at the time of our analysis. The PriMux software is available as open source from http://sourceforge.net/projects/PriMux.

  6. Optics Alignment Panel

    NASA Technical Reports Server (NTRS)

    Schroeder, Daniel J.

    1992-01-01

    The Optics Alignment Panel (OAP) was commissioned by the HST Science Working Group to determine the optimum alignment of the OTA optics. The goal was to find the position of the secondary mirror (SM) for which there is no coma or astigmatism in the camera images due to misaligned optics, either tilt or decenter. The despace position was reviewed of the SM and the optimum focus was sought. The results of these efforts are as follows: (1) the best estimate of the aligned position of the SM in the notation of HDOS is (DZ,DY,TZ,TY) = (+248 microns, +8 microns, +53 arcsec, -79 arcsec), and (2) the best focus, defined to be that despace which maximizes the fractional energy at 486 nm in a 0.1 arcsec radius of a stellar image, is 12.2 mm beyond paraxial focus. The data leading to these conclusions, and the estimated uncertainties in the final results, are presented.

  7. Improved docking alignment system

    NASA Technical Reports Server (NTRS)

    Monford, Leo G. (Inventor)

    1988-01-01

    Improved techniques are provided for the alignment of two objects. The present invention is particularly suited for 3-D translation and 3-D rotational alignment of objects in outer space. A camera is affixed to one object, such as a remote manipulator arm of the spacecraft, while the planar reflective surface is affixed to the other object, such as a grapple fixture. A monitor displays in real-time images from the camera such that the monitor displays both the reflected image of the camera and visible marking on the planar reflective surface when the objects are in proper alignment. The monitor may thus be viewed by the operator and the arm manipulated so that the reflective surface is perpendicular to the optical axis of the camera, the roll of the reflective surface is at a selected angle with respect to the camera, and the camera is spaced a pre-selected distance from the reflective surface.

  8. Whole-genome alignment.

    PubMed

    Dewey, Colin N

    2012-01-01

    Whole-genome alignment (WGA) is the prediction of evolutionary relationships at the nucleotide level between two or more genomes. It combines aspects of both colinear sequence alignment and gene orthology prediction, and is typically more challenging to address than either of these tasks due to the size and complexity of whole genomes. Despite the difficulty of this problem, numerous methods have been developed for its solution because WGAs are valuable for genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction. In this chapter, we discuss the meaning and significance of WGA and present an overview of the methods that address it. We also examine the problem of evaluating whole-genome aligners and offer a set of methodological challenges that need to be tackled in order to make the most effective use of our rapidly growing databases of whole genomes.

  9. Magnetically Aligned Supramolecular Hydrogels

    PubMed Central

    Wallace, Matthew; Cardoso, Andre Zamith; Frith, William J; Iggo, Jonathan A; Adams, Dave J

    2014-01-01

    The magnetic-field-induced alignment of the fibrillar structures present in an aqueous solution of a dipeptide gelator, and the subsequent retention of this alignment upon transformation to a hydrogel upon the addition of CaCl2 or upon a reduction in solution pH is reported. Utilising the switchable nature of the magnetic field coupled with the slow diffusion of CaCl2, it is possible to precisely control the extent of anisotropy across a hydrogel, something that is generally very difficult to do using alternative methods. The approach is readily extended to other compounds that form viscous solutions at high pH. It is expected that this work will greatly expand the utility of such low-molecular-weight gelators (LMWG) in areas where alignment is key. PMID:25345918

  10. MUSE optical alignment procedure

    NASA Astrophysics Data System (ADS)

    Laurent, Florence; Renault, Edgard; Loupias, Magali; Kosmalski, Johan; Anwand, Heiko; Bacon, Roland; Boudon, Didier; Caillier, Patrick; Daguisé, Eric; Dubois, Jean-Pierre; Dupuy, Christophe; Kelz, Andreas; Lizon, Jean-Louis; Nicklas, Harald; Parès, Laurent; Remillieux, Alban; Seifert, Walter; Valentin, Hervé; Xu, Wenli

    2012-09-01

    MUSE (Multi Unit Spectroscopic Explorer) is a second generation VLT integral field spectrograph (1x1arcmin² Field of View) developed for the European Southern Observatory (ESO), operating in the visible wavelength range (0.465-0.93 μm). A consortium of seven institutes is currently assembling and testing MUSE in the Integration Hall of the Observatoire de Lyon for the Preliminary Acceptance in Europe, scheduled for 2013. MUSE is composed of several subsystems which are under the responsibility of each institute. The Fore Optics derotates and anamorphoses the image at the focal plane. A Splitting and Relay Optics feed the 24 identical Integral Field Units (IFU), that are mounted within a large monolithic instrument mechanical structure. Each IFU incorporates an image slicer, a fully refractive spectrograph with VPH-grating and a detector system connected to a global vacuum and cryogenic system. During 2011, all MUSE subsystems were integrated, aligned and tested independently in each institute. After validations, the systems were shipped to the P.I. institute at Lyon and were assembled in the Integration Hall This paper describes the end-to-end optical alignment procedure of the MUSE instrument. The design strategy, mixing an optical alignment by manufacturing (plug and play approach) and few adjustments on key components, is presented. We depict the alignment method for identifying the optical axis using several references located in pupil and image planes. All tools required to perform the global alignment between each subsystem are described. The success of this alignment approach is demonstrated by the good results for the MUSE image quality. MUSE commissioning at the VLT (Very Large Telescope) is planned for 2013.

  11. PILOT optical alignment

    NASA Astrophysics Data System (ADS)

    Longval, Y.; Mot, B.; Ade, P.; André, Y.; Aumont, J.; Baustista, L.; Bernard, J.-Ph.; Bray, N.; de Bernardis, P.; Boulade, O.; Bousquet, F.; Bouzit, M.; Buttice, V.; Caillat, A.; Charra, M.; Chaigneau, M.; Crane, B.; Crussaire, J.-P.; Douchin, F.; Doumayrou, E.; Dubois, J.-P.; Engel, C.; Etcheto, P.; Gélot, P.; Griffin, M.; Foenard, G.; Grabarnik, S.; Hargrave, P..; Hughes, A.; Laureijs, R.; Lepennec, Y.; Leriche, B.; Maestre, S.; Maffei, B.; Martignac, J.; Marty, C.; Marty, W.; Masi, S.; Mirc, F.; Misawa, R.; Montel, J.; Montier, L.; Narbonne, J.; Nicot, J.-M.; Pajot, F.; Parot, G.; Pérot, E.; Pimentao, J.; Pisano, G.; Ponthieu, N.; Ristorcelli, I.; Rodriguez, L.; Roudil, G.; Salatino, M.; Savini, G.; Simonella, O.; Saccoccio, M.; Tapie, P.; Tauber, J.; Torre, J.-P.; Tucker, C.

    2016-07-01

    PILOT is a balloon-borne astronomy experiment designed to study the polarization of dust emission in the diffuse interstellar medium in our Galaxy at wavelengths 240 μm with an angular resolution about two arcminutes. Pilot optics is composed an off-axis Gregorian type telescope and a refractive re-imager system. All optical elements, except the primary mirror, are in a cryostat cooled to 3K. We combined the optical, 3D dimensional measurement methods and thermo-elastic modeling to perform the optical alignment. The talk describes the system analysis, the alignment procedure, and finally the performances obtained during the first flight in September 2015.

  12. RHIC survey and alignment

    SciTech Connect

    Karl, F.X.; Anderson, R.R.; Goldman, M.A.; Hemmer, F.M.; Kazmark, D. Jr.; Mroczkowski, T.T.; Roecklien, J.C.

    1993-07-01

    The Relativistic Heavy Ion Collider consists of two interlaced plane rings, a pair of mirror-symmetric beam injection arcs, a spatially curved beam transfer line from the Alternating Gradient Synchrotron, and a collection of precisely positioned and aligned magnets, on appropriately positioned support stands, threaded on those arcs. RHIC geometry is defined by six beam crossing points exactly in a plane, lying precising at the vertices of a regular hexagon of specified size position and orientation of this hexagon are defined geodetically. Survey control and alignment procedures, currently in use to construct RHIC, are described.

  13. Laboratory simulation of field-aligned currents

    NASA Technical Reports Server (NTRS)

    Wessel, Frank J.; Rostoker, Norman

    1993-01-01

    A summary of progress during the period Apr. 1992 to Mar. 1993 is provided. Objectives of the research are (1) to simulate, via laboratory experiments, the three terms of the field-aligned current equation; (2) to simulate auroral-arc formation processes by configuring the boundary conditions of the experimental chamber and plasma parameters to produce highly localized return currents at the end of a field-aligned current system; and (3) to extrapolate these results, using theoretical and computational techniques, to the problem of magnetospheric-ionospheric coupling and to compare them with published literature signatures of auroral-arc phenomena.

  14. Impact decimation using alignment of granular spheres

    NASA Astrophysics Data System (ADS)

    Tiwari, Mukesh; Krishna Mohan, T. R.; Sen, Surajit

    2017-04-01

    Solitary waves in alignment of elastic beads have been an important area of study. An important and rich area has been the behavior of solitary waves at a boundary, where features such as localization, anomalous behavior in scattering and transmission, quasiequilibrium phase, etc. are being studied. An application area of significance is the design of artificial granular alignments for shock decimation and dispersion. In this review article, we first present a summary and background of these unique features, and some designs in 1D which exploit these features. We further discuss some extensions to higher dimensional systems and their impact decimation ability.

  15. Curriculum Alignment: Establishing Coherence

    ERIC Educational Resources Information Center

    Gagné, Philippe; Dumont, Laurence; Brunet, Sabine; Boucher, Geneviève

    2013-01-01

    In this paper, we present a step-by-step guide to implement a curricular alignment project, directed at professional development and student support, and developed in a higher education French as a second language department. We outline best practices and preliminary results from our experience and provide ways to adapt our experience to other…

  16. Aligned-or Not?

    ERIC Educational Resources Information Center

    Roseman, Jo Ellen; Koppal, Mary

    2015-01-01

    When state leaders and national partners in the development of the Next Generation Science Standards met to consider implementation strategies, states and school districts wanted to know which materials were aligned to the new standards. The answer from the developers was short but not sweet: You won't find much now, and it's going to…

  17. Aligning brains and minds

    PubMed Central

    Tong, Frank

    2012-01-01

    In this issue of Neuron, Haxby and colleagues describe a new method for aligning functional brain activity patterns across participants. Their study demonstrates that objects are similarly represented across different brains, allowing for reliable classification of one person’s brain activity based on another’s. PMID:22017984

  18. Optically Aligned Drill Press

    NASA Technical Reports Server (NTRS)

    Adderholdt, Bruce M.

    1994-01-01

    Precise drill press equipped with rotary-indexing microscope. Microscope and drill exchange places when turret rotated. Microscope axis first aligned over future hole, then rotated out of way so drill axis assumes its precise position. New procedure takes less time to locate drilling positions and produces more accurate results. Apparatus adapted to such other machine tools as milling and measuring machines.

  19. Precision Antenna Alignment Procedure.

    DTIC Science & Technology

    Precise azimuthal alignment of troposcatter system antennas is achieved by centering on the great circle, the combined pattern of intercepting beams...from two troposcatter antennas. The combined antenna pattern is determined to be centered on and symmetric about the great circle when the Doppler

  20. Optically Aligned Drill Press

    NASA Technical Reports Server (NTRS)

    Adderholdt, Bruce M.

    1994-01-01

    Precise drill press equipped with rotary-indexing microscope. Microscope and drill exchange places when turret rotated. Microscope axis first aligned over future hole, then rotated out of way so drill axis assumes its precise position. New procedure takes less time to locate drilling positions and produces more accurate results. Apparatus adapted to such other machine tools as milling and measuring machines.

  1. Aligned-or Not?

    ERIC Educational Resources Information Center

    Roseman, Jo Ellen; Koppal, Mary

    2015-01-01

    When state leaders and national partners in the development of the Next Generation Science Standards met to consider implementation strategies, states and school districts wanted to know which materials were aligned to the new standards. The answer from the developers was short but not sweet: You won't find much now, and it's going to…

  2. Variability in static alignment and kinematics for kinematically aligned TKA.

    PubMed

    Theodore, Willy; Twiggs, Joshua; Kolos, Elizabeth; Roe, Justin; Fritsch, Brett; Dickison, David; Liu, David; Salmon, Lucy; Miles, Brad; Howell, Stephen

    2017-08-01

    Total knee arthroplasty (TKA) significantly improves pain and restores a considerable degree of function. However, improvements are needed to increase patient satisfaction and restore kinematics to allow more physically demanding activities that active patients consider important. The aim of our study was to compare the alignment and motion of kinematically and mechanically aligned TKAs. A patient specific musculoskeletal computer simulation was used to compare the tibio-femoral and patello-femoral kinematics between mechanically aligned and kinematically aligned TKA in 20 patients. When kinematically aligned, femoral components on average resulted in more valgus alignment to the mechanical axis and internally rotated to surgical transepicondylar axis whereas tibia component on average resulted in more varus alignment to the mechanical axis and internally rotated to tibial AP rotational axis. With kinematic alignment, tibio-femoral motion displayed greater tibial external rotation and lateral femoral flexion facet centre (FFC) translation with knee flexion than mechanical aligned TKA. At the patellofemoral joint, patella lateral shift of kinematically aligned TKA plateaued after 20 to 30° flexion while in mechanically aligned TKA it decreased continuously through the whole range of motion. Kinematic alignment resulted in greater variation than mechanical alignment for all tibio-femoral and patello-femoral motion. Kinematic alignment places TKA components patient specific alignment which depends on the preoperative state of the knee resulting in greater variation in kinematics. The use of computational models has the potential to predict which alignment based on native alignment, kinematic or mechanical, could improve knee function for patient's undergoing TKA. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. CAB-Align: A Flexible Protein Structure Alignment Method Based on the Residue-Residue Contact Area

    PubMed Central

    Terashi, Genki; Takeda-Shitaka, Mayuko

    2015-01-01

    Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue–residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue–residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both

  4. CAB-Align: A Flexible Protein Structure Alignment Method Based on the Residue-Residue Contact Area.

    PubMed

    Terashi, Genki; Takeda-Shitaka, Mayuko

    2015-01-01

    Proteins are flexible, and this flexibility has an essential functional role. Flexibility can be observed in loop regions, rearrangements between secondary structure elements, and conformational changes between entire domains. However, most protein structure alignment methods treat protein structures as rigid bodies. Thus, these methods fail to identify the equivalences of residue pairs in regions with flexibility. In this study, we considered that the evolutionary relationship between proteins corresponds directly to the residue-residue physical contacts rather than the three-dimensional (3D) coordinates of proteins. Thus, we developed a new protein structure alignment method, contact area-based alignment (CAB-align), which uses the residue-residue contact area to identify regions of similarity. The main purpose of CAB-align is to identify homologous relationships at the residue level between related protein structures. The CAB-align procedure comprises two main steps: First, a rigid-body alignment method based on local and global 3D structure superposition is employed to generate a sufficient number of initial alignments. Then, iterative dynamic programming is executed to find the optimal alignment. We evaluated the performance and advantages of CAB-align based on four main points: (1) agreement with the gold standard alignment, (2) alignment quality based on an evolutionary relationship without 3D coordinate superposition, (3) consistency of the multiple alignments, and (4) classification agreement with the gold standard classification. Comparisons of CAB-align with other state-of-the-art protein structure alignment methods (TM-align, FATCAT, and DaliLite) using our benchmark dataset showed that CAB-align performed robustly in obtaining high-quality alignments and generating consistent multiple alignments with high coverage and accuracy rates, and it performed extremely well when discriminating between homologous and nonhomologous pairs of proteins in both

  5. DNA recognition for virus assembly through multiple sequence-independent interactions with a helix-turn-helix motif

    PubMed Central

    Greive, Sandra J.; Fung, Herman K.H.; Chechik, Maria; Jenkins, Huw T.; Weitzel, Stephen E.; Aguiar, Pedro M.; Brentnall, Andrew S.; Glousieau, Matthieu; Gladyshev, Grigory V.; Potts, Jennifer R.; Antson, Alfred A.

    2016-01-01

    The helix-turn-helix (HTH) motif features frequently in protein DNA-binding assemblies. Viral pac site-targeting small terminase proteins possess an unusual architecture in which the HTH motifs are displayed in a ring, distinct from the classical HTH dimer. Here we investigate how such a circular array of HTH motifs enables specific recognition of the viral genome for initiation of DNA packaging during virus assembly. We found, by surface plasmon resonance and analytical ultracentrifugation, that individual HTH motifs of the Bacillus phage SF6 small terminase bind the packaging regions of SF6 and related SPP1 genome weakly, with little local sequence specificity. Nuclear magnetic resonance chemical shift perturbation studies with an arbitrary single-site substrate suggest that the HTH motif contacts DNA similarly to how certain HTH proteins contact DNA non-specifically. Our observations support a model where specificity is generated through conformational selection of an intrinsically bent DNA segment by a ring of HTHs which bind weakly but cooperatively. Such a system would enable viral gene regulation and control of the viral life cycle, with a minimal genome, conferring a major evolutionary advantage for SPP1-like viruses. PMID:26673721

  6. MUSE alignment onto VLT

    NASA Astrophysics Data System (ADS)

    Laurent, Florence; Renault, Edgard; Boudon, Didier; Caillier, Patrick; Daguisé, Eric; Dupuy, Christophe; Jarno, Aurélien; Lizon, Jean-Louis; Migniau, Jean-Emmanuel; Nicklas, Harald; Piqueras, Laure

    2014-07-01

    MUSE (Multi Unit Spectroscopic Explorer) is a second generation Very Large Telescope (VLT) integral field spectrograph developed for the European Southern Observatory (ESO). It combines a 1' x 1' field of view sampled at 0.2 arcsec for its Wide Field Mode (WFM) and a 7.5"x7.5" field of view for its Narrow Field Mode (NFM). Both modes will operate with the improved spatial resolution provided by GALACSI (Ground Atmospheric Layer Adaptive Optics for Spectroscopic Imaging), that will use the VLT deformable secondary mirror and 4 Laser Guide Stars (LGS) foreseen in 2015. MUSE operates in the visible wavelength range (0.465-0.93 μm). A consortium of seven institutes is currently commissioning MUSE in the Very Large Telescope for the Preliminary Acceptance in Chile, scheduled for September, 2014. MUSE is composed of several subsystems which are under the responsibility of each institute. The Fore Optics derotates and anamorphoses the image at the focal plane. A Splitting and Relay Optics feed the 24 identical Integral Field Units (IFU), that are mounted within a large monolithic structure. Each IFU incorporates an image slicer, a fully refractive spectrograph with VPH-grating and a detector system connected to a global vacuum and cryogenic system. During 2012 and 2013, all MUSE subsystems were integrated, aligned and tested to the P.I. institute at Lyon. After successful PAE in September 2013, MUSE instrument was shipped to the Very Large Telescope in Chile where that was aligned and tested in ESO integration hall at Paranal. After, MUSE was directly transported, fully aligned and without any optomechanical dismounting, onto VLT telescope where the first light was overcame the 7th of February, 2014. This paper describes the alignment procedure of the whole MUSE instrument with respect to the Very Large Telescope (VLT). It describes how 6 tons could be move with accuracy better than 0.025mm and less than 0.25 arcmin in order to reach alignment requirements. The success

  7. Protein alignment algorithms with an efficient backtracking routine on multiple GPUs

    PubMed Central

    2011-01-01

    Background Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. Results In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. Conclusions The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card. PMID:21599912

  8. Inflation by alignment

    SciTech Connect

    Burgess, C.P.; Roest, Diederik

    2015-06-08

    Pseudo-Goldstone bosons (pGBs) can provide technically natural inflatons, as has been comparatively well-explored in the simplest axion examples. Although inflationary success requires trans-Planckian decay constants, f≳M{sub p}, several mechanisms have been proposed to obtain this, relying on (mis-)alignments between potential and kinetic energies in multiple-field models. We extend these mechanisms to a broader class of inflationary models, including in particular the exponential potentials that arise for pGB potentials based on noncompact groups (and so which might apply to moduli in an extra-dimensional setting). The resulting potentials provide natural large-field inflationary models and can predict a larger primordial tensor signal than is true for simpler single-field versions of these models. In so doing we provide a unified treatment of several alignment mechanisms, showing how each emerges as a limit of the more general setup.

  9. Alignment reference device

    DOEpatents

    Patton, Gail Y.; Torgerson, Darrel D.

    1987-01-01

    An alignment reference device provides a collimated laser beam that minimizes angular deviations therein. A laser beam source outputs the beam into a single mode optical fiber. The output end of the optical fiber acts as a source of radiant energy and is positioned at the focal point of a lens system where the focal point is positioned within the lens. The output beam reflects off a mirror back to the lens that produces a collimated beam.

  10. Nuclear reactor alignment plate configuration

    SciTech Connect

    Altman, David A; Forsyth, David R; Smith, Richard E; Singleton, Norman R

    2014-01-28

    An alignment plate that is attached to a core barrel of a pressurized water reactor and fits within slots within a top plate of a lower core shroud and upper core plate to maintain lateral alignment of the reactor internals. The alignment plate is connected to the core barrel through two vertically-spaced dowel pins that extend from the outside surface of the core barrel through a reinforcement pad and into corresponding holes in the alignment plate. Additionally, threaded fasteners are inserted around the perimeter of the reinforcement pad and into the alignment plate to further secure the alignment plate to the core barrel. A fillet weld also is deposited around the perimeter of the reinforcement pad. To accomodate thermal growth between the alignment plate and the core barrel, a gap is left above, below and at both sides of one of the dowel pins in the alignment plate holes through with the dowel pins pass.

  11. Orbit IMU alignment: Error analysis

    NASA Technical Reports Server (NTRS)

    Corson, R. W.

    1980-01-01

    A comprehensive accuracy analysis of orbit inertial measurement unit (IMU) alignments using the shuttle star trackers was completed and the results are presented. Monte Carlo techniques were used in a computer simulation of the IMU alignment hardware and software systems to: (1) determine the expected Space Transportation System 1 Flight (STS-1) manual mode IMU alignment accuracy; (2) investigate the accuracy of alignments in later shuttle flights when the automatic mode of star acquisition may be used; and (3) verify that an analytical model previously used for estimating the alignment error is a valid model. The analysis results do not differ significantly from expectations. The standard deviation in the IMU alignment error for STS-1 alignments was determined to the 68 arc seconds per axis. This corresponds to a 99.7% probability that the magnitude of the total alignment error is less than 258 arc seconds.

  12. Dynamic Alignment at SLS

    SciTech Connect

    Ruland, Robert E.

    2003-04-23

    The relative alignment of components in the storage ring of the Swiss Light Source (SLS) is guaranteed by mechanical means. The magnets are rigidly fixed to 48 girders by means of alignment rails with tolerances of less than {+-}15 {micro}m. The bending magnets, supported by 3 point ball bearings, overlap adjacent girders and thus establish virtual train links between the girders, located near the bending magnet centres. Keeping the distortion of the storage ring geometry within a tolerance of {+-}100 {micro}m in order to guarantee sufficient dynamic apertures, requires continuous monitoring and correction of the girder locations. Two monitoring systems for the horizontal and the vertical direction will be installed to measure displacements of the train link between girders, which are due to ground settings and temperature effects: The hydrostatic levelling system (HLS) gives an absolute vertical reference, while the horizontal positioning system (HPS), which employs low cost linear encoders with sub-micron resolution, measures relative horizontal movements. The girder mover system based on five DC motors per girder allows a dynamic realignment of the storage ring within a working window of more than {+-}1 mm for girder translations and {+-}1 mrad for rotations. We will describe both monitoring systems (HLS and HPS) as well as the applied correction scheme based on the girder movers. We also show simulations indicating that beam based girder alignment takes care of most of the static closed orbit correction.

  13. Docking alignment system

    NASA Technical Reports Server (NTRS)

    Monford, Leo G. (Inventor)

    1990-01-01

    Improved techniques are provided for alignment of two objects. The present invention is particularly suited for three-dimensional translation and three-dimensional rotational alignment of objects in outer space. A camera 18 is fixedly mounted to one object, such as a remote manipulator arm 10 of the spacecraft, while the planar reflective surface 30 is fixed to the other object, such as a grapple fixture 20. A monitor 50 displays in real-time images from the camera, such that the monitor displays both the reflected image of the camera and visible markings on the planar reflective surface when the objects are in proper alignment. The monitor may thus be viewed by the operator and the arm 10 manipulated so that the reflective surface is perpendicular to the optical axis of the camera, the roll of the reflective surface is at a selected angle with respect to the camera, and the camera is spaced a pre-selected distance from the reflective surface.

  14. Method for alignment of microwires

    DOEpatents

    Beardslee, Joseph A.; Lewis, Nathan S.; Sadtler, Bryce

    2017-01-24

    A method of aligning microwires includes modifying the microwires so they are more responsive to a magnetic field. The method also includes using a magnetic field so as to magnetically align the microwires. The method can further include capturing the microwires in a solid support structure that retains the longitudinal alignment of the microwires when the magnetic field is not applied to the microwires.

  15. Alignment as a Teacher Variable

    ERIC Educational Resources Information Center

    Porter, Andrew C.; Smithson, John; Blank, Rolf; Zeidner, Timothy

    2007-01-01

    With the exception of the procedures developed by Porter and colleagues (Porter, 2002), other methods of defining and measuring alignment are essentially limited to alignment between tests and standards. Porter's procedures have been generalized to investigating the alignment between content standards, tests, textbooks, and even classroom…

  16. Alignment as a Teacher Variable

    ERIC Educational Resources Information Center

    Porter, Andrew C.; Smithson, John; Blank, Rolf; Zeidner, Timothy

    2007-01-01

    With the exception of the procedures developed by Porter and colleagues (Porter, 2002), other methods of defining and measuring alignment are essentially limited to alignment between tests and standards. Porter's procedures have been generalized to investigating the alignment between content standards, tests, textbooks, and even classroom…

  17. CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment.

    PubMed

    Chen, Xi; Wang, Chen; Tang, Shanjiang; Yu, Ce; Zou, Quan

    2017-06-24

    The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn (2)) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to

  18. Integrated Automatic Workflow for Phylogenetic Tree Analysis Using Public Access and Local Web Services.

    PubMed

    Damkliang, Kasikrit; Tandayya, Pichaya; Sangket, Unitsa; Pasomsub, Ekawat

    2016-11-28

    At the present, coding sequence (CDS) has been discovered and larger CDS is being revealed frequently. Approaches and related tools have also been developed and upgraded concurrently, especially for phylogenetic tree analysis. This paper proposes an integrated automatic Taverna workflow for the phylogenetic tree inferring analysis using public access web services at European Bioinformatics Institute (EMBL-EBI) and Swiss Institute of Bioinformatics (SIB), and our own deployed local web services. The workflow input is a set of CDS in the Fasta format. The workflow supports 1,000 to 20,000 numbers in bootstrapping replication. The workflow performs the tree inferring such as Parsimony (PARS), Distance Matrix - Neighbor Joining (DIST-NJ), and Maximum Likelihood (ML) algorithms of EMBOSS PHYLIPNEW package based on our proposed Multiple Sequence Alignment (MSA) similarity score. The local web services are implemented and deployed into two types using the Soaplab2 and Apache Axis2 deployment. There are SOAP and Java Web Service (JWS) providing WSDL endpoints to Taverna Workbench, a workflow manager. The workflow has been validated, the performance has been measured, and its results have been verified. Our workflow's execution time is less than ten minutes for inferring a tree with 10,000 replicates of the bootstrapping numbers. This paper proposes a new integrated automatic workflow which will be beneficial to the bioinformaticians with an intermediate level of knowledge and experiences. All local services have been deployed at our portal http://bioservices.sci.psu.ac.th.

  19. Phylo-VISTA: An Interactive Visualization Tool for Multiple DNA Sequence Alignments

    SciTech Connect

    Shah, Nameeta; Couronne, Olivier; Pennacchio, Len A.; Brudno, Michael; Batzoglou, Serafim; Bethel, E. Wes; Rubin, Edward M.; Hamann, Bernd; Dubchak, Inna

    2004-04-01

    We have developed Phylo-VISTA (Shah et al., 2003), an interactive software tool for analyzing multiple alignments by visualizing a similarity measure for DNA sequences of multiple species. The complexity of visual presentation is effectively organized using a framework based upon inter-species phylogenetic relationships. The phylogenetic organization supports rapid, user-guided inter-species comparison. To aid in navigation through large sequence datasets, Phylo-VISTA provides a user with the ability to select and view data at varying resolutions. The combination of multi-resolution data visualization and analysis, combined with the phylogenetic framework for inter-species comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments.

  20. A data parallel strategy for aligning multiple biological sequences on multi-core computers.

    PubMed

    Zhu, Xiangyuan; Li, Kenli; Salah, Ahmad

    2013-05-01

    In this paper, we address the large-scale biological sequence alignment problem, which has an increasing demand in computational biology. We employ data parallelism paradigm that is suitable for handling large-scale processing on multi-core computers to achieve a high degree of parallelism. Using the data parallelism paradigm, we propose a general strategy which can be used to speed up any multiple sequence alignment method. We applied five different clustering algorithms in our strategy and implemented rigorous tests on an 8-core computer using four traditional benchmarks and artificially generated sequences. The results show that our multi-core-based implementations can achieve up to 151-fold improvements in execution time while losing 2.19% accuracy on average. The source code of the proposed strategy, together with the test sets used in our analysis, is available on request. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. Dust Grain Alignment in the Interstellar Medium

    NASA Astrophysics Data System (ADS)

    Vaillancourt, J.; Andersson, B. G.; Lazarian, A.

    The first observations of interstellar polarization at visible wavelengths over 60 years ago were quickly attributed to the net alignment of irregular dust grains with local magnetic fields. This mechanism provides a method to measure the topology and strength of the magnetic field and to probe the physical characteristics of the dust (e.g., material, size, and shape). However, to do so with confidence, the physics and variability of the alignment mechanism(s) must be quantitatively understood. The description of the physical alignment mechanism has a long history with key contributions spanning decades; the last 15 years have seen major advances in both the theoretical and observational understanding of the problem. For example, it is now clear that the canonical process of paramagnetic relaxation, in which grain rotational components perpendicular to the magnetic field are damped out, is inadequate to align grains on the necessary timescales (compared to damping via collisions) for typical interstellar medium conditions. However, the modern theory of radiative alignment has been more successful; in this theory grains are aligned with respect to the magnetic field via photon-grain interactions that impart the necessary torques to the rotation axes of grains. Here we highlight key observational tests of these alignment mechanisms, especially those involving spectropolarimetry of both dust extinction at near-optical wavelengths and dust emission at far-infrared through millimeter wavelengths. Observations in both these regimes can place limits on such grain aspects as their size and temperature. To date, most observations of the polarized emission have been in the densest regions of the interstellar medium where interpretation in terms of grain alignment models is complicated by regions containing embedded stars and a wide range of temperatures. Additionally, direct comparison of the optical extinction polarization (AV . 10 magnitudes) with dust emission

  2. Precision alignment and mounting apparatus

    NASA Technical Reports Server (NTRS)

    Preston, Dennis R. (Inventor)

    1993-01-01

    An alignment and mounting apparatus for mounting two modules (10,12) includes a first portion having a cylindrical alignment pin (16) projecting normal to a module surface, a second portion having a three-stage alignment guide (18) including a shoehorn flange (34), a Y-slot (42) and a V-block (22) which sequentially guide the alignment pin (16) with successively finer precision and a third portion in the form of a spring-loaded captive fastener (20) for connecting the two modules after alignment is achieved.

  3. CDD: a database of conserved domain alignments with links to domain three-dimensional structure

    PubMed Central

    Marchler-Bauer, Aron; Panchenko, Anna R.; Shoemaker, Benjamin A.; Thiessen, Paul A.; Geer, Lewis Y.; Bryant, Stephen H.

    2002-01-01

    The Conserved Domain Database (CDD) is a compilation of multiple sequence alignments representing protein domains conserved in molecular evolution. It has been populated with alignment data from the public collections Pfam and SMART, as well as with contributions from colleagues at NCBI. The current version of CDD (v.1.54) contains 3693 such models. CDD alignments are linked to protein sequence and structure data in Entrez. The molecular structure viewer Cn3D serves as a tool to interactively visualize alignments and three-dimensional structure, and to link three-dimensional residue coordinates to descriptions of evolutionary conservation. CDD can be accessed on the World Wide Web at http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml. Protein query sequences may be compared against databases of position-specific score matrices derived from alignments in CDD, using a service named CD-Search, which can be found at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search runs reverse-position-specific BLAST (RPS-BLAST), a variant of the widely used PSI-BLAST algorithm. CD-Search is run by default for protein–protein queries submitted to NCBI’s BLAST service at http://www.ncbi.nlm.nih.gov/BLAST. PMID:11752315

  4. MutationAligner: a resource of recurrent mutation hotspots in protein domains in cancer.

    PubMed

    Gauthier, Nicholas Paul; Reznik, Ed; Gao, Jianjiong; Sumer, Selcuk Onur; Schultz, Nikolaus; Sander, Chris; Miller, Martin L

    2016-01-04

    The MutationAligner web resource, available at http://www.mutationaligner.org, enables discovery and exploration of somatic mutation hotspots identified in protein domains in currently (mid-2015) more than 5000 cancer patient samples across 22 different tumor types. Using multiple sequence alignments of protein domains in the human genome, we extend the principle of recurrence analysis by aggregating mutations in homologous positions across sets of paralogous genes. Protein domain analysis enhances the statistical power to detect cancer-relevant mutations and links mutations to the specific biological functions encoded in domains. We illustrate how the MutationAligner database and interactive web tool can be used to explore, visualize and analyze mutation hotspots in protein domains across genes and tumor types. We believe that MutationAligner will be an important resource for the cancer research community by providing detailed clues for the functional importance of particular mutations, as well as for the design of functional genomics experiments and for decision support in precision medicine. MutationAligner is slated to be periodically updated to incorporate additional analyses and new data from cancer genomics projects. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  5. MutationAligner: a resource of recurrent mutation hotspots in protein domains in cancer

    PubMed Central

    Gauthier, Nicholas Paul; Reznik, Ed; Gao, Jianjiong; Sumer, Selcuk Onur; Schultz, Nikolaus; Sander, Chris; Miller, Martin L.

    2016-01-01

    The MutationAligner web resource, available at http://www.mutationaligner.org, enables discovery and exploration of somatic mutation hotspots identified in protein domains in currently (mid-2015) more than 5000 cancer patient samples across 22 different tumor types. Using multiple sequence alignments of protein domains in the human genome, we extend the principle of recurrence analysis by aggregating mutations in homologous positions across sets of paralogous genes. Protein domain analysis enhances the statistical power to detect cancer-relevant mutations and links mutations to the specific biological functions encoded in domains. We illustrate how the MutationAligner database and interactive web tool can be used to explore, visualize and analyze mutation hotspots in protein domains across genes and tumor types. We believe that MutationAligner will be an important resource for the cancer research community by providing detailed clues for the functional importance of particular mutations, as well as for the design of functional genomics experiments and for decision support in precision medicine. MutationAligner is slated to be periodically updated to incorporate additional analyses and new data from cancer genomics projects. PMID:26590264

  6. Alignment of suprathermally rotating grains

    NASA Astrophysics Data System (ADS)

    Lazarian, A.

    1995-12-01

    It is shown that mechanical alignment can be efficient for suprathermally rotating grains, provided that they drift with supersonic velocities. Such a drift should be widely spread due to both Alfvenic waves and ambipolar diffusion. Moreover, if suprathermal rotation is caused by grain interaction with a radiative flux, it is shown that mechanical alignment may be present even in the absence of supersonic drift. This means that the range of applicability of mechanical alignment is wider than generally accepted and that it can rival the paramagnetic one. We also study the latter mechanism and re-examine the interplay between poisoning of active sites and desorption of molecules blocking the access to the active sites of H_2 formation, in order to explain the observed poor alignment of small grains and good alignment of large grains. To obtain a more comprehensive picture of alignment, we briefly discuss the alignment by radiation fluxes and by grain magnetic moments.

  7. Engineering cell alignment in vitro.

    PubMed

    Li, Yuhui; Huang, Guoyou; Zhang, Xiaohui; Wang, Lin; Du, Yanan; Lu, Tian Jian; Xu, Feng

    2014-01-01

    Cell alignment plays a critical role in various cell behaviors including cytoskeleton reorganization, membrane protein relocation, nucleus gene expression, and ECM remodeling. Cell alignment is also known to exert significant effects on tissue regeneration (e.g., neuron) and modulate mechanical properties of tissues including skeleton, cardiac muscle and tendon. Therefore, it is essential to engineer cell alignment in vitro for biomechanics, cell biology, tissue engineering and regenerative medicine applications. With advances in nano- and micro-scale technologies, a variety of approaches have been developed to engineer cell alignment in vitro, including mechanical loading, topographical patterning, and surface chemical treatment. In this review, we first present alignments of various cell types and their functionality in different tissues in vivo including muscle and nerve tissues. Then, we provide an overview of recent approaches for engineering cell alignment in vitro. Finally, concluding remarks and perspectives are addressed for future improvement of engineering cell alignment.

  8. TSGC and JSC Alignment

    NASA Technical Reports Server (NTRS)

    Sanchez, Humberto

    2013-01-01

    NASA and the SGCs are, by design, intended to work closely together and have synergistic Vision, Mission, and Goals. The TSGC affiliates and JSC have been working together, but not always in a concise, coordinated, nor strategic manner. Today we have a couple of simple ideas to present about how TSGC and JSC have started to work together in a more concise, coordinated, and strategic manner, and how JSC and non-TSG Jurisdiction members have started to collaborate: Idea I: TSGC and JSC Technical Alignment Idea II: Concept of Clusters.

  9. Self-Aligning Coupler

    NASA Technical Reports Server (NTRS)

    Cooney, Earl T.

    1990-01-01

    Joint reduces assembly time and eliminates fumbling. Self-aligning coupler easy to use for people wearing heavy gloves or other restrictive clothing. Consists of two threaded sections, one with blade, other with slot - joined by threaded collar. Blade fits precisely in slot. Notch in blade engages pin in slot to form temporary attachment. Collar turned on continuous thread of joined sections to form tight, rigid joint. Designed for assembly of structures by astronauts in space suits, coupler used on Earth by firefighters wearing protective garments, technicians handling hazardous materials, and others working underwater or in other difficult environments.

  10. The N-terminal region of eukaryotic translation initiation factor 5A signals to nuclear localization of the protein

    SciTech Connect

    Parreiras-e-Silva, Lucas T.; Gomes, Marcelo D.; Oliveira, Eduardo B.; Costa-Neto, Claudio M.

    2007-10-19

    The eukaryotic translation initiation factor 5A (eIF5A) is a ubiquitous protein of eukaryotic and archaeal organisms which undergoes hypusination, a unique post-translational modification. We have generated a polyclonal antibody against murine eIF5A, which in immunocytochemical assays in B16-F10 cells revealed that the endogenous protein is preferentially localized to the nuclear region. We therefore analyzed possible structural features present in eIF5A proteins that could be responsible for that characteristic. Multiple sequence alignment analysis of eIF5A proteins from different eukaryotic and archaeal organisms showed that the former sequences have an extended N-terminal segment. We have then performed in silico prediction analyses and constructed different truncated forms of murine eIF5A to verify any possible role that the N-terminal extension might have in determining the subcellular localization of the eIF5A in eukaryotic organisms. Our results indicate that the N-terminal extension of the eukaryotic eIF5A contributes in signaling this protein to nuclear localization, despite of bearing no structural similarity with classical nuclear localization signals.

  11. OBSERVATIONS OF ENHANCED RADIATIVE GRAIN ALIGNMENT NEAR HD 97300

    SciTech Connect

    Andersson, B-G; Potter, S. B. E-mail: sbp@saao.ac.z

    2010-09-10

    We have obtained optical multi-band polarimetry toward sightlines through the Chamaeleon I cloud, particularly in the vicinity of the young B9/A0 star HD 97300. We show, in agreement with earlier studies, that the radiation field impinging on the cloud in the projected vicinity of the star is dominated by the flux from the star, as evidenced by a local enhancement in the grain heating. By comparing the differential grain heating with the differential change in the location of the peak of the polarization curve, we show that the grain alignment is enhanced by the increase in the radiation field. We also find a weak, but measurable, variation in the grain alignment with the relative angle between the radiation field anisotropy and the magnetic field direction. Such an anisotropy in the grain alignment is consistent with a unique prediction of modern radiative alignment torque theory and provides direct support for radiatively driven grain alignment.

  12. MRFalign: protein homology detection through alignment of Markov random fields.

    PubMed

    Ma, Jianzhu; Wang, Sheng; Wang, Zhiyong; Xu, Jinbo

    2014-03-01

    Sequence-based protein homology detection has been extensively studied and so far the most sensitive method is based upon comparison of protein sequence profiles, which are derived from multiple sequence alignment (MSA) of sequence homologs in a protein family. A sequence profile is usually represented as a position-specific scoring matrix (PSSM) or an HMM (Hidden Markov Model) and accordingly PSSM-PSSM or HMM-HMM comparison is used for homolog detection. This paper presents a new homology detection method MRFalign, consisting of three key components: 1) a Markov Random Fields (MRF) representation of a protein family; 2) a scoring function measuring similarity of two MRFs; and 3) an efficient ADMM (Alternating Direction Method of Multipliers) algorithm aligning two MRFs. Compared to HMM that can only model very short-range residue correlation, MRFs can model long-range residue interaction pattern and thus, encode information for the global 3D structure of a protein family. Consequently, MRF-MRF comparison for remote homology detection shall be much more sensitive than HMM-HMM or PSSM-PSSM comparison. Experiments confirm that MRFalign outperforms several popular HMM or PSSM-based methods in terms of both alignment accuracy and remote homology detection and that MRFalign works particularly well for mainly beta proteins. For example, tested on the benchmark SCOP40 (8353 proteins) for homology detection, PSSM-PSSM and HMM-HMM succeed on 48% and 52% of proteins, respectively, at superfamily level, and on 15% and 27% of proteins, respectively, at fold level. In contrast, MRFalign succeeds on 57.3% and 42.5% of proteins at superfamily and fold level, respectively. This study implies that long-range residue interaction patterns are very helpful for sequence-based homology detection. The software is available for download at http://raptorx.uchicago.edu/download/. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2-5.

  13. MRFalign: Protein Homology Detection through Alignment of Markov Random Fields

    PubMed Central

    Ma, Jianzhu; Wang, Sheng; Wang, Zhiyong; Xu, Jinbo

    2014-01-01

    Sequence-based protein homology detection has been extensively studied and so far the most sensitive method is based upon comparison of protein sequence profiles, which are derived from multiple sequence alignment (MSA) of sequence homologs in a protein family. A sequence profile is usually represented as a position-specific scoring matrix (PSSM) or an HMM (Hidden Markov Model) and accordingly PSSM-PSSM or HMM-HMM comparison is used for homolog detection. This paper presents a new homology detection method MRFalign, consisting of three key components: 1) a Markov Random Fields (MRF) representation of a protein family; 2) a scoring function measuring similarity of two MRFs; and 3) an efficient ADMM (Alternating Direction Method of Multipliers) algorithm aligning two MRFs. Compared to HMM that can only model very short-range residue correlation, MRFs can model long-range residue interaction pattern and thus, encode information for the global 3D structure of a protein family. Consequently, MRF-MRF comparison for remote homology detection shall be much more sensitive than HMM-HMM or PSSM-PSSM comparison. Experiments confirm that MRFalign outperforms several popular HMM or PSSM-based methods in terms of both alignment accuracy and remote homology detection and that MRFalign works particularly well for mainly beta proteins. For example, tested on the benchmark SCOP40 (8353 proteins) for homology detection, PSSM-PSSM and HMM-HMM succeed on 48% and 52% of proteins, respectively, at superfamily level, and on 15% and 27% of proteins, respectively, at fold level. In contrast, MRFalign succeeds on 57.3% and 42.5% of proteins at superfamily and fold level, respectively. This study implies that long-range residue interaction patterns are very helpful for sequence-based homology detection. The software is available for download at http://raptorx.uchicago.edu/download/. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5. PMID:24675572

  14. Functional Alignment of Metabolic Networks.

    PubMed

    Mazza, Arnon; Wagner, Allon; Ruppin, Eytan; Sharan, Roded

    2016-05-01

    Network alignment has become a standard tool in comparative biology, allowing the inference of protein function, interaction, and orthology. However, current alignment techniques are based on topological properties of networks and do not take into account their functional implications. Here we propose, for the first time, an algorithm to align two metabolic networks by taking advantage of their coupled metabolic models. These models allow us to assess the functional implications of genes or reactions, captured by the metabolic fluxes that are altered following their deletion from the network. Such implications may spread far beyond the region of the network where the gene or reaction lies. We apply our algorithm to align metabolic networks from various organisms, ranging from bacteria to humans, showing that our alignment can reveal functional orthology relations that are missed by conventional topological alignments.

  15. Onorbit IMU alignment error budget

    NASA Technical Reports Server (NTRS)

    Corson, R. W.

    1980-01-01

    The Star Tracker, Crew Optical Alignment Sight (COAS), and Inertial Measurement Unit (IMU) from a complex navigation system with a multitude of error sources were combined. A complete list of the system errors is presented. The errors were combined in a rational way to yield an estimate of the IMU alignment accuracy for STS-1. The expected standard deviation in the IMU alignment error for STS-1 type alignments was determined to be 72 arc seconds per axis for star tracker alignments and 188 arc seconds per axis for COAS alignments. These estimates are based on current knowledge of the star tracker, COAS, IMU, and navigation base error specifications, and were partially verified by preliminary Monte Carlo analysis.

  16. Nuclear reactor internals alignment configuration

    DOEpatents

    Gilmore, Charles B.; Singleton, Norman R.

    2009-11-10

    An alignment system that employs jacking block assemblies and alignment posts around the periphery of the top plate of a nuclear reactor lower internals core shroud to align an upper core plate with the lower internals and the core shroud with the core barrel. The distal ends of the alignment posts are chamfered and are closely received within notches machined in the upper core plate at spaced locations around the outer circumference of the upper core plate. The jacking block assemblies are used to center the core shroud in the core barrel and the alignment posts assure the proper orientation of the upper core plate. The alignment posts may alternately be formed in the upper core plate and the notches may be formed in top plate.

  17. Aligned Defrosting Dunes

    NASA Technical Reports Server (NTRS)

    2004-01-01

    17 August 2004 This July 2004 Mars Global Surveyor (MGS) Mars Orbiter Camera (MOC) image shows a group of aligned barchan sand dunes in the martian north polar region. At the time, the dunes were covered with seasonal frost, but the frost had begun to sublime away, leaving dark spots and dark outlines around the dunes. The surrounding plains exhibit small, diffuse spots that are also the result of subliming seasonal frost. This northern spring image, acquired on a descending ground track (as MGS was moving north to south on the 'night' side of Mars) is located near 78.8oN, 34.8oW. The image covers an area about 3 km (1.9 mi) across and sunlight illuminates the scene from the upper left.

  18. The alignment strategy of HADES

    NASA Astrophysics Data System (ADS)

    Pechenova, O.; Pechenov, V.; Galatyuk, T.; Hennino, T.; Holzmann, R.; Kornakov, G.; Markert, J.; Müntz, C.; Salabura, P.; Schmah, A.; Schwab, E.; Stroth, J.

    2015-06-01

    The global as well as intrinsic alignment of any spectrometer impacts directly on its performance and the quality of the achievable physics results. An overview of the current alignment procedure of the DiElectron Spectrometer HADES is presented with an emphasis on its main features and its accuracy. The sequence of all steps and procedures is given, including details on photogrammetric and track-based alignment.

  19. Lunar Alignments - Identification and Analysis

    NASA Astrophysics Data System (ADS)

    González-García, A. César

    Lunar alignments are difficult to establish given the apparent lack of written accounts clearly pointing toward lunar alignments for individual temples. While some individual cases are reviewed and highlighted, the weight of the proof must fall on statistical sampling. Some definitions for the lunar alignments are provided in order to clarify the targets, and thus, some new tools are provided to try to test the lunar hypothesis in several cases, especially in megalithic astronomy.

  20. Alignment of Helical Membrane Protein Sequences Using AlignMe

    PubMed Central

    Khafizov, Kamil; Forrest, Lucy R.

    2013-01-01

    Few sequence alignment methods have been designed specifically for integral membrane proteins, even though these important proteins have distinct evolutionary and structural properties that might affect their alignments. Existing approaches typically consider membrane-related information either by using membrane-specific substitution matrices or by assigning distinct penalties for gap creation in transmembrane and non-transmembrane regions. Here, we ask whether favoring matching of predicted transmembrane segments within a standard dynamic programming algorithm can improve the accuracy of pairwise membrane protein sequence alignments. We tested various strategies using a specifically designed program called AlignMe. An updated set of homologous membrane protein structures, called HOMEP2, was used as a reference for optimizing the gap penalties. The best of the membrane-protein optimized approaches were then tested on an independent reference set of membrane protein sequence alignments from the BAliBASE collection. When secondary structure (S) matching was combined with evolutionary information (using a position-specific substitution matrix (P)), in an approach we called AlignMePS, the resultant pairwise alignments were typically among the most accurate over a broad range of sequence similarities when compared to available methods. Matching transmembrane predictions (T), in addition to evolutionary information, and secondary-structure predictions, in an approach called AlignMePST, generally reduces the accuracy of the alignments of closely-related proteins in the BAliBASE set relative to AlignMePS, but may be useful in cases of extremely distantly related proteins for which sequence information is less informative. The open source AlignMe code is available at https://sourceforge.net/projects/alignme/, and at http://www.forrestlab.org, along with an online server and the HOMEP2 data set. PMID:23469223

  1. Cosmological information in the intrinsic alignments of luminous red galaxies

    SciTech Connect

    Chisari, Nora Elisa; Dvorkin, Cora E-mail: cdvorkin@ias.edu

    2013-12-01

    The intrinsic alignments of galaxies are usually regarded as a contaminant to weak gravitational lensing observables. The alignment of Luminous Red Galaxies, detected unambiguously in observations from the Sloan Digital Sky Survey, can be reproduced by the linear tidal alignment model of Catelan, Kamionkowski and Blandford (2001) on large scales. In this work, we explore the cosmological information encoded in the intrinsic alignments of red galaxies. We make forecasts for the ability of current and future spectroscopic surveys to constrain local primordial non-Gaussianity and Baryon Acoustic Oscillations (BAO) in the cross-correlation function of intrinsic alignments and the galaxy density field. For the Baryon Oscillation Spectroscopic Survey, we find that the BAO signal in the intrinsic alignments is marginally significant with a signal-to-noise ratio of 1.8 and 2.2 with the current LOWZ and CMASS samples of galaxies, respectively, and increasing to 2.3 and 2.7 once the survey is completed. For the Dark Energy Spectroscopic Instrument and for a spectroscopic survey following the EUCLID redshift selection function, we find signal-to-noise ratios of 12 and 15, respectively. Local type primordial non-Gaussianity, parametrized by f{sub NL} = 10, is only marginally significant in the intrinsic alignments signal with signal-to-noise ratios < 2 for the three surveys considered.

  2. Protein Secondary Structure Prediction Using Local Adaptive Techniques in Training Neural Networks

    NASA Astrophysics Data System (ADS)

    Aik, Lim Eng; Zainuddin, Zarita; Joseph, Annie

    2008-01-01

    One of the most significant problems in computer molecular biology today is how to predict a protein's three-dimensional structure from its one-dimensional amino acid sequence or generally call the protein folding problem and difficult to determine the corresponding protein functions. Thus, this paper involves protein secondary structure prediction using neural network in order to solve the protein folding problem. The neural network used for protein secondary structure prediction is multilayer perceptron (MLP) of the feed-forward variety. The training set are taken from the protein data bank which are 120 proteins while 60 testing set is the proteins which were chosen randomly from the protein data bank. Multiple sequence alignment (MSA) is used to get the protein similar sequence and Position Specific Scoring matrix (PSSM) is used for network input. The training process of the neural network involves local adaptive techniques. Local adaptive techniques used in this paper comprises Learning rate by sign changes, SuperSAB, Quickprop and RPROP. From the simulation, the performance for learning rate by Rprop and Quickprop are superior to all other algorithms with respect to the convergence time. However, the best result was obtained using Rprop algorithm.

  3. Coronal alignment of patellofemoral arthroplasty.

    PubMed

    Thienpont, Emmanuel; Lonner, Jess H

    2014-01-01

    Patellofemoral arthroplasty (PFA) can yield successful results in appropriately selected patients. The varus-valgus position or coronal alignment of the trochlear implant is determined by how its transitional edges articulate with the condylar cartilage. Whilst variation in condylar anatomy will not influence the axis of the lower limb in PFA, it can impact on the Q-angle of the PF joint. The aim of this study was to analyze how the coronal alignment can be influenced by the choice of anatomical landmarks. Retrospective analysis of 57 PFAs with measurements of alignment from full leg radiographs. Coronal alignment following anterior condylar anatomy leads to a mean (SD) proximal valgus alignment of 100° (9°). Aligning the component with Whiteside's line gives a better alignment with less variance 89° (3°). A trochlear component with a higher Q-angle compensates for patellar maltracking if the condylar anatomy would tend to put the implant in a more proximal varus or neutral position. If the trochlear component is proximally aligned in valgus this may have the opposite effect. Aligning the trochlear component with the AP-axis in the coronal plane avoids maltracking and optimally utilizes the design features of the implant. Level III. © 2014 Elsevier B.V. All rights reserved.

  4. A Dynamic Alignment System for the Final Focus Test Beam

    SciTech Connect

    Ruland, R.E.; Bressler, V.E.; Fischer, G.; Plouffe, D.; /SLAC

    2005-08-16

    The Final Focus Test Beam (FFTB) was conceived as a technological stepping stone on the way to the next linear collider. Nowhere is this more evident than with the alignment subsystems. Alignment tolerances for components prior to beam turn are almost an order of magnitude smaller than for previous projects at SLAC. Position monitoring systems which operate independent of the beam are employed to monitor motions of the components locally and globally with unprecedented precision. An overview of the FFTB alignment system is presented herein.

  5. GS-align for glycan structure alignment and similarity measurement

    PubMed Central

    Lee, Hui Sun; Jo, Sunhwan; Mukherjee, Srayanta; Park, Sang-Jun; Skolnick, Jeffrey; Lee, Jooyoung; Im, Wonpil

    2015-01-01

    Motivation: Glycans play critical roles in many biological processes, and their structural diversity is key for specific protein-glycan recognition. Comparative structural studies of biological molecules provide useful insight into their biological relationships. However, most computational tools are designed for protein structure, and despite their importance, there is no currently available tool for comparing glycan structures in a sequence order- and size-independent manner. Results: A novel method, GS-align, is developed for glycan structure alignment and similarity measurement. GS-align generates possible alignments between two glycan structures through iterative maximum clique search and fragment superposition. The optimal alignment is then determined by the maximum structural similarity score, GS-score, which is size-independent. Benchmark tests against the Protein Data Bank (PDB) N-linked glycan library and PDB homologous/non-homologous N-glycoprotein sets indicate that GS-align is a robust computational tool to align glycan structures and quantify their structural similarity. GS-align is also applied to template-based glycan structure prediction and monosaccharide substitution matrix generation to illustrate its utility. Availability and implementation: http://www.glycanstructure.org/gsalign. Contact: wonpil@ku.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25857669

  6. MolAlign: an algorithm for aligning multiple small molecules

    NASA Astrophysics Data System (ADS)

    Chan, Shek Ling

    2017-06-01

    In small molecule drug discovery projects, the receptor structure is not always available. In such cases it is enormously useful to be able to align known ligands in the way they bind in the receptor. Here we shall present an algorithm for the alignment of multiple small molecule ligands. This algorithm takes pre-generated conformers as input, and proposes aligned assemblies of the ligands. The algorithm consists of two stages: the first stage is to perform alignments for each pair of ligands, the second stage makes use of the results from the first stage to build up multiple ligand alignment assemblies using a novel iterative procedure. The scoring functions are improved versions of the one mentioned in our previous work. We have compared our results with some recent publications. While an exact comparison is impossible, it is clear that our algorithm is fast and produces very competitive results.

  7. Variable Reference Alignment: an improved peak alignment protocol for NMR spectral data with large inter-sample variation

    PubMed Central

    MacKinnon, Neil; Ge, Wencheng; Khan, Amjad P.; Somashekar, Bagganahalli S.; Tripathi, Pratima; Siddiqui, Javed; Wei, John T.; Chinnaiyan, Arul M.; Rajendiran, Thekkelnaycke M.; Ramamoorthy, Ayyalusamy

    2012-01-01

    In an effort to address the variable correspondence problem across large sample cohorts common in metabolomic/metabonomic studies, we have developed a pre-alignment protocol that aims to generate spectral segments sharing a common target spectrum. Under the assumption that a single reference spectrum will not correctly represent all spectra of a data set, the goal of this approach is to perform local alignment corrections on spectral regions which share a common ‘most similar’ spectrum. A natural beneficial outcome of this procedure is the automatic definition of spectral segments, a feature that is not common to all alignment methods. This protocol is shown to specifically improve the quality of alignment in 1H NMR data sets exhibiting large inter-sample compositional variation (e.g. pH, ionic strength). As a proof-of-principle demonstration, we have utilized two recently developed alignment algorithms specific to NMR data, recursive segment-wise peak alignment and interval correlated shifting and applied them to two data sets comprised of 15 aqueous cell line extract and 20 human urine 1H NMR profiles. Application of this protocol represents a fundamental shift from current alignment methodologies that seek to correct misalignments utilizing a single representative spectrum, with the added benefit that it can be appended to any alignment algorithm. PMID:22616856

  8. Optical alignment of oval graphene flakes

    NASA Astrophysics Data System (ADS)

    Mobini, E.; Rahimzadegan, A.; Alaee, R.; Rockstuhl, C.

    2017-03-01

    Patterned graphene, as an atomically thin layer, supports localized surface plasmon-polaritons (LSPPs) at mid-infrared or far-infrared frequencies. This provides a pronounced optical force/torque in addition to large optical cross sections and will make it an ideal candidate for optical manipulation. Here, we study the optical force and torque exerted by a linearly polarized plane wave on circular and oval graphene flakes. Whereas the torque vanishes for circular flakes, the finite torque allows rotating and orienting oval flakes relative to the electric field polarization. Depending on the wavelength, the alignment is either perpendicular or parallel. In our contribution, we rely on full-wave numerical simulation but also on an analytical model that treats the graphene flakes in dipole approximation. The presented results reveal a good level of control on the spatial alignment of graphene flakes subjected to far-infrared illumination.

  9. Optical alignment of oval graphene flakes.

    PubMed

    Mobini, E; Rahimzadegan, A; Alaee, R; Rockstuhl, C

    2017-03-15

    Patterned graphene, as an atomically thin layer, supports localized surface plasmon polaritons at mid-infrared or far-infrared frequencies. This provides a pronounced optical force/torque in addition to large optical cross sections and will make it an ideal candidate for optical manipulation. Here, we study the optical force and torque exerted by a linearly polarized plane wave on circular and oval graphene flakes (single layers of graphene). While the torque vanishes for circular flakes, the finite torque allows rotating and orienting oval flakes relative to the electric field polarization. Depending on the wavelength, the alignment is either parallel or perpendicular to the electric field vector. In our contribution, we rely on a full-wave numerical simulation and also on an analytical model that treats the graphene flakes in a dipole approximation. The presented results reveal a good level of control on the spatial alignment of graphene flakes subjected to far-infrared illumination.

  10. Mask alignment system for semiconductor processing

    DOEpatents

    Webb, Aaron P.; Carlson, Charles T.; Weaver, William T.; Grant, Christopher N.

    2017-02-14

    A mask alignment system for providing precise and repeatable alignment between ion implantation masks and workpieces. The system includes a mask frame having a plurality of ion implantation masks loosely connected thereto. The mask frame is provided with a plurality of frame alignment cavities, and each mask is provided with a plurality of mask alignment cavities. The system further includes a platen for holding workpieces. The platen may be provided with a plurality of mask alignment pins and frame alignment pins configured to engage the mask alignment cavities and frame alignment cavities, respectively. The mask frame can be lowered onto the platen, with the frame alignment cavities moving into registration with the frame alignment pins to provide rough alignment between the masks and workpieces. The mask alignment cavities are then moved into registration with the mask alignment pins, thereby shifting each individual mask into precise alignment with a respective workpiece.

  11. Calibration and Alignment.

    NASA Astrophysics Data System (ADS)

    Grassotti, Christopher; Iskenderian, Haig; Hoffman, Ross N.

    1999-06-01

    Discrepancies between estimates of rainfall from ground-based radar and satellite observing systems can be attributed to either calibration differences or to geolocation and sampling differences. These latter include differences due to radar or satellite misregistration, differences in observation times, or variations in instrument and retrieval algorithm sensitivities. A new methodology has been developed and tested for integrating radar- and satellite-based estimates of precipitation using a feature calibration and alignment (FCA) technique. The parameters describing the calibration and alignment are found using a variational approach, and are composed of displacement and amplitude adjustments to the satellite rainfall retrievals, which minimize the differences with respect to the radar data and satisfy additional smoothness and magnitude constraints. In this approach the amplitude component represents a calibration of the satellite estimate to the radar, whereas the displacement components correct temporal and/or geolocation differences between the radar and satellite data.The method has been tested on a number of cases of the NASA WetNet PIP-2 dataset. These data consist of coincident estimates of rainfall by ground-based radar and the DMSP SSM/I. Sensitivity tests were conducted to tune the parameters of the algorithm. Results indicate the effectiveness of the technique in minimizing the discrepancies between radar and satellite observations of rainfall for a variety of rainfall events ranging from midlatitude frontal precipitation to heavy convection associated with a tropical cyclone (Hurricane Andrew). A remaining issue to be resolved is the incorporation of knowledge about location dependencies in the errors of the radar and microwave estimates.Once the satellite data have been adjusted to match the radar observations, the two independent estimates (radar and adjusted SSM/I rain rates) may be blended to improve the overall depiction of the rainfall event

  12. GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters

    PubMed Central

    Sela, Itamar; Ashkenazy, Haim; Katoh, Kazutaka; Pupko, Tal

    2015-01-01

    Inference of multiple sequence alignments (MSAs) is a critical part of phylogenetic and comparative genomics studies. However, from the same set of sequences different MSAs are often inferred, depending on the methodologies used and the assumed parameters. Much effort has recently been devoted to improving the ability to identify unreliable alignment regions. Detecting such unreliable regions was previously shown to be important for downstream analyses relying on MSAs, such as the detection of positive selection. Here we developed GUIDANCE2, a new integrative methodology that accounts for: (i) uncertainty in the process of indel formation, (ii) uncertainty in the assumed guide tree and (iii) co-optimal solutions in the pairwise alignments, used as building blocks in progressive alignment algorithms. We compared GUIDANCE2 with seven methodologies to detect unreliable MSA regions using extensive simulations and empirical benchmarks. We show that GUIDANCE2 outperforms all previously developed methodologies. Furthermore, GUIDANCE2 also provides a set of alternative MSAs which can be useful for downstream analyses. The novel algorithm is implemented as a web-server, available at: http://guidance.tau.ac.il. PMID:25883146

  13. QuickProbs 2: Towards rapid construction of high-quality alignments of large protein families

    PubMed Central

    Gudyś, Adam; Deorowicz, Sebastian

    2017-01-01

    The ever-increasing size of sequence databases caused by the development of high throughput sequencing, poses to multiple alignment algorithms one of the greatest challenges yet. As we show, well-established techniques employed for increasing alignment quality, i.e., refinement and consistency, are ineffective when large protein families are investigated. We present QuickProbs 2, an algorithm for multiple sequence alignment. Based on probabilistic models, equipped with novel column-oriented refinement and selective consistency, it offers outstanding accuracy. When analysing hundreds of sequences, Quick-Probs 2 is noticeably better than ClustalΩ and MAFFT, the previous leaders for processing numerous protein families. In the case of smaller sets, for which consistency-based methods are the best performing, QuickProbs 2 is also superior to the competitors. Due to low computational requirements of selective consistency and utilization of massively parallel architectures, presented algorithm has similar execution times to ClustalΩ, and is orders of magnitude faster than full consistency approaches, like MSAProbs or PicXAA. All these make QuickProbs 2 an excellent tool for aligning families ranging from few, to hundreds of proteins. PMID:28139687

  14. Drive alignment pays maintenance dividends

    SciTech Connect

    Fedder, R.

    2008-12-15

    Proper alignment of the motor and gear drive on conveying and processing equipment will result in longer bearing and coupling life, along with lower maintenance costs. Selecting an alignment free drive package instead of a traditional foot mounted drive and motor is a major advancement toward these goals. 4 photos.

  15. CATO: The Clone Alignment Tool

    PubMed Central

    Henstock, Peter V.; LaPan, Peter

    2016-01-01

    High-throughput cloning efforts produce large numbers of sequences that need to be aligned, edited, compared with reference sequences, and organized as files and selected clones. Different pieces of software are typically required to perform each of these tasks. We have designed a single piece of software, CATO, the Clone Alignment Tool, that allows a user to align, evaluate, edit, and select clone sequences based on comparisons to reference sequences. The input and output are designed to be compatible with standard data formats, and thus suitable for integration into a clone processing pipeline. CATO provides both sequence alignment and visualizations to facilitate the analysis of cloning experiments. The alignment algorithm matches each of the relevant candidate sequences against each reference sequence. The visualization portion displays three levels of matching: 1) a top-level summary of the top candidate sequences aligned to each reference sequence, 2) a focused alignment view with the nucleotides of matched sequences displayed against one reference sequence, and 3) a pair-wise alignment of a single reference and candidate sequence pair. Users can select the minimum matching criteria for valid clones, edit or swap reference sequences, and export the results to a summary file as part of the high-throughput cloning workflow. PMID:27459605

  16. Engaging Teachers in Curriculum Alignment.

    ERIC Educational Resources Information Center

    Armstrong, Dale; Suddards, Carol

    1999-01-01

    In 1997, Edmonton Public Schools (Alberta, Canada) began developing a process to engage teachers in curriculum alignment with a view to improving student achievement. Ten principles guiding the curriculum alignment framework are listed, followed by first-year results and factors that led to the framework's success. (CDS)

  17. CATO: The Clone Alignment Tool.

    PubMed

    Henstock, Peter V; LaPan, Peter

    2016-01-01

    High-throughput cloning efforts produce large numbers of sequences that need to be aligned, edited, compared with reference sequences, and organized as files and selected clones. Different pieces of software are typically required to perform each of these tasks. We have designed a single piece of software, CATO, the Clone Alignment Tool, that allows a user to align, evaluate, edit, and select clone sequences based on comparisons to reference sequences. The input and output are designed to be compatible with standard data formats, and thus suitable for integration into a clone processing pipeline. CATO provides both sequence alignment and visualizations to facilitate the analysis of cloning experiments. The alignment algorithm matches each of the relevant candidate sequences against each reference sequence. The visualization portion displays three levels of matching: 1) a top-level summary of the top candidate sequences aligned to each reference sequence, 2) a focused alignment view with the nucleotides of matched sequences displayed against one reference sequence, and 3) a pair-wise alignment of a single reference and candidate sequence pair. Users can select the minimum matching criteria for valid clones, edit or swap reference sequences, and export the results to a summary file as part of the high-throughput cloning workflow.

  18. Lexical alignment in triadic communication

    PubMed Central

    Foltz, Anouschka; Gaspers, Judith; Thiele, Kristina; Stenneken, Prisca; Cimiano, Philipp

    2015-01-01

    Lexical alignment refers to the adoption of one’s interlocutor’s lexical items. Accounts of the mechanisms underlying such lexical alignment differ (among other aspects) in the role assigned to addressee-centered behavior. In this study, we used a triadic communicative situation to test which factors may modulate the extent to which participants’ lexical alignment reflects addressee-centered behavior. Pairs of naïve participants played a picture matching game and received information about the order in which pictures were to be matched from a voice over headphones. On critical trials, participants did or did not hear a name for the picture to be matched next over headphones. Importantly, when the voice over headphones provided a name, it did not match the name that the interlocutor had previously used to describe the object. Participants overwhelmingly used the word that the voice over headphones provided. This result points to non-addressee-centered behavior and is discussed in terms of disrupting alignment with the interlocutor as well as in terms of establishing alignment with the voice over headphones. In addition, the type of picture (line drawing vs. tangram shape) independently modulated lexical alignment, such that participants showed more lexical alignment to their interlocutor for (more ambiguous) tangram shapes compared to line drawings. Overall, the results point to a rather large role for non-addressee-centered behavior during lexical alignment. PMID:25762955

  19. Lexical alignment in triadic communication.

    PubMed

    Foltz, Anouschka; Gaspers, Judith; Thiele, Kristina; Stenneken, Prisca; Cimiano, Philipp

    2015-01-01

    Lexical alignment refers to the adoption of one's interlocutor's lexical items. Accounts of the mechanisms underlying such lexical alignment differ (among other aspects) in the role assigned to addressee-centered behavior. In this study, we used a triadic communicative situation to test which factors may modulate the extent to which participants' lexical alignment reflects addressee-centered behavior. Pairs of naïve participants played a picture matching game and received information about the order in which pictures were to be matched from a voice over headphones. On critical trials, participants did or did not hear a name for the picture to be matched next over headphones. Importantly, when the voice over headphones provided a name, it did not match the name that the interlocutor had previously used to describe the object. Participants overwhelmingly used the word that the voice over headphones provided. This result points to non-addressee-centered behavior and is discussed in terms of disrupting alignment with the interlocutor as well as in terms of establishing alignment with the voice over headphones. In addition, the type of picture (line drawing vs. tangram shape) independently modulated lexical alignment, such that participants showed more lexical alignment to their interlocutor for (more ambiguous) tangram shapes compared to line drawings. Overall, the results point to a rather large role for non-addressee-centered behavior during lexical alignment.

  20. Well-pump alignment system

    DOEpatents

    Drumheller, Douglas S.

    1998-01-01

    An improved well-pump for geothermal wells, an alignment system for a well-pump, and to a method for aligning a rotor and stator within a well-pump, wherein the well-pump has a whistle assembly formed at a bottom portion thereof, such that variations in the frequency of the whistle, indicating misalignment, may be monitored during pumping.

  1. Alignment of the MINOS FD

    SciTech Connect

    Becker, B.; Boehnlein, D.; /Fermilab

    2004-11-01

    The results and procedure of the alignment of the MINOS Far Detector are presented. The far detector has independent alignments of SM1 and SM2. The misalignments have an estimated uncertainty of {approx}850 {micro}m for SM1 and {approx}750 {micro}m for SM2. The alignment has as inputs the average rotations of U and V as determined by optical survey and strip positions within modules measured from the module mapper. The output of this is a module-module correction for transverse mis-alignments. These results were verified by examining an independent set of data. These alignment constants on average contribute much less then 1% to the total uncertainty in the transverse strip position.

  2. De Novo Genome Assembly of the Economically Important Weed Horseweed Using Integrated Data from Multiple Sequencing Platforms1[C][W][OPEN

    PubMed Central

    Peng, Yanhui; Lai, Zhao; Lane, Thomas; Nageswara-Rao, Madhugiri; Okada, Miki; Jasieniuk, Marie; O’Geen, Henriette; Kim, Ryan W.; Sammons, R. Douglas; Rieseberg, Loren H.; Stewart, C. Neal

    2014-01-01

    Horseweed (Conyza canadensis), a member of the Compositae (Asteraceae) family, was the first broadleaf weed to evolve resistance to glyphosate. Horseweed, one of the most problematic weeds in the world, is a true diploid (2n = 2x = 18), with the smallest genome of any known agricultural weed (335 Mb). Thus, it is an appropriate candidate to help us understand the genetic and genomic bases of weediness. We undertook a draft de novo genome assembly of horseweed by combining data from multiple sequencing platforms (454 GS-FLX, Illumina HiSeq 2000, and PacBio RS) using various libraries with different insertion sizes (approximately 350 bp, 600 bp, 3 kb, and 10 kb) of a Tennessee-accessed, glyphosate-resistant horseweed biotype. From 116.3 Gb (approximately 350× coverage) of data, the genome was assembled into 13,966 scaffolds with 50% of the assembly = 33,561 bp. The assembly covered 92.3% of the genome, including the complete chloroplast genome (approximately 153 kb) and a nearly complete mitochondrial genome (approximately 450 kb in 120 scaffolds). The nuclear genome is composed of 44,592 protein-coding genes. Genome resequencing of seven additional horseweed biotypes was performed. These sequence data were assembled and used to analyze genome variation. Simple sequence repeat and single-nucleotide polymorphisms were surveyed. Genomic patterns were detected that associated with glyphosate-resistant or -susceptible biotypes. The draft genome will be useful to better understand weediness and the evolution of herbicide resistance and to devise new management strategies. The genome will also be useful as another reference genome in the Compositae. To our knowledge, this article represents the first published draft genome of an agricultural weed. PMID:25209985

  3. Alignment-Annotator web server: rendering and annotating sequence alignments.

    PubMed

    Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

    2014-07-01

    Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  4. Alignment-Annotator web server: rendering and annotating sequence alignments

    PubMed Central

    Gille, Christoph; Fähling, Michael; Weyand, Birgit; Wieland, Thomas; Gille, Andreas

    2014-01-01

    Alignment-Annotator is a novel web service designed to generate interactive views of annotated nucleotide and amino acid sequence alignments (i) de novo and (ii) embedded in other software. All computations are performed at server side. Interactivity is implemented in HTML5, a language native to web browsers. The alignment is initially displayed using default settings and can be modified with the graphical user interfaces. For example, individual sequences can be reordered or deleted using drag and drop, amino acid color code schemes can be applied and annotations can be added. Annotations can be made manually or imported (BioDAS servers, the UniProt, the Catalytic Site Atlas and the PDB). Some edits take immediate effect while others require server interaction and may take a few seconds to execute. The final alignment document can be downloaded as a zip-archive containing the HTML files. Because of the use of HTML the resulting interactive alignment can be viewed on any platform including Windows, Mac OS X, Linux, Android and iOS in any standard web browser. Importantly, no plugins nor Java are required and therefore Alignment-Anotator represents the first interactive browser-based alignment visualization. Availability: http://www.bioinformatics.org/strap/aa/ and http://strap.charite.de/aa/. PMID:24813445

  5. Magnetic alignment and the Poisson alignment reference system

    NASA Astrophysics Data System (ADS)

    Griffith, L. V.; Schenz, R. F.; Sommargren, G. E.

    1990-08-01

    Three distinct metrological operations are necessary to align a free-electron laser (FEL): the magnetic axis must be located, a straight line reference (SLR) must be generated, and the magnetic axis must be related to the SLR. This article begins with a review of the motivation for developing an alignment system that will assure better than 100-μm accuracy in the alignment of the magnetic axis throughout an FEL. The 100-μm accuracy is an error circle about an ideal axis for 300 m or more. The article describes techniques for identifying the magnetic axes of solenoids, quadrupoles, and wiggler poles. Propagation of a laser beam is described to the extent of revealing sources of nonlinearity in the beam. Development of a straight-line reference based on the Poisson line, a diffraction effect, is described in detail. Spheres in a large-diameter laser beam create Poisson lines and thus provide a necessary mechanism for gauging between the magnetic axis and the SLR. Procedures for installing FEL components and calibrating alignment fiducials to the magnetic axes of the components are also described. The Poisson alignment reference system should be accurate to 25 μm over 300 m, which is believed to be a factor-of-4 improvement over earlier techniques. An error budget shows that only 25% of the total budgeted tolerance is used for the alignment reference system, so the remaining tolerances should fall within the allowable range for FEL alignment.

  6. Testing the tidal alignment model of galaxy intrinsic alignment

    SciTech Connect

    Blazek, Jonathan; Seljak, Uroš; McQuinn, Matthew E-mail: mmcquinn@berkeley.edu

    2011-05-01

    Weak gravitational lensing has become a powerful probe of large-scale structure and cosmological parameters. Precision weak lensing measurements require an understanding of the intrinsic alignment of galaxy ellipticities, which can in turn inform models of galaxy formation. It is hypothesized that elliptical galaxies align with the background tidal field and that this alignment mechanism dominates the correlation between ellipticities on cosmological scales (in the absence of lensing). We use recent large-scale structure measurements from the Sloan Digital Sky Survey to test this picture with several statistics: (1) the correlation between ellipticity and galaxy overdensity, w{sub g+}; (2) the intrinsic alignment auto-correlation functions; (3) the correlation functions of curl-free, E, and divergence-free, B, modes, the latter of which is zero in the linear tidal alignment theory; (4) the alignment correlation function, w{sub g}(r{sub p},θ), a recently developed statistic that generalizes the galaxy correlation function to account for the angle between the galaxy separation vector and the principle axis of ellipticity. We show that recent measurements are largely consistent with the tidal alignment model and discuss dependence on galaxy luminosity. In addition, we show that at linear order the tidal alignment model predicts that the angular dependence of w{sub g}(r{sub p},θ) is simply w{sub g+}(r{sub p})cos (2θ) and that this dependence is consistent with recent measurements. We also study how stochastic nonlinear contributions to galaxy ellipticity impact these statistics. We find that a significant fraction of the observed LRG ellipticity can be explained by alignment with the tidal field on scales ∼> 10 \\hMpc. These considerations are relevant to galaxy formation and evolution.

  7. Predicted secondary structure for 28S and 18S rRNA from Ichneumonoidea (Insecta: Hymenoptera: Apocrita): impact on sequence alignment and phylogeny estimation.

    PubMed

    Gillespie, Joseph J; Yoder, Matthew J; Wharton, Robert A

    2005-07-01

    We utilize the secondary structural properties of the 28S rRNA D2-D10 expansion segments to hypothesize a multiple sequence alignment for major lineages of the hymenopteran superfamily Ichneumonoidea (Braconidae, Ichneumonidae). The alignment consists of 290 sequences (originally analyzed in Belshaw and Quicke, Syst Biol 51:450-477, 2002) and provides the first global alignment template for this diverse group of insects. Predicted structures for these expansion segments as well as for over half of the 18S rRNA are given, with highly variable regions characterized and isolated within conserved structures. We demonstrate several pitfalls of optimization alignment and illustrate how these are potentially addressed with structure-based alignments. Our global alignment is presented online at (http://hymenoptera.tamu.edu/rna) with summary statistics, such as basepair frequency tables, along with novel tools for parsing structure-based alignments into input files for most commonly used phylogenetic software. These resources will be valuable for hymenopteran systematists, as well as researchers utilizing rRNA sequences for phylogeny estimation in any taxon. We explore the phylogenetic utility of our structure-based alignment by examining a subset of the data under a variety of optimality criteria using results from Belshaw and Quicke (2002) as a benchmark.

  8. DR-TAMAS: Diffeomorphic Registration for Tensor Accurate alignMent of Anatomical Structures

    PubMed Central

    Irfanoglu, M. Okan; Nayak, Amritha; Jenkins, Jeffrey; Hutchinson, Elizabeth B.; Sadeghi, Neda; Thomas, Cibu P.; Pierpaoli, Carlo

    2016-01-01

    In this work, we propose DR-TAMAS (Diffeomorphic Registration for Tensor Accurate alignMent of Anatomical Structures), a novel framework for intersubject registration of Diffusion Tensor Imaging (DTI) data sets. This framework is optimized for brain data and its main goal is to achieve an accurate alignment of all brain structures, including white matter (WM), gray matter (GM), and spaces containing cerebrospinal fluid (CSF). Currently most DTI-based spatial normalization algorithms emphasize alignment of anisotropic structures. While some diffusion-derived metrics, such as diffusion anisotropy and tensor eigenvector orientation, are highly informative for proper alignment of WM, other tensor metrics such as the trace or mean diffusivity (MD) are fundamental for a proper alignment of GM and CSF boundaries. Moreover, it is desirable to include information from structural MRI data, e.g., T1-weighted or T2-weighted images, which are usually available together with the diffusion data. The fundamental property of DR-TAMAS is to achieve global anatomical accuracy by incorporating in its cost function the most informative metrics locally. Another important feature of DR-TAMAS is a symmetric time-varying velocity-based transformation model, which enables it to account for potentially large anatomical variability in healthy subjects and patients. The performance of DR-TAMAS is evaluated with several data sets and compared with other widely-used diffeomorphic image registration techniques employing both full tensor information and/or DTI-derived scalar maps. Our results show that the proposed method has excellent overall performance in the entire brain, while being equivalent to the best existing methods in WM. PMID:26931817

  9. ANTICALIgN: visualizing, editing and analyzing combined nucleotide and amino acid sequence alignments for combinatorial protein engineering.

    PubMed

    Jarasch, Alexander; Kopp, Melanie; Eggenstein, Evelyn; Richter, Antonia; Gebauer, Michaela; Skerra, Arne

    2016-07-01

    ANTIC ALIGN: is an interactive software developed to simultaneously visualize, analyze and modify alignments of DNA and/or protein sequences that arise during combinatorial protein engineering, design and selection. ANTIC ALIGN: combines powerful functions known from currently available sequence analysis tools with unique features for protein engineering, in particular the possibility to display and manipulate nucleotide sequences and their translated amino acid sequences at the same time. ANTIC ALIGN: offers both template-based multiple sequence alignment (MSA), using the unmutated protein as reference, and conventional global alignment, to compare sequences that share an evolutionary relationship. The application of similarity-based clustering algorithms facilitates the identification of duplicates or of conserved sequence features among a set of selected clones. Imported nucleotide sequences from DNA sequence analysis are automatically translated into the corresponding amino acid sequences and displayed, offering numerous options for selecting reading frames, highlighting of sequence features and graphical layout of the MSA. The MSA complexity can be reduced by hiding the conserved nucleotide and/or amino acid residues, thus putting emphasis on the relevant mutated positions. ANTIC ALIGN: is also able to handle suppressed stop codons or even to incorporate non-natural amino acids into a coding sequence. We demonstrate crucial functions of ANTIC ALIGN: in an example of Anticalins selected from a lipocalin random library against the fibronectin extradomain B (ED-B), an established marker of tumor vasculature. Apart from engineered protein scaffolds, ANTIC ALIGN: provides a powerful tool in the area of antibody engineering and for directed enzyme evolution. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. BAYESIAN PROTEIN STRUCTURE ALIGNMENT1

    PubMed Central

    RODRIGUEZ, ABEL; SCHMIDLER, SCOTT C.

    2015-01-01

    The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key “gap” parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence–structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples. PMID:26925188

  11. Orbit Alignment in Triple Stars

    NASA Astrophysics Data System (ADS)

    Tokovinin, Andrei

    2017-08-01

    The statistics of the angle Φ between orbital angular momenta in hierarchical triple systems with known inner visual or astrometric orbits are studied. A correlation between apparent revolution directions proves the partial orbit alignment known from earlier works. The alignment is strong in triples with outer projected separation less than ∼50 au, where the average Φ is about 20^\\circ . In contrast, outer orbits wider than 1000 au are not aligned with the inner orbits. It is established that the orbit alignment decreases with the increasing mass of the primary component. The average eccentricity of inner orbits in well-aligned triples is smaller than in randomly aligned ones. These findings highlight the role of dissipative interactions with gas in defining the orbital architecture of low-mass triple systems. On the other hand, chaotic dynamics apparently played a role in shaping more massive hierarchies. The analysis of projected configurations and triples with known inner and outer orbits indicates that the distribution of Φ is likely bimodal, where 80% of triples have {{Φ }}< 70^\\circ and the remaining ones are randomly aligned.

  12. Multi-Harmony: detecting functional specificity from sequence alignment.

    PubMed

    Brandt, Bernd W; Feenstra, K Anton; Heringa, Jaap

    2010-07-01

    Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein-protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww.

  13. Multi-Harmony: detecting functional specificity from sequence alignment

    PubMed Central

    Brandt, Bernd W.; Feenstra, K. Anton; Heringa, Jaap

    2010-01-01

    Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein–protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww. PMID:20525785

  14. Recursive dynamic programming for adaptive sequence and structure alignment

    SciTech Connect

    Thiele, R.; Zimmer, R.; Lengauer, T.

    1995-12-31

    We propose a new alignment procedure that is capable of aligning protein sequences and structures in a unified manner. Recursive dynamic programming (RDP) is a hierarchical method which, on each level of the hierarchy, identifies locally optimal solutions and assembles them into partial alignments of sequences and/or structures. In contrast to classical dynamic programming, RDP can also handle alignment problems that use objective functions not obeying the principle of prefix optimality, e.g. scoring schemes derived from energy potentials of mean force. For such alignment problems, RDP aims at computing solutions that are near-optimal with respect to the involved cost function and biologically meaningful at the same time. Towards this goal, RDP maintains a dynamic balance between different factors governing alignment fitness such as evolutionary relationships and structural preferences. As in the RDP method gaps are not scored explicitly, the problematic assignment of gap cost parameters is circumvented. In order to evaluate the RDP approach we analyse whether known and accepted multiple alignments based on structural information can be reproduced with the RDP method.

  15. Aligning for Innovation - Alignment Strategy to Drive Innovation

    NASA Technical Reports Server (NTRS)

    Johnson, Hurel; Teltschik, David; Bussey, Horace, Jr.; Moy, James

    2010-01-01

    With the sudden need for innovation that will help the country achieve its long-term space exploration objectives, the question of whether NASA is aligned effectively to drive the innovation that it so desperately needs to take space exploration to the next level should be entertained. Authors such as Robert Kaplan and David North have noted that companies that use a formal system for implementing strategy consistently outperform their peers. They have outlined a six-stage management systems model for implementing strategy, which includes the aligning of the organization towards its objectives. This involves the alignment of the organization from the top down. This presentation will explore the impacts of existing U.S. industrial policy on technological innovation; assess the current NASA organizational alignment and its impacts on driving technological innovation; and finally suggest an alternative approach that may drive the innovation needed to take the world to the next level of space exploration, with NASA truly leading the way.

  16. Aligning for Innovation - Alignment Strategy to Drive Innovation

    NASA Technical Reports Server (NTRS)

    Johnson, Hurel; Teltschik, David; Bussey, Horace, Jr.; Moy, James

    2010-01-01

    With the sudden need for innovation that will help the country achieve its long-term space exploration objectives, the question of whether NASA is aligned effectively to drive the innovation that it so desperately needs to take space exploration to the next level should be entertained. Authors such as Robert Kaplan and David North have noted that companies that use a formal system for implementing strategy consistently outperform their peers. They have outlined a six-stage management systems model for implementing strategy, which includes the aligning of the organization towards its objectives. This involves the alignment of the organization from the top down. This presentation will explore the impacts of existing U.S. industrial policy on technological innovation; assess the current NASA organizational alignment and its impacts on driving technological innovation; and finally suggest an alternative approach that may drive the innovation needed to take the world to the next level of space exploration, with NASA truly leading the way.

  17. QOMA: quasi-optimal multiple alignment of protein sequences.

    PubMed

    Zhang, Xu; Kahveci, Tamer

    2007-01-15

    We consider the problem of multiple alignment of protein sequences with the goal of achieving a large SP (Sum-of-Pairs) score. We introduce a new graph-based method. We name our method QOMA (Quasi-Optimal Multiple Alignment). QOMA starts with an initial alignment. It represents this alignment using a K-partite graph. It then improves the SP score of the initial alignment through local optimizations within a window that moves greedily on the alignment. QOMA uses two parameters to permit flexibility in time/accuracy trade off: (1) The size of the window for local optimization. (2) The sparsity of the K-partite graph. Unlike traditional progressive methods, QOMA is independent of the order of sequences. The experimental results on BAliBASE benchmarks show that QOMA produces higher SP score than the existing tools including ClustalW, Probcons, Muscle, T-Coffee and DCA. The difference is more significant for distant proteins. The software is available from the authors upon request.

  18. Fusion bonding and alignment fixture

    DOEpatents

    Ackler, Harold D.; Swierkowski, Stefan P.; Tarte, Lisa A.; Hicks, Randall K.

    2000-01-01

    An improved vacuum fusion bonding structure and process for aligned bonding of large area glass plates, patterned with microchannels and access holes and slots, for elevated glass fusion temperatures. Vacuum pumpout of all the components is through the bottom platform which yields an untouched, defect free top surface which greatly improves optical access through this smooth surface. Also, a completely non-adherent interlayer, such as graphite, with alignment and location features is located between the main steel platform and the glass plate pair, which makes large improvements in quality, yield, and ease of use, and enables aligned bonding of very large glass structures.

  19. Alignment and nonlinear elasticity in biopolymer gels.

    PubMed

    Feng, Jingchen; Levine, Herbert; Mao, Xiaoming; Sander, Leonard M

    2015-04-01

    We present a Landau-type theory for the nonlinear elasticity of biopolymer gels with a part of the order parameter describing induced nematic order of fibers in the gel. We attribute the nonlinear elastic behavior of these materials to fiber alignment induced by strain. We suggest an application to contact guidance of cell motility in tissue. We compare our theory to simulation of a disordered lattice model for biopolymers. We treat homogeneous deformations such as simple shear, hydrostatic expansion, and simple extension, and obtain good agreement between theory and simulation. We also consider a localized perturbation which is a simple model for a contracting cell in a medium.

  20. Hardware accelerator for genomic sequence alignment.

    PubMed

    Chiang, Jason; Studniberg, Michael; Shaw, Jack; Seto, Shaw; Truong, Kevin

    2006-01-01

    To infer homology and subsequently gene function, the Smith-Waterman algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain billions of sequences, this algorithm becomes computationally expensive. Consequently, in this paper, we focused on accelerating the Smith-Waterman algorithm by modifying the computationally repeated portion of the algorithm by FPGA hardware custom instructions. These simple modifications accelerated the algorithm runtime by an average of 287% compared to the pure software implementation. Therefore, further design of FPGA accelerated hardware offers a promising direction to seeking runtime improvement of genomic database searching.

  1. Magnetic axis alignment and the Poisson alignment reference system

    NASA Astrophysics Data System (ADS)

    Griffith, Lee V.; Schenz, Richard F.; Sommargren, Gary E.

    1989-01-01

    Three distinct metrological operations are necessary to align a free-electron laser (FEL): the magnetic axis must be located, a straight line reference (SLR) must be generated, and the magnetic axis must be related to the SLR. This paper begins with a review of the motivation for developing an alignment system that will assure better than 100 micrometer accuracy in the alignment of the magnetic axis throughout an FEL. The paper describes techniques for identifying the magnetic axis of solenoids, quadrupoles, and wiggler poles. Propagation of a laser beam is described to the extent of revealing sources of nonlinearity in the beam. Development and use of the Poisson line, a diffraction effect, is described in detail. Spheres in a large-diameter laser beam create Poisson lines and thus provide a necessary mechanism for gauging between the magnetic axis and the SLR. Procedures for installing FEL components and calibrating alignment fiducials to the magnetic axes of the components are also described. An error budget shows that the Poisson alignment reference system will make it possible to meet the alignment tolerances for an FEL.

  2. Progressive alignment of genomic signals by multiple dynamic time warping.

    PubMed

    Skutkova, Helena; Vitek, Martin; Sedlar, Karel; Provaznik, Ivo

    2015-11-21

    This paper presents the utilization of progressive alignment principle for positional adjustment of a set of genomic signals with different lengths. The new method of multiple alignment of signals based on dynamic time warping is tested for the purpose of evaluating the similarity of different length genes in phylogenetic studies. Two sets of phylogenetic markers were used to demonstrate the effectiveness of the evaluation of intraspecies and interspecies genetic variability. The part of the proposed method is modification of pairwise alignment of two signals by dynamic time warping with using correlation in a sliding window. The correlation based dynamic time warping allows more accurate alignment dependent on local homologies in sequences without the need of scoring matrix or evolutionary models, because mutual similarities of residues are included in the numerical code of signals.

  3. Laser beam alignment apparatus and method

    DOEpatents

    Gruhn, Charles R.; Hammond, Robert B.

    1981-01-01

    The disclosure relates to an apparatus and method for laser beam alignment. Thermoelectric properties of a disc in a laser beam path are used to provide an indication of beam alignment and/or automatic laser alignment.

  4. Laser beam alignment apparatus and method

    DOEpatents

    Gruhn, C.R.; Hammond, R.B.

    The disclosure related to an apparatus and method for laser beam alignment. Thermoelectric properties of a disc in a laser beam path are used to provide an indication of beam alignment and/or automatic laser alignment.

  5. Visual attitude orientation and alignment system

    NASA Technical Reports Server (NTRS)

    Beam, R. A.; Morris, D. B.

    1967-01-01

    Active vehicle optical alignment aid and a passive vehicle three-dimensional alignment target ensure proper orientation and alignment plus control of the closure range and rate between two bodies, one in controlled motion and one at rest.

  6. Protein structure alignment beyond spatial proximity.

    PubMed

    Wang, Sheng; Ma, Jianzhu; Peng, Jian; Xu, Jinbo

    2013-01-01

    Protein structure alignment is a fundamental problem in computational structure biology. Many programs have been developed for automatic protein structure alignment, but most of them align two protein structures purely based upon geometric similarity without considering evolutionary and functional relationship. As such, these programs may generate structure alignments which are not very biologically meaningful from the evolutionary perspective. This paper presents a novel method DeepAlign for automatic pairwise protein structure alignment. DeepAlign aligns two protein structures using not only spatial proximity of equivalent residues (after rigid-body superposition), but also evolutionary relationship and hydrogen-bonding similarity. Experimental results show that DeepAlign can generate structure alignments much more consistent with manually-curated alignments than other automatic tools especially when proteins under consideration are remote homologs. These results imply that in addition to geometric similarity, evolutionary information and hydrogen-bonding similarity are essential to aligning two protein structures.

  7. Theory of grain alignment in molecular clouds

    NASA Technical Reports Server (NTRS)

    Roberge, Wayne G.

    1993-01-01

    Research accomplishments are presented and include the following: (1) mathematical theory of grain alignment; (2) super-paramagnetic alignment of molecular cloud grains; and (3) theory of grain alignment by ambipolar diffusion.

  8. Net2Align: An Algorithm For Pairwise Global Alignment of Biological Networks

    PubMed Central

    Wadhwab, Gulshan; Upadhyayaa, K. C.

    2016-01-01

    The amount of data on molecular interactions is growing at an enormous pace, whereas the progress of methods for analysing this data is still lacking behind. Particularly, in the area of comparative analysis of biological networks, where one wishes to explore the similarity between two biological networks, this holds a potential problem. In consideration that the functionality primarily runs at the network level, it advocates the need for robust comparison methods. In this paper, we describe Net2Align, an algorithm for pairwise global alignment that can perform node-to-node correspondences as well as edge-to-edge correspondences into consideration. The uniqueness of our algorithm is in the fact that it is also able to detect the type of interaction, which is essential in case of directed graphs. The existing algorithm is only able to identify the common nodes but not the common edges. Another striking feature of the algorithm is that it is able to remove duplicate entries in case of variable datasets being aligned. This is achieved through creation of a local database which helps exclude duplicate links. In a pervasive computational study on gene regulatory network, we establish that our algorithm surpasses its counterparts in its results. Net2Align has been implemented in Java 7 and the source code is available as supplementary files. PMID:28356678

  9. Fixture for aligning motor assembly

    SciTech Connect

    Shervington, Roger M.; Vaghani, Vallabh V.; Vanek, Laurence D.; Christensen, Scott A.

    2009-12-08

    An alignment fixture includes a rotor fixture, a stator fixture and a sensor system which measures a rotational displacement therebetween. The fixture precisely measures rotation of a generator stator assembly away from a NULL position referenced by a unique reference spline on the rotor shaft. By providing an adjustable location of the stator assembly within the housing, the magnetic axes within each generator shall be aligned to a predetermined and controlled tolerance between the generator interface mounting pin and the reference spline on the rotor shaft. Once magnetically aligned, each generator is essentially a line replaceable unit which may be readily mounted to any input of a multi-generator gearbox assembly with the assurance that the magnetic alignment will be within a predetermined tolerance.

  10. Stellar Alignments - Identification and Analysis

    NASA Astrophysics Data System (ADS)

    Ruggles, Clive L. N.

    Fortuitous stellar alignments can be fitted to structural orientations with relative ease by the unwary. Nonetheless, cautious approaches taking into account a broader range of cultural evidence, as well as paying due attention to potential methodological pitfalls, have been successful in identifying credible stellar alignments—and constructing plausible assessments of their cultural significance—in a variety of circumstances. These range from single instances of alignments upon particular asterisms where the corroborating historical or ethnographic evidence is strong to repeated instances of oriented structures with only limited independent cultural information but where systematic, data-driven approaches can be productive. In the majority of cases, the identification and interpretation of putative stellar alignments relates to groups of similar monuments or complex single sites and involves a balance between systematic studies of the alignments themselves, backed up by statistical analysis where appropriate, and the consideration of a range of contextual evidence, either derived from the archaeological record alone or from other relevant sources.

  11. RF Jitter Modulation Alignment Sensing

    NASA Astrophysics Data System (ADS)

    Ortega, L. F.; Fulda, P.; Diaz-Ortiz, M.; Perez Sanchez, G.; Ciani, G.; Voss, D.; Mueller, G.; Tanner, D. B.

    2017-01-01

    We will present the numerical and experimental results of a new alignment sensing scheme which can reduce the complexity of alignment sensing systems currently used, while maintaining the same shot noise limited sensitivity. This scheme relies on the ability of electro-optic beam deflectors to create angular modulation sidebands in radio frequency, and needs only a single-element photodiode and IQ demodulation to generate error signals for tilt and translation degrees of freedom in one dimension. It distances itself from current techniques by eliminating the need for beam centering servo systems, quadrant photodetectors and Gouy phase telescopes. RF Jitter alignment sensing can be used to reduce the complexity in the alignment systems of many laser optical experiments, including LIGO and the ALPS experiment.

  12. Webb Instrument Undergoes Alignment Testing

    NASA Image and Video Library

    2011-08-18

    The Mid-Infrared Instrument, a component of NASA James Webb Space Telescope, underwent alignment testing at the Science and Technology Facilities Council Rutherford Appleton Laboratory Space in Oxfordshire, England.

  13. Well-pump alignment system

    DOEpatents

    Drumheller, D.S.

    1998-10-20

    An improved well-pump for geothermal wells, an alignment system for a well-pump, and to a method for aligning a rotor and stator within a well-pump are disclosed, wherein the well-pump has a whistle assembly formed at a bottom portion thereof, such that variations in the frequency of the whistle, indicating misalignment, may be monitored during pumping. 6 figs.

  14. National Ignition Facility system alignment.

    PubMed

    Burkhart, S C; Bliss, E; Di Nicola, P; Kalantar, D; Lowe-Webb, R; McCarville, T; Nelson, D; Salmon, T; Schindler, T; Villanueva, J; Wilhelmsen, K

    2011-03-10

    The National Ignition Facility (NIF) is the world's largest optical instrument, comprising 192 37 cm square beams, each generating up to 9.6 kJ of 351 nm laser light in a 20 ns beam precisely tailored in time and spectrum. The Facility houses a massive (10 m diameter) target chamber within which the beams converge onto an ∼1 cm size target for the purpose of creating the conditions needed for deuterium/tritium nuclear fusion in a laboratory setting. A formidable challenge was building NIF to the precise requirements for beam propagation, commissioning the beam lines, and engineering systems to reliably and safely align 192 beams within the confines of a multihour shot cycle. Designing the facility to minimize drift and vibration, placing the optical components in their design locations, commissioning beam alignment, and performing precise system alignment are the key alignment accomplishments over the decade of work described herein. The design and positioning phases placed more than 3000 large (2.5 m×2 m×1 m) line-replaceable optics assemblies to within ±1 mm of design requirement. The commissioning and alignment phases validated clear apertures (no clipping) for all beam lines, and demonstrated automated laser alignment within 10 min and alignment to target chamber center within 44 min. Pointing validation system shots to flat gold-plated x-ray emitting targets showed NIF met its design requirement of ±50 μm rms beam pointing to target chamber. Finally, this paper describes the major alignment challenges faced by the NIF Project from inception to present, and how these challenges were met and solved by the NIF design and commissioning teams.

  15. Automated whole-genome multiple alignment of rat, mouse, and human

    SciTech Connect

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  16. Binocular collimation vs conditional alignment

    NASA Astrophysics Data System (ADS)

    Cook, William J.

    2012-10-01

    As binocular enthusiasts share their passion, topics related to collimation abound. Typically, we find how observers, armed only with a jeweler's screwdriver, can "perfectly collimate" his or her binocular, make it "spot on," or other verbiage of similar connotation. Unfortunately, what most are addressing is a form of pseudo-collimation I have referred to since the mid-1970s as "Conditional Alignment." Ignoring the importance of the mechanical axis (hinge) in the alignment process, this "condition," while having the potential to make alignment serviceable, or even outstanding—within a small range of IPD (Interpupillary Distance) settings relative to the user's spatial accommodation (the ability to accept small errors in parallelism of the optical axes)—may take the instrument farther from the 3-axis collimation conscientious manufacturers seek to implement. Becoming more optically savvy—and especially with so many mechanically inferior binoculars entering the marketplace— the consumer contemplating self-repair and alignment has a need to understand the difference between clinical, 3-axis "collimation" (meaning both optical axes are parallel with the axis of the hinge) and "conditional alignment," as differentiated in this paper. Furthermore, I believe there has been a long-standing need for the term "Conditional Alignment," or some equivalent, to be accepted as part of the vernacular of those who use binoculars extensively, whether for professional or recreational activities. Achieving that acceptance is the aim of this paper.

  17. Projection-Based Volume Alignment

    PubMed Central

    Yu, Lingbo; Snapp, Robert R.; Ruiz, Teresa; Radermacher, Michael

    2013-01-01

    When heterogeneous samples of macromolecular assemblies are being examined by 3D electron microscopy (3DEM), often multiple reconstructions are obtained. For example, subtomograms of individual particles can be acquired from tomography, or volumes of multiple 2D classes can be obtained by random conical tilt reconstruction. Of these, similar volumes can be averaged to achieve higher resolution. Volume alignment is an essential step before 3D classification and averaging. Here we present a projection-based volume alignment (PBVA) algorithm. We select a set of projections to represent the reference volume and align them to a second volume. Projection alignment is achieved by maximizing the cross-correlation function with respect to rotation and translation parameters. If data are missing, the cross-correlation functions are normalized accordingly. Accurate alignments are obtained by averaging and quadratic interpolation of the cross-correlation maximum. Comparisons of the computation time between PBVA and traditional 3D cross-correlation methods demonstrate that PBVA outperforms the traditional methods. Performance tests were carried out with different signal-to-noise ratios using modeled noise and with different percentages of missing data using a cryo-EM dataset. All tests show that the algorithm is robust and highly accurate. PBVA was applied to align the reconstructions of a subcomplex of the NADH: ubiquinone oxidoreductase (Complex I) from the yeast Yarrowia lipolytica, followed by classification and averaging. PMID:23410725

  18. Calibration of shaft alignment instruments

    NASA Astrophysics Data System (ADS)

    Hemming, Bjorn

    1998-09-01

    Correct shaft alignment is vital for most rotating machines. Several shaft alignment instruments, ranging form dial indicator based to laser based, are commercially available. At VTT Manufacturing Technology a device for calibration of shaft alignment instruments was developed during 1997. A feature of the developed device is the similarity to the typical use of shaft alignment instruments i.e. the rotation of two shafts during the calibration. The benefit of the rotation is that all errors of the shaft alignment instrument, for example the deformations of the suspension bars, are included. However, the rotation increases significantly the uncertainty of calibration because of errors in the suspension of the shafts in the developed device for calibration of shaft alignment instruments. Without rotation the uncertainty of calibration is 0.001 mm for the parallel offset scale and 0,003 mm/m for the angular scale. With rotation the uncertainty of calibration is 0.002 mm for the scale and 0.004 mm/m for the angular scale.

  19. Parameterizing sequence alignment with an explicit evolutionary model.

    PubMed

    Rivas, Elena; Eddy, Sean R

    2015-12-10

    Inference of sequence homology is inherently an evolutionary question, dependent upon evolutionary divergence. However, the insertion and deletion penalties in the most widely used methods for inferring homology by sequence alignment, including BLAST and profile hidden Markov models (profile HMMs), are not based on any explicitly time-dependent evolutionary model. Using one fixed score system (BLOSUM62 with some gap open/extend costs, for example) corresponds to making an unrealistic assumption that all sequence relationships have diverged by the same time. Adoption of explicit time-dependent evolutionary models for scoring insertions and deletions in sequence alignments has been hindered by algorithmic complexity and technical difficulty. We identify and implement several probabilistic evolutionary models compatible with the affine-cost insertion/deletion model used in standard pairwise sequence alignment. Assuming an affine gap cost imposes important restrictions on the realism of the evolutionary models compatible with it, as single insertion events with geometrically distributed lengths do not result in geometrically distributed insert lengths at finite times. Nevertheless, we identify one evolutionary model compatible with symmetric pair HMMs that are the basis for Smith-Waterman pairwise alignment, and two evolutionary models compatible with standard profile-based alignment. We test different aspects of the performance of these "optimized branch length" models, including alignment accuracy and homology coverage (discrimination of residues in a homologous region from nonhomologous flanking residues). We test on benchmarks of both global homologies (full length sequence homologs) and local homologies (homologous subsequences embedded in nonhomologous sequence). Contrary to our expectations, we find that for global homologies a single long branch parameterization suffices both for distant and close homologous relationships. In contrast, we do see an advantage in

  20. ConBind: motif-aware cross-species alignment for the identification of functional transcription factor binding sites.

    PubMed

    Lelieveld, Stefan H; Schütte, Judith; Dijkstra, Maurits J J; Bawono, Punto; Kinston, Sarah J; Göttgens, Berthold; Heringa, Jaap; Bonzanni, Nicola

    2016-05-05

    Eukaryotic gene expression is regulated by transcription factors (TFs) binding to promoter as well as distal enhancers. TFs recognize short, but specific binding sites (TFBSs) that are located within the promoter and enhancer regions. Functionally relevant TFBSs are often highly conserved during evolution leaving a strong phylogenetic signal. While multiple sequence alignment (MSA) is a potent tool to detect the phylogenetic signal, the current MSA implementations are optimized to align the maximum number of identical nucleotides. This approach might result in the omission of conserved motifs that contain interchangeable nucleotides such as the ETS motif (IUPAC code: GGAW). Here, we introduce ConBind, a novel method to enhance alignment of short motifs, even if their mutual sequence similarity is only partial. ConBind improves the identification of conserved TFBSs by improving the alignment accuracy of TFBS families within orthologous DNA sequences. Functional validation of the Gfi1b + 13 enhancer reveals that ConBind identifies additional functionally important ETS binding sites that were missed by all other tested alignment tools. In addition to the analysis of known regulatory regions, our web tool is useful for the analysis of TFBSs on so far unknown DNA regions identified through ChIP-sequencing. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  1. ConBind: motif-aware cross-species alignment for the identification of functional transcription factor binding sites

    PubMed Central

    Lelieveld, Stefan H.; Schütte, Judith; Dijkstra, Maurits J.J.; Bawono, Punto; Kinston, Sarah J.; Göttgens, Berthold; Heringa, Jaap; Bonzanni, Nicola

    2016-01-01

    Eukaryotic gene expression is regulated by transcription factors (TFs) binding to promoter as well as distal enhancers. TFs recognize short, but specific binding sites (TFBSs) that are located within the promoter and enhancer regions. Functionally relevant TFBSs are often highly conserved during evolution leaving a strong phylogenetic signal. While multiple sequence alignment (MSA) is a potent tool to detect the phylogenetic signal, the current MSA implementations are optimized to align the maximum number of identical nucleotides. This approach might result in the omission of conserved motifs that contain interchangeable nucleotides such as the ETS motif (IUPAC code: GGAW). Here, we introduce ConBind, a novel method to enhance alignment of short motifs, even if their mutual sequence similarity is only partial. ConBind improves the identification of conserved TFBSs by improving the alignment accuracy of TFBS families within orthologous DNA sequences. Functional validation of the Gfi1b + 13 enhancer reveals that ConBind identifies additional functionally important ETS binding sites that were missed by all other tested alignment tools. In addition to the analysis of known regulatory regions, our web tool is useful for the analysis of TFBSs on so far unknown DNA regions identified through ChIP-sequencing. PMID:26721389

  2. Cavity alignment using fringe scanning

    NASA Astrophysics Data System (ADS)

    Sinkunaite, Laura Paulina; Kawabe, Keita; Landry, Michael

    2017-01-01

    LIGO employs two 4-km long Fabry-Pérot arm cavities, which need to be aligned in order for an interferometer to be locked on a TEM00 mode. Once the cavity is locked, alignment signals can be derived from wave-front sensors which measure the TEM01 mode content. However, the alignment state is not always good enough for locking on TEM00. Even when this is the case, the alignment can be evaluated using a free swinging cavity, that shows flashes when higher-order modes become resonant. By moving test masses, small changes are made to the mirror orientation, and hence the TEM00 mode can be optimized iteratively. Currently, this is a manual procedure, and thus it is very time-consuming. Therefore, this project is aimed to study another possible way to lock the cavity on the TEM00 mode. Misalignment information can also be extracted from the power of the higher-order modes transmitted through the cavity. This talk will present an algorithm for this alternative and faster way to derive the alignment state of the arm cavities. Supported by APS FIP, NSF, and Caltech SFP.

  3. Optimizing a global alignment of protein interaction networks

    PubMed Central

    Chindelevitch, Leonid; Ma, Cheng-Yu; Liao, Chung-Shou; Berger, Bonnie

    2013-01-01

    Motivation: The global alignment of protein interaction networks is a widely studied problem. It is an important first step in understanding the relationship between the proteins in different species and identifying functional orthologs. Furthermore, it can provide useful insights into the species’ evolution. Results: We propose a novel algorithm, PISwap, for optimizing global pairwise alignments of protein interaction networks, based on a local optimization heuristic that has previously demonstrated its effectiveness for a variety of other intractable problems. PISwap can begin with different types of network alignment approaches and then iteratively adjust the initial alignments by incorporating network topology information, trading it off for sequence information. In practice, our algorithm efficiently refines other well-studied alignment techniques with almost no additional time cost. We also show the robustness of the algorithm to noise in protein interaction data. In addition, the flexible nature of this algorithm makes it suitable for different applications of network alignment. This algorithm can yield interesting insights into the evolutionary dynamics of related species. Availability: Our software is freely available for non-commercial purposes from our Web site, http://piswap.csail.mit.edu/. Contact: bab@csail.mit.edu or csliao@ie.nthu.edu.tw Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24048352

  4. CORAL: aligning conserved core regions across domain families.

    PubMed

    Fong, Jessica H; Marchler-Bauer, Aron

    2009-08-01

    Homologous protein families share highly conserved sequence and structure regions that are frequent targets for comparative analysis of related proteins and families. Many protein families, such as the curated domain families in the Conserved Domain Database (CDD), exhibit similar structural cores. To improve accuracy in aligning such protein families, we propose a profile-profile method CORAL that aligns individual core regions as gap-free units. CORAL computes optimal local alignment of two profiles with heuristics to preserve continuity within core regions. We benchmarked its performance on curated domains in CDD, which have pre-defined core regions, against COMPASS, HHalign and PSI-BLAST, using structure superpositions and comprehensive curator-optimized alignments as standards of truth. CORAL improves alignment accuracy on core regions over general profile methods, returning a balanced score of 0.57 for over 80% of all domain families in CDD, compared with the highest balanced score of 0.45 from other methods. Further, CORAL provides E-values to aid in detecting homologous protein families and, by respecting block boundaries, produces alignments with improved 'readability' that facilitate manual refinement. CORAL will be included in future versions of the NCBI Cn3D/CDTree software, which can be downloaded at http://www.ncbi.nlm.nih.gov/Structure/cdtree/cdtree.shtml. Supplementary data are available at Bioinformatics online.

  5. CORAL: aligning conserved core regions across domain families

    PubMed Central

    Fong, Jessica H.; Marchler-Bauer, Aron

    2009-01-01

    Motivation: Homologous protein families share highly conserved sequence and structure regions that are frequent targets for comparative analysis of related proteins and families. Many protein families, such as the curated domain families in the Conserved Domain Database (CDD), exhibit similar structural cores. To improve accuracy in aligning such protein families, we propose a profile–profile method CORAL that aligns individual core regions as gap-free units. Results: CORAL computes optimal local alignment of two profiles with heuristics to preserve continuity within core regions. We benchmarked its performance on curated domains in CDD, which have pre-defined core regions, against COMPASS, HHalign and PSI-BLAST, using structure superpositions and comprehensive curator-optimized alignments as standards of truth. CORAL improves alignment accuracy on core regions over general profile methods, returning a balanced score of 0.57 for over 80% of all domain families in CDD, compared with the highest balanced score of 0.45 from other methods. Further, CORAL provides E-values to aid in detecting homologous protein families and, by respecting block boundaries, produces alignments with improved ‘readability’ that facilitate manual refinement. Availability: CORAL will be included in future versions of the NCBI Cn3D/CDTree software, which can be downloaded at http://www.ncbi.nlm.nih.gov/Structure/cdtree/cdtree.shtml. Contact: fongj@ncbi.nlm.nih.gov. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19470584

  6. Feature-based Alignment of Volumetric Multi-modal Images

    PubMed Central

    Toews, Matthew; Zöllei, Lilla; Wells, William M.

    2014-01-01

    This paper proposes a method for aligning image volumes acquired from different imaging modalities (e.g. MR, CT) based on 3D scale-invariant image features. A novel method for encoding invariant feature geometry and appearance is developed, based on the assumption of locally linear intensity relationships, providing a solution to poor repeatability of feature detection in different image modalities. The encoding method is incorporated into a probabilistic feature-based model for multi-modal image alignment. The model parameters are estimated via a group-wise alignment algorithm, that iteratively alternates between estimating a feature-based model from feature data, then realigning feature data to the model, converging to a stable alignment solution with few pre-processing or pre-alignment requirements. The resulting model can be used to align multi-modal image data with the benefits of invariant feature correspondence: globally optimal solutions, high efficiency and low memory usage. The method is tested on the difficult RIRE data set of CT, T1, T2, PD and MP-RAGE brain images of subjects exhibiting significant inter-subject variability due to pathology. PMID:24683955

  7. Direct numerical simulation of particle alignment in viscoelastic fluids

    NASA Astrophysics Data System (ADS)

    Hulsen, Martien; Jaensson, Nick; Anderson, Patrick

    2016-11-01

    Rigid particles suspended in viscoelastic fluids under shear can align in string-like structures in flow direction. To unravel this phenomenon, we present 3D direct numerical simulations of the alignment of two and three rigid, non-Brownian particles in a shear flow of a viscoelastic fluid. The equations are solved on moving, boundary-fitted meshes, which are locally refined to accurately describe the polymer stresses around and in between the particles. A small minimal gap size between the particles is introduced. The Giesekus model is used and the effect of the Weissenberg number, shear thinning and solvent viscosity is investigated. Alignment of two and three particles is observed. Morphology plots have been created for various combinations of fluid parameters. Alignment is mainly governed by the value of the elasticity parameter S, defined as half of the ratio between the first normal stress difference and shear stress of the suspending fluid. Alignment appears to occur above a critical value of S, which decreases with increasing shear thinning. This result, together with simulations of a shear-thinning Carreau fluid, leads us to the conclusion that normal stress differences are essential for particle alignment to occur, but it is also strongly promoted by shear thinning.

  8. Multiple structural alignment and clustering of RNA sequences.

    PubMed

    Torarinsson, Elfar; Havgaard, Jakob H; Gorodkin, Jan

    2007-04-15

    An apparent paradox in computational RNA structure prediction is that many methods, in advance, require a multiple alignment of a set of related sequences, when searching for a common structure between them. However, such a multiple alignment is hard to obtain even for few sequences with low sequence similarity without simultaneously folding and aligning them. Furthermore, it is of interest to conduct a multiple alignment of RNA sequence candidates found from searching as few as two genomic sequences. Here, based on the PMcomp program, we present a global multiple alignment program, foldalignM, which performs especially well on few sequences with low sequence similarity, and is comparable in performance with state of the art programs in general. In addition, it can cluster sequences based on sequence and structure similarity and output a multiple alignment for each cluster. Furthermore, preliminary results with local datasets indicate that the program is useful for post processing foldalign pairwise scans. The program foldalignM is implemented in JAVA and is, along with some accompanying PERL scripts, available at http://foldalign.ku.dk/

  9. A two-stage approach to automatic face alignment

    NASA Astrophysics Data System (ADS)

    Wang, Tong; Ai, Haizhou; Huang, Gaofeng

    2003-09-01

    Face alignment is very important in face recognition, modeling and synthesis. Many approaches have been developed for this purpose, such as ASM, AAM, DAM and TC-ASM. After a brief review of all those methods, it is pointed out that these approaches all require a manual initialization to the positions of the landmarks and are very sensitive to it, and despite of all those devoted works the outline of a human face remains a difficult task to be localized precisely. In this paper, a two-stage method to achieve frontal face alignment fully automatically is introduced. The first stage is landmarks' initialization called coarse face alignment. In this stage, after a face is detected by an Adaboost cascade face detector, we use Simple Direct Appearance Model (SDAM) to locate a few key points of human face from the texture according which all the initial landmarks are setup as the coarse alignment. The second stage is fine face alignment that uses a variant of AAM method in which shape variation is predicted from texture reconstruction error together with an embedded ASM refinement for the outline landmarks of the face to achieve the fine alignment. Experiments on a face database of 500 people show that this method is very effective for practical applications.

  10. Self-Aligning Optical Measurement Systems

    NASA Technical Reports Server (NTRS)

    Decker, Arthur J.

    1992-01-01

    The paper discusses how to teach a system of neural networks to respond to the alignment clues used by a human operator in performing routine, initial alignments. A paradigm is proposed for automating the alignment of the components of optical measurement systems. The paradigm which was tested on a spatial filter has proved to be successful for optical alignment.

  11. A Nonlinear Observer for Gyro Alignment Estimation

    NASA Technical Reports Server (NTRS)

    Thienel, J.; Sanner, R. M.

    2003-01-01

    A nonlinear observer for gyro alignment estimation is presented. The observer is composed of two error terms, an attitude error and an alignment error. The observer is globally stable with exponential convergence of the attitude errors. The gyro alignment estimate converges to the true alignment when the system is completely observable.

  12. Photosensitive Polymers for Liquid Crystal Alignment

    NASA Astrophysics Data System (ADS)

    Mahilny, U. V.; Stankevich, A. I.; Trofimova, A. V.; Muravsky, A. A.; Murauski, A. A.

    The peculiarities of alignment of liquid crystal (LC) materials by the layers of photocrosslinkable polymers with side benzaldehyde groups are considered. The investigation of mechanism of photostimulated alignment by rubbed benzaldehyde layer is performed. The methods of creation of multidomain aligning layers on the basis of photostimulated rubbing alignment are described.

  13. Precise Synaptic Efficacy Alignment Suggests Potentiation Dominated Learning.

    PubMed

    Hartmann, Christoph; Miner, Daniel C; Triesch, Jochen

    2015-01-01

    Recent evidence suggests that parallel synapses from the same axonal branch onto the same dendritic branch have almost identical strength. It has been proposed that this alignment is only possible through learning rules that integrate activity over long time spans. However, learning mechanisms such as spike-timing-dependent plasticity (STDP) are commonly assumed to be temporally local. Here, we propose that the combination of temporally local STDP and a multi