Science.gov

Sample records for java sequence alignment

  1. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    PubMed

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/. PMID:25653836

  2. Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

    PubMed

    Martin, Andrew C R

    2014-01-01

    The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.

  3. JVM: Java Visual Mapping tool for next generation sequencing read.

    PubMed

    Yang, Ye; Liu, Juan

    2015-01-01

    We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.

  4. JavaScript DNA translator: DNA-aligned protein translations.

    PubMed

    Perry, William L

    2002-12-01

    There are many instances in molecular biology when it is necessary to identify ORFs in a DNA sequence. While programs exist for displaying protein translations in multiple ORFs in alignment with a DNA sequence, they are often expensive, exist as add-ons to software that must be purchased, or are only compatible with a particular operating system. JavaScript DNA Translator is a shareware application written in JavaScript, a scripting language interpreted by the Netscape Communicator and Internet Explorer Web browsers, which makes it compatible with several different operating systems. While the program uses a familiar Web page interface, it requires no connection to the Internet since calculations are performed on the user's own computer. The program analyzes one or multiple DNA sequences and generates translations in up to six reading frames aligned to a DNA sequence, in addition to displaying translations as separate sequences in FASTA format. ORFs within a reading frame can also be displayed as separate sequences. Flexible formatting options are provided, including the ability to hide ORFs below a minimum size specified by the user. The program is available free of charge at the BioTechniques Software Library (www.Biotechniques.com).

  5. Pairwise Sequence Alignment Library

    SciTech Connect

    Jeff Daily, PNNL

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.

  6. Pairwise Sequence Alignment Library

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprintmore » that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, a novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less

  7. Multiple sequence alignment with DIALIGN.

    PubMed

    Morgenstern, Burkhard

    2014-01-01

    DIALIGN is a software tool for multiple sequence alignment by combining global and local alignment features. It composes multiple alignments from local pairwise sequence similarities. This approach is particularly useful to discover conserved functional regions in sequences that share only local homologies but are otherwise unrelated. An anchoring option allows to use external information and expert knowledge in addition to primary-sequence similarity alone. The latest version of DIALIGN optionally uses matches to the PFAM database to detect weak homologies. Various versions of the program are available through Göttingen Bioinformatics Compute Server (GOBICS) at http://www.gobics.de/department/software.

  8. Pareto optimal pairwise sequence alignment.

    PubMed

    DeRonne, Kevin W; Karypis, George

    2013-01-01

    Sequence alignment using evolutionary profiles is a commonly employed tool when investigating a protein. Many profile-profile scoring functions have been developed for use in such alignments, but there has not yet been a comprehensive study of Pareto optimal pairwise alignments for combining multiple such functions. We show that the problem of generating Pareto optimal pairwise alignments has an optimal substructure property, and develop an efficient algorithm for generating Pareto optimal frontiers of pairwise alignments. All possible sets of two, three, and four profile scoring functions are used from a pool of 11 functions and applied to 588 pairs of proteins in the ce_ref data set. The performance of the best objective combinations on ce_ref is also evaluated on an independent set of 913 protein pairs extracted from the BAliBASE RV11 data set. Our dynamic-programming-based heuristic approach produces approximated Pareto optimal frontiers of pairwise alignments that contain comparable alignments to those on the exact frontier, but on average in less than 1/58th the time in the case of four objectives. Our results show that the Pareto frontiers contain alignments whose quality is better than the alignments obtained by single objectives. However, the task of identifying a single high-quality alignment among those in the Pareto frontier remains challenging.

  9. Global Alignment System for Large Genomic Sequencing

    2002-03-01

    AVID is a global alignment system tailored for the alignment of large genomic sequences up to megabases in length. Features include the possibility of one sequence being in draft form, fast alignment, robustness and accuracy. The method is an anchor based alignment using maximal matches derived from suffix trees.

  10. Simultaneous Alignment and Folding of Protein Sequences

    PubMed Central

    Waldispühl, Jérôme; O'Donnell, Charles W.; Will, Sebastian; Devadas, Srinivas; Backofen, Rolf

    2014-01-01

    Abstract Accurate comparative analysis tools for low-homology proteins remains a difficult challenge in computational biology, especially sequence alignment and consensus folding problems. We present partiFold-Align, the first algorithm for simultaneous alignment and consensus folding of unaligned protein sequences; the algorithm's complexity is polynomial in time and space. Algorithmically, partiFold-Align exploits sparsity in the set of super-secondary structure pairings and alignment candidates to achieve an effectively cubic running time for simultaneous pairwise alignment and folding. We demonstrate the efficacy of these techniques on transmembrane β-barrel proteins, an important yet difficult class of proteins with few known three-dimensional structures. Testing against structurally derived sequence alignments, partiFold-Align significantly outperforms state-of-the-art pairwise and multiple sequence alignment tools in the most difficult low-sequence homology case. It also improves secondary structure prediction where current approaches fail. Importantly, partiFold-Align requires no prior training. These general techniques are widely applicable to many more protein families (partiFold-Align is available at http://partifold.csail.mit.edu/). PMID:24766258

  11. Phylo-VISTA: Interactive visualization of multiple DNA sequence alignments

    SciTech Connect

    Shah, Nameeta; Couronne, Olivier; Pennacchio, Len A.; Brudno, Michael; Batzoglou, Serafim; Bethel, E. Wes; Rubin, Edward M.; Hamann, Bernd; Dubchak, Inna

    2004-01-15

    The power of multi-sequence comparison for biological discovery is well established. The need for new capabilities to visualize and compare cross-species alignment data is intensified by the growing number of genomic sequence datasets being generated for an ever-increasing number of organisms. To be efficient these visualization algorithms must support the ability to accommodate consistently a wide range of evolutionary distances in a comparison framework based upon phylogenetic relationships. Results: We have developed Phylo-VISTA, an interactive tool for analyzing multiple alignments by visualizing a similarity measure for multiple DNA sequences. The complexity of visual presentation is effectively organized using a framework based upon interspecies phylogenetic relationships. The phylogenetic organization supports rapid, user-guided interspecies comparison. To aid in navigation through large sequence datasets, Phylo-VISTA leverages concepts from VISTA that provide a user with the ability to select and view data at varying resolutions. The combination of multiresolution data visualization and analysis, combined with the phylogenetic framework for interspecies comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments. Availability: Phylo-VISTA is available at http://www-gsd.lbl. gov/phylovista. It requires an Internet browser with Java Plugin 1.4.2 and it is integrated into the global alignment program LAGAN at http://lagan.stanford.edu

  12. Multiple sequence alignment in HTML: colored, possibly hyperlinked, compact representations.

    PubMed

    Campagne, F; Maigret, B

    1998-02-01

    Protein sequence alignments are widely used in protein structure prediction, protein engineering, modeling of proteins, etc. This type of representation is useful at different stages of scientific activity: looking at previous results, working on a research project, and presenting the results. There is a need to make it available through a network (intranet or WWW), in a way that allows biologists, chemists, and noncomputer specialists to look at the data and carry on research--possibly in a collaborative research. Previous methods (text-based, Java-based) are reported and their advantages are discussed. We have developed two novel approaches to represent the alignments as colored, hyper-linked HTML pages. The first method creates an HTML page that uses efficiently the image cache mechanism of a WWW browser, thereby allowing the user to browse different alignments without waiting for the images to be loaded through the network, but only for the first viewed alignment. The generated pages can be browsed with any HTML2.0-compliant browser. The second method that we propose uses W3C-CSS1-style sheets to render alignments. This new method generates pages that require recent browsers to be viewed. We implemented these methods in the Viseur program and made a WWW service available that allows a user to convert an MSF alignment file in HTML for WWW publishing. The latter service is available at http:@www.lctn.u-nancy.fr/viseur/services.htm l.

  13. DNA sequence matching processor using FPGA and JAVA interface.

    PubMed

    Brown, Benjamin O; Yin, Meng-Lai; Cheng, Yi

    2004-01-01

    This study uses an FPGA to perform high-speed DNA sequence matching as an alternative to using general purpose computer CPUs. The FPGA is programmed using the Verilog HDL and interfaced using a graphical user interface programmed in JAVA. Design overviews and details for a small scale design are given as well as plans for larger scale expansion. Encouraging results of the small scale model currently in production are also provided. Results of a successful match and no match are shown.

  14. GASSST: global alignment short sequence search tool

    PubMed Central

    Rizk, Guillaume; Lavenier, Dominique

    2010-01-01

    Motivation: The rapid development of next-generation sequencing technologies able to produce huge amounts of sequence data is leading to a wide range of new applications. This triggers the need for fast and accurate alignment software. Common techniques often restrict indels in the alignment to improve speed, whereas more flexible aligners are too slow for large-scale applications. Moreover, many current aligners are becoming inefficient as generated reads grow ever larger. Our goal with our new aligner GASSST (Global Alignment Short Sequence Search Tool) is thus 2-fold—achieving high performance with no restrictions on the number of indels with a design that is still effective on long reads. Results: We propose a new efficient filtering step that discards most alignments coming from the seed phase before they are checked by the costly dynamic programming algorithm. We use a carefully designed series of filters of increasing complexity and efficiency to quickly eliminate most candidate alignments in a wide range of configurations. The main filter uses a precomputed table containing the alignment score of short four base words aligned against each other. This table is reused several times by a new algorithm designed to approximate the score of the full dynamic programming algorithm. We compare the performance of GASSST against BWA, BFAST, SSAHA2 and PASS. We found that GASSST achieves high sensitivity in a wide range of configurations and faster overall execution time than other state-of-the-art aligners. Availability: GASSST is distributed under the CeCILL software license at http://www.irisa.fr/symbiose/projects/gassst/ Contact: guillaume.rizk@irisa.fr; dominique.lavenier@irisa.fr Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20739310

  15. MACSE: Multiple Alignment of Coding SEquences Accounting for Frameshifts and Stop Codons

    PubMed Central

    Ranwez, Vincent; Harispe, Sébastien; Delsuc, Frédéric; Douzery, Emmanuel J. P.

    2011-01-01

    Until now the most efficient solution to align nucleotide sequences containing open reading frames was to use indirect procedures that align amino acid translation before reporting the inferred gap positions at the codon level. There are two important pitfalls with this approach. Firstly, any premature stop codon impedes using such a strategy. Secondly, each sequence is translated with the same reading frame from beginning to end, so that the presence of a single additional nucleotide leads to both aberrant translation and alignment. We present an algorithm that has the same space and time complexity as the classical Needleman-Wunsch algorithm while accommodating sequencing errors and other biological deviations from the coding frame. The resulting pairwise coding sequence alignment method was extended to a multiple sequence alignment (MSA) algorithm implemented in a program called MACSE (Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons). MACSE is the first automatic solution to align protein-coding gene datasets containing non-functional sequences (pseudogenes) without disrupting the underlying codon structure. It has also proved useful in detecting undocumented frameshifts in public database sequences and in aligning next-generation sequencing reads/contigs against a reference coding sequence. MACSE is distributed as an open-source java file executable with freely available source code and can be used via a web interface at: http://mbb.univ-montp2.fr/macse. PMID:21949676

  16. Two Hybrid Algorithms for Multiple Sequence Alignment

    NASA Astrophysics Data System (ADS)

    Naznin, Farhana; Sarker, Ruhul; Essam, Daryl

    2010-01-01

    In order to design life saving drugs, such as cancer drugs, the design of Protein or DNA structures has to be accurate. These structures depend on Multiple Sequence Alignment (MSA). MSA is used to find the accurate structure of Protein and DNA sequences from existing approximately correct sequences. To overcome the overly greedy nature of the well known global progressive alignment method for multiple sequence alignment, we have proposed two different algorithms in this paper; one is using an iterative approach with a progressive alignment method (PAMIM) and the second one is using a genetic algorithm with a progressive alignment method (PAMGA). Both of our methods started with a "kmer" distance table to generate single guide-tree. In the iterative approach, we have introduced two new techniques: the first technique is to generate Guide-trees with randomly selected sequences and the second is of shuffling the sequences inside that tree. The output of the tree is a multiple sequence alignment which has been evaluated by the Sum of Pairs Method (SPM) considering the real value data from PAM250. In our second GA approach, these two techniques are used to generate an initial population and also two different approaches of genetic operators are implemented in crossovers and mutation. To test the performance of our two algorithms, we have compared these with the existing well known methods: T-Coffee, MUSCEL, MAFFT and Probcon, using BAliBase benchmarks. The experimental results show that the first algorithm works well for some situations, where other existing methods face difficulties in obtaining better solutions. The proposed second method works well compared to the existing methods for all situations and it shows better performance over the first one.

  17. Robust temporal alignment of multimodal cardiac sequences

    NASA Astrophysics Data System (ADS)

    Perissinotto, Andrea; Queirós, Sandro; Morais, Pedro; Baptista, Maria J.; Monaghan, Mark; Rodrigues, Nuno F.; D'hooge, Jan; Vilaça, João. L.; Barbosa, Daniel

    2015-03-01

    Given the dynamic nature of cardiac function, correct temporal alignment of pre-operative models and intraoperative images is crucial for augmented reality in cardiac image-guided interventions. As such, the current study focuses on the development of an image-based strategy for temporal alignment of multimodal cardiac imaging sequences, such as cine Magnetic Resonance Imaging (MRI) or 3D Ultrasound (US). First, we derive a robust, modality-independent signal from the image sequences, estimated by computing the normalized cross-correlation between each frame in the temporal sequence and the end-diastolic frame. This signal is a resembler for the left-ventricle (LV) volume curve over time, whose variation indicates different temporal landmarks of the cardiac cycle. We then perform the temporal alignment of these surrogate signals derived from MRI and US sequences of the same patient through Dynamic Time Warping (DTW), allowing to synchronize both sequences. The proposed framework was evaluated in 98 patients, which have undergone both 3D+t MRI and US scans. The end-systolic frame could be accurately estimated as the minimum of the image-derived surrogate signal, presenting a relative error of 1.6 +/- 1.9% and 4.0 +/- 4.2% for the MRI and US sequences, respectively, thus supporting its association with key temporal instants of the cardiac cycle. The use of DTW reduces the desynchronization of the cardiac events in MRI and US sequences, allowing to temporally align multimodal cardiac imaging sequences. Overall, a generic, fast and accurate method for temporal synchronization of MRI and US sequences of the same patient was introduced. This approach could be straightforwardly used for the correct temporal alignment of pre-operative MRI information and intra-operative US images.

  18. DNA sequence chromatogram browsing using JAVA and CORBA.

    PubMed

    Parsons, J D; Buehler, E; Hillier, L

    1999-03-01

    DNA sequence chromatograms (traces) are the primary data source for all large-scale genomic and expressed sequence tags (ESTs) sequencing projects. Access to the sequencing trace assists many later analyses, for example contig assembly and polymorphism detection, but obtaining and using traces is problematic. Traces are not collected and published centrally, they are much larger than the base calls derived from them, and viewing them requires the interactivity of a local graphical client with local data. To provide efficient global access to DNA traces, we developed a client/server system based on flexible Java components integrated into other applications including an applet for use in a WWW browser and a stand-alone trace viewer. Client/server interaction is facilitated by CORBA middleware which provides a well-defined interface, a naming service, and location independence. [The software is packaged as a Jar file available from the following URL: http://www.ebi.ac.uk/jparsons. Links to working examples of the trace viewers can be found at http://corba.ebi.ac.uk/EST. All the Washington University mouse EST traces are available for browsing at the same URL.

  19. A rank-based sequence aligner with applications in phylogenetic analysis.

    PubMed

    Dinu, Liviu P; Ionescu, Radu Tudor; Tomescu, Alexandru I

    2014-01-01

    Recent tools for aligning short DNA reads have been designed to optimize the trade-off between correctness and speed. This paper introduces a method for assigning a set of short DNA reads to a reference genome, under Local Rank Distance (LRD). The rank-based aligner proposed in this work aims to improve correctness over speed. However, some indexing strategies to speed up the aligner are also investigated. The LRD aligner is improved in terms of speed by storing [Formula: see text]-mer positions in a hash table for each read. Another improvement, that produces an approximate LRD aligner, is to consider only the positions in the reference that are likely to represent a good positional match of the read. The proposed aligner is evaluated and compared to other state of the art alignment tools in several experiments. A set of experiments are conducted to determine the precision and the recall of the proposed aligner, in the presence of contaminated reads. In another set of experiments, the proposed aligner is used to find the order, the family, or the species of a new (or unknown) organism, given only a set of short Next-Generation Sequencing DNA reads. The empirical results show that the aligner proposed in this work is highly accurate from a biological point of view. Compared to the other evaluated tools, the LRD aligner has the important advantage of being very accurate even for a very low base coverage. Thus, the LRD aligner can be considered as a good alternative to standard alignment tools, especially when the accuracy of the aligner is of high importance. Source code and UNIX binaries of the aligner are freely available for future development and use at http://lrd.herokuapp.com/aligners. The software is implemented in C++ and Java, being supported on UNIX and MS Windows.

  20. DINAMO: a coupled sequence alignment editor/molecular graphics tool for interactive homology modeling of proteins.

    PubMed

    Hansen, M; Bentz, J; Baucom, A; Gregoret, L

    1998-01-01

    Gaining functional information about a novel protein is a universal problem in biomedical research. With the explosive growth of the protein sequence and structural databases, it is becoming increasingly common for researchers to attempt to build a three-dimensional model of their protein of interest in order to gain information about its structure and interactions with other molecules. The two most reliable methods for predicting the structure of a protein are homology modeling, in which the novel sequence is modeled on the known three-dimensional structure of a related protein, and fold recognition (threading), where the sequence is scored against a library of fold models, and the highest scoring model is selected. The sequence alignment to a known structure can be ambiguous, and human intervention is often required to optimize the model. We describe an interactive model building and assessment tool in which a sequence alignment editor is dynamically coupled to a molecular graphics display. By means of a set of assessment tools, the user may optimize his or her alignment to satisfy the known heuristics of protein structure. Adjustments to the sequence alignment made by the user are reflected in the displayed model by color and other visual cues. For instance, residues are colored by hydrophobicity in both the three-dimensional model and in the sequence alignment. This aids the user in identifying undesirable buried polar residues. Several different evaluation metrics may be selected including residue conservation, residue properties, and visualization of predicted secondary structure. These characteristics may be mapped to the model both singly and in combination. DINAMO is a Java-based tool that may be run either over the web or installed locally. Its modular architecture also allows Java-literate users to add plug-ins of their own design.

  1. PROMALS web server for accurate multiple protein sequence alignments.

    PubMed

    Pei, Jimin; Kim, Bong-Hyun; Tang, Ming; Grishin, Nick V

    2007-07-01

    Multiple sequence alignments are essential in homology inference, structure modeling, functional prediction and phylogenetic analysis. We developed a web server that constructs multiple protein sequence alignments using PROMALS, a progressive method that improves alignment quality by using additional homologs from PSI-BLAST searches and secondary structure predictions from PSIPRED. PROMALS shows higher alignment accuracy than other advanced methods, such as MUMMALS, ProbCons, MAFFT and SPEM. The PROMALS web server takes FASTA format protein sequences as input. The output includes a colored alignment augmented with information about sequence grouping, predicted secondary structures and positional conservation. The PROMALS web server is available at: http://prodata.swmed.edu/promals/ PMID:17452345

  2. Blasting and Zipping: Sequence Alignment and Mutual Information

    NASA Astrophysics Data System (ADS)

    Penner, Orion; Grassberger, Peter; Paczuski, Maya

    2009-03-01

    Alignment of biological sequences such as DNA, RNA or proteins is one of the most widely used tools in computational bioscience. While the accomplishments of sequence alignment algorithms are undeniable the fact remains that these algorithms are based upon heuristic scoring schemes. Therefore, these algorithms do not provide model independent and objective measures for how similar two (or more) sequences actually are. Although information theory provides such a similarity measure - the mutual information (MI) - numerous previous attempts to connect sequence alignment and information have not produced realistic estimates for the MI from a given alignment. We report on a simple and flexible approach to get robust estimates of MI from global alignments. The presented results may help establish MI as a reliable tool for evaluating the quality of global alignments, judging the relative merits of different alignment algorithms, and estimating the significance of specific alignments.

  3. MULTAN: a program to align multiple DNA sequences.

    PubMed Central

    Bains, W

    1986-01-01

    I describe a computer program which can align a large number of nucleic acid sequences with one another. The program uses an heuristic, iterative algorithm which has been tested extensively, and is found to produce useful alignments of a variety of sequence families. The algorithm is fast enough to be practical for the analysis of large number of sequences, and is implemented in a program which contains a variety of other functions to facilitate the analysis of the aligned result. PMID:3003672

  4. High-speed multiple sequence alignment on a reconfigurable platform.

    PubMed

    Oliver, Tim; Schmidt, Bertil; Maskell, Douglas; Nathan, Darran; Clemens, Ralf

    2006-01-01

    Progressive alignment is a widely used approach to compute multiple sequence alignments (MSAs). However, aligning several hundred sequences by popular progressive alignment tools requires hours on sequential computers. Due to the rapid growth of sequence databases biologists have to compute MSAs in a far shorter time. In this paper we present a new approach to MSA on reconfigurable hardware platforms to gain high performance at low cost. We have constructed a linear systolic array to perform pairwise sequence distance computations using dynamic programming. This results in an implementation with significant runtime savings on a standard FPGA.

  5. The number of reduced alignments between two DNA sequences

    PubMed Central

    2014-01-01

    Background In this study we consider DNA sequences as mathematical strings. Total and reduced alignments between two DNA sequences have been considered in the literature to measure their similarity. Results for explicit representations of some alignments have been already obtained. Results We present exact, explicit and computable formulas for the number of different possible alignments between two DNA sequences and a new formula for a class of reduced alignments. Conclusions A unified approach for a wide class of alignments between two DNA sequences has been provided. The formula is computable and, if complemented by software development, will provide a deeper insight into the theory of sequence alignment and give rise to new comparison methods. AMS Subject Classification Primary 92B05, 33C20, secondary 39A14, 65Q30 PMID:24684679

  6. Local alignment of two-base encoded DNA sequence

    PubMed Central

    Homer, Nils; Merriman, Barry; Nelson, Stanley F

    2009-01-01

    Background DNA sequence comparison is based on optimal local alignment of two sequences using a similarity score. However, some new DNA sequencing technologies do not directly measure the base sequence, but rather an encoded form, such as the two-base encoding considered here. In order to compare such data to a reference sequence, the data must be decoded into sequence. The decoding is deterministic, but the possibility of measurement errors requires searching among all possible error modes and resulting alignments to achieve an optimal balance of fewer errors versus greater sequence similarity. Results We present an extension of the standard dynamic programming method for local alignment, which simultaneously decodes the data and performs the alignment, maximizing a similarity score based on a weighted combination of errors and edits, and allowing an affine gap penalty. We also present simulations that demonstrate the performance characteristics of our two base encoded alignment method and contrast those with standard DNA sequence alignment under the same conditions. Conclusion The new local alignment algorithm for two-base encoded data has substantial power to properly detect and correct measurement errors while identifying underlying sequence variants, and facilitating genome re-sequencing efforts based on this form of sequence data. PMID:19508732

  7. A comparative analysis of multiple sequence alignments for biological data.

    PubMed

    Manzoor, Umar; Shahid, Sarosh; Zafar, Bassam

    2015-01-01

    Multiple sequence alignment plays a key role in the computational analysis of biological data. Different programs are developed to analyze the sequence similarity. This paper highlights the algorithmic techniques of the most popular multiple sequence alignment programs. These programs are then evaluated on the basis of execution time and scalability. The overall performance of these programs is assessed to highlight their strengths and weaknesses with reference to their algorithmic techniques. In terms of overall alignment quality, T-Coffee and Mafft attain the highest average scores, whereas K-align has the minimum computation time. PMID:26405947

  8. [Tabular excel editor for analysis of aligned nucleotide sequences].

    PubMed

    Demkin, V V

    2010-01-01

    Excel platform was used for transition of results of multiple aligned nucleotide sequences obtained using the BLAST network service to the form appropriate for visual analysis and editing. Two macros operators for MS Excel 2007 were constructed. The array of aligned sequences transformed into Excel table and processed using macros operators is more appropriate for analysis than initial html data.

  9. Probabilistic sequence alignment of stratigraphic records

    NASA Astrophysics Data System (ADS)

    Lin, Luan; Khider, Deborah; Lisiecki, Lorraine E.; Lawrence, Charles E.

    2014-10-01

    The assessment of age uncertainty in stratigraphically aligned records is a pressing need in paleoceanographic research. The alignment of ocean sediment cores is used to develop mutually consistent age models for climate proxies and is often based on the δ18O of calcite from benthic foraminifera, which records a global ice volume and deep water temperature signal. To date, δ18O alignment has been performed by manual, qualitative comparison or by deterministic algorithms. Here we present a hidden Markov model (HMM) probabilistic algorithm to find 95% confidence bands for δ18O alignment. This model considers the probability of every possible alignment based on its fit to the δ18O data and transition probabilities for sedimentation rate changes obtained from radiocarbon-based estimates for 37 cores. Uncertainty is assessed using a stochastic back trace recursion to sample alignments in exact proportion to their probability. We applied the algorithm to align 35 late Pleistocene records to a global benthic δ18O stack and found that the mean width of 95% confidence intervals varies between 3 and 23 kyr depending on the resolution and noisiness of the record's δ18O signal. Confidence bands within individual cores also vary greatly, ranging from ~0 to >40 kyr. These alignment uncertainty estimates will allow researchers to examine the robustness of their conclusions, including the statistical evaluation of lead-lag relationships between events observed in different cores.

  10. ProbCons: Probabilistic consistency-based multiple sequence alignment.

    PubMed

    Do, Chuong B; Mahabhashyam, Mahathi S P; Brudno, Michael; Batzoglou, Serafim

    2005-02-01

    To study gene evolution across a wide range of organisms, biologists need accurate tools for multiple sequence alignment of protein families. Obtaining accurate alignments, however, is a difficult computational problem because of not only the high computational cost but also the lack of proper objective functions for measuring alignment quality. In this paper, we introduce probabilistic consistency, a novel scoring function for multiple sequence comparisons. We present ProbCons, a practical tool for progressive protein multiple sequence alignment based on probabilistic consistency, and evaluate its performance on several standard alignment benchmark data sets. On the BAliBASE, SABmark, and PREFAB benchmark alignment databases, ProbCons achieves statistically significant improvement over other leading methods while maintaining practical speed. ProbCons is publicly available as a Web resource.

  11. An enhanced algorithm for multiple sequence alignment of protein sequences using genetic algorithm

    PubMed Central

    Kumar, Manish

    2015-01-01

    One of the most fundamental operations in biological sequence analysis is multiple sequence alignment (MSA). The basic of multiple sequence alignment problems is to determine the most biologically plausible alignments of protein or DNA sequences. In this paper, an alignment method using genetic algorithm for multiple sequence alignment has been proposed. Two different genetic operators mainly crossover and mutation were defined and implemented with the proposed method in order to know the population evolution and quality of the sequence aligned. The proposed method is assessed with protein benchmark dataset, e.g., BALIBASE, by comparing the obtained results to those obtained with other alignment algorithms, e.g., SAGA, RBT-GA, PRRP, HMMT, SB-PIMA, CLUSTALX, CLUSTAL W, DIALIGN and PILEUP8 etc. Experiments on a wide range of data have shown that the proposed algorithm is much better (it terms of score) than previously proposed algorithms in its ability to achieve high alignment quality. PMID:27065770

  12. A simple method to control over-alignment in the MAFFT multiple sequence alignment program

    PubMed Central

    Katoh, Kazutaka; Standley, Daron M.

    2016-01-01

    Motivation: We present a new feature of the MAFFT multiple alignment program for suppressing over-alignment (aligning unrelated segments). Conventional MAFFT is highly sensitive in aligning conserved regions in remote homologs, but the risk of over-alignment is recently becoming greater, as low-quality or noisy sequences are increasing in protein sequence databases, due, for example, to sequencing errors and difficulty in gene prediction. Results: The proposed method utilizes a variable scoring matrix for different pairs of sequences (or groups) in a single multiple sequence alignment, based on the global similarity of each pair. This method significantly increases the correctly gapped sites in real examples and in simulations under various conditions. Regarding sensitivity, the effect of the proposed method is slightly negative in real protein-based benchmarks, and mostly neutral in simulation-based benchmarks. This approach is based on natural biological reasoning and should be compatible with many methods based on dynamic programming for multiple sequence alignment. Availability and implementation: The new feature is available in MAFFT versions 7.263 and higher. http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153688

  13. A novel randomized iterative strategy for aligning multiple protein sequences.

    PubMed

    Berger, M P; Munson, P J

    1991-10-01

    The rigorous alignment of multiple protein sequences becomes impractical even with a modest number of sequences, since computer memory and time requirements increase as the product of the lengths of the sequences. We have devised a strategy to approach such an optimal alignment, which modifies the intensive computer storage and time requirements of dynamic programming. Our algorithm randomly divides a group of unaligned sequences into two subgroups, between which an optimal alignment is then obtained by a Needleman-Wunsch style of algorithm. Our algorithm uses a matrix with dimensions corresponding to the lengths of the two aligned sequence subgroups. The pairwise alignment process is repeated using different random divisions of the whole group into two subgroups. Compared with the rigorous approach of solving the n-dimensional lattice by dynamic programming, our iterative algorithm results in alignments that match or are close to the optimal solution, on a limited set of test problems. We have implemented this algorithm in a computer program that runs on the IBM PC class of machines, together with a user-friendly environment for interactively selecting sequences or groups of sequences to be aligned either simultaneously or progressively.

  14. Uncertainty in homology inferences: Assessing and improving genomic sequence alignment

    PubMed Central

    Lunter, Gerton; Rocco, Andrea; Mimouni, Naila; Heger, Andreas; Caldeira, Alexandre; Hein, Jotun

    2008-01-01

    Sequence alignment underpins all of comparative genomics, yet it remains an incompletely solved problem. In particular, the statistical uncertainty within inferred alignments is often disregarded, while parametric or phylogenetic inferences are considered meaningless without confidence estimates. Here, we report on a theoretical and simulation study of pairwise alignments of genomic DNA at human–mouse divergence. We find that >15% of aligned bases are incorrect in existing whole-genome alignments, and we identify three types of alignment error, each leading to systematic biases in all algorithms considered. Careful modeling of the evolutionary process improves alignment quality; however, these improvements are modest compared with the remaining alignment errors, even with exact knowledge of the evolutionary model, emphasizing the need for statistical approaches to account for uncertainty. We develop a new algorithm, Marginalized Posterior Decoding (MPD), which explicitly accounts for uncertainties, is less biased and more accurate than other algorithms we consider, and reduces the proportion of misaligned bases by a third compared with the best existing algorithm. To our knowledge, this is the first nonheuristic algorithm for DNA sequence alignment to show robust improvements over the classic Needleman–Wunsch algorithm. Despite this, considerable uncertainty remains even in the improved alignments. We conclude that a probabilistic treatment is essential, both to improve alignment quality and to quantify the remaining uncertainty. This is becoming increasingly relevant with the growing appreciation of the importance of noncoding DNA, whose study relies heavily on alignments. Alignment errors are inevitable, and should be considered when drawing conclusions from alignments. Software and alignments to assist researchers in doing this are provided at http://genserv.anat.ox.ac.uk/grape/. PMID:18073381

  15. Refinement by shifting secondary structure elements improves sequence alignments.

    PubMed

    Tong, Jing; Pei, Jimin; Otwinowski, Zbyszek; Grishin, Nick V

    2015-03-01

    Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template-defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile-based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at http://prodata.swmed.edu/sfesa. PMID:25546158

  16. Refinement by shifting secondary structure elements improves sequence alignments

    PubMed Central

    Tong, Jing; Pei, Jimin; Otwinowski, Zbyszek; Grishin, Nick V.

    2015-01-01

    Constructing a model of a query protein based on its alignment to a homolog with experimentally determined spatial structure (the template) is still the most reliable approach to structure prediction. Alignment errors are the main bottleneck for homology modeling when the query is distantly related to the template. Alignment methods often misalign secondary structural elements by a few residues. Therefore, better alignment solutions can be found within a limited set of local shifts of secondary structures. We present a refinement method to improve pairwise sequence alignments by evaluating alignment variants generated by local shifts of template-defined secondary structures. Our method SFESA is based on a novel scoring function that combines the profile-based sequence score and the structure score derived from residue contacts in a template. Such a combined score frequently selects a better alignment variant among a set of candidate alignments generated by local shifts and leads to overall increase in alignment accuracy. Evaluation of several benchmarks shows that our refinement method significantly improves alignments made by automatic methods such as PROMALS, HHpred and CNFpred. The web server is available at http://prodata.swmed.edu/sfesa. PMID:25546158

  17. Mercury BLASTP: Accelerating Protein Sequence Alignment

    PubMed Central

    Jacob, Arpith; Lancaster, Joseph; Buhler, Jeremy; Harris, Brandon; Chamberlain, Roger D.

    2008-01-01

    Large-scale protein sequence comparison is an important but compute-intensive task in molecular biology. BLASTP is the most popular tool for comparative analysis of protein sequences. In recent years, an exponential increase in the size of protein sequence databases has required either exponentially more running time or a cluster of machines to keep pace. To address this problem, we have designed and built a high-performance FPGA-accelerated version of BLASTP, Mercury BLASTP. In this paper, we describe the architecture of the portions of the application that are accelerated in the FPGA, and we also describe the integration of these FPGA-accelerated portions with the existing BLASTP software. We have implemented Mercury BLASTP on a commodity workstation with two Xilinx Virtex-II 6000 FPGAs. We show that the new design runs 11-15 times faster than software BLASTP on a modern CPU while delivering close to 99% identical results. PMID:19492068

  18. Evaluating the Accuracy and Efficiency of Multiple Sequence Alignment Methods

    PubMed Central

    Pervez, Muhammad Tariq; Babar, Masroor Ellahi; Nadeem, Asif; Aslam, Muhammad; Awan, Ali Raza; Aslam, Naeem; Hussain, Tanveer; Naveed, Nasir; Qadri, Salman; Waheed, Usman; Shoaib, Muhammad

    2014-01-01

    A comparison of 10 most popular Multiple Sequence Alignment (MSA) tools, namely, MUSCLE, MAFFT(L-INS-i), MAFFT (FFT-NS-2), T-Coffee, ProbCons, SATe, Clustal Omega, Kalign, Multalin, and Dialign-TX is presented. We also focused on the significance of some implementations embedded in algorithm of each tool. Based on 10 simulated trees of different number of taxa generated by R, 400 known alignments and sequence files were constructed using indel-Seq-Gen. A total of 4000 test alignments were generated to study the effect of sequence length, indel size, deletion rate, and insertion rate. Results showed that alignment quality was highly dependent on the number of deletions and insertions in the sequences and that the sequence length and indel size had a weaker effect. Overall, ProbCons was consistently on the top of list of the evaluated MSA tools. SATe, being little less accurate, was 529.10% faster than ProbCons and 236.72% faster than MAFFT(L-INS-i). Among other tools, Kalign and MUSCLE achieved the highest sum of pairs. We also considered BALiBASE benchmark datasets and the results relative to BAliBASE- and indel-Seq-Gen-generated alignments were consistent in the most cases. PMID:25574120

  19. Evaluating the accuracy and efficiency of multiple sequence alignment methods.

    PubMed

    Pervez, Muhammad Tariq; Babar, Masroor Ellahi; Nadeem, Asif; Aslam, Muhammad; Awan, Ali Raza; Aslam, Naeem; Hussain, Tanveer; Naveed, Nasir; Qadri, Salman; Waheed, Usman; Shoaib, Muhammad

    2014-01-01

    A comparison of 10 most popular Multiple Sequence Alignment (MSA) tools, namely, MUSCLE, MAFFT(L-INS-i), MAFFT (FFT-NS-2), T-Coffee, ProbCons, SATe, Clustal Omega, Kalign, Multalin, and Dialign-TX is presented. We also focused on the significance of some implementations embedded in algorithm of each tool. Based on 10 simulated trees of different number of taxa generated by R, 400 known alignments and sequence files were constructed using indel-Seq-Gen. A total of 4000 test alignments were generated to study the effect of sequence length, indel size, deletion rate, and insertion rate. Results showed that alignment quality was highly dependent on the number of deletions and insertions in the sequences and that the sequence length and indel size had a weaker effect. Overall, ProbCons was consistently on the top of list of the evaluated MSA tools. SATe, being little less accurate, was 529.10% faster than ProbCons and 236.72% faster than MAFFT(L-INS-i). Among other tools, Kalign and MUSCLE achieved the highest sum of pairs. We also considered BALiBASE benchmark datasets and the results relative to BAliBASE- and indel-Seq-Gen-generated alignments were consistent in the most cases.

  20. Protein folds and families: sequence and structure alignments.

    PubMed

    Holm, L; Sander, C

    1999-01-01

    Dali and HSSP are derived databases organizing protein space in the structurally known regions. We use an automatic structure alignment program (Dali) for the classification of all known 3D structures based on all-against-all comparison of 3D structures in the Protein Data Bank. The HSSP database associates 1D sequences with known 3D structures using a position-weighted dynamic programming method for sequence profile alignment (MaxHom). As a result, the HSSP database not only provides aligned sequence families, but also implies secondary and tertiary structures covering 36% of all sequences in Swiss-Prot. The structure classification by Dali and the sequence families in HSSP can be browsed jointly from a web interface providing a rich network of links between neighbours in fold space, between domains and proteins, and between structures and sequences. In particular, this results in a database of explicit multiple alignments of protein families in the twilight zone of sequence similarity. The organization of protein structures and families provides a map of the currently known regions of the protein universe that is useful for the analysis of folding principles, for the evolutionary unification of protein families and for maximizing the information return from experimental structure determination. The databases are available from http://www.embl-ebi.ac.uk/dali/

  1. Recursive dynamic programming for adaptive sequence and structure alignment

    SciTech Connect

    Thiele, R.; Zimmer, R.; Lengauer, T.

    1995-12-31

    We propose a new alignment procedure that is capable of aligning protein sequences and structures in a unified manner. Recursive dynamic programming (RDP) is a hierarchical method which, on each level of the hierarchy, identifies locally optimal solutions and assembles them into partial alignments of sequences and/or structures. In contrast to classical dynamic programming, RDP can also handle alignment problems that use objective functions not obeying the principle of prefix optimality, e.g. scoring schemes derived from energy potentials of mean force. For such alignment problems, RDP aims at computing solutions that are near-optimal with respect to the involved cost function and biologically meaningful at the same time. Towards this goal, RDP maintains a dynamic balance between different factors governing alignment fitness such as evolutionary relationships and structural preferences. As in the RDP method gaps are not scored explicitly, the problematic assignment of gap cost parameters is circumvented. In order to evaluate the RDP approach we analyse whether known and accepted multiple alignments based on structural information can be reproduced with the RDP method.

  2. Image-based temporal alignment of echocardiographic sequences

    NASA Astrophysics Data System (ADS)

    Danudibroto, Adriyana; Bersvendsen, Jørn; Mirea, Oana; Gerard, Olivier; D'hooge, Jan; Samset, Eigil

    2016-04-01

    Temporal alignment of echocardiographic sequences enables fair comparisons of multiple cardiac sequences by showing corresponding frames at given time points in the cardiac cycle. It is also essential for spatial registration of echo volumes where several acquisitions are combined for enhancement of image quality or forming larger field of view. In this study, three different image-based temporal alignment methods were investigated. First, a method based on dynamic time warping (DTW). Second, a spline-based method that optimized the similarity between temporal characteristic curves of the cardiac cycle using 1D cubic B-spline interpolation. Third, a method based on the spline-based method with piecewise modification. These methods were tested on in-vivo data sets of 19 echo sequences. For each sequence, the mitral valve opening (MVO) time was manually annotated. The results showed that the average MVO timing error for all methods are well under the time resolution of the sequences.

  3. HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis

    PubMed Central

    Santana-Quintero, Luis; Dingerdissen, Hayley; Thierry-Mieg, Jean; Mazumder, Raja; Simonyan, Vahan

    2014-01-01

    Due to the size of Next-Generation Sequencing data, the computational challenge of sequence alignment has been vast. Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines. High-performance Integrated Virtual Environment (HIVE), a cloud-based environment optimized for storage and analysis of extra-large data, presents an algorithmic solution: the HIVE-hexagon DNA sequence aligner. HIVE-hexagon implements novel approaches to exploit both characteristics of sequence space and CPU, RAM and Input/Output (I/O) architecture to quickly compute accurate alignments. Key components of HIVE-hexagon include non-redundification and sorting of sequences; floating diagonals of linearized dynamic programming matrices; and consideration of cross-similarity to minimize computations. Availability https://hive.biochemistry.gwu.edu/hive/ PMID:24918764

  4. FastaValidator: an open-source Java library to parse and validate FASTA formatted sequences

    PubMed Central

    2014-01-01

    Background Advances in sequencing technologies challenge the efficient importing and validation of FASTA formatted sequence data which is still a prerequisite for most bioinformatic tools and pipelines. Comparative analysis of commonly used Bio*-frameworks (BioPerl, BioJava and Biopython) shows that their scalability and accuracy is hampered. Findings FastaValidator represents a platform-independent, standardized, light-weight software library written in the Java programming language. It targets computer scientists and bioinformaticians writing software which needs to parse quickly and accurately large amounts of sequence data. For end-users FastaValidator includes an interactive out-of-the-box validation of FASTA formatted files, as well as a non-interactive mode designed for high-throughput validation in software pipelines. Conclusions The accuracy and performance of the FastaValidator library qualifies it for large data sets such as those commonly produced by massive parallel (NGS) technologies. It offers scientists a fast, accurate and standardized method for parsing and validating FASTA formatted sequence data. PMID:24929426

  5. The impact of single substitutions on multiple sequence alignments.

    PubMed

    Klaere, Steffen; Gesell, Tanja; von Haeseler, Arndt

    2008-12-27

    We introduce another view of sequence evolution. Contrary to other approaches, we model the substitution process in two steps. First we assume (arbitrary) scaled branch lengths on a given phylogenetic tree. Second we allocate a Poisson distributed number of substitutions on the branches. The probability to place a mutation on a branch is proportional to its relative branch length. More importantly, the action of a single mutation on an alignment column is described by a doubly stochastic matrix, the so-called one-step mutation matrix. This matrix leads to analytical formulae for the posterior probability distribution of the number of substitutions for an alignment column.

  6. Sequence Alignment Tools: One Parallel Pattern to Rule Them All?

    PubMed Central

    2014-01-01

    In this paper, we advocate high-level programming methodology for next generation sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets. PMID:25147803

  7. FootPrinter3: phylogenetic footprinting in partially alignable sequences.

    PubMed

    Fang, Fei; Blanchette, Mathieu

    2006-07-01

    FootPrinter3 is a web server for predicting transcription factor binding sites by using phylogenetic footprinting. Until now, phylogenetic footprinting approaches have been based either on multiple alignment analysis (e.g. PhyloVista, PhastCons), or on motif-discovery algorithms (e.g. FootPrinter2). FootPrinter3 integrates these two approaches, making use of local multiple sequence alignment blocks when those are available and reliable, but also allowing finding motifs in unalignable regions. The result is a set of predictions that joins the advantages of alignment-based methods (good specificity) to those of motif-based methods (good sensitivity, even in the presence of highly diverged species). FootPrinter3 is thus a tool of choice to exploit the wealth of vertebrate genomes being sequenced, as it allows taking full advantage of the sequences of highly diverged species (e.g. chicken, zebrafish), as well as those of more closely related species (e.g. mammals). The FootPrinter3 web server is available at: http://www.mcb.mcgill.ca/~blanchem/FootPrinter3.

  8. Exploring Dance Movement Data Using Sequence Alignment Methods

    PubMed Central

    Chavoshi, Seyed Hossein; De Baets, Bernard; Neutens, Tijs; De Tré, Guy; Van de Weghe, Nico

    2015-01-01

    Despite the abundance of research on knowledge discovery from moving object databases, only a limited number of studies have examined the interaction between moving point objects in space over time. This paper describes a novel approach for measuring similarity in the interaction between moving objects. The proposed approach consists of three steps. First, we transform movement data into sequences of successive qualitative relations based on the Qualitative Trajectory Calculus (QTC). Second, sequence alignment methods are applied to measure the similarity between movement sequences. Finally, movement sequences are grouped based on similarity by means of an agglomerative hierarchical clustering method. The applicability of this approach is tested using movement data from samba and tango dancers. PMID:26181435

  9. NanoOK: multi-reference alignment analysis of nanopore sequencing data, quality and error profiles

    PubMed Central

    Leggett, Richard M.; Heavens, Darren; Caccamo, Mario; Clark, Matthew D.; Davey, Robert P.

    2016-01-01

    Motivation: The Oxford Nanopore MinION sequencer, currently in pre-release testing through the MinION Access Programme (MAP), promises long reads in real-time from an inexpensive, compact, USB device. Tools have been released to extract FASTA/Q from the MinION base calling output and to provide basic yield statistics. However, no single tool yet exists to provide comprehensive alignment-based quality control and error profile analysis—something that is extremely important given the speed with which the platform is evolving. Results: NanoOK generates detailed tabular and graphical output plus an in-depth multi-page PDF report including error profile, quality and yield data. NanoOK is multi-reference, enabling detailed analysis of metagenomic or multiplexed samples. Four popular Nanopore aligners are supported and it is easily extensible to include others. Availability and implementation: NanoOK is an open-source software, implemented in Java with supporting R scripts. It has been tested on Linux and Mac OS X and can be downloaded from https://github.com/TGAC/NanoOK. A VirtualBox VM containing all dependencies and the DH10B read set used in this article is available from http://opendata.tgac.ac.uk/nanook/. A Docker image is also available from Docker Hub—see program documentation https://documentation.tgac.ac.uk/display/NANOOK. Contact: richard.leggett@tgac.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26382197

  10. MACSIMS : multiple alignment of complete sequences information management system

    PubMed Central

    Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier

    2006-01-01

    Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820

  11. Extracting protein alignment models from the sequence database.

    PubMed Central

    Neuwald, A F; Liu, J S; Lipman, D J; Lawrence, C E

    1997-01-01

    Biologists often gain structural and functional insights into a protein sequence by constructing a multiple alignment model of the family. Here a program called Probe fully automates this process of model construction starting from a single sequence. Central to this program is a powerful new method to locate and align only those, often subtly, conserved patterns essential to the family as a whole. When applied to randomly chosen proteins, Probe found on average about four times as many relationships as a pairwise search and yielded many new discoveries. These include: an obscure subfamily of globins in the roundworm Caenorhabditis elegans ; two new superfamilies of metallohydrolases; a lipoyl/biotin swinging arm domain in bacterial membrane fusion proteins; and a DH domain in the yeast Bud3 and Fus2 proteins. By identifying distant relationships and merging families into superfamilies in this way, this analysis further confirms the notion that proteins evolved from relatively few ancient sequences. Moreover, this method automatically generates models of these ancient conserved regions for rapid and sensitive screening of sequences. PMID:9108146

  12. Genome-wide synteny through highly sensitive sequence alignment: Satsuma

    PubMed Central

    Grabherr, Manfred G.; Russell, Pamela; Meyer, Miriah; Mauceli, Evan; Alföldi, Jessica; Di Palma, Federica; Lindblad-Toh, Kerstin

    2010-01-01

    Motivation: Comparative genomics heavily relies on alignments of large and often complex DNA sequences. From an engineering perspective, the problem here is to provide maximum sensitivity (to find all there is to find), specificity (to only find real homology) and speed (to accommodate the billions of base pairs of vertebrate genomes). Results: Satsuma addresses all three issues through novel strategies: (i) cross-correlation, implemented via fast Fourier transform; (ii) a match scoring scheme that eliminates almost all false hits; and (iii) an asynchronous ‘battleship’-like search that allows for aligning two entire fish genomes (470 and 217 Mb) in 120 CPU hours using 15 processors on a single machine. Availability: Satsuma is part of the Spines software package, implemented in C++ on Linux. The latest version of Spines can be freely downloaded under the LGPL license from http://www.broadinstitute.org/science/programs/genome-biology/spines/ Contact: grabherr@broadinstitute.org PMID:20208069

  13. Multiple sequence alignment with the Clustal series of programs.

    PubMed

    Chenna, Ramu; Sugawara, Hideaki; Koike, Tadashi; Lopez, Rodrigo; Gibson, Toby J; Higgins, Desmond G; Thompson, Julie D

    2003-07-01

    The Clustal series of programs are widely used in molecular biology for the multiple alignment of both nucleic acid and protein sequences and for preparing phylogenetic trees. The popularity of the programs depends on a number of factors, including not only the accuracy of the results, but also the robustness, portability and user-friendliness of the programs. New features include NEXUS and FASTA format output, printing range numbers and faster tree calculation. Although, Clustal was originally developed to run on a local computer, numerous Web servers have been set up, notably at the EBI (European Bioinformatics Institute) (http://www.ebi.ac.uk/clustalw/).

  14. Implied alignment: a synapomorphy-based multiple-sequence alignment method and its use in cladogram search

    NASA Technical Reports Server (NTRS)

    Wheeler, Ward C.

    2003-01-01

    A method to align sequence data based on parsimonious synapomorphy schemes generated by direct optimization (DO; earlier termed optimization alignment) is proposed. DO directly diagnoses sequence data on cladograms without an intervening multiple-alignment step, thereby creating topology-specific, dynamic homology statements. Hence, no multiple-alignment is required to generate cladograms. Unlike general and globally optimal multiple-alignment procedures, the method described here, implied alignment (IA), takes these dynamic homologies and traces them back through a single cladogram, linking the unaligned sequence positions in the terminal taxa via DO transformation series. These "lines of correspondence" link ancestor-descendent states and, when displayed as linearly arrayed columns without hypothetical ancestors, are largely indistinguishable from standard multiple alignment. Since this method is based on synapomorphy, the treatment of certain classes of insertion-deletion (indel) events may be different from that of other alignment procedures. As with all alignment methods, results are dependent on parameter assumptions such as indel cost and transversion:transition ratios. Such an IA could be used as a basis for phylogenetic search, but this would be questionable since the homologies derived from the implied alignment depend on its natal cladogram and any variance, between DO and IA + Search, due to heuristic approach. The utility of this procedure in heuristic cladogram searches using DO and the improvement of heuristic cladogram cost calculations are discussed. c2003 The Willi Hennig Society. Published by Elsevier Science (USA). All rights reserved.

  15. DNA sequence alignment by microhomology sampling during homologous recombination

    PubMed Central

    Qi, Zhi; Redding, Sy; Lee, Ja Yil; Gibb, Bryan; Kwon, YoungHo; Niu, Hengyao; Gaines, William A.; Sung, Patrick

    2015-01-01

    Summary Homologous recombination (HR) mediates the exchange of genetic information between sister or homologous chromatids. During HR, members of the RecA/Rad51 family of recombinases must somehow search through vast quantities of DNA sequence to align and pair ssDNA with a homologous dsDNA template. Here we use single-molecule imaging to visualize Rad51 as it aligns and pairs homologous DNA sequences in real-time. We show that Rad51 uses a length-based recognition mechanism while interrogating dsDNA, enabling robust kinetic selection of 8-nucleotide (nt) tracts of microhomology, which kinetically confines the search to sites with a high probability of being a homologous target. Successful pairing with a 9th nucleotide coincides with an additional reduction in binding free energy and subsequent strand exchange occurs in precise 3-nt steps, reflecting the base triplet organization of the presynaptic complex. These findings provide crucial new insights into the physical and evolutionary underpinnings of DNA recombination. PMID:25684365

  16. MSA-PAD: DNA multiple sequence alignment framework based on PFAM accessed domain information.

    PubMed

    Balech, Bachir; Vicario, Saverio; Donvito, Giacinto; Monaco, Alfonso; Notarangelo, Pasquale; Pesole, Graziano

    2015-08-01

    Here we present the MSA-PAD application, a DNA multiple sequence alignment framework that uses PFAM protein domain information to align DNA sequences encoding either single or multiple protein domains. MSA-PAD has two alignment options: gene and genome mode. PMID:25819080

  17. Using reconfigurable hardware to accelerate multiple sequence alignment with ClustalW.

    PubMed

    Oliver, Tim; Schmidt, Bertil; Nathan, Darran; Clemens, Ralf; Maskell, Douglas

    2005-08-15

    Aligning hundreds of sequences using progressive alignment tools such as ClustalW requires several hours on state-of-the-art workstations. We present a new approach to compute multiple sequence alignments in far shorter time using reconfigurable hardware. This results in an implementation of ClustalW with significant runtime savings on a standard off-the-shelf FPGA.

  18. AlignerBoost: A Generalized Software Toolkit for Boosting Next-Gen Sequencing Mapping Accuracy Using a Bayesian-Based Mapping Quality Framework

    PubMed Central

    Zheng, Qi; Grice, Elizabeth A.

    2016-01-01

    Accurate mapping of next-generation sequencing (NGS) reads to reference genomes is crucial for almost all NGS applications and downstream analyses. Various repetitive elements in human and other higher eukaryotic genomes contribute in large part to ambiguously (non-uniquely) mapped reads. Most available NGS aligners attempt to address this by either removing all non-uniquely mapping reads, or reporting one random or "best" hit based on simple heuristics. Accurate estimation of the mapping quality of NGS reads is therefore critical albeit completely lacking at present. Here we developed a generalized software toolkit "AlignerBoost", which utilizes a Bayesian-based framework to accurately estimate mapping quality of ambiguously mapped NGS reads. We tested AlignerBoost with both simulated and real DNA-seq and RNA-seq datasets at various thresholds. In most cases, but especially for reads falling within repetitive regions, AlignerBoost dramatically increases the mapping precision of modern NGS aligners without significantly compromising the sensitivity even without mapping quality filters. When using higher mapping quality cutoffs, AlignerBoost achieves a much lower false mapping rate while exhibiting comparable or higher sensitivity compared to the aligner default modes, therefore significantly boosting the detection power of NGS aligners even using extreme thresholds. AlignerBoost is also SNP-aware, and higher quality alignments can be achieved if provided with known SNPs. AlignerBoost’s algorithm is computationally efficient, and can process one million alignments within 30 seconds on a typical desktop computer. AlignerBoost is implemented as a uniform Java application and is freely available at https://github.com/Grice-Lab/AlignerBoost. PMID:27706155

  19. PROMALS3D web server for accurate multiple protein sequence and structure alignments.

    PubMed

    Pei, Jimin; Tang, Ming; Grishin, Nick V

    2008-07-01

    Multiple sequence alignments are essential in computational sequence and structural analysis, with applications in homology detection, structure modeling, function prediction and phylogenetic analysis. We report PROMALS3D web server for constructing alignments for multiple protein sequences and/or structures using information from available 3D structures, database homologs and predicted secondary structures. PROMALS3D shows higher alignment accuracy than a number of other advanced methods. Input of PROMALS3D web server can be FASTA format protein sequences, PDB format protein structures and/or user-defined alignment constraints. The output page provides alignments with several formats, including a colored alignment augmented with useful information about sequence grouping, predicted secondary structures and consensus sequences. Intermediate results of sequence and structural database searches are also available. The PROMALS3D web server is available at: http://prodata.swmed.edu/promals3d/. PMID:18503087

  20. Multiple sequence alignment based on combining genetic algorithm with chaotic sequences.

    PubMed

    Gao, C; Wang, B; Zhou, C J; Zhang, Q

    2016-01-01

    In bioinformatics, sequence alignment is one of the most common problems. Multiple sequence alignment is an NP (nondeterministic polynomial time) problem, which requires further study and exploration. The chaos optimization algorithm is a type of chaos theory, and a procedure for combining the genetic algorithm (GA), which uses ergodicity, and inherent randomness of chaotic iteration. It is an efficient method to solve the basic premature phenomenon of the GA. Applying the Logistic map to the GA and using chaotic sequences to carry out the chaotic perturbation can improve the convergence of the basic GA. In addition, the random tournament selection and optimal preservation strategy are used in the GA. Experimental evidence indicates good results for this process. PMID:27420977

  1. Multiple sequence alignment based on combining genetic algorithm with chaotic sequences.

    PubMed

    Gao, C; Wang, B; Zhou, C J; Zhang, Q

    2016-06-24

    In bioinformatics, sequence alignment is one of the most common problems. Multiple sequence alignment is an NP (nondeterministic polynomial time) problem, which requires further study and exploration. The chaos optimization algorithm is a type of chaos theory, and a procedure for combining the genetic algorithm (GA), which uses ergodicity, and inherent randomness of chaotic iteration. It is an efficient method to solve the basic premature phenomenon of the GA. Applying the Logistic map to the GA and using chaotic sequences to carry out the chaotic perturbation can improve the convergence of the basic GA. In addition, the random tournament selection and optimal preservation strategy are used in the GA. Experimental evidence indicates good results for this process.

  2. PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and 3-dimensional structural information

    PubMed Central

    Pei, Jimin; Grishin, Nick V.

    2015-01-01

    SUMMARY Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of 3-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D web server and package are available at http://prodata.swmed.edu/PROMALS3D. PMID:24170408

  3. PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information.

    PubMed

    Pei, Jimin; Grishin, Nick V

    2014-01-01

    Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of three-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D Web server and package are available at http://prodata.swmed.edu/PROMALS3D. PMID:24170408

  4. PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information.

    PubMed

    Pei, Jimin; Grishin, Nick V

    2014-01-01

    Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of three-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D Web server and package are available at http://prodata.swmed.edu/PROMALS3D.

  5. Fast single-pass alignment and variant calling using sequencing data

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...

  6. Score distributions of gapped multiple sequence alignments down to the low-probability tail

    NASA Astrophysics Data System (ADS)

    Fieth, Pascal; Hartmann, Alexander K.

    2016-08-01

    Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.

  7. Score distributions of gapped multiple sequence alignments down to the low-probability tail.

    PubMed

    Fieth, Pascal; Hartmann, Alexander K

    2016-08-01

    Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10^{-160}, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments. PMID:27627266

  8. B-MIC: An Ultrafast Three-Level Parallel Sequence Aligner Using MIC.

    PubMed

    Cui, Yingbo; Liao, Xiangke; Zhu, Xiaoqian; Wang, Bingqiang; Peng, Shaoliang

    2016-03-01

    Sequence alignment is the central process for sequence analysis, where mapping raw sequencing data to reference genome. The large amount of data generated by NGS is far beyond the process capabilities of existing alignment tools. Consequently, sequence alignment becomes the bottleneck of sequence analysis. Intensive computing power is required to address this challenge. Intel recently announced the MIC coprocessor, which can provide massive computing power. The Tianhe-2 is the world's fastest supercomputer now equipped with three MIC coprocessors each compute node. A key feature of sequence alignment is that different reads are independent. Considering this property, we proposed a MIC-oriented three-level parallelization strategy to speed up BWA, a widely used sequence alignment tool, and developed our ultrafast parallel sequence aligner: B-MIC. B-MIC contains three levels of parallelization: firstly, parallelization of data IO and reads alignment by a three-stage parallel pipeline; secondly, parallelization enabled by MIC coprocessor technology; thirdly, inter-node parallelization implemented by MPI. In this paper, we demonstrate that B-MIC outperforms BWA by a combination of those techniques using Inspur NF5280M server and the Tianhe-2 supercomputer. To the best of our knowledge, B-MIC is the first sequence alignment tool to run on Intel MIC and it can achieve more than fivefold speedup over the original BWA while maintaining the alignment precision.

  9. Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost

    PubMed Central

    Yamada, Shinsuke; Gotoh, Osamu; Yamana, Hayato

    2006-01-01

    Background Multiple sequence alignment (MSA) is a useful tool in bioinformatics. Although many MSA algorithms have been developed, there is still room for improvement in accuracy and speed. In the alignment of a family of protein sequences, global MSA algorithms perform better than local ones in many cases, while local ones perform better than global ones when some sequences have long insertions or deletions (indels) relative to others. Many recent leading MSA algorithms have incorporated pairwise alignment information obtained from a mixture of sources into their scoring system to improve accuracy of alignment containing long indels. Results We propose a novel group-to-group sequence alignment algorithm that uses a piecewise linear gap cost. We developed a program called PRIME, which employs our proposed algorithm to optimize the well-defined sum-of-pairs score. PRIME stands for Profile-based Randomized Iteration MEthod. We evaluated PRIME and some recent MSA programs using BAliBASE version 3.0 and PREFAB version 4.0 benchmarks. The results of benchmark tests showed that PRIME can construct accurate alignments comparable to the most accurate programs currently available, including L-INS-i of MAFFT, ProbCons, and T-Coffee. Conclusion PRIME enables users to construct accurate alignments without having to employ pairwise alignment information. PRIME is available at . PMID:17137519

  10. MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

    PubMed

    Katoh, Kazutaka; Standley, Daron M

    2013-04-01

    We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.

  11. Enhanced spatio-temporal alignment of plantar pressure image sequences using B-splines.

    PubMed

    Oliveira, Francisco P M; Tavares, João Manuel R S

    2013-03-01

    This article presents an enhanced methodology to align plantar pressure image sequences simultaneously in time and space. The temporal alignment of the sequences is accomplished using B-splines in the time modeling, and the spatial alignment can be attained using several geometric transformation models. The methodology was tested on a dataset of 156 real plantar pressure image sequences (3 sequences for each foot of the 26 subjects) that was acquired using a common commercial plate during barefoot walking. In the alignment of image sequences that were synthetically deformed both in time and space, an outstanding accuracy was achieved with the cubic B-splines. This accuracy was significantly better (p < 0.001) than the one obtained using the best solution proposed in our previous work. When applied to align real image sequences with unknown transformation involved, the alignment based on cubic B-splines also achieved superior results than our previous methodology (p < 0.001). The consequences of the temporal alignment on the dynamic center of pressure (COP) displacement was also assessed by computing the intraclass correlation coefficients (ICC) before and after the temporal alignment of the three image sequence trials of each foot of the associated subject at six time instants. The results showed that, generally, the ICCs related to the medio-lateral COP displacement were greater when the sequences were temporally aligned than the ICCs of the original sequences. Based on the experimental findings, one can conclude that the cubic B-splines are a remarkable solution for the temporal alignment of plantar pressure image sequences. These findings also show that the temporal alignment can increase the consistency of the COP displacement on related acquired plantar pressure image sequences.

  12. Tcoffee@igs: A web server for computing, evaluating and combining multiple sequence alignments.

    PubMed

    Poirot, Olivier; O'Toole, Eamonn; Notredame, Cedric

    2003-07-01

    This paper presents Tcoffee@igs, a new server provided to the community by Hewlet Packard computers and the Centre National de la Recherche Scientifique. This server is a web-based tool dedicated to the computation, the evaluation and the combination of multiple sequence alignments. It uses the latest version of the T-Coffee package. Given a set of unaligned sequences, the server returns an evaluated multiple sequence alignment and the associated phylogenetic tree. This server also makes it possible to evaluate the local reliability of an existing alignment and to combine several alternative multiple alignments into a single new one. Tcoffee@igs can be used for aligning protein, RNA or DNA sequences. Datasets of up to 100 sequences (2000 residues long) can be processed. The server and its documentation are available from: http://igs-server.cnrs-mrs.fr/Tcoffee/.

  13. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations.

    PubMed

    Abascal, Federico; Zardoya, Rafael; Telford, Maximilian J

    2010-07-01

    We present TranslatorX, a web server designed to align protein-coding nucleotide sequences based on their corresponding amino acid translations. Many comparisons between biological sequences (nucleic acids and proteins) involve the construction of multiple alignments. Alignments represent a statement regarding the homology between individual nucleotides or amino acids within homologous genes. As protein-coding DNA sequences evolve as triplets of nucleotides (codons) and it is known that sequence similarity degrades more rapidly at the DNA than at the amino acid level, alignments are generally more accurate when based on amino acids than on their corresponding nucleotides. TranslatorX novelties include: (i) use of all documented genetic codes and the possibility of assigning different genetic codes for each sequence; (ii) a battery of different multiple alignment programs; (iii) translation of ambiguous codons when possible; (iv) an innovative criterion to clean nucleotide alignments with GBlocks based on protein information; and (v) a rich output, including Jalview-powered graphical visualization of the alignments, codon-based alignments coloured according to the corresponding amino acids, measures of compositional bias and first, second and third codon position specific alignments. The TranslatorX server is freely available at http://translatorx.co.uk.

  14. BarraCUDA - a fast short read sequence aligner using graphics processing units

    PubMed Central

    2012-01-01

    Background With the maturation of next-generation DNA sequencing (NGS) technologies, the throughput of DNA sequencing reads has soared to over 600 gigabases from a single instrument run. General purpose computing on graphics processing units (GPGPU), extracts the computing power from hundreds of parallel stream processors within graphics processing cores and provides a cost-effective and energy efficient alternative to traditional high-performance computing (HPC) clusters. In this article, we describe the implementation of BarraCUDA, a GPGPU sequence alignment software that is based on BWA, to accelerate the alignment of sequencing reads generated by these instruments to a reference DNA sequence. Findings Using the NVIDIA Compute Unified Device Architecture (CUDA) software development environment, we ported the most computational-intensive alignment component of BWA to GPU to take advantage of the massive parallelism. As a result, BarraCUDA offers a magnitude of performance boost in alignment throughput when compared to a CPU core while delivering the same level of alignment fidelity. The software is also capable of supporting multiple CUDA devices in parallel to further accelerate the alignment throughput. Conclusions BarraCUDA is designed to take advantage of the parallelism of GPU to accelerate the alignment of millions of sequencing reads generated by NGS instruments. By doing this, we could, at least in part streamline the current bioinformatics pipeline such that the wider scientific community could benefit from the sequencing technology. BarraCUDA is currently available from http://seqbarracuda.sf.net PMID:22244497

  15. Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties

    PubMed Central

    Neuwald, Andrew F.; Altschul, Stephen F.

    2016-01-01

    We describe a Bayesian Markov chain Monte Carlo (MCMC) sampler for protein multiple sequence alignment (MSA) that, as implemented in the program GISMO and applied to large numbers of diverse sequences, is more accurate than the popular MSA programs MUSCLE, MAFFT, Clustal-Ω and Kalign. Features of GISMO central to its performance are: (i) It employs a “top-down” strategy with a favorable asymptotic time complexity that first identifies regions generally shared by all the input sequences, and then realigns closely related subgroups in tandem. (ii) It infers position-specific gap penalties that favor insertions or deletions (indels) within each sequence at alignment positions in which indels are invoked in other sequences. This favors the placement of insertions between conserved blocks, which can be understood as making up the proteins’ structural core. (iii) It uses a Bayesian statistical measure of alignment quality based on the minimum description length principle and on Dirichlet mixture priors. Consequently, GISMO aligns sequence regions only when statistically justified. This is unlike methods based on the ad hoc, but widely used, sum-of-the-pairs scoring system, which will align random sequences. (iv) It defines a system for exploring alignment space that provides natural avenues for further experimentation through the development of new sampling strategies for more efficiently escaping from suboptimal traps. GISMO’s superior performance is illustrated using 408 protein sets containing, on average, 235 sequences. These sets correspond to NCBI Conserved Domain Database alignments, which have been manually curated in the light of available crystal structures, and thus provide a means to assess alignment accuracy. GISMO fills a different niche than other MSA programs, namely identifying and aligning a conserved domain present within a large, diverse set of full length sequences. The GISMO program is available at http://gismo.igs.umaryland.edu/. PMID

  16. Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties.

    PubMed

    Neuwald, Andrew F; Altschul, Stephen F

    2016-05-01

    We describe a Bayesian Markov chain Monte Carlo (MCMC) sampler for protein multiple sequence alignment (MSA) that, as implemented in the program GISMO and applied to large numbers of diverse sequences, is more accurate than the popular MSA programs MUSCLE, MAFFT, Clustal-Ω and Kalign. Features of GISMO central to its performance are: (i) It employs a "top-down" strategy with a favorable asymptotic time complexity that first identifies regions generally shared by all the input sequences, and then realigns closely related subgroups in tandem. (ii) It infers position-specific gap penalties that favor insertions or deletions (indels) within each sequence at alignment positions in which indels are invoked in other sequences. This favors the placement of insertions between conserved blocks, which can be understood as making up the proteins' structural core. (iii) It uses a Bayesian statistical measure of alignment quality based on the minimum description length principle and on Dirichlet mixture priors. Consequently, GISMO aligns sequence regions only when statistically justified. This is unlike methods based on the ad hoc, but widely used, sum-of-the-pairs scoring system, which will align random sequences. (iv) It defines a system for exploring alignment space that provides natural avenues for further experimentation through the development of new sampling strategies for more efficiently escaping from suboptimal traps. GISMO's superior performance is illustrated using 408 protein sets containing, on average, 235 sequences. These sets correspond to NCBI Conserved Domain Database alignments, which have been manually curated in the light of available crystal structures, and thus provide a means to assess alignment accuracy. GISMO fills a different niche than other MSA programs, namely identifying and aligning a conserved domain present within a large, diverse set of full length sequences. The GISMO program is available at http://gismo.igs.umaryland.edu/. PMID:27192614

  17. Upcoming challenges for multiple sequence alignment methods in the high-throughput era

    PubMed Central

    Kemena, Carsten; Notredame, Cedric

    2009-01-01

    This review focuses on recent trends in multiple sequence alignment tools. It describes the latest algorithmic improvements including the extension of consistency-based methods to the problem of template-based multiple sequence alignments. Some results are presented suggesting that template-based methods are significantly more accurate than simpler alternative methods. The validation of existing methods is also discussed at length with the detailed description of recent results and some suggestions for future validation strategies. The last part of the review addresses future challenges for multiple sequence alignment methods in the genomic era, most notably the need to cope with very large sequences, the need to integrate large amounts of experimental data, the need to accurately align non-coding and non-transcribed sequences and finally, the need to integrate many alternative methods and approaches. Contact: cedric.notredame@crg.es PMID:19648142

  18. RNA-Pareto: interactive analysis of Pareto-optimal RNA sequence-structure alignments.

    PubMed

    Schnattinger, Thomas; Schöning, Uwe; Marchfelder, Anita; Kestler, Hans A

    2013-12-01

    Incorporating secondary structure information into the alignment process improves the quality of RNA sequence alignments. Instead of using fixed weighting parameters, sequence and structure components can be treated as different objectives and optimized simultaneously. The result is not a single, but a Pareto-set of equally optimal solutions, which all represent different possible weighting parameters. We now provide the interactive graphical software tool RNA-Pareto, which allows a direct inspection of all feasible results to the pairwise RNA sequence-structure alignment problem and greatly facilitates the exploration of the optimal solution set.

  19. PFAAT version 2.0: A tool for editing, annotating, and analyzing multiple sequence alignments

    PubMed Central

    Caffrey, Daniel R; Dana, Paul H; Mathur, Vidhya; Ocano, Marco; Hong, Eun-Jong; Wang, Yaoyu E; Somaroo, Shyamal; Caffrey, Brian E; Potluri, Shobha; Huang, Enoch S

    2007-01-01

    Background By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. Results Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple annotations is a key component of this release as it provides a framework for most of the new functionalities. The sequence annotations are accessible from the alignment and tree, where they are typically used to label sequences or hyperlink them to related databases. Sequence annotations can be created manually or extracted automatically from UniProt entries. Once a multiple sequence alignment is populated with sequence annotations, sequences can be easily selected and sorted through a sophisticated search dialog. The selected sequences can be further analyzed using statistical methods that explicitly model relationships between the sequence annotations and residue properties. Residue annotations are accessible from the alignment viewer and are typically used to designate binding sites or properties for a particular residue. Residue annotations are also searchable, and allow one to quickly select alignment columns for further sequence analysis, e.g. computing percent identities. Other features include: novel algorithms to compute sequence conservation, mapping conservation scores to a 3D structure in Jmol, displaying secondary structure elements, and sorting sequences by residue composition. Conclusion PFAAT provides a framework whereby end-users can specify knowledge for a protein family in the form of annotation. The annotations can be combined with sophisticated analysis to test hypothesis that relate to sequence, structure and function. PMID:17931421

  20. DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors

    PubMed Central

    Schmollinger, Martin; Nieselt, Kay; Kaufmann, Michael; Morgenstern, Burkhard

    2004-01-01

    Background Parallel computing is frequently used to speed up computationally expensive tasks in Bioinformatics. Results Herein, a parallel version of the multi-alignment program DIALIGN is introduced. We propose two ways of dividing the program into independent sub-routines that can be run on different processors: (a) pair-wise sequence alignments that are used as a first step to multiple alignment account for most of the CPU time in DIALIGN. Since alignments of different sequence pairs are completely independent of each other, they can be distributed to multiple processors without any effect on the resulting output alignments. (b) For alignments of large genomic sequences, we use a heuristics by splitting up sequences into sub-sequences based on a previously introduced anchored alignment procedure. For our test sequences, this combined approach reduces the program running time of DIALIGN by up to 97%. Conclusions By distributing sub-routines to multiple processors, the running time of DIALIGN can be crucially improved. With these improvements, it is possible to apply the program in large-scale genomics and proteomics projects that were previously beyond its scope. PMID:15357879

  1. Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model

    PubMed Central

    Neuwald, Andrew F; Liu, Jun S

    2004-01-01

    Background Certain protein families are highly conserved across distantly related organisms and belong to large and functionally diverse superfamilies. The patterns of conservation present in these protein sequences presumably are due to selective constraints maintaining important but unknown structural mechanisms with some constraints specific to each family and others shared by a larger subset or by the entire superfamily. To exploit these patterns as a source of functional information, we recently devised a statistically based approach called contrast hierarchical alignment and interaction network (CHAIN) analysis, which infers the strengths of various categories of selective constraints from co-conserved patterns in a multiple alignment. The power of this approach strongly depends on the quality of the multiple alignments, which thus motivated development of theoretical concepts and strategies to improve alignment of conserved motifs within large sets of distantly related sequences. Results Here we describe a hidden Markov model (HMM), an algebraic system, and Markov chain Monte Carlo (MCMC) sampling strategies for alignment of multiple sequence motifs. The MCMC sampling strategies are useful both for alignment optimization and for adjusting position specific background amino acid frequencies for alignment uncertainties. Associated statistical formulations provide an objective measure of alignment quality as well as automatic gap penalty optimization. Improved alignments obtained in this way are compared with PSI-BLAST based alignments within the context of CHAIN analysis of three protein families: Giα subunits, prolyl oligopeptidases, and transitional endoplasmic reticulum (p97) AAA+ ATPases. Conclusion While not entirely replacing PSI-BLAST based alignments, which likewise may be optimized for CHAIN analysis using this approach, these motif-based methods often more accurately align very distantly related sequences and thus can provide a better measure of

  2. IP-MSA: Independent order of progressive multiple sequence alignments using different substitution matrices

    NASA Astrophysics Data System (ADS)

    Boraik, Aziz Nasser; Abdullah, Rosni; Venkat, Ibrahim

    2014-12-01

    Multiple sequence alignment (MSA) is an essential process for many biological sequence analyses. There are many algorithms developed to solve MSA, but an efficient computation method with very high accuracy is still a challenge. Progressive alignment is the most widely used approach to compute the final MSA. In this paper, we present a simple and effective progressive approach. Based on the independent order of sequences progressive alignment which proposed in QOMA, this method has been modified to align the whole sequences to maximize the score of MSA. Moreover, in order to further improve the accuracy of the method, we estimate the similarity of any pair of input sequences by using their percent identity, and based on this measure, we choose different substitution matrices during the progressive alignment. In addition, we have included horizontal information to alignment by adjusting the weights of amino acid residues based on their neighboring residues. The experimental results have been tested on popular benchmark of global protein sequences BAliBASE 3.0 and local protein sequences IRMBASE 2.0. The results of the proposed approach outperform the original method in QOMA in terms of sum-of-pair score and column score by up to 14% and 7% respectively.

  3. Manipulating multiple sequence alignments via MaM and WebMaM

    PubMed Central

    Alkan, Can; Tüzün, Eray; Buard, Jerome; Lethiec, Franck; Eichler, Evan E.; Bailey, Jeffrey A.; Sahinalp, S. Cenk

    2005-01-01

    MaM is a software tool that processes and manipulates multiple alignments of genomic sequence. MaM computes the exact location of common repeat elements, exons and unique regions within aligned genomics sequences using a variety of user identified programs, databases and/or tables. The program can extract subalignments, corresponding to these various regions of DNA to be analyzed independently or in conjunction with other elements of genomic DNA. Graphical displays further allow an assessment of sequence variation throughout these different regions of the aligned sequence, providing separate displays for their repeat, non-repeat and coding portions of genomic DNA. The program should facilitate the phylogenetic analysis and processing of different portions of genomic sequence as part of large-scale sequencing efforts. MaM source code is freely available for non-commercial use at ; and the web interface WebMaM is hosted at . PMID:15980474

  4. Predicting and improving the protein sequence alignment quality by support vector regression

    PubMed Central

    Lee, Minho; Jeong, Chan-seok; Kim, Dongsup

    2007-01-01

    Background For successful protein structure prediction by comparative modeling, in addition to identifying a good template protein with known structure, obtaining an accurate sequence alignment between a query protein and a template protein is critical. It has been known that the alignment accuracy can vary significantly depending on our choice of various alignment parameters such as gap opening penalty and gap extension penalty. Because the accuracy of sequence alignment is typically measured by comparing it with its corresponding structure alignment, there is no good way of evaluating alignment accuracy without knowing the structure of a query protein, which is obviously not available at the time of structure prediction. Moreover, there is no universal alignment parameter option that would always yield the optimal alignment. Results In this work, we develop a method to predict the quality of the alignment between a query and a template. We train the support vector regression (SVR) models to predict the MaxSub scores as a measure of alignment quality. The alignment between a query protein and a template of length n is transformed into a (n + 1)-dimensional feature vector, then it is used as an input to predict the alignment quality by the trained SVR model. Performance of our work is evaluated by various measures including Pearson correlation coefficient between the observed and predicted MaxSub scores. Result shows high correlation coefficient of 0.945. For a pair of query and template, 48 alignments are generated by changing alignment options. Trained SVR models are then applied to predict the MaxSub scores of those and to select the best alignment option which is chosen specifically to the query-template pair. This adaptive selection procedure results in 7.4% improvement of MaxSub scores, compared to those when the single best parameter option is used for all query-template pairs. Conclusion The present work demonstrates that the alignment quality can be

  5. Predicting the accuracy of multiple sequence alignment algorithms by using computational intelligent techniques

    PubMed Central

    Ortuño, Francisco M.; Valenzuela, Olga; Pomares, Hector; Rojas, Fernando; Florido, Javier P.; Urquiza, Jose M.

    2013-01-01

    Multiple sequence alignments (MSAs) have become one of the most studied approaches in bioinformatics to perform other outstanding tasks such as structure prediction, biological function analysis or next-generation sequencing. However, current MSA algorithms do not always provide consistent solutions, since alignments become increasingly difficult when dealing with low similarity sequences. As widely known, these algorithms directly depend on specific features of the sequences, causing relevant influence on the alignment accuracy. Many MSA tools have been recently designed but it is not possible to know in advance which one is the most suitable for a particular set of sequences. In this work, we analyze some of the most used algorithms presented in the bibliography and their dependences on several features. A novel intelligent algorithm based on least square support vector machine is then developed to predict how accurate each alignment could be, depending on its analyzed features. This algorithm is performed with a dataset of 2180 MSAs. The proposed system first estimates the accuracy of possible alignments. The most promising methodologies are then selected in order to align each set of sequences. Since only one selected algorithm is run, the computational time is not excessively increased. PMID:23066102

  6. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    PubMed

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing.

  7. PhyPA: Phylogenetic method with pairwise sequence alignment outperforms likelihood methods in phylogenetics involving highly diverged sequences.

    PubMed

    Xia, Xuhua

    2016-09-01

    While pairwise sequence alignment (PSA) by dynamic programming is guaranteed to generate one of the optimal alignments, multiple sequence alignment (MSA) of highly divergent sequences often results in poorly aligned sequences, plaguing all subsequent phylogenetic analysis. One way to avoid this problem is to use only PSA to reconstruct phylogenetic trees, which can only be done with distance-based methods. I compared the accuracy of this new computational approach (named PhyPA for phylogenetics by pairwise alignment) against the maximum likelihood method using MSA (the ML+MSA approach), based on nucleotide, amino acid and codon sequences simulated with different topologies and tree lengths. I present a surprising discovery that the fast PhyPA method consistently outperforms the slow ML+MSA approach for highly diverged sequences even when all optimization options were turned on for the ML+MSA approach. Only when sequences are not highly diverged (i.e., when a reliable MSA can be obtained) does the ML+MSA approach outperforms PhyPA. The true topologies are always recovered by ML with the true alignment from the simulation. However, with MSA derived from alignment programs such as MAFFT or MUSCLE, the recovered topology consistently has higher likelihood than that for the true topology. Thus, the failure to recover the true topology by the ML+MSA is not because of insufficient search of tree space, but by the distortion of phylogenetic signal by MSA methods. I have implemented in DAMBE PhyPA and two approaches making use of multi-gene data sets to derive phylogenetic support for subtrees equivalent to resampling techniques such as bootstrapping and jackknifing. PMID:27377322

  8. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics

    PubMed Central

    Will, Sebastian; Otto, Christina; Miladi, Milad; Möhl, Mathias; Backofen, Rolf

    2015-01-01

    Motivation: RNA-Seq experiments have revealed a multitude of novel ncRNAs. The gold standard for their analysis based on simultaneous alignment and folding suffers from extreme time complexity of O(n6). Subsequently, numerous faster ‘Sankoff-style’ approaches have been suggested. Commonly, the performance of such methods relies on sequence-based heuristics that restrict the search space to optimal or near-optimal sequence alignments; however, the accuracy of sequence-based methods breaks down for RNAs with sequence identities below 60%. Alignment approaches like LocARNA that do not require sequence-based heuristics, have been limited to high complexity (≥ quartic time). Results: Breaking this barrier, we introduce the novel Sankoff-style algorithm ‘sparsified prediction and alignment of RNAs based on their structure ensembles (SPARSE)’, which runs in quadratic time without sequence-based heuristics. To achieve this low complexity, on par with sequence alignment algorithms, SPARSE features strong sparsification based on structural properties of the RNA ensembles. Following PMcomp, SPARSE gains further speed-up from lightweight energy computation. Although all existing lightweight Sankoff-style methods restrict Sankoff’s original model by disallowing loop deletions and insertions, SPARSE transfers the Sankoff algorithm to the lightweight energy model completely for the first time. Compared with LocARNA, SPARSE achieves similar alignment and better folding quality in significantly less time (speedup: 3.7). At similar run-time, it aligns low sequence identity instances substantially more accurate than RAF, which uses sequence-based heuristics. Availability and implementation: SPARSE is freely available at http://www.bioinf.uni-freiburg.de/Software/SPARSE. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25838465

  9. Who watches the watchmen? An appraisal of benchmarks for multiple sequence alignment.

    PubMed

    Iantorno, Stefano; Gori, Kevin; Goldman, Nick; Gil, Manuel; Dessimoz, Christophe

    2014-01-01

    Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies-based on simulation, consistency, protein structure, and phylogeny-and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application-with a keen awareness of the assumptions underlying each benchmarking strategy.

  10. A statistical physics perspective on alignment-independent protein sequence comparison

    PubMed Central

    Chattopadhyay, Amit K.; Nasiev, Diar; Flower, Darren R.

    2015-01-01

    Motivation: Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative success of modern-day sequence alignment algorithms, so-called alignment-free approaches offer a complementary means of determining and expressing similarity, with potential benefits in certain key applications, such as regression analysis of protein structure-function studies, where alignment-base similarity has performed poorly. Results: Here, we offer a fresh, statistical physics-based perspective focusing on the question of alignment-free comparison, in the process adapting results from ‘first passage probability distribution’ to summarize statistics of ensemble averaged amino acid propensity values. In this article, we introduce and elaborate this approach. Contact: d.r.flower@aston.ac.uk PMID:25810434

  11. Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference

    PubMed Central

    Tan, Ge; Muffato, Matthieu; Ledergerber, Christian; Herrero, Javier; Goldman, Nick; Gil, Manuel; Dessimoz, Christophe

    2015-01-01

    Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms. PMID:26031838

  12. Current Methods for Automated Filtering of Multiple Sequence Alignments Frequently Worsen Single-Gene Phylogenetic Inference.

    PubMed

    Tan, Ge; Muffato, Matthieu; Ledergerber, Christian; Herrero, Javier; Goldman, Nick; Gil, Manuel; Dessimoz, Christophe

    2015-09-01

    Phylogenetic inference is generally performed on the basis of multiple sequence alignments (MSA). Because errors in an alignment can lead to errors in tree estimation, there is a strong interest in identifying and removing unreliable parts of the alignment. In recent years several automated filtering approaches have been proposed, but despite their popularity, a systematic and comprehensive comparison of different alignment filtering methods on real data has been lacking. Here, we extend and apply recently introduced phylogenetic tests of alignment accuracy on a large number of gene families and contrast the performance of unfiltered versus filtered alignments in the context of single-gene phylogeny reconstruction. Based on multiple genome-wide empirical and simulated data sets, we show that the trees obtained from filtered MSAs are on average worse than those obtained from unfiltered MSAs. Furthermore, alignment filtering often leads to an increase in the proportion of well-supported branches that are actually wrong. We confirm that our findings hold for a wide range of parameters and methods. Although our results suggest that light filtering (up to 20% of alignment positions) has little impact on tree accuracy and may save some computation time, contrary to widespread practice, we do not generally recommend the use of current alignment filtering methods for phylogenetic inference. By providing a way to rigorously and systematically measure the impact of filtering on alignments, the methodology set forth here will guide the development of better filtering algorithms. PMID:26031838

  13. SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data.

    PubMed

    Abuín, José M; Pichel, Juan C; Pena, Tomás F; Amigo, Jorge

    2016-01-01

    Next-generation sequencing (NGS) technologies have led to a huge amount of genomic data that need to be analyzed and interpreted. This fact has a huge impact on the DNA sequence alignment process, which nowadays requires the mapping of billions of small DNA sequences onto a reference genome. In this way, sequence alignment remains the most time-consuming stage in the sequence analysis workflow. To deal with this issue, state of the art aligners take advantage of parallelization strategies. However, the existent solutions show limited scalability and have a complex implementation. In this work we introduce SparkBWA, a new tool that exploits the capabilities of a big data technology as Spark to boost the performance of one of the most widely adopted aligner, the Burrows-Wheeler Aligner (BWA). The design of SparkBWA uses two independent software layers in such a way that no modifications to the original BWA source code are required, which assures its compatibility with any BWA version (future or legacy). SparkBWA is evaluated in different scenarios showing noticeable results in terms of performance and scalability. A comparison to other parallel BWA-based aligners validates the benefits of our approach. Finally, an intuitive and flexible API is provided to NGS professionals in order to facilitate the acceptance and adoption of the new tool. The source code of the software described in this paper is publicly available at https://github.com/citiususc/SparkBWA, with a GPL3 license. PMID:27182962

  14. BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations.

    PubMed

    Bahr, A; Thompson, J D; Thierry, J C; Poch, O

    2001-01-01

    BAliBASE is specifically designed to serve as an evaluation resource to address all the problems encountered when aligning complete sequences. The database contains high quality, manually constructed multiple sequence alignments together with detailed annotations. The alignments are all based on three-dimensional structural superpositions, with the exception of the transmembrane sequences. The first release provided sets of reference alignments dealing with the problems of high variability, unequal repartition and large N/C-terminal extensions and internal insertions. Here we describe version 2.0 of the database, which incorporates three new reference sets of alignments containing structural repeats, trans-membrane sequences and circular permutations to evaluate the accuracy of detection/prediction and alignment of these complex sequences. BAliBASE can be viewed at the web site http://www-igbmc.u-strasbg. fr/BioInfo/BAliBASE2/index.html or can be downloaded from ftp://ftp-igbmc.u-strasbg.fr/pub/BAliBASE2 /.

  15. SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data

    PubMed Central

    2016-01-01

    Next-generation sequencing (NGS) technologies have led to a huge amount of genomic data that need to be analyzed and interpreted. This fact has a huge impact on the DNA sequence alignment process, which nowadays requires the mapping of billions of small DNA sequences onto a reference genome. In this way, sequence alignment remains the most time-consuming stage in the sequence analysis workflow. To deal with this issue, state of the art aligners take advantage of parallelization strategies. However, the existent solutions show limited scalability and have a complex implementation. In this work we introduce SparkBWA, a new tool that exploits the capabilities of a big data technology as Spark to boost the performance of one of the most widely adopted aligner, the Burrows-Wheeler Aligner (BWA). The design of SparkBWA uses two independent software layers in such a way that no modifications to the original BWA source code are required, which assures its compatibility with any BWA version (future or legacy). SparkBWA is evaluated in different scenarios showing noticeable results in terms of performance and scalability. A comparison to other parallel BWA-based aligners validates the benefits of our approach. Finally, an intuitive and flexible API is provided to NGS professionals in order to facilitate the acceptance and adoption of the new tool. The source code of the software described in this paper is publicly available at https://github.com/citiususc/SparkBWA, with a GPL3 license. PMID:27182962

  16. BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark.

    PubMed

    Thompson, Julie D; Koehl, Patrice; Ripp, Raymond; Poch, Olivier

    2005-10-01

    Multiple sequence alignment is one of the cornerstones of modern molecular biology. It is used to identify conserved motifs, to determine protein domains, in 2D/3D structure prediction by homology and in evolutionary studies. Recently, high-throughput technologies such as genome sequencing and structural proteomics have lead to an explosion in the amount of sequence and structure information available. In response, several new multiple alignment methods have been developed that improve both the efficiency and the quality of protein alignments. Consequently, the benchmarks used to evaluate and compare these methods must also evolve. We present here the latest release of the most widely used multiple alignment benchmark, BAliBASE, which provides high quality, manually refined, reference alignments based on 3D structural superpositions. Version 3.0 of BAliBASE includes new, more challenging test cases, representing the real problems encountered when aligning large sets of complex sequences. Using a novel, semiautomatic update protocol, the number of protein families in the benchmark has been increased and representative test cases are now available that cover most of the protein fold space. The total number of proteins in BAliBASE has also been significantly increased from 1444 to 6255 sequences. In addition, full-length sequences are now provided for all test cases, which represent difficult cases for both global and local alignment programs. Finally, the BAliBASE Web site (http://www-bio3d-igbmc.u-strasbg.fr/balibase) has been completely redesigned to provide a more user-friendly, interactive interface for the visualization of the BAliBASE reference alignments and the associated annotations.

  17. SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes

    PubMed Central

    Pruesse, Elmar; Peplies, Jörg; Glöckner, Frank Oliver

    2012-01-01

    Motivation: In the analysis of homologous sequences, computation of multiple sequence alignments (MSAs) has become a bottleneck. This is especially troublesome for marker genes like the ribosomal RNA (rRNA) where already millions of sequences are publicly available and individual studies can easily produce hundreds of thousands of new sequences. Methods have been developed to cope with such numbers, but further improvements are needed to meet accuracy requirements. Results: In this study, we present the SILVA Incremental Aligner (SINA) used to align the rRNA gene databases provided by the SILVA ribosomal RNA project. SINA uses a combination of k-mer searching and partial order alignment (POA) to maintain very high alignment accuracy while satisfying high throughput performance demands. SINA was evaluated in comparison with the commonly used high throughput MSA programs PyNAST and mothur. The three BRAliBase III benchmark MSAs could be reproduced with 99.3, 97.6 and 96.1 accuracy. A larger benchmark MSA comprising 38 772 sequences could be reproduced with 98.9 and 99.3% accuracy using reference MSAs comprising 1000 and 5000 sequences. SINA was able to achieve higher accuracy than PyNAST and mothur in all performed benchmarks. Availability: Alignment of up to 500 sequences using the latest SILVA SSU/LSU Ref datasets as reference MSA is offered at http://www.arb-silva.de/aligner. This page also links to Linux binaries, user manual and tutorial. SINA is made available under a personal use license. Contact: epruesse@mpi-bremen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22556368

  18. Parasail: SIMD C library for global, semi-global, and local pairwise sequence alignments

    DOE PAGESBeta

    Daily, Jeffrey A.

    2016-02-10

    Sequence alignment algorithms are a key component of many bioinformatics applications. Though various fast Smith-Waterman local sequence alignment implementations have been developed for x86 CPUs, most are embedded into larger database search tools. In addition, fast implementations of Needleman-Wunsch global sequence alignment and its semi-global variants are not as widespread. This article presents the first software library for local, global, and semi-global pairwise intra-sequence alignments and improves the performance of previous intra-sequence implementations. As a result, a faster intra-sequence pairwise alignment implementation is described and benchmarked. Using a 375 residue query sequence a speed of 136 billion cell updates permore » second (GCUPS) was achieved on a dual Intel Xeon E5-2670 12-core processor system, the highest reported for an implementation based on Farrar’s ’striped’ approach. When using only a single thread, parasail was 1.7 times faster than Rognes’s SWIPE. For many score matrices, parasail is faster than BLAST. The software library is designed for 64 bit Linux, OS X, or Windows on processors with SSE2, SSE41, or AVX2. Source code is available from https://github.com/jeffdaily/parasail under the Battelle BSD-style license. In conclusion, applications that require optimal alignment scores could benefit from the improved performance. For the first time, SIMD global, semi-global, and local alignments are available in a stand-alone C library.« less

  19. Support for linguistic macrofamilies from weighted sequence alignment.

    PubMed

    Jäger, Gerhard

    2015-10-13

    Computational phylogenetics is in the process of revolutionizing historical linguistics. Recent applications have shed new light on controversial issues, such as the location and time depth of language families and the dynamics of their spread. So far, these approaches have been limited to single-language families because they rely on a large body of expert cognacy judgments or grammatical classifications, which is currently unavailable for most language families. The present study pursues a different approach. Starting from raw phonetic transcription of core vocabulary items from very diverse languages, it applies weighted string alignment to track both phonetic and lexical change. Applied to a collection of ∼1,000 Eurasian languages and dialects, this method, combined with phylogenetic inference, leads to a classification in excellent agreement with established findings of historical linguistics. Furthermore, it provides strong statistical support for several putative macrofamilies contested in current historical linguistics. In particular, there is a solid signal for the Nostratic/Eurasiatic macrofamily.

  20. Support for linguistic macrofamilies from weighted sequence alignment

    PubMed Central

    Jäger, Gerhard

    2015-01-01

    Computational phylogenetics is in the process of revolutionizing historical linguistics. Recent applications have shed new light on controversial issues, such as the location and time depth of language families and the dynamics of their spread. So far, these approaches have been limited to single-language families because they rely on a large body of expert cognacy judgments or grammatical classifications, which is currently unavailable for most language families. The present study pursues a different approach. Starting from raw phonetic transcription of core vocabulary items from very diverse languages, it applies weighted string alignment to track both phonetic and lexical change. Applied to a collection of ∼1,000 Eurasian languages and dialects, this method, combined with phylogenetic inference, leads to a classification in excellent agreement with established findings of historical linguistics. Furthermore, it provides strong statistical support for several putative macrofamilies contested in current historical linguistics. In particular, there is a solid signal for the Nostratic/Eurasiatic macrofamily. PMID:26403857

  1. SARA-Coffee web server, a tool for the computation of RNA sequence and structure multiple alignments

    PubMed Central

    Di Tommaso, Paolo; Bussotti, Giovanni; Kemena, Carsten; Capriotti, Emidio; Chatzou, Maria; Prieto, Pablo; Notredame, Cedric

    2014-01-01

    This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for every sequence, column and aligned residue. SARA-Coffee combines SARA, a pairwise structural RNA aligner with the R-Coffee multiple RNA aligner in a way that has been shown to improve alignment accuracy over most sequence aligners when enough structural data is available. The server can be accessed from http://tcoffee.crg.cat/apps/tcoffee/do:saracoffee. PMID:24972831

  2. SARA-Coffee web server, a tool for the computation of RNA sequence and structure multiple alignments.

    PubMed

    Di Tommaso, Paolo; Bussotti, Giovanni; Kemena, Carsten; Capriotti, Emidio; Chatzou, Maria; Prieto, Pablo; Notredame, Cedric

    2014-07-01

    This article introduces the SARA-Coffee web server; a service allowing the online computation of 3D structure based multiple RNA sequence alignments. The server makes it possible to combine sequences with and without known 3D structures. Given a set of sequences SARA-Coffee outputs a multiple sequence alignment along with a reliability index for every sequence, column and aligned residue. SARA-Coffee combines SARA, a pairwise structural RNA aligner with the R-Coffee multiple RNA aligner in a way that has been shown to improve alignment accuracy over most sequence aligners when enough structural data is available. The server can be accessed from http://tcoffee.crg.cat/apps/tcoffee/do:saracoffee.

  3. Skeleton-based human action recognition using multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Ding, Wenwen; Liu, Kai; Cheng, Fei; Zhang, Jin; Li, YunSong

    2015-05-01

    Human action recognition and analysis is an active research topic in computer vision for many years. This paper presents a method to represent human actions based on trajectories consisting of 3D joint positions. This method first decompose action into a sequence of meaningful atomic actions (actionlets), and then label actionlets with English alphabets according to the Davies-Bouldin index value. Therefore, an action can be represented using a sequence of actionlet symbols, which will preserve the temporal order of occurrence of each of the actionlets. Finally, we employ sequence comparison to classify multiple actions through using string matching algorithms (Needleman-Wunsch). The effectiveness of the proposed method is evaluated on datasets captured by commodity depth cameras. Experiments of the proposed method on three challenging 3D action datasets show promising results.

  4. PyMod: sequence similarity searches, multiple sequence-structure alignments, and homology modeling within PyMOL

    PubMed Central

    2012-01-01

    Background In recent years, an exponential growing number of tools for protein sequence analysis, editing and modeling tasks have been put at the disposal of the scientific community. Despite the vast majority of these tools have been released as open source software, their deep learning curves often discourages even the most experienced users. Results A simple and intuitive interface, PyMod, between the popular molecular graphics system PyMOL and several other tools (i.e., [PSI-]BLAST, ClustalW, MUSCLE, CEalign and MODELLER) has been developed, to show how the integration of the individual steps required for homology modeling and sequence/structure analysis within the PyMOL framework can hugely simplify these tasks. Sequence similarity searches, multiple sequence and structural alignments generation and editing, and even the possibility to merge sequence and structure alignments have been implemented in PyMod, with the aim of creating a simple, yet powerful tool for sequence and structure analysis and building of homology models. Conclusions PyMod represents a new tool for the analysis and the manipulation of protein sequences and structures. The ease of use, integration with many sequence retrieving and alignment tools and PyMOL, one of the most used molecular visualization system, are the key features of this tool. Source code, installation instructions, video tutorials and a user's guide are freely available at the URL http://schubert.bio.uniroma1.it/pymod/index.html PMID:22536966

  5. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    PubMed

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time. PMID:22254462

  6. Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

    PubMed

    Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

    2011-01-01

    The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.

  7. Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment

    SciTech Connect

    Lawrence, C.E.; Altschul, S.F.; Boguski, M.S.; Neuwald, A.F.; Wootton, J.C. ); Liu, J.S. )

    1993-10-08

    A wealth of protein and DNA sequence data is being generated by genome projects and other sequencing efforts. A crucial barrier to deciphering these sequences and understanding the relations among them is the difficulty of detecting subtle local residue patterns common to multiple sequences. Such patterns frequently reflect similar molecular structures and biological properties. A mathematical definition of this [open quotes]local multiple alignment[close quotes] problem suitable for full computer automation has been used to develop a new and sensitive algorithm, based on the statistical method of iterative sampling. This algorithm finds an optimized local alignment model for N sequences in N-linear time, requiring only seconds on current workstations, and allows the simultaneous detection and optimization of multiple patterns and pattern repeats. The method is illustrated as applied to helixturn-helix proteins, lipocalins, and prenyltransferases.

  8. Spaced words and kmacs: fast alignment-free sequence comparison based on inexact word matches.

    PubMed

    Horwege, Sebastian; Lindner, Sebastian; Boden, Marcus; Hatje, Klas; Kollmar, Martin; Leimeister, Chris-André; Morgenstern, Burkhard

    2014-07-01

    In this article, we present a user-friendly web interface for two alignment-free sequence-comparison methods that we recently developed. Most alignment-free methods rely on exact word matches to estimate pairwise similarities or distances between the input sequences. By contrast, our new algorithms are based on inexact word matches. The first of these approaches uses the relative frequencies of so-called spaced words in the input sequences, i.e. words containing 'don't care' or 'wildcard' symbols at certain pre-defined positions. Various distance measures can then be defined on sequences based on their different spaced-word composition. Our second approach defines the distance between two sequences by estimating for each position in the first sequence the length of the longest substring at this position that also occurs in the second sequence with up to k mismatches. Both approaches take a set of deoxyribonucleic acid (DNA) or protein sequences as input and return a matrix of pairwise distance values that can be used as a starting point for clustering algorithms or distance-based phylogeny reconstruction. The two alignment-free programmes are accessible through a web interface at 'Göttingen Bioinformatics Compute Server (GOBICS)': http://spaced.gobics.de http://kmacs.gobics.de and the source codes can be downloaded.

  9. Aptaligner: automated software for aligning pseudorandom DNA X-aptamers from next-generation sequencing data.

    PubMed

    Lu, Emily; Elizondo-Riojas, Miguel-Angel; Chang, Jeffrey T; Volk, David E

    2014-06-10

    Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects.

  10. Mulan: Multiple-Sequence Local Alignment and Visualization for Studying Function and Evolution

    SciTech Connect

    Ovcharenko, I; Loots, G; Giardine, B; Hou, M; Ma, J; Hardison, R; Stubbs, L; Miller, W

    2004-07-14

    Multiple sequence alignment analysis is a powerful approach for understanding phylogenetic relationships, annotating genes and detecting functional regulatory elements. With a growing number of partly or fully sequenced vertebrate genomes, effective tools for performing multiple comparisons are required to accurately and efficiently assist biological discoveries. Here we introduce Mulan (http://mulan.dcode.org/), a novel method and a network server for comparing multiple draft and finished-quality sequences to identify functional elements conserved over evolutionary time. Mulan brings together several novel algorithms: the tba multi-aligner program for rapid identification of local sequence conservation and the multiTF program for detecting evolutionarily conserved transcription factor binding sites in multiple alignments. In addition, Mulan supports two-way communication with the GALA database; alignments of multiple species dynamically generated in GALA can be viewed in Mulan, and conserved transcription factor binding sites identified with Mulan/multiTF can be integrated and overlaid with extensive genome annotation data using GALA. Local multiple alignments computed by Mulan ensure reliable representation of short-and large-scale genomic rearrangements in distant organisms. Mulan allows for interactive modification of critical conservation parameters to differentially predict conserved regions in comparisons of both closely and distantly related species. We illustrate the uses and applications of the Mulan tool through multi-species comparisons of the GATA3 gene locus and the identification of elements that are conserved differently in avians than in other genomes allowing speculation on the evolution of birds. Source code for the aligners and the aligner-evaluation software can be freely downloaded from http://bio.cse.psu.edu/.

  11. SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.

    PubMed

    Hagopian, Raffi; Davidson, John R; Datta, Ruchira S; Samad, Bushra; Jarvis, Glen R; Sjölander, Kimmen

    2010-07-01

    We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/.

  12. ESPERR: learning strong and weak signals in genomic sequence alignments to identify functional elements.

    PubMed

    Taylor, James; Tyekucheva, Svitlana; King, David C; Hardison, Ross C; Miller, Webb; Chiaromonte, Francesca

    2006-12-01

    Genomic sequence signals - such as base composition, presence of particular motifs, or evolutionary constraint - have been used effectively to identify functional elements. However, approaches based only on specific signals known to correlate with function can be quite limiting. When training data are available, application of computational learning algorithms to multispecies alignments has the potential to capture broader and more informative sequence and evolutionary patterns that better characterize a class of elements. However, effective exploitation of patterns in multispecies alignments is impeded by the vast number of possible alignment columns and by a limited understanding of which particular strings of columns may characterize a given class. We have developed a computational method, called ESPERR (evolutionary and sequence pattern extraction through reduced representations), which uses training examples to learn encodings of multispecies alignments into reduced forms tailored for the prediction of chosen classes of functional elements. ESPERR produces a greatly improved Regulatory Potential score, which can discriminate regulatory regions from neutral sites with excellent accuracy ( approximately 94%). This score captures strong signals (GC content and conservation), as well as subtler signals (with small contributions from many different alignment patterns) that characterize the regulatory elements in our training set. ESPERR is also effective for predicting other classes of functional elements, as we show for DNaseI hypersensitive sites and highly conserved regions with developmental enhancer activity. Our software, training data, and genome-wide predictions are available from our Web site (http://www.bx.psu.edu/projects/esperr).

  13. Spial: analysis of subtype-specific features in multiple sequence alignments of proteins

    PubMed Central

    Wuster, Arthur; Venkatakrishnan, A. J.; Schertler, Gebhard F. X.; Babu, M. Madan

    2010-01-01

    Motivation: Spial (Specificity in alignments) is a tool for the comparative analysis of two alignments of evolutionarily related sequences that differ in their function, such as two receptor subtypes. It highlights functionally important residues that are either specific to one of the two alignments or conserved across both alignments. It permits visualization of this information in three complementary ways: by colour-coding alignment positions, by sequence logos and optionally by colour-coding the residues of a protein structure provided by the user. This can aid in the detection of residues that are involved in the subtype-specific interaction with a ligand, other proteins or nucleic acids. Spial may also be used to detect residues that may be post-translationally modified in one of the two sets of sequences. Availability: http://www.mrc-lmb.cam.ac.uk/genomes/spial/; supplementary information is available at http://www.mrc-lmb.cam.ac.uk/genomes/spial/help.html Contact: ajv@mrc-lmb.cam.ac.uk PMID:20880955

  14. Multiple sequence alignment with arbitrary gap costs: computing an optimal solution using polyhedral combinatorics.

    PubMed

    Althaus, Ernst; Caprara, Alberto; Lenhof, Hans-Peter; Reinert, Knut

    2002-01-01

    Multiple sequence alignment is one of the dominant problems in computational molecular biology. Numerous scoring functions and methods have been proposed, most of which result in NP-hard problems. In this paper we propose for the first time a general formulation for multiple alignment with arbitrary gap-costs based on an integer linear program (ILP). In addition we describe a branch-and-cut algorithm to effectively solve the ILP to optimality. We evaluate the performances of our approach in terms of running time and quality of the alignments using the BAliBase database of reference alignments. The results show that our implementation ranks amongst the best programs developed so far.

  15. Multiple sequence alignment algorithm based on a dispersion graph and ant colony algorithm.

    PubMed

    Chen, Weiyang; Liao, Bo; Zhu, Wen; Xiang, Xuyu

    2009-10-01

    In this article, we describe a representation for the processes of multiple sequences alignment (MSA) and used it to solve the problem of MSA. By this representation, we took every possible aligning result into account by defining the representation of gap insertion, the value of heuristic information in every optional path and scoring rule. On the basis of the proposed multidimensional graph, we used the ant colony algorithm to find the better path that denotes a better aligning result. In our article, we proposed the instance of three-dimensional graph and four-dimensional graph and advanced a special ichnographic representation to analyze MSA. It is yet only an experimental software, and we gave an example for finding the best aligning result by three-dimensional graph and ant colony algorithm. Experimental results show that our method can improve the solution quality on MSA benchmarks. PMID:19130503

  16. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles). PMID:26819408

  17. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments.

    PubMed

    Schwarz, Roland F; Tamuri, Asif U; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M; Schultz, Jörg; Goldman, Nick

    2016-05-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles).

  18. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments

    PubMed Central

    Schwarz, Roland F.; Tamuri, Asif U.; Kultys, Marek; King, James; Godwin, James; Florescu, Ana M.; Schultz, Jörg; Goldman, Nick

    2016-01-01

    Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles). PMID:26819408

  19. A phylogenetic analysis of the brassicales clade based on an alignment-free sequence comparison method.

    PubMed

    Hatje, Klas; Kollmar, Martin

    2012-01-01

    Phylogenetic analyses reveal the evolutionary derivation of species. A phylogenetic tree can be inferred from multiple sequence alignments of proteins or genes. The alignment of whole genome sequences of higher eukaryotes is a computational intensive and ambitious task as is the computation of phylogenetic trees based on these alignments. To overcome these limitations, we here used an alignment-free method to compare genomes of the Brassicales clade. For each nucleotide sequence a Chaos Game Representation (CGR) can be computed, which represents each nucleotide of the sequence as a point in a square defined by the four nucleotides as vertices. Each CGR is therefore a unique fingerprint of the underlying sequence. If the CGRs are divided by grid lines each grid square denotes the occurrence of oligonucleotides of a specific length in the sequence (Frequency Chaos Game Representation, FCGR). Here, we used distance measures between FCGRs to infer phylogenetic trees of Brassicales species. Three types of data were analyzed because of their different characteristics: (A) Whole genome assemblies as far as available for species belonging to the Malvidae taxon. (B) EST data of species of the Brassicales clade. (C) Mitochondrial genomes of the Rosids branch, a supergroup of the Malvidae. The trees reconstructed based on the Euclidean distance method are in general agreement with single gene trees. The Fitch-Margoliash and Neighbor joining algorithms resulted in similar to identical trees. Here, for the first time we have applied the bootstrap re-sampling concept to trees based on FCGRs to determine the support of the branchings. FCGRs have the advantage that they are fast to calculate, and can be used as additional information to alignment based data and morphological characteristics to improve the phylogenetic classification of species in ambiguous cases.

  20. Assessing Activity Pattern Similarity with Multidimensional Sequence Alignment based on a Multiobjective Optimization Evolutionary Algorithm

    PubMed Central

    Kwan, Mei-Po; Xiao, Ningchuan; Ding, Guoxiang

    2015-01-01

    Due to the complexity and multidimensional characteristics of human activities, assessing the similarity of human activity patterns and classifying individuals with similar patterns remains highly challenging. This paper presents a new and unique methodology for evaluating the similarity among individual activity patterns. It conceptualizes multidimensional sequence alignment (MDSA) as a multiobjective optimization problem, and solves this problem with an evolutionary algorithm. The study utilizes sequence alignment to code multiple facets of human activities into multidimensional sequences, and to treat similarity assessment as a multiobjective optimization problem that aims to minimize the alignment cost for all dimensions simultaneously. A multiobjective optimization evolutionary algorithm (MOEA) is used to generate a diverse set of optimal or near-optimal alignment solutions. Evolutionary operators are specifically designed for this problem, and a local search method also is incorporated to improve the search ability of the algorithm. We demonstrate the effectiveness of our method by comparing it with a popular existing method called ClustalG using a set of 50 sequences. The results indicate that our method outperforms the existing method for most of our selected cases. The multiobjective evolutionary algorithm presented in this paper provides an effective approach for assessing activity pattern similarity, and a foundation for identifying distinctive groups of individuals with similar activity patterns. PMID:26190858

  1. Sequence comparison alignment-free approach based on suffix tree and L-words frequency.

    PubMed

    Soares, Inês; Goios, Ana; Amorim, António

    2012-01-01

    The vast majority of methods available for sequence comparison rely on a first sequence alignment step, which requires a number of assumptions on evolutionary history and is sometimes very difficult or impossible to perform due to the abundance of gaps (insertions/deletions). In such cases, an alternative alignment-free method would prove valuable. Our method starts by a computation of a generalized suffix tree of all sequences, which is completed in linear time. Using this tree, the frequency of all possible words with a preset length L-L-words--in each sequence is rapidly calculated. Based on the L-words frequency profile of each sequence, a pairwise standard Euclidean distance is then computed producing a symmetric genetic distance matrix, which can be used to generate a neighbor joining dendrogram or a multidimensional scaling graph. We present an improvement to word counting alignment-free approaches for sequence comparison, by determining a single optimal word length and combining suffix tree structures to the word counting tasks. Our approach is, thus, a fast and simple application that proved to be efficient and powerful when applied to mitochondrial genomes. The algorithm was implemented in Python language and is freely available on the web.

  2. Manipulating multiple sequence alignments via MaM and WebMaM.

    PubMed

    Alkan, Can; Tüzün, Eray; Buard, Jerome; Lethiec, Franck; Eichler, Evan E; Bailey, Jeffrey A; Sahinalp, S Cenk

    2005-07-01

    MaM is a software tool that processes and manipulates multiple alignments of genomic sequence. MaM computes the exact location of common repeat elements, exons and unique regions within aligned genomics sequences using a variety of user identified programs, databases and/or tables. The program can extract subalignments, corresponding to these various regions of DNA to be analyzed independently or in conjunction with other elements of genomic DNA. Graphical displays further allow an assessment of sequence variation throughout these different regions of the aligned sequence, providing separate displays for their repeat, non-repeat and coding portions of genomic DNA. The program should facilitate the phylogenetic analysis and processing of different portions of genomic sequence as part of large-scale sequencing efforts. MaM source code is freely available for non-commercial use at http://compbio.cs.sfu.ca/MAM.htm; and the web interface WebMaM is hosted at http://atgc.lirmm.fr/mam.

  3. DUC-Curve, a highly compact 2D graphical representation of DNA sequences and its application in sequence alignment

    NASA Astrophysics Data System (ADS)

    Li, Yushuang; Liu, Qian; Zheng, Xiaoqi

    2016-08-01

    A highly compact and simple 2D graphical representation of DNA sequences, named DUC-Curve, is constructed through mapping four nucleotides to a unit circle with a cyclic order. DUC-Curve could directly detect nucleotide, di-nucleotide compositions and microsatellite structure from DNA sequences. Moreover, it also could be used for DNA sequence alignment. Taking geometric center vectors of DUC-Curves as sequence descriptor, we perform similarity analysis on the first exons of β-globin genes of 11 species, oncogene TP53 of 27 species and twenty-four Influenza A viruses, respectively. The obtained reasonable results illustrate that the proposed method is very effective in sequence comparison problems, and will at least play a complementary role in classification and clustering problems.

  4. SDT: a virus classification tool based on pairwise sequence alignment and identity calculation.

    PubMed

    Muhire, Brejnev Muhizi; Varsani, Arvind; Martin, Darren Patrick

    2014-01-01

    The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV). There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT), a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms). PMID:25259891

  5. SDT: a virus classification tool based on pairwise sequence alignment and identity calculation.

    PubMed

    Muhire, Brejnev Muhizi; Varsani, Arvind; Martin, Darren Patrick

    2014-01-01

    The perpetually increasing rate at which viral full-genome sequences are being determined is creating a pressing demand for computational tools that will aid the objective classification of these genome sequences. Taxonomic classification approaches that are based on pairwise genetic identity measures are potentially highly automatable and are progressively gaining favour with the International Committee on Taxonomy of Viruses (ICTV). There are, however, various issues with the calculation of such measures that could potentially undermine the accuracy and consistency with which they can be applied to virus classification. Firstly, pairwise sequence identities computed based on multiple sequence alignments rather than on multiple independent pairwise alignments can lead to the deflation of identity scores with increasing dataset sizes. Also, when gap-characters need to be introduced during sequence alignments to account for insertions and deletions, methodological variations in the way that these characters are introduced and handled during pairwise genetic identity calculations can cause high degrees of inconsistency in the way that different methods classify the same sets of sequences. Here we present Sequence Demarcation Tool (SDT), a free user-friendly computer program that aims to provide a robust and highly reproducible means of objectively using pairwise genetic identity calculations to classify any set of nucleotide or amino acid sequences. SDT can produce publication quality pairwise identity plots and colour-coded distance matrices to further aid the classification of sequences according to ICTV approved taxonomic demarcation criteria. Besides a graphical interface version of the program for Windows computers, command-line versions of the program are available for a variety of different operating systems (including a parallel version for cluster computing platforms).

  6. Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models

    PubMed Central

    2014-01-01

    Background Logos are commonly used in molecular biology to provide a compact graphical representation of the conservation pattern of a set of sequences. They render the information contained in sequence alignments or profile hidden Markov models by drawing a stack of letters for each position, where the height of the stack corresponds to the conservation at that position, and the height of each letter within a stack depends on the frequency of that letter at that position. Results We present a new tool and web server, called Skylign, which provides a unified framework for creating logos for both sequence alignments and profile hidden Markov models. In addition to static image files, Skylign creates a novel interactive logo plot for inclusion in web pages. These interactive logos enable scrolling, zooming, and inspection of underlying values. Skylign can avoid sampling bias in sequence alignments by down-weighting redundant sequences and by combining observed counts with informed priors. It also simplifies the representation of gap parameters, and can optionally scale letter heights based on alternate calculations of the conservation of a position. Conclusion Skylign is available as a website, a scriptable web service with a RESTful interface, and as a software package for download. Skylign’s interactive logos are easily incorporated into a web page with just a few lines of HTML markup. Skylign may be found at http://skylign.org. PMID:24410852

  7. R3D-2-MSA: the RNA 3D structure-to-multiple sequence alignment server

    PubMed Central

    Cannone, Jamie J.; Sweeney, Blake A.; Petrov, Anton I.; Gutell, Robin R.; Zirbel, Craig L.; Leontis, Neocles

    2015-01-01

    The RNA 3D Structure-to-Multiple Sequence Alignment Server (R3D-2-MSA) is a new web service that seamlessly links RNA three-dimensional (3D) structures to high-quality RNA multiple sequence alignments (MSAs) from diverse biological sources. In this first release, R3D-2-MSA provides manual and programmatic access to curated, representative ribosomal RNA sequence alignments from bacterial, archaeal, eukaryal and organellar ribosomes, using nucleotide numbers from representative atomic-resolution 3D structures. A web-based front end is available for manual entry and an Application Program Interface for programmatic access. Users can specify up to five ranges of nucleotides and 50 nucleotide positions per range. The R3D-2-MSA server maps these ranges to the appropriate columns of the corresponding MSA and returns the contents of the columns, either for display in a web browser or in JSON format for subsequent programmatic use. The browser output page provides a 3D interactive display of the query, a full list of sequence variants with taxonomic information and a statistical summary of distinct sequence variants found. The output can be filtered and sorted in the browser. Previous user queries can be viewed at any time by resubmitting the output URL, which encodes the search and re-generates the results. The service is freely available with no login requirement at http://rna.bgsu.edu/r3d-2-msa. PMID:26048960

  8. Phylo-VISTA: An Interactive Visualization Tool for Multiple DNA Sequence Alignments

    SciTech Connect

    Shah, Nameeta; Couronne, Olivier; Pennacchio, Len A.; Brudno, Michael; Batzoglou, Serafim; Bethel, E. Wes; Rubin, Edward M.; Hamann, Bernd; Dubchak, Inna

    2004-04-01

    We have developed Phylo-VISTA (Shah et al., 2003), an interactive software tool for analyzing multiple alignments by visualizing a similarity measure for DNA sequences of multiple species. The complexity of visual presentation is effectively organized using a framework based upon inter-species phylogenetic relationships. The phylogenetic organization supports rapid, user-guided inter-species comparison. To aid in navigation through large sequence datasets, Phylo-VISTA provides a user with the ability to select and view data at varying resolutions. The combination of multi-resolution data visualization and analysis, combined with the phylogenetic framework for inter-species comparison, produces a highly flexible and powerful tool for visual data analysis of multiple sequence alignments.

  9. A data parallel strategy for aligning multiple biological sequences on multi-core computers.

    PubMed

    Zhu, Xiangyuan; Li, Kenli; Salah, Ahmad

    2013-05-01

    In this paper, we address the large-scale biological sequence alignment problem, which has an increasing demand in computational biology. We employ data parallelism paradigm that is suitable for handling large-scale processing on multi-core computers to achieve a high degree of parallelism. Using the data parallelism paradigm, we propose a general strategy which can be used to speed up any multiple sequence alignment method. We applied five different clustering algorithms in our strategy and implemented rigorous tests on an 8-core computer using four traditional benchmarks and artificially generated sequences. The results show that our multi-core-based implementations can achieve up to 151-fold improvements in execution time while losing 2.19% accuracy on average. The source code of the proposed strategy, together with the test sets used in our analysis, is available on request.

  10. Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer

    PubMed Central

    Bernard, Guillaume; Chan, Cheong Xin; Ragan, Mark A.

    2016-01-01

    Alignment-free (AF) approaches have recently been highlighted as alternatives to methods based on multiple sequence alignment in phylogenetic inference. However, the sensitivity of AF methods to genome-scale evolutionary scenarios is little known. Here, using simulated microbial genome data we systematically assess the sensitivity of nine AF methods to three important evolutionary scenarios: sequence divergence, lateral genetic transfer (LGT) and genome rearrangement. Among these, AF methods are most sensitive to the extent of sequence divergence, less sensitive to low and moderate frequencies of LGT, and most robust against genome rearrangement. We describe the application of AF methods to three well-studied empirical genome datasets, and introduce a new application of the jackknife to assess node support. Our results demonstrate that AF phylogenomics is computationally scalable to multi-genome data and can generate biologically meaningful phylogenies and insights into microbial evolution. PMID:27363362

  11. CRASP: a program for analysis of coordinated substitutions in multiple alignments of protein sequences.

    PubMed

    Afonnikov, Dmitry A; Kolchanov, Nikolay A

    2004-07-01

    Recent results suggest that during evolution certain substitutions at protein sites may occur in a coordinated manner due to interactions between amino acid residues. Information on these coordinated substitutions may be useful for analysis of protein structure and function. CRASP is an Internet-available software tool for the detection and analysis of coordinated substitutions in multiple alignments of protein sequences. The approach is based on estimation of the correlation coefficient between the values of a physicochemical parameter at a pair of positions of sequence alignment. The program enables the user to detect and analyze pairwise relationships between amino acid substitutions at protein sequence positions, estimate the contribution of the coordinated substitutions to the evolutionary invariance or variability in integral protein physicochemical characteristics such as the net charge of protein residues and hydrophobic core volume. The CRASP program is available at http://wwwmgs.bionet.nsc.ru/mgs/programs/crasp/.

  12. A parallel approach of COFFEE objective function to multiple sequence alignment

    NASA Astrophysics Data System (ADS)

    Zafalon, G. F. D.; Visotaky, J. M. V.; Amorim, A. R.; Valêncio, C. R.; Neves, L. A.; de Souza, R. C. G.; Machado, J. M.

    2015-09-01

    The computational tools to assist genomic analyzes show even more necessary due to fast increasing of data amount available. With high computational costs of deterministic algorithms for sequence alignments, many works concentrate their efforts in the development of heuristic approaches to multiple sequence alignments. However, the selection of an approach, which offers solutions with good biological significance and feasible execution time, is a great challenge. Thus, this work aims to show the parallelization of the processing steps of MSA-GA tool using multithread paradigm in the execution of COFFEE objective function. The standard objective function implemented in the tool is the Weighted Sum of Pairs (WSP), which produces some distortions in the final alignments when sequences sets with low similarity are aligned. Then, in studies previously performed we implemented the COFFEE objective function in the tool to smooth these distortions. Although the nature of COFFEE objective function implies in the increasing of execution time, this approach presents points, which can be executed in parallel. With the improvements implemented in this work, we can verify the execution time of new approach is 24% faster than the sequential approach with COFFEE. Moreover, the COFFEE multithreaded approach is more efficient than WSP, because besides it is slightly fast, its biological results are better.

  13. Direct RNA motif definition and identification from multiple sequence alignments using secondary structure profiles.

    PubMed

    Gautheret, D; Lambert, A

    2001-11-01

    We present here a new approach to the problem of defining RNA signatures and finding their occurrences in sequence databases. The proposed method is based on "secondary structure profiles". An RNA sequence alignment with secondary structure information is used as an input. Two types of weight matrices/profiles are constructed from this alignment: single strands are represented by a classical lod-scores profile while helical regions are represented by an extended "helical profile" comprising 16 lod-scores per position, one for each of the 16 possible base-pairs. Database searches are then conducted using a simultaneous search for helical profiles and dynamic programming alignment of single strand profiles. The algorithm has been implemented into a new software, ERPIN, that performs both profile construction and database search. Applications are presented for several RNA motifs. The automated use of sequence information in both single-stranded and helical regions yields better sensitivity/specificity ratios than descriptor-based programs. Furthermore, since the translation of alignments into profiles is straightforward with ERPIN, iterative searches can easily be conducted to enrich collections of homologous RNAs.

  14. SVM-BALSA: Remote Homology Detection based on Bayesian Sequence Alignment

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.; Oehmen, Chris S.; Matzke, Melissa M.

    2005-11-10

    Using biopolymer sequence comparison methods to identify evolutionarily related proteins is one of the most common tasks in bioinformatics. Recently, support vector machines (SVMs) utilizing statistical learning theory have been employed in the problem of remote homology detection and shown to outperform iterative profile methods such as PSI-BLAST. In this study we demonstrate the utilization of a Bayesian alignment score, which accounts for the uncertainty of all possible alignments, in the SVM construction improves sensitivity compared to the traditional dynamic programming implementation.

  15. Comparative Topological Analysis of Neuronal Arbors via Sequence Representation and Alignment

    NASA Astrophysics Data System (ADS)

    Gillette, Todd Aaron

    Neuronal morphology is a key mediator of neuronal function, defining the profile of connectivity and shaping signal integration and propagation. Reconstructing neurite processes is technically challenging and thus data has historically been relatively sparse. Data collection and curation along with more efficient and reliable data production methods provide opportunities for the application of informatics to find new relationships and more effectively explore the field. This dissertation presents a method for aiding the development of data production as well as a novel representation and set of analyses for extracting morphological patterns. The DIADEM Challenge was organized for the purposes of determining the state of the art in automated neuronal reconstruction and what existing challenges remained. As one of the co-organizers of the Challenge, I developed the DIADEM metric, a tool designed to measure the effectiveness of automated reconstruction algorithms by comparing resulting reconstructions to expert-produced gold standards and identifying errors of various types. It has been used in the DIADEM Challenge and in the testing of several algorithms since. Further, this dissertation describes a topological sequence representation of neuronal trees amenable to various forms of sequence analysis, notably motif analysis, global pairwise alignment, clustering, and multiple sequence alignment. Motif analysis of neuronal arbors shows a large difference in bifurcation type proportions between axons and dendrites, but that relatively simple growth mechanisms account for most higher order motifs. Pairwise global alignment of topological sequences, modified from traditional sequence alignment to preserve tree relationships, enabled cluster analysis which displayed strong correspondence with known cell classes by cell type, species, and brain region. Multiple alignment of sequences in selected clusters enabled the extraction of conserved features, revealing mouse

  16. WAViS server for handling, visualization and presentation of multiple alignments of nucleotide or amino acids sequences.

    PubMed

    Zika, Radek; Paces, Jan; Pavlícek, Adam; Paces, Václav

    2004-07-01

    Web Alignment Visualization Server contains a set of web-tools designed for quick generation of publication-quality color figures of multiple alignments of nucleotide or amino acids sequences. It can be used for identification of conserved regions and gaps within many sequences using only common web browsers. The server is accessible at http://wavis.img.cas.cz.

  17. TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction.

    PubMed

    Chang, Jia-Ming; Di Tommaso, Paolo; Lefort, Vincent; Gascuel, Olivier; Notredame, Cedric

    2015-07-01

    This article introduces the Transitive Consistency Score (TCS) web server; a service making it possible to estimate the local reliability of protein multiple sequence alignments (MSAs) using the TCS index. The evaluation can be used to identify the aligned positions most likely to contain structurally analogous residues and also most likely to support an accurate phylogenetic reconstruction. The TCS scoring scheme has been shown to be accurate predictor of structural alignment correctness among commonly used methods. It has also been shown to outperform common filtering schemes like Gblocks or trimAl when doing MSA post-processing prior to phylogenetic tree reconstruction. The web server is available from http://tcoffee.crg.cat/tcs.

  18. ANTICALIgN: visualizing, editing and analyzing combined nucleotide and amino acid sequence alignments for combinatorial protein engineering.

    PubMed

    Jarasch, Alexander; Kopp, Melanie; Eggenstein, Evelyn; Richter, Antonia; Gebauer, Michaela; Skerra, Arne

    2016-07-01

    ANTIC ALIGN: is an interactive software developed to simultaneously visualize, analyze and modify alignments of DNA and/or protein sequences that arise during combinatorial protein engineering, design and selection. ANTIC ALIGN: combines powerful functions known from currently available sequence analysis tools with unique features for protein engineering, in particular the possibility to display and manipulate nucleotide sequences and their translated amino acid sequences at the same time. ANTIC ALIGN: offers both template-based multiple sequence alignment (MSA), using the unmutated protein as reference, and conventional global alignment, to compare sequences that share an evolutionary relationship. The application of similarity-based clustering algorithms facilitates the identification of duplicates or of conserved sequence features among a set of selected clones. Imported nucleotide sequences from DNA sequence analysis are automatically translated into the corresponding amino acid sequences and displayed, offering numerous options for selecting reading frames, highlighting of sequence features and graphical layout of the MSA. The MSA complexity can be reduced by hiding the conserved nucleotide and/or amino acid residues, thus putting emphasis on the relevant mutated positions. ANTIC ALIGN: is also able to handle suppressed stop codons or even to incorporate non-natural amino acids into a coding sequence. We demonstrate crucial functions of ANTIC ALIGN: in an example of Anticalins selected from a lipocalin random library against the fibronectin extradomain B (ED-B), an established marker of tumor vasculature. Apart from engineered protein scaffolds, ANTIC ALIGN: provides a powerful tool in the area of antibody engineering and for directed enzyme evolution.

  19. OrthoSelect: a web server for selecting orthologous gene alignments from EST sequences.

    PubMed

    Schreiber, Fabian; Wörheide, Gert; Morgenstern, Burkhard

    2009-07-01

    In the absence of whole genome sequences for many organisms, the use of expressed sequence tags (EST) offers an affordable approach for researchers conducting phylogenetic analyses to gain insight about the evolutionary history of organisms. Reliable alignments for phylogenomic analyses are based on orthologous gene sequences from different taxa. So far, researchers have not sufficiently tackled the problem of the completely automated construction of such datasets. Existing software tools are either semi-automated, covering only part of the necessary data processing, or implemented as a pipeline, requiring the installation and configuration of a cascade of external tools, which may be time-consuming and hard to manage. To simplify data set construction for phylogenomic studies, we set up a web server that uses our recently developed OrthoSelect approach. To the best of our knowledge, our web server is the first web-based EST analysis pipeline that allows the detection of orthologous gene sequences in EST libraries and outputs orthologous gene alignments. Additionally, OrthoSelect provides the user with an extensive results section that lists and visualizes all important results, such as annotations, data matrices for each gene/taxon and orthologous gene alignments. The web server is available at http://orthoselect.gobics.de.

  20. Aligning biological sequences on distributed bus networks: a divisible load scheduling approach.

    PubMed

    Min, Wong Han; Veeravalli, Bharadwaj

    2005-12-01

    In this paper, we design a multiprocessor strategy that exploits the computational characteristics of the algorithms used for biological sequence comparison proposed in the literature. We employ divisible load theory (DLT) that is suitable for handling large scale processing on network based systems. For the first time in the domain of DLT, the problem of aligning biological sequences is attempted. The objective is to minimize the total processing time of the alignment process. In designing our strategy, DLT facilitates a clever partitioning of the entire computation process involved in such a way that the overall time consumed for aligning the sequences is a minimum. The partitioning takes into account the computation speeds of the nodes and the underlying communication network. Since this is a real-life application, the post-processing phase becomes important, and hence we consider propagating the results back in order to generate an exact alignment. We consider several cases in our analysis such as deriving closed-form solutions for the processing time for heterogeneous, homogeneous, and networks with slow links. Further, we attempt to employ a multiinstallment strategy to distribute the tasks such that a higher degree of parallelism can be achieved. For slow networks, our strategy recommends near-optimal solutions. We derive an important condition to identify such cases and propose two heuristic strategies. Also, our strategy can be extended for multisequence alignment by utilizing a clustering strategy such as the Berger-Munson algorithm proposed in the literature. Finally, we use real-life DNA samples of house mouse mitochondrion (Mus Musculus Mitochondrion, NC_001569) consisting of 16,295 residues and the DNA of human mitochondrion (Homo Sapiens Mitochondrion, NC_001807) consisting of 16,571 residues, obtainable from the GenBank, in our rigorous simulation experiments to illustrate all the theoretical findings.

  1. Java XMGR

    SciTech Connect

    Dr. George L. Mesina; Steven P. Miller

    2004-08-01

    The XMGR5 graphing package [1] for drawing RELAP5 [2] plots is being re-written in Java [3]. Java is a robust programming language that is available at no cost for most computer platforms from Sun Microsystems, Inc. XMGR5 is an extension of an XY plotting tool called ACE/gr extended to plot data from several US Nuclear Regulatory Commission (NRC) applications. It is also the most popular graphing package worldwide for making RELAP5 plots. In Section 1, a short review of XMGR5 is given, followed by a brief overview of Java. In Section 2, shortcomings of both tkXMGR [4] and XMGR5 are discussed and the value of converting to Java is given. Details of the conversion to Java are given in Section 3. The progress to date, some conclusions and future work are given in Section 4. Some screen shots of the Java version are shown.

  2. Review of alignment and SNP calling algorithms for next-generation sequencing data.

    PubMed

    Mielczarek, M; Szyda, J

    2016-02-01

    Application of the massive parallel sequencing technology has become one of the most important issues in life sciences. Therefore, it was crucial to develop bioinformatics tools for next-generation sequencing (NGS) data processing. Currently, two of the most significant tasks include alignment to a reference genome and detection of single nucleotide polymorphisms (SNPs). In many types of genomic analyses, great numbers of reads need to be mapped to the reference genome; therefore, selection of the aligner is an essential step in NGS pipelines. Two main algorithms-suffix tries and hash tables-have been introduced for this purpose. Suffix array-based aligners are memory-efficient and work faster than hash-based aligners, but they are less accurate. In contrast, hash table algorithms tend to be slower, but more sensitive. SNP and genotype callers may also be divided into two main different approaches: heuristic and probabilistic methods. A variety of software has been subsequently developed over the past several years. In this paper, we briefly review the current development of NGS data processing algorithms and present the available software.

  3. Enzyme sequence similarity improves the reaction alignment method for cross-species pathway comparison

    SciTech Connect

    Ovacik, Meric A.; Androulakis, Ioannis P.

    2013-09-15

    Pathway-based information has become an important source of information for both establishing evolutionary relationships and understanding the mode of action of a chemical or pharmaceutical among species. Cross-species comparison of pathways can address two broad questions: comparison in order to inform evolutionary relationships and to extrapolate species differences used in a number of different applications including drug and toxicity testing. Cross-species comparison of metabolic pathways is complex as there are multiple features of a pathway that can be modeled and compared. Among the various methods that have been proposed, reaction alignment has emerged as the most successful at predicting phylogenetic relationships based on NCBI taxonomy. We propose an improvement of the reaction alignment method by accounting for sequence similarity in addition to reaction alignment method. Using nine species, including human and some model organisms and test species, we evaluate the standard and improved comparison methods by analyzing glycolysis and citrate cycle pathways conservation. In addition, we demonstrate how organism comparison can be conducted by accounting for the cumulative information retrieved from nine pathways in central metabolism as well as a more complete study involving 36 pathways common in all nine species. Our results indicate that reaction alignment with enzyme sequence similarity results in a more accurate representation of pathway specific cross-species similarities and differences based on NCBI taxonomy.

  4. FAMSA: Fast and accurate multiple sequence alignment of huge protein families

    PubMed Central

    Deorowicz, Sebastian; Debudaj-Grabysz, Agnieszka; Gudyś, Adam

    2016-01-01

    Rapid development of modern sequencing platforms has contributed to the unprecedented growth of protein families databases. The abundance of sets containing hundreds of thousands of sequences is a formidable challenge for multiple sequence alignment algorithms. The article introduces FAMSA, a new progressive algorithm designed for fast and accurate alignment of thousands of protein sequences. Its features include the utilization of the longest common subsequence measure for determining pairwise similarities, a novel method of evaluating gap costs, and a new iterative refinement scheme. What matters is that its implementation is highly optimized and parallelized to make the most of modern computer platforms. Thanks to the above, quality indicators, i.e. sum-of-pairs and total-column scores, show FAMSA to be superior to competing algorithms, such as Clustal Omega or MAFFT for datasets exceeding a few thousand sequences. Quality does not compromise on time or memory requirements, which are an order of magnitude lower than those in the existing solutions. For example, a family of 415519 sequences was analyzed in less than two hours and required no more than 8 GB of RAM. FAMSA is available for free at http://sun.aei.polsl.pl/REFRESH/famsa. PMID:27670777

  5. DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

    PubMed

    Kelly, Steven; Maini, Philip K

    2013-01-01

    The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.

  6. ClustalXeed: a GUI-based grid computation version for high performance and terabyte size multiple sequence alignment

    PubMed Central

    2010-01-01

    Background There is an increasing demand to assemble and align large-scale biological sequence data sets. The commonly used multiple sequence alignment programs are still limited in their ability to handle very large amounts of sequences because the system lacks a scalable high-performance computing (HPC) environment with a greatly extended data storage capacity. Results We designed ClustalXeed, a software system for multiple sequence alignment with incremental improvements over previous versions of the ClustalX and ClustalW-MPI software. The primary advantage of ClustalXeed over other multiple sequence alignment software is its ability to align a large family of protein or nucleic acid sequences. To solve the conventional memory-dependency problem, ClustalXeed uses both physical random access memory (RAM) and a distributed file-allocation system for distance matrix construction and pair-align computation. The computation efficiency of disk-storage system was markedly improved by implementing an efficient load-balancing algorithm, called "idle node-seeking task algorithm" (INSTA). The new editing option and the graphical user interface (GUI) provide ready access to a parallel-computing environment for users who seek fast and easy alignment of large DNA and protein sequence sets. Conclusions ClustalXeed can now compute a large volume of biological sequence data sets, which were not tractable in any other parallel or single MSA program. The main developments include: 1) the ability to tackle larger sequence alignment problems than possible with previous systems through markedly improved storage-handling capabilities. 2) Implementing an efficient task load-balancing algorithm, INSTA, which improves overall processing times for multiple sequence alignment with input sequences of non-uniform length. 3) Support for both single PC and distributed cluster systems. PMID:20849574

  7. A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery

    PubMed Central

    Yen, Ian E. H.; Lin, Xin; Zhang, Jiong; Ravikumar, Pradeep; Dhillon, Inderjit S.

    2016-01-01

    Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems. PMID:27559428

  8. KMAD: knowledge-based multiple sequence alignment for intrinsically disordered proteins

    PubMed Central

    Lange, Joanna; Wyrwicz, Lucjan S.; Vriend, Gert

    2016-01-01

    Summary: Intrinsically disordered proteins (IDPs) lack tertiary structure and thus differ from globular proteins in terms of their sequence–structure–function relations. IDPs have lower sequence conservation, different types of active sites and a different distribution of functionally important regions, which altogether make their multiple sequence alignment (MSA) difficult. The KMAD MSA software has been written specifically for the alignment and annotation of IDPs. It augments the substitution matrix with knowledge about post-translational modifications, functional domains and short linear motifs. Results: MSAs produced with KMAD describe well-conserved features among IDPs, tend to agree well with biological intuition, and are a good basis for designing new experiments to shed light on this large, understudied class of proteins. Availability and implementation: KMAD web server is accessible at http://www.cmbi.ru.nl/kmad/. A standalone version is freely available. Contact: vriend@cmbi.ru.nl PMID:26568635

  9. seqphase: a web tool for interconverting phase input/output files and fasta sequence alignments.

    PubMed

    Flot, J-F

    2010-01-01

    The program phase is widely used for Bayesian inference of haplotypes from diploid genotypes; however, manually creating phase input files from sequence alignments is an error-prone and time-consuming process, especially when dealing with numerous variable sites and/or individuals. Here, a web tool called seqphase is presented that generates phase input files from fasta sequence alignments and converts phase output files back into fasta. During the production of the phase input file, several consistency checks are performed on the dataset and suitable command line options to be used for the actual phase data analysis are suggested. seqphase was written in perl and is freely accessible over the Internet at the address http://www.mnhn.fr/jfflot/seqphase.

  10. rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison

    PubMed Central

    Hahn, Lars; Leimeister, Chris-André; Morgenstern, Burkhard

    2016-01-01

    Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don’t-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/ PMID:27760124

  11. JAR3D Webserver: Scoring and aligning RNA loop sequences to known 3D motifs

    PubMed Central

    Roll, James; Zirbel, Craig L.; Sweeney, Blake; Petrov, Anton I.; Leontis, Neocles

    2016-01-01

    Many non-coding RNAs have been identified and may function by forming 2D and 3D structures. RNA hairpin and internal loops are often represented as unstructured on secondary structure diagrams, but RNA 3D structures show that most such loops are structured by non-Watson–Crick basepairs and base stacking. Moreover, different RNA sequences can form the same RNA 3D motif. JAR3D finds possible 3D geometries for hairpin and internal loops by matching loop sequences to motif groups from the RNA 3D Motif Atlas, by exact sequence match when possible, and by probabilistic scoring and edit distance for novel sequences. The scoring gauges the ability of the sequences to form the same pattern of interactions observed in 3D structures of the motif. The JAR3D webserver at http://rna.bgsu.edu/jar3d/ takes one or many sequences of a single loop as input, or else one or many sequences of longer RNAs with multiple loops. Each sequence is scored against all current motif groups. The output shows the ten best-matching motif groups. Users can align input sequences to each of the motif groups found by JAR3D. JAR3D will be updated with every release of the RNA 3D Motif Atlas, and so its performance is expected to improve over time. PMID:27235417

  12. Genomic signal processing methods for computation of alignment-free distances from DNA sequences.

    PubMed

    Borrayo, Ernesto; Mendizabal-Ruiz, E Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P; Morales, J Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments.

  13. Genomic Signal Processing Methods for Computation of Alignment-Free Distances from DNA Sequences

    PubMed Central

    Borrayo, Ernesto; Mendizabal-Ruiz, E. Gerardo; Vélez-Pérez, Hugo; Romo-Vázquez, Rebeca; Mendizabal, Adriana P.; Morales, J. Alejandro

    2014-01-01

    Genomic signal processing (GSP) refers to the use of digital signal processing (DSP) tools for analyzing genomic data such as DNA sequences. A possible application of GSP that has not been fully explored is the computation of the distance between a pair of sequences. In this work we present GAFD, a novel GSP alignment-free distance computation method. We introduce a DNA sequence-to-signal mapping function based on the employment of doublet values, which increases the number of possible amplitude values for the generated signal. Additionally, we explore the use of three DSP distance metrics as descriptors for categorizing DNA signal fragments. Our results indicate the feasibility of employing GAFD for computing sequence distances and the use of descriptors for characterizing DNA fragments. PMID:25393409

  14. Multiple Amino Acid Sequence Alignment Nitrogenase Component 1: Insights into Phylogenetics and Structure-Function Relationships

    PubMed Central

    Howard, James B.; Kechris, Katerina J.; Rees, Douglas C.; Glazer, Alexander N.

    2013-01-01

    Amino acid residues critical for a protein's structure-function are retained by natural selection and these residues are identified by the level of variance in co-aligned homologous protein sequences. The relevant residues in the nitrogen fixation Component 1 α- and β-subunits were identified by the alignment of 95 protein sequences. Proteins were included from species encompassing multiple microbial phyla and diverse ecological niches as well as the nitrogen fixation genotypes, anf, nif, and vnf, which encode proteins associated with cofactors differing at one metal site. After adjusting for differences in sequence length, insertions, and deletions, the remaining >85% of the sequence co-aligned the subunits from the three genotypes. Six Groups, designated Anf, Vnf , and Nif I-IV, were assigned based upon genetic origin, sequence adjustments, and conserved residues. Both subunits subdivided into the same groups. Invariant and single variant residues were identified and were defined as “core” for nitrogenase function. Three species in Group Nif-III, Candidatus Desulforudis audaxviator, Desulfotomaculum kuznetsovii, and Thermodesulfatator indicus, were found to have a seleno-cysteine that replaces one cysteinyl ligand of the 8Fe:7S, P-cluster. Subsets of invariant residues, limited to individual groups, were identified; these unique residues help identify the gene of origin (anf, nif, or vnf) yet should not be considered diagnostic of the metal content of associated cofactors. Fourteen of the 19 residues that compose the cofactor pocket are invariant or single variant; the other five residues are highly variable but do not correlate with the putative metal content of the cofactor. The variable residues are clustered on one side of the cofactor, away from other functional centers in the three dimensional structure. Many of the invariant and single variant residues were not previously recognized as potentially critical and their identification provides the bases

  15. Alignment-free analysis of barcode sequences by means of compression-based methods

    PubMed Central

    2013-01-01

    Background The key idea of DNA barcode initiative is to identify, for each group of species belonging to different kingdoms of life, a short DNA sequence that can act as a true taxon barcode. DNA barcode represents a valuable type of information that can be integrated with ecological, genetic, and morphological data in order to obtain a more consistent taxonomy. Recent studies have shown that, for the animal kingdom, the mitochondrial gene cytochrome c oxidase I (COI), about 650 bp long, can be used as a barcode sequence for identification and taxonomic purposes of animals. In the present work we aims at introducing the use of an alignment-free approach in order to make taxonomic analysis of barcode sequences. Our approach is based on the use of two compression-based versions of non-computable Universal Similarity Metric (USM) class of distances. Our purpose is to justify the employ of USM also for the analysis of short DNA barcode sequences, showing how USM is able to correctly extract taxonomic information among those kind of sequences. Results We downloaded from Barcode of Life Data System (BOLD) database 30 datasets of barcode sequences belonging to different animal species. We built phylogenetic trees of every dataset, according to compression-based and classic evolutionary methods, and compared them in terms of topology preservation. In the experimental tests, we obtained scores with a percentage of similarity between evolutionary and compression-based trees between 80% and 100% for the most of datasets (94%). Moreover we carried out experimental tests using simulated barcode datasets composed of 100, 150, 200 and 500 sequences, each simulation replicated 25-fold. In this case, mean similarity scores between evolutionary and compression-based trees span between 83% and 99% for all simulated datasets. Conclusions In the present work we aims at introducing the use of an alignment-free approach in order to make taxonomic analysis of barcode sequences. Our

  16. Graph-based modeling of tandem repeats improves global multiple sequence alignment.

    PubMed

    Szalkowski, Adam M; Anisimova, Maria

    2013-09-01

    Tandem repeats (TRs) are often present in proteins with crucial functions, responsible for resistance, pathogenicity and associated with infectious or neurodegenerative diseases. This motivates numerous studies of TRs and their evolution, requiring accurate multiple sequence alignment. TRs may be lost or inserted at any position of a TR region by replication slippage or recombination, but current methods assume fixed unit boundaries, and yet are of high complexity. We present a new global graph-based alignment method that does not restrict TR unit indels by unit boundaries. TR indels are modeled separately and penalized using the phylogeny-aware alignment algorithm. This ensures enhanced accuracy of reconstructed alignments, disentangling TRs and measuring indel events and rates in a biologically meaningful way. Our method detects not only duplication events but also all changes in TR regions owing to recombination, strand slippage and other events inserting or deleting TR units. We evaluate our method by simulation incorporating TR evolution, by either sampling TRs from a profile hidden Markov model or by mimicking strand slippage with duplications. The new method is illustrated on a family of type III effectors, a pathogenicity determinant in agriculturally important bacteria Ralstonia solanacearum. We show that TR indel rate variation contributes to the diversification of this protein family.

  17. Multiple Sequence Alignment with Hidden Markov Models Learned by Random Drift Particle Swarm Optimization.

    PubMed

    Sun, Jun; Palade, Vasile; Wu, Xiaojun; Fang, Wei

    2014-01-01

    Hidden Markov Models (HMMs) are powerful tools for multiple sequence alignment (MSA), which is known to be an NP-complete and important problem in bioinformatics. Learning HMMs is a difficult task, and many meta-heuristic methods, including particle swarm optimization (PSO), have been used for that. In this paper, a new variant of PSO, called the random drift particle swarm optimization (RDPSO) algorithm, is proposed to be used for HMM learning tasks in MSA problems. The proposed RDPSO algorithm, inspired by the free electron model in metal conductors in an external electric field, employs a novel set of evolution equations that can enhance the global search ability of the algorithm. Moreover, in order to further enhance the algorithmic performance of the RDPSO, we incorporate a diversity control method into the algorithm and, thus, propose an RDPSO with diversity-guided search (RDPSO-DGS). The performances of the RDPSO, RDPSO-DGS and other algorithms are tested and compared by learning HMMs for MSA on two well-known benchmark data sets. The experimental results show that the HMMs learned by the RDPSO and RDPSO-DGS are able to generate better alignments for the benchmark data sets than other most commonly used HMM learning methods, such as the Baum-Welch and other PSO algorithms. The performance comparison with well-known MSA programs, such as ClustalW and MAFFT, also shows that the proposed methods have advantages in multiple sequence alignment.

  18. Studies on structure-based sequence alignment and phylogenies of beta-lactamases.

    PubMed

    Salahuddin, Parveen; Khan, Asad U

    2014-01-01

    The β-lactamases enzymes cleave the amide bond in β-lactam ring, rendering β-lactam antibiotics harmless to bacteria. In this communication we have studied structure-function relationship and phylogenies of class A, B and D beta-lactamases using structure-based sequence alignment and phylip programs respectively. The data of structure-based sequence alignment suggests that in different isolates of TEM-1, mutations did not occur at or near sequence motifs. Since deletions are reported to be lethal to structure and function of enzyme. Therefore, in these variants antibiotic hydrolysis profile and specificity will be affected. The alignment data of class A enzyme SHV-1, CTX-M-15, class D enzyme, OXA-10, and class B enzyme VIM-2 and SIM-1 show sequence motifs along with other part of polypeptide are essentially conserved. These results imply that conformations of betalactamases are close to native state and possess normal hydrolytic activities towards beta-lactam antibiotics. However, class B enzyme such as IMP-1 and NDM-1 are less conserved than other class A and D studied here because mutation and deletions occurred at critically important region such as active site. Therefore, the structure of these beta-lactamases will be altered and antibiotic hydrolysis profile will be affected. Phylogenetic studies suggest that class A and D beta-lactamases including TOHO-1 and OXA-10 respectively evolved by horizontal gene transfer (HGT) whereas other member of class A such as TEM-1 evolved by gene duplication mechanism. Taken together, these studies justify structure-function relationship of beta-lactamases and phylogenetic studies suggest these enzymes evolved by different mechanisms. PMID:24966539

  19. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine.

    PubMed

    Ye, Hao; Meehan, Joe; Tong, Weida; Hong, Huixiao

    2015-01-01

    Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS) is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants.

  20. Alignment of Short Reads: A Crucial Step for Application of Next-Generation Sequencing Data in Precision Medicine

    PubMed Central

    Ye, Hao; Meehan, Joe; Tong, Weida; Hong, Huixiao

    2015-01-01

    Precision medicine or personalized medicine has been proposed as a modernized and promising medical strategy. Genetic variants of patients are the key information for implementation of precision medicine. Next-generation sequencing (NGS) is an emerging technology for deciphering genetic variants. Alignment of raw reads to a reference genome is one of the key steps in NGS data analysis. Many algorithms have been developed for alignment of short read sequences since 2008. Users have to make a decision on which alignment algorithm to use in their studies. Selection of the right alignment algorithm determines not only the alignment algorithm but also the set of suitable parameters to be used by the algorithm. Understanding these algorithms helps in selecting the appropriate alignment algorithm for different applications in precision medicine. Here, we review current available algorithms and their major strategies such as seed-and-extend and q-gram filter. We also discuss the challenges in current alignment algorithms, including alignment in multiple repeated regions, long reads alignment and alignment facilitated with known genetic variants. PMID:26610555

  1. CUDA ClustalW: An efficient parallel algorithm for progressive multiple sequence alignment on Multi-GPUs.

    PubMed

    Hung, Che-Lun; Lin, Yu-Shiang; Lin, Chun-Yuan; Chung, Yeh-Ching; Chung, Yi-Fang

    2015-10-01

    For biological applications, sequence alignment is an important strategy to analyze DNA and protein sequences. Multiple sequence alignment is an essential methodology to study biological data, such as homology modeling, phylogenetic reconstruction and etc. However, multiple sequence alignment is a NP-hard problem. In the past decades, progressive approach has been proposed to successfully align multiple sequences by adopting iterative pairwise alignments. Due to rapid growth of the next generation sequencing technologies, a large number of sequences can be produced in a short period of time. When the problem instance is large, progressive alignment will be time consuming. Parallel computing is a suitable solution for such applications, and GPU is one of the important architectures for contemporary parallel computing researches. Therefore, we proposed a GPU version of ClustalW v2.0.11, called CUDA ClustalW v1.0, in this work. From the experiment results, it can be seen that the CUDA ClustalW v1.0 can achieve more than 33× speedups for overall execution time by comparing to ClustalW v2.0.11.

  2. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    PubMed

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples.

  3. HBLAST: Parallelised sequence similarity--A Hadoop MapReducable basic local alignment search tool.

    PubMed

    O'Driscoll, Aisling; Belogrudov, Vladislav; Carroll, John; Kropp, Kai; Walsh, Paul; Ghazal, Peter; Sleator, Roy D

    2015-04-01

    The recent exponential growth of genomic databases has resulted in the common task of sequence alignment becoming one of the major bottlenecks in the field of computational biology. It is typical for these large datasets and complex computations to require cost prohibitive High Performance Computing (HPC) to function. As such, parallelised solutions have been proposed but many exhibit scalability limitations and are incapable of effectively processing "Big Data" - the name attributed to datasets that are extremely large, complex and require rapid processing. The Hadoop framework, comprised of distributed storage and a parallelised programming framework known as MapReduce, is specifically designed to work with such datasets but it is not trivial to efficiently redesign and implement bioinformatics algorithms according to this paradigm. The parallelisation strategy of "divide and conquer" for alignment algorithms can be applied to both data sets and input query sequences. However, scalability is still an issue due to memory constraints or large databases, with very large database segmentation leading to additional performance decline. Herein, we present Hadoop Blast (HBlast), a parallelised BLAST algorithm that proposes a flexible method to partition both databases and input query sequences using "virtual partitioning". HBlast presents improved scalability over existing solutions and well balanced computational work load while keeping database segmentation and recompilation to a minimum. Enhanced BLAST search performance on cheap memory constrained hardware has significant implications for in field clinical diagnostic testing; enabling faster and more accurate identification of pathogenic DNA in human blood or tissue samples. PMID:25625550

  4. Assessing genetic diversity in java fine-flavor cocoa (theobroma cacao l.) Germplasm by simple sequence repeat (ssr) markers

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Indonesia is the 3rd largest cocoa producing countries in the world, with an annual cacao bean production of 572,000 tons. The currently cultivated cacao varieties in Indonesia were inter-hybrids of various clones introduced from the Americas since the 16th century. Among them, “Java cocoa” is a wel...

  5. Open-Phylo: a customizable crowd-computing platform for multiple sequence alignment.

    PubMed

    Kwak, Daniel; Kam, Alfred; Becerra, David; Zhou, Qikuan; Hops, Adam; Zarour, Eleyine; Kam, Arthur; Sarmenta, Luis; Blanchette, Mathieu; Waldispühl, Jérôme

    2013-01-01

    Citizen science games such as Galaxy Zoo, Foldit, and Phylo aim to harness the intelligence and processing power generated by crowds of online gamers to solve scientific problems. However, the selection of the data to be analyzed through these games is under the exclusive control of the game designers, and so are the results produced by gamers. Here, we introduce Open-Phylo, a freely accessible crowd-computing platform that enables any scientist to enter our system and use crowds of gamers to assist computer programs in solving one of the most fundamental problems in genomics: the multiple sequence alignment problem. PMID:24148814

  6. Antibody-specific model of amino acid substitution for immunological inferences from alignments of antibody sequences.

    PubMed

    Mirsky, Alexander; Kazandjian, Linda; Anisimova, Maria

    2015-03-01

    Antibodies are glycoproteins produced by the immune system as a dynamically adaptive line of defense against invading pathogens. Very elegant and specific mutational mechanisms allow B lymphocytes to produce a large and diversified repertoire of antibodies, which is modified and enhanced throughout all adulthood. One of these mechanisms is somatic hypermutation, which stochastically mutates nucleotides in the antibody genes, forming new sequences with different properties and, eventually, higher affinity and selectivity to the pathogenic target. As somatic hypermutation involves fast mutation of antibody sequences, this process can be described using a Markov substitution model of molecular evolution. Here, using large sets of antibody sequences from mice and humans, we infer an empirical amino acid substitution model AB, which is specific to antibody sequences. Compared with existing general amino acid models, we show that the AB model provides significantly better description for the somatic evolution of mice and human antibody sequences, as demonstrated on large next generation sequencing (NGS) antibody data. General amino acid models are reflective of conservation at the protein level due to functional constraints, with most frequent amino acids exchanges taking place between residues with the same or similar physicochemical properties. In contrast, within the variable part of antibody sequences we observed an elevated frequency of exchanges between amino acids with distinct physicochemical properties. This is indicative of a sui generis mutational mechanism, specific to antibody somatic hypermutation. We illustrate this property of antibody sequences by a comparative analysis of the network modularity implied by the AB model and general amino acid substitution models. We recommend using the new model for computational studies of antibody sequence maturation, including inference of alignments and phylogenetic trees describing antibody somatic hypermutation in

  7. Two Simple and Efficient Algorithms to Compute the SP-Score Objective Function of a Multiple Sequence Alignment

    PubMed Central

    Ranwez, Vincent

    2016-01-01

    Background Multiple sequence alignment (MSA) is a crucial step in many molecular analyses and many MSA tools have been developed. Most of them use a greedy approach to construct a first alignment that is then refined by optimizing the sum of pair score (SP-score). The SP-score estimation is thus a bottleneck for most MSA tools since it is repeatedly required and is time consuming. Results Given an alignment of n sequences and L sites, I introduce here optimized solutions reaching O(nL) time complexity for affine gap cost, instead of O(n2L), which are easy to implement. PMID:27505054

  8. 3-d structure-based amino acid sequence alignment of esterases, lipases and related proteins

    SciTech Connect

    Gentry, M.K.; Doctor, B.P.; Cygler, M.; Schrag, J.D.; Sussman, J.L.

    1993-05-13

    Acetylcholinesterase and butyrylcholinesterase, enzymes with potential as pretreatment drugs for organophosphate toxicity, are members of a larger family of homologous proteins that includes carboxylesterases, cholesterol esterases, lipases, and several nonhydrolytic proteins. A computer-generated alignment of 18 of the proteins, the acetylcholinesases, butyrylcholinesterases, carboxylesterases, some esterases, and the nonenzymatic proteins has been previously presented. More recently, the three-dimensional structures of two enzymes enzymes in this group, acetylcholinesterase from Torpedo californica and lipase from Geotrichum candidum, have been determined. Based on the x-ray structures and the superposition of these two enzymes, it was possible to obtain an improved amino acid sequence alignment of 32 members of this family of proteins. Examination of this alignment reveals that 24 amino acids are invariant in all of the hydrolytic proteins, and an additional 49 are well conserved. Conserved amino acids include those of the active site, the disulfide bridges, the salt bridges, in the core of the proteins, and at the edges of secondary structural elements. Comparison of the three-dimensional structures makes it possible to find a well-defined structural basis for the conservation of many of these amino acids.

  9. Memory-efficient dynamic programming backtrace and pairwise local sequence alignment

    PubMed Central

    Newberg, Lee A.

    2008-01-01

    Motivation: A backtrace through a dynamic programming algorithm's intermediate results in search of an optimal path, or to sample paths according to an implied probability distribution, or as the second stage of a forward–backward algorithm, is a task of fundamental importance in computational biology. When there is insufficient space to store all intermediate results in high-speed memory (e.g. cache) existing approaches store selected stages of the computation, and recompute missing values from these checkpoints on an as-needed basis. Results: Here we present an optimal checkpointing strategy, and demonstrate its utility with pairwise local sequence alignment of sequences of length 10 000. Availability: Sample C++-code for optimal backtrace is available in the Supplementary Materials. Contact: leen@cs.rpi.edu Supplementary information: Supplementary data is available at Bioinformatics online. PMID:18558620

  10. An alignment-free method to find and visualise rearrangements between pairs of DNA sequences

    PubMed Central

    Pratas, Diogo; Silva, Raquel M.; Pinho, Armando J.; Ferreira, Paulo J.S.G.

    2015-01-01

    Species evolution is indirectly registered in their genomic structure. The emergence and advances in sequencing technology provided a way to access genome information, namely to identify and study evolutionary macro-events, as well as chromosome alterations for clinical purposes. This paper describes a completely alignment-free computational method, based on a blind unsupervised approach, to detect large-scale and small-scale genomic rearrangements between pairs of DNA sequences. To illustrate the power and usefulness of the method we give complete chromosomal information maps for the pairs human-chimpanzee and human-orangutan. The tool by means of which these results were obtained has been made publicly available and is described in detail. PMID:25984837

  11. Nucleotide sequence alignment of hdcA from Gram-positive bacteria.

    PubMed

    Diaz, Maria; Ladero, Victor; Redruello, Begoña; Sanchez-Llana, Esther; Del Rio, Beatriz; Fernandez, Maria; Martin, Maria Cruz; Alvarez, Miguel A

    2016-03-01

    The decarboxylation of histidine -carried out mainly by some gram-positive bacteria- yields the toxic dietary biogenic amine histamine (Ladero et al. 2010 〈10.2174/157340110791233256〉 [1], Linares et al. 2016 〈http://dx.doi.org/10.1016/j.foodchem.2015.11.013〉〉 [2]). The reaction is catalyzed by a pyruvoyl-dependent histidine decarboxylase (Linares et al. 2011 〈10.1080/10408398.2011.582813〉 [3]), which is encoded by the gene hdcA. In order to locate conserved regions in the hdcA gene of Gram-positive bacteria, this article provides a nucleotide sequence alignment of all the hdcA sequences from Gram-positive bacteria present in databases. For further utility and discussion, see 〈http://dx.doi.org/ 10.1016/j.foodcont.2015.11.035〉〉 [4].

  12. Nucleotide sequence alignment of hdcA from Gram-positive bacteria

    PubMed Central

    Diaz, Maria; Ladero, Victor; Redruello, Begoña; Sanchez-Llana, Esther; del Rio, Beatriz; Fernandez, Maria; Martin, Maria Cruz; Alvarez, Miguel A.

    2016-01-01

    The decarboxylation of histidine -carried out mainly by some gram-positive bacteria- yields the toxic dietary biogenic amine histamine (Ladero et al. 2010 〈10.2174/157340110791233256〉 [1], Linares et al. 2016 〈http://dx.doi.org/10.1016/j.foodchem.2015.11.013〉〉 [2]). The reaction is catalyzed by a pyruvoyl-dependent histidine decarboxylase (Linares et al. 2011 〈10.1080/10408398.2011.582813〉 [3]), which is encoded by the gene hdcA. In order to locate conserved regions in the hdcA gene of Gram-positive bacteria, this article provides a nucleotide sequence alignment of all the hdcA sequences from Gram-positive bacteria present in databases. For further utility and discussion, see 〈http://dx.doi.org/ 10.1016/j.foodcont.2015.11.035〉〉 [4]. PMID:26958625

  13. An alignment-free method to find and visualise rearrangements between pairs of DNA sequences.

    PubMed

    Pratas, Diogo; Silva, Raquel M; Pinho, Armando J; Ferreira, Paulo J S G

    2015-01-01

    Species evolution is indirectly registered in their genomic structure. The emergence and advances in sequencing technology provided a way to access genome information, namely to identify and study evolutionary macro-events, as well as chromosome alterations for clinical purposes. This paper describes a completely alignment-free computational method, based on a blind unsupervised approach, to detect large-scale and small-scale genomic rearrangements between pairs of DNA sequences. To illustrate the power and usefulness of the method we give complete chromosomal information maps for the pairs human-chimpanzee and human-orangutan. The tool by means of which these results were obtained has been made publicly available and is described in detail.

  14. A probabilistic coding based quantum genetic algorithm for multiple sequence alignment.

    PubMed

    Huo, Hongwei; Xie, Qiaoluan; Shen, Xubang; Stojkovic, Vojislav

    2008-01-01

    This paper presents an original Quantum Genetic algorithm for Multiple sequence ALIGNment (QGMALIGN) that combines a genetic algorithm and a quantum algorithm. A quantum probabilistic coding is designed for representing the multiple sequence alignment. A quantum rotation gate as a mutation operator is used to guide the quantum state evolution. Six genetic operators are designed on the coding basis to improve the solution during the evolutionary process. The features of implicit parallelism and state superposition in quantum mechanics and the global search capability of the genetic algorithm are exploited to get efficient computation. A set of well known test cases from BAliBASE2.0 is used as reference to evaluate the efficiency of the QGMALIGN optimization. The QGMALIGN results have been compared with the most popular methods (CLUSTALX, SAGA, DIALIGN, SB_PIMA, and QGMALIGN) results. The QGMALIGN results show that QGMALIGN performs well on the presenting biological data. The addition of genetic operators to the quantum algorithm lowers the cost of overall running time.

  15. QuickProbs—A Fast Multiple Sequence Alignment Algorithm Designed for Graphics Processors

    PubMed Central

    Gudyś, Adam; Deorowicz, Sebastian

    2014-01-01

    Multiple sequence alignment is a crucial task in a number of biological analyses like secondary structure prediction, domain searching, phylogeny, etc. MSAProbs is currently the most accurate alignment algorithm, but its effectiveness is obtained at the expense of computational time. In the paper we present QuickProbs, the variant of MSAProbs customised for graphics processors. We selected the two most time consuming stages of MSAProbs to be redesigned for GPU execution: the posterior matrices calculation and the consistency transformation. Experiments on three popular benchmarks (BAliBASE, PREFAB, OXBench-X) on quad-core PC equipped with high-end graphics card show QuickProbs to be 5.7 to 9.7 times faster than original CPU-parallel MSAProbs. Additional tests performed on several protein families from Pfam database give overall speed-up of 6.7. Compared to other algorithms like MAFFT, MUSCLE, or ClustalW, QuickProbs proved to be much more accurate at similar speed. Additionally we introduce a tuned variant of QuickProbs which is significantly more accurate on sets of distantly related sequences than MSAProbs without exceeding its computation time. The GPU part of QuickProbs was implemented in OpenCL, thus the package is suitable for graphics processors produced by all major vendors. PMID:24586435

  16. Resolving the multiple sequence alignment problem using biogeography-based optimization with multiple populations.

    PubMed

    Zemali, El-Amine; Boukra, Abdelmadjid

    2015-08-01

    The multiple sequence alignment (MSA) is one of the most challenging problems in bioinformatics, it involves discovering similarity between a set of protein or DNA sequences. This paper introduces a new method for the MSA problem called biogeography-based optimization with multiple populations (BBOMP). It is based on a recent metaheuristic inspired from the mathematics of biogeography named biogeography-based optimization (BBO). To improve the exploration ability of BBO, we have introduced a new concept allowing better exploration of the search space. It consists of manipulating multiple populations having each one its own parameters. These parameters are used to build up progressive alignments allowing more diversity. At each iteration, the best found solution is injected in each population. Moreover, to improve solution quality, six operators are defined. These operators are selected with a dynamic probability which changes according to the operators efficiency. In order to test proposed approach performance, we have considered a set of datasets from Balibase 2.0 and compared it with many recent algorithms such as GAPAM, MSA-GA, QEAMSA and RBT-GA. The results show that the proposed approach achieves better average score than the previously cited methods.

  17. MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment.

    PubMed

    Kumar, Sudhir; Tamura, Koichiro; Nei, Masatoshi

    2004-06-01

    With its theoretical basis firmly established in molecular evolutionary and population genetics, the comparative DNA and protein sequence analysis plays a central role in reconstructing the evolutionary histories of species and multigene families, estimating rates of molecular evolution, and inferring the nature and extent of selective forces shaping the evolution of genes and genomes. The scope of these investigations has now expanded greatly owing to the development of high-throughput sequencing techniques and novel statistical and computational methods. These methods require easy-to-use computer programs. One such effort has been to produce Molecular Evolutionary Genetics Analysis (MEGA) software, with its focus on facilitating the exploration and analysis of the DNA and protein sequence variation from an evolutionary perspective. Currently in its third major release, MEGA3 contains facilities for automatic and manual sequence alignment, web-based mining of databases, inference of the phylogenetic trees, estimation of evolutionary distances and testing evolutionary hypotheses. This paper provides an overview of the statistical methods, computational tools, and visual exploration modules for data input and the results obtainable in MEGA.

  18. AREM: Aligning Short Reads from ChIP-Sequencing by Expectation Maximization

    NASA Astrophysics Data System (ADS)

    Newkirk, Daniel; Biesinger, Jacob; Chon, Alvin; Yokomori, Kyoko; Xie, Xiaohui

    High-throughput sequencing coupled to chromatin immunoprecipitation (ChIP-Seq) is widely used in characterizing genome-wide binding patterns of transcription factors, cofactors, chromatin modifiers, and other DNA binding proteins. A key step in ChIP-Seq data analysis is to map short reads from high-throughput sequencing to a reference genome and identify peak regions enriched with short reads. Although several methods have been proposed for ChIP-Seq analysis, most existing methods only consider reads that can be uniquely placed in the reference genome, and therefore have low power for detecting peaks located within repeat sequences. Here we introduce a probabilistic approach for ChIP-Seq data analysis which utilizes all reads, providing a truly genome-wide view of binding patterns. Reads are modeled using a mixture model corresponding to K enriched regions and a null genomic background. We use maximum likelihood to estimate the locations of the enriched regions, and implement an expectation-maximization (E-M) algorithm, called AREM (aligning reads by expectation maximization), to update the alignment probabilities of each read to different genomic locations. We apply the algorithm to identify genome-wide binding events of two proteins: Rad21, a component of cohesin and a key factor involved in chromatid cohesion, and Srebp-1, a transcription factor important for lipid/cholesterol homeostasis. Using AREM, we were able to identify 19,935 Rad21 peaks and 1,748 Srebp-1 peaks in the mouse genome with high confidence, including 1,517 (7.6%) Rad21 peaks and 227 (13%) Srebp-1 peaks that were missed using only uniquely mapped reads. The open source implementation of our algorithm is available at http://sourceforge.net/projects/arem

  19. Sequence stratigraphy, structural style, and age of deformation of the Malaita accretionary prism (Solomon arc-Ontong Java Plateau convergent zone)

    NASA Astrophysics Data System (ADS)

    Phinney, Eric J.; Mann, Paul; Coffin, Millard F.; Shipley, Thomas H.

    2004-10-01

    Possibilities for the fate of oceanic plateaus at subduction zones range from complete subduction of the plateau beneath the arc to complete plateau-arc accretion and resulting collisional orogenesis. Deep penetration, multi-channel seismic reflection (MCS) data from the northern flank of the Solomon Islands reveal the sequence stratigraphy, structural style, and age of deformation of an accretionary prism formed during late Neogene (5-0 Ma) convergence between the ˜33-km-thick crust of the Ontong Java oceanic plateau and the ˜15-km-thick Solomon island arc. Correlation of MCS data with the satellite-derived, free-air gravity field defines the tectonic boundaries and internal structure of the 800-km-long, 140-km-wide accretionary prism. We name this prism the "Malaita accretionary prism" or "MAP" after Malaita, the largest and best-studied island exposure of the accretionary prism in the Solomon Islands. MCS data, gravity data, and stratigraphic correlations to islands and ODP sites on the Ontong Java Plateau (OJP) reveal that the offshore MAP is composed of folded and thrust faulted sedimentary rocks and upper crystalline crust offscraped from the Solomon the subducting Ontong Java Plateau (Pacific plate) and transferred to the Solomon arc. With the exception of an upper, sequence of Quaternary? island-derived terrigenous sediments, the deformed stratigraphy of the MAP is identical to that of the incoming Ontong Java Plateau in the North Solomon trench. We divide the MAP into four distinct, folded and thrust fault-bounded structural domains interpreted to have formed by diachronous, southeast-to-northwest, and highly oblique entry of the Ontong Java Plateau into a former trench now marked by the Kia-Kaipito-Korigole (KKK) left-lateral strike-slip fault zone along the suture between the Solomon arc and the MAP. The structural style within each of the four structural domains consists of a parallel series of three to four fault propagation folds formed by the

  20. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.

    PubMed

    Herzeel, Charlotte; Costanza, Pascal; Decap, Dries; Fostier, Jan; Reumers, Joke

    2015-01-01

    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406

  1. Alignment of 3D Building Models and TIR Video Sequences with Line Tracking

    NASA Astrophysics Data System (ADS)

    Iwaszczuk, D.; Stilla, U.

    2014-11-01

    Thermal infrared imagery of urban areas became interesting for urban climate investigations and thermal building inspections. Using a flying platform such as UAV or a helicopter for the acquisition and combining the thermal data with the 3D building models via texturing delivers a valuable groundwork for large-area building inspections. However, such thermal textures are useful for further analysis if they are geometrically correctly extracted. This can be achieved with a good coregistrations between the 3D building models and thermal images, which cannot be achieved by direct georeferencing. Hence, this paper presents methodology for alignment of 3D building models and oblique TIR image sequences taken from a flying platform. In a single image line correspondences between model edges and image line segments are found using accumulator approach and based on these correspondences an optimal camera pose is calculated to ensure the best match between the projected model and the image structures. Among the sequence the linear features are tracked based on visibility prediction. The results of the proposed methodology are presented using a TIR image sequence taken from helicopter in a densely built-up urban area. The novelty of this work is given by employing the uncertainty of the 3D building models and by innovative tracking strategy based on a priori knowledge from the 3D building model and the visibility checking.

  2. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling.

    PubMed

    Herzeel, Charlotte; Costanza, Pascal; Decap, Dries; Fostier, Jan; Reumers, Joke

    2015-01-01

    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost.

  3. elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling

    PubMed Central

    Decap, Dries; Fostier, Jan; Reumers, Joke

    2015-01-01

    elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406

  4. The identification of complete domains within protein sequences using accurate E-values for semi-global alignment

    PubMed Central

    Kann, Maricel G.; Sheetlin, Sergey L.; Park, Yonil; Bryant, Stephen H.; Spouge, John L.

    2007-01-01

    The sequencing of complete genomes has created a pressing need for automated annotation of gene function. Because domains are the basic units of protein function and evolution, a gene can be annotated from a domain database by aligning domains to the corresponding protein sequence. Ideally, complete domains are aligned to protein subsequences, in a ‘semi-global alignment’. Local alignment, which aligns pieces of domains to subsequences, is common in high-throughput annotation applications, however. It is a mature technique, with the heuristics and accurate E-values required for screening large databases and evaluating the screening results. Hidden Markov models (HMMs) provide an alternative theoretical framework for semi-global alignment, but their use is limited because they lack heuristic acceleration and accurate E-values. Our new tool, GLOBAL, overcomes some limitations of previous semi-global HMMs: it has accurate E-values and the possibility of the heuristic acceleration required for high-throughput applications. Moreover, according to a standard of truth based on protein structure, two semi-global HMM alignment tools (GLOBAL and HMMer) had comparable performance in identifying complete domains, but distinctly outperformed two tools based on local alignment. When searching for complete protein domains, therefore, GLOBAL avoids disadvantages commonly associated with HMMs, yet maintains their superior retrieval performance. PMID:17596268

  5. IBBOMSA: An Improved Biogeography-based Approach for Multiple Sequence Alignment

    PubMed Central

    Yadav, Rohit Kumar; Banka, Haider

    2016-01-01

    In bioinformatics, multiple sequence alignment (MSA) is an NP-hard problem. Hence, nature-inspired techniques can better approximate the solution. In the current study, a novel biogeography-based optimization (NBBO) is proposed to solve an MSA problem. The biogeography-based optimization (BBO) is a new paradigm for optimization. But, there exists some deficiencies in solving complicated problems such as low population diversity and slow convergence rate. NBBO is an enhanced version of BBO, in which, a new migration operation is proposed to overcome the limitations of BBO. The new migration adopts more information from other habitats, maintains population diversity, and preserves exploitation ability. In the performance analysis, the proposed and existing techniques such as VDGA, MOMSA, and GAPAM are tested on publicly available benchmark datasets (ie, Bali base). It has been observed that the proposed method shows the superiority/competitiveness with the existing techniques. PMID:27812276

  6. SP-Designer: a user-friendly program for designing species-specific primer pairs from DNA sequence alignments.

    PubMed

    Villard, Pierre; Malausa, Thibaut

    2013-07-01

    SP-Designer is an open-source program providing a user-friendly tool for the design of specific PCR primer pairs from a DNA sequence alignment containing sequences from various taxa. SP-Designer selects PCR primer pairs for the amplification of DNA from a target species on the basis of several criteria: (i) primer specificity, as assessed by interspecific sequence polymorphism in the annealing regions, (ii) the biochemical characteristics of the primers and (iii) the intended PCR conditions. SP-Designer generates tables, detailing the primer pair and PCR characteristics, and a FASTA file locating the primer sequences in the original sequence alignment. SP-Designer is Windows-compatible and freely available from http://www2.sophia.inra.fr/urih/sophia_mart/sp_designer/info_sp_designer.php.

  7. A genome survey sequencing of the Java mouse deer (Tragulus javanicus) adds new aspects to the evolution of lineage specific retrotransposons in Ruminantia (Cetartiodactyla).

    PubMed

    Gallus, S; Kumar, V; Bertelsen, M F; Janke, A; Nilsson, M A

    2015-10-25

    Ruminantia, the ruminating, hoofed mammals (cow, deer, giraffe and allies) are an unranked artiodactylan clade. Around 50-60 million years ago the BovB retrotransposon entered the ancestral ruminantian genome through horizontal gene transfer. A survey genome screen using 454-pyrosequencing of the Java mouse deer (Tragulus javanicus) and the lesser kudu (Tragelaphus imberbis) was done to investigate and to compare the landscape of transposable elements within Ruminantia. The family Tragulidae (mouse deer) is the only representative of Tragulina and phylogenetically important, because it represents the earliest divergence in Ruminantia. The data analyses show that, relative to other ruminantian species, the lesser kudu genome has seen an expansion of BovB Long INterspersed Elements (LINEs) and BovB related Short INterspersed Elements (SINEs) like BOVA2. In comparison the genome of Java mouse deer has fewer BovB elements than other ruminants, especially Bovinae, and has in addition a novel CHR-3 SINE most likely propagated by LINE-1. By contrast the other ruminants have low amounts of CHR SINEs but high numbers of actively propagating BovB-derived and BovB-propagated SINEs. The survey sequencing data suggest that the transposable element landscape in mouse deer (Tragulina) is unique among Ruminantia, suggesting a lineage specific evolutionary trajectory that does not involve BovB mediated retrotransposition. This shows that the genomic landscape of mobile genetic elements can rapidly change in any lineage.

  8. A genome survey sequencing of the Java mouse deer (Tragulus javanicus) adds new aspects to the evolution of lineage specific retrotransposons in Ruminantia (Cetartiodactyla).

    PubMed

    Gallus, S; Kumar, V; Bertelsen, M F; Janke, A; Nilsson, M A

    2015-10-25

    Ruminantia, the ruminating, hoofed mammals (cow, deer, giraffe and allies) are an unranked artiodactylan clade. Around 50-60 million years ago the BovB retrotransposon entered the ancestral ruminantian genome through horizontal gene transfer. A survey genome screen using 454-pyrosequencing of the Java mouse deer (Tragulus javanicus) and the lesser kudu (Tragelaphus imberbis) was done to investigate and to compare the landscape of transposable elements within Ruminantia. The family Tragulidae (mouse deer) is the only representative of Tragulina and phylogenetically important, because it represents the earliest divergence in Ruminantia. The data analyses show that, relative to other ruminantian species, the lesser kudu genome has seen an expansion of BovB Long INterspersed Elements (LINEs) and BovB related Short INterspersed Elements (SINEs) like BOVA2. In comparison the genome of Java mouse deer has fewer BovB elements than other ruminants, especially Bovinae, and has in addition a novel CHR-3 SINE most likely propagated by LINE-1. By contrast the other ruminants have low amounts of CHR SINEs but high numbers of actively propagating BovB-derived and BovB-propagated SINEs. The survey sequencing data suggest that the transposable element landscape in mouse deer (Tragulina) is unique among Ruminantia, suggesting a lineage specific evolutionary trajectory that does not involve BovB mediated retrotransposition. This shows that the genomic landscape of mobile genetic elements can rapidly change in any lineage. PMID:26123917

  9. Efficient and Accurate OTU Clustering with GPU-Based Sequence Alignment and Dynamic Dendrogram Cutting.

    PubMed

    Nguyen, Thuy-Diem; Schmidt, Bertil; Zheng, Zejun; Kwoh, Chee-Keong

    2015-01-01

    De novo clustering is a popular technique to perform taxonomic profiling of a microbial community by grouping 16S rRNA amplicon reads into operational taxonomic units (OTUs). In this work, we introduce a new dendrogram-based OTU clustering pipeline called CRiSPy. The key idea used in CRiSPy to improve clustering accuracy is the application of an anomaly detection technique to obtain a dynamic distance cutoff instead of using the de facto value of 97 percent sequence similarity as in most existing OTU clustering pipelines. This technique works by detecting an abrupt change in the merging heights of a dendrogram. To produce the output dendrograms, CRiSPy employs the OTU hierarchical clustering approach that is computed on a genetic distance matrix derived from an all-against-all read comparison by pairwise sequence alignment. However, most existing dendrogram-based tools have difficulty processing datasets larger than 10,000 unique reads due to high computational complexity. We address this difficulty by developing two efficient algorithms for CRiSPy: a compute-efficient GPU-accelerated parallel algorithm for pairwise distance matrix computation and a memory-efficient hierarchical clustering algorithm. Our experiments on various datasets with distinct attributes show that CRiSPy is able to produce more accurate OTU groupings than most OTU clustering applications. PMID:26451819

  10. PETcofold: predicting conserved interactions and structures of two multiple alignments of RNA sequences

    PubMed Central

    Seemann, Stefan E.; Richter, Andreas S.; Gesell, Tanja; Backofen, Rolf; Gorodkin, Jan

    2011-01-01

    Motivation: Predicting RNA–RNA interactions is essential for determining the function of putative non-coding RNAs. Existing methods for the prediction of interactions are all based on single sequences. Since comparative methods have already been useful in RNA structure determination, we assume that conserved RNA–RNA interactions also imply conserved function. Of these, we further assume that a non-negligible amount of the existing RNA–RNA interactions have also acquired compensating base changes throughout evolution. We implement a method, PETcofold, that can take covariance information in intra-molecular and inter-molecular base pairs into account to predict interactions and secondary structures of two multiple alignments of RNA sequences. Results: PETcofold's ability to predict RNA–RNA interactions was evaluated on a carefully curated dataset of 32 bacterial small RNAs and their targets, which was manually extracted from the literature. For evaluation of both RNA–RNA interaction and structure prediction, we were able to extract only a few high-quality examples: one vertebrate small nucleolar RNA and four bacterial small RNAs. For these we show that the prediction can be improved by our comparative approach. Furthermore, PETcofold was evaluated on controlled data with phylogenetically simulated sequences enriched for covariance patterns at the interaction sites. We observed increased performance with increased amounts of covariance. Availability: The program PETcofold is available as source code and can be downloaded from http://rth.dk/resources/petcofold. Contact: gorodkin@rth.dk; backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21088024

  11. StreamingTrim 1.0: a Java software for dynamic trimming of 16S rRNA sequence data from metagenetic studies.

    PubMed

    Bacci, G; Bazzicalupo, M; Benedetti, A; Mengoni, A

    2014-03-01

    Next-generation sequencing technologies are extensively used in the field of molecular microbial ecology to describe taxonomic composition and to infer functionality of microbial communities. In particular, the so-called barcode or metagenetic applications that are based on PCR amplicon library sequencing are very popular at present. One of the problems, related to the utilization of the data of these libraries, is the analysis of reads quality and removal (trimming) of low-quality segments, while retaining sufficient information for subsequent analyses (e.g. taxonomic assignment). Here, we present StreamingTrim, a DNA reads trimming software, written in Java, with which researchers are able to analyse the quality of DNA sequences in fastq files and to search for low-quality zones in a very conservative way. This software has been developed with the aim to provide a tool capable of trimming amplicon library data, retaining as much as taxonomic information as possible. This software is equipped with a graphical user interface for a user-friendly usage. Moreover, from a computational point of view, StreamingTrim reads and analyses sequences one by one from an input fastq file, without keeping anything in memory, permitting to run the computation on a normal desktop PC or even a laptop. Trimmed sequences are saved in an output file, and a statistics summary is displayed that contains the mean and standard deviation of the length and quality of the whole sequence file. Compiled software, a manual and example data sets are available under the BSD-2-Clause License at the GitHub repository at https://github.com/GiBacci/StreamingTrim/.

  12. Alignment of nucleotide or amino acid sequences on microcomputers, using a modification of Sellers' (1974) algorithm which avoids the need for calculation of the complete distance matrix.

    PubMed

    Tyson, H; Haley, B

    1985-10-01

    A program to calculate optimum alignment between two sequences, which may be DNA, amino acid or other information, has been written in PASCAL. The Sellers' algorithm for calculating distance between sequences has been modified to reduce its demands on microcomputer memory space by more than half. Gap penalties and mismatch scores are user-adjustable. In 48 K of memory the program aligns sequences up to 170 elements in length; optimum alignment and total distance between a pair of sequences are displayed. The program aligns longer sequences by subdivision of both sequences into corresponding, overlapping sections. Section length and amount of section overlap are user-defined. More importantly, extension of this modification of Sellers' algorithm to align longer sequences, given hardware and compilers/languages capable of using a larger memory space (e.g. 640 K), shows that it is now possible to align, without subdivision, sequences with up to 700 elements each. The increase in computation time for this program with increasing sequence lengths aligned without subdivision is curvilinear, but total times are essentially dependent on hardware/language/compiler combinations. The statistical significance of an alignment is examined with conventional Monte Carlo approaches. PMID:3852712

  13. Evolution of the cytochrome P450 superfamily: sequence alignments and pharmacogenetics.

    PubMed

    Lewis, D F; Watson, E; Lake, B G

    1998-06-01

    The evolution of the cytochrome P450 (CYP) superfamily is described, with particular reference to major events in the development of biological forms during geological time. It is noted that the currently accepted timescale for the elaboration of the P450 phylogenetic tree exhibits close parallels with the evolution of terrestrial biota. Indeed, the present human P450 complement of xenobiotic-metabolizing enzymes may have originated from coevolutionary 'warfare' between plants and animals during the Devonian period about 400 million years ago. A number of key correspondences between the evolution of P450 system and the course of biological development over time, point to a mechanistic molecular biology of evolution which is consistent with a steady increase in atmospheric oxygenation beginning over 2000 million years ago, whereas dietary changes during more recent geological time may provide one possible explanation for certain species differences in metabolism. Alignment between P450 protein sequences within the same family or subfamily, together with across-family comparisons, aid the rationalization of drug metabolism specificities for different P450 isoforms, and can assist in an understanding of genetic polymorphisms in P450-mediated oxidations at the molecular level. Moreover, the variation in P450 regulatory mechanisms and inducibilities between different mammalian species are likely to have important implications for current procedures of chemical safety evaluation, which rely on pure genetic strains of laboratory bred rodents for the testing of compounds destined for human exposure.

  14. Structure-Based Sequence Alignment of the Transmembrane Domains of All Human GPCRs: Phylogenetic, Structural and Functional Implications

    PubMed Central

    Cvicek, Vaclav; Goddard, William A.; Abrol, Ravinder

    2016-01-01

    The understanding of G-protein coupled receptors (GPCRs) is undergoing a revolution due to increased information about their signaling and the experimental determination of structures for more than 25 receptors. The availability of at least one receptor structure for each of the GPCR classes, well separated in sequence space, enables an integrated superfamily-wide analysis to identify signatures involving the role of conserved residues, conserved contacts, and downstream signaling in the context of receptor structures. In this study, we align the transmembrane (TM) domains of all experimental GPCR structures to maximize the conserved inter-helical contacts. The resulting superfamily-wide GpcR Sequence-Structure (GRoSS) alignment of the TM domains for all human GPCR sequences is sufficient to generate a phylogenetic tree that correctly distinguishes all different GPCR classes, suggesting that the class-level differences in the GPCR superfamily are encoded at least partly in the TM domains. The inter-helical contacts conserved across all GPCR classes describe the evolutionarily conserved GPCR structural fold. The corresponding structural alignment of the inactive and active conformations, available for a few GPCRs, identifies activation hot-spot residues in the TM domains that get rewired upon activation. Many GPCR mutations, known to alter receptor signaling and cause disease, are located at these conserved contact and activation hot-spot residue positions. The GRoSS alignment places the chemosensory receptor subfamilies for bitter taste (TAS2R) and pheromones (Vomeronasal, VN1R) in the rhodopsin family, known to contain the chemosensory olfactory receptor subfamily. The GRoSS alignment also enables the quantification of the structural variability in the TM regions of experimental structures, useful for homology modeling and structure prediction of receptors. Furthermore, this alignment identifies structurally and functionally important residues in all human GPCRs

  15. An optimized and low-cost FPGA-based DNA sequence alignment--a step towards personal genomics.

    PubMed

    Shah, Hurmat Ali; Hasan, Laiq; Ahmad, Nasir

    2013-01-01

    DNA sequence alignment is a cardinal process in computational biology but also is much expensive computationally when performing through traditional computational platforms like CPU. Of many off the shelf platforms explored for speeding up the computation process, FPGA stands as the best candidate due to its performance per dollar spent and performance per watt. These two advantages make FPGA as the most appropriate choice for realizing the aim of personal genomics. The previous implementation of DNA sequence alignment did not take into consideration the price of the device on which optimization was performed. This paper presents optimization over previous FPGA implementation that increases the overall speed-up achieved as well as the price incurred by the platform that was optimized. The optimizations are (1) The array of processing elements is made to run on change in input value and not on clock, so eliminating the need for tight clock synchronization, (2) the implementation is unrestrained by the size of the sequences to be aligned, (3) the waiting time required for the sequences to load to FPGA is reduced to the minimum possible and (4) an efficient method is devised to store the output matrix that make possible to save the diagonal elements to be used in next pass, in parallel with the computation of output matrix. Implemented on Spartan3 FPGA, this implementation achieved 20 times performance improvement in terms of CUPS over GPP implementation.

  16. An optimized and low-cost FPGA-based DNA sequence alignment--a step towards personal genomics.

    PubMed

    Shah, Hurmat Ali; Hasan, Laiq; Ahmad, Nasir

    2013-01-01

    DNA sequence alignment is a cardinal process in computational biology but also is much expensive computationally when performing through traditional computational platforms like CPU. Of many off the shelf platforms explored for speeding up the computation process, FPGA stands as the best candidate due to its performance per dollar spent and performance per watt. These two advantages make FPGA as the most appropriate choice for realizing the aim of personal genomics. The previous implementation of DNA sequence alignment did not take into consideration the price of the device on which optimization was performed. This paper presents optimization over previous FPGA implementation that increases the overall speed-up achieved as well as the price incurred by the platform that was optimized. The optimizations are (1) The array of processing elements is made to run on change in input value and not on clock, so eliminating the need for tight clock synchronization, (2) the implementation is unrestrained by the size of the sequences to be aligned, (3) the waiting time required for the sequences to load to FPGA is reduced to the minimum possible and (4) an efficient method is devised to store the output matrix that make possible to save the diagonal elements to be used in next pass, in parallel with the computation of output matrix. Implemented on Spartan3 FPGA, this implementation achieved 20 times performance improvement in terms of CUPS over GPP implementation. PMID:24110283

  17. eMatchSite: Sequence Order-Independent Structure Alignments of Ligand Binding Pockets in Protein Models

    PubMed Central

    Brylinski, Michal

    2014-01-01

    Detecting similarities between ligand binding sites in the absence of global homology between target proteins has been recognized as one of the critical components of modern drug discovery. Local binding site alignments can be constructed using sequence order-independent techniques, however, to achieve a high accuracy, many current algorithms for binding site comparison require high-quality experimental protein structures, preferably in the bound conformational state. This, in turn, complicates proteome scale applications, where only various quality structure models are available for the majority of gene products. To improve the state-of-the-art, we developed eMatchSite, a new method for constructing sequence order-independent alignments of ligand binding sites in protein models. Large-scale benchmarking calculations using adenine-binding pockets in crystal structures demonstrate that eMatchSite generates accurate alignments for almost three times more protein pairs than SOIPPA. More importantly, eMatchSite offers a high tolerance to structural distortions in ligand binding regions in protein models. For example, the percentage of correctly aligned pairs of adenine-binding sites in weakly homologous protein models is only 4–9% lower than those aligned using crystal structures. This represents a significant improvement over other algorithms, e.g. the performance of eMatchSite in recognizing similar binding sites is 6% and 13% higher than that of SiteEngine using high- and moderate-quality protein models, respectively. Constructing biologically correct alignments using predicted ligand binding sites in protein models opens up the possibility to investigate drug-protein interaction networks for complete proteomes with prospective systems-level applications in polypharmacology and rational drug repositioning. eMatchSite is freely available to the academic community as a web-server and a stand-alone software distribution at http://www.brylinski.org/ematchsite. PMID

  18. Design Pattern Mining Using Distributed Learning Automata and DNA Sequence Alignment

    PubMed Central

    Esmaeilpour, Mansour; Naderifar, Vahideh; Shukur, Zarina

    2014-01-01

    Context Over the last decade, design patterns have been used extensively to generate reusable solutions to frequently encountered problems in software engineering and object oriented programming. A design pattern is a repeatable software design solution that provides a template for solving various instances of a general problem. Objective This paper describes a new method for pattern mining, isolating design patterns and relationship between them; and a related tool, DLA-DNA for all implemented pattern and all projects used for evaluation. DLA-DNA achieves acceptable precision and recall instead of other evaluated tools based on distributed learning automata (DLA) and deoxyribonucleic acid (DNA) sequences alignment. Method The proposed method mines structural design patterns in the object oriented source code and extracts the strong and weak relationships between them, enabling analyzers and programmers to determine the dependency rate of each object, component, and other section of the code for parameter passing and modular programming. The proposed model can detect design patterns better that available other tools those are Pinot, PTIDEJ and DPJF; and the strengths of their relationships. Results The result demonstrate that whenever the source code is build standard and non-standard, based on the design patterns, then the result of the proposed method is near to DPJF and better that Pinot and PTIDEJ. The proposed model is tested on the several source codes and is compared with other related models and available tools those the results show the precision and recall of the proposed method, averagely 20% and 9.6% are more than Pinot, 27% and 31% are more than PTIDEJ and 3.3% and 2% are more than DPJF respectively. Conclusion The primary idea of the proposed method is organized in two following steps: the first step, elemental design patterns are identified, while at the second step, is composed to recognize actual design patterns. PMID:25243670

  19. Science course sequences: The alignment of written, enacted, and tested curricula and their impact on grade 11 HSPA science scores

    NASA Astrophysics Data System (ADS)

    Lentz, Christine A.

    The purpose of this mixed method study was to examine the alignment of the written, enacted, and tested curricula of the Ocean City High School science course sequencing and its impact on student achievement. This study also examined the school's ability to predict student scores on the science portion of the High School Proficiency Assessment (HSPA). Data collected for science achievement included the science portion of the Grade Eight Proficiency Assessment (GEPA) as a pretest and the scores for the science portion of the HSPA as a posttest. Data collected for curriculum alignment included an examination of teacher generated course curriculum maps to determine the alignment with the New Jersey Core Curriculum Content Standards and the HSPA Test Specifications Directory. The quantitative data were treated through a series of paired samples t-tests, Pearson product moment correlation was used to examine relationships between variables, an ANCOVA analysis and a stepwise regression analysis were also completed. Based on the findings of the data analysis of this research effort, the following conclusions were drawn: (1) the alignment of the enacted curriculum with the tested and written curricula affected science achievement. (2) GEPA scores are significantly tied to HSPA scores and (3) GEPA scores and enrollment in the science sequence whose curriculum was aligned with the written and tested curricula, met the requirements of a predictor of scores on the HSPA exam. It is expected that educational leadership will use the results of this research to inform practice and drive decision-making in respect to student placement in to course sequences. It is hoped that the results will not only increase support for the district's curricula development plan but also add to the overall body of knowledge surrounding science program effectiveness in relation to the No Child Left Behind standards.

  20. A Novel Method for Alignment-free DNA Sequence Similarity Analysis Based on the Characterization of Complex Networks

    PubMed Central

    Zhou, Jie; Zhong, Pianyu; Zhang, Tinghui

    2016-01-01

    Determination of sequence similarity is one of the major steps in computational phylogenetic studies. One of the major tasks of computational biologists is to develop novel mathematical descriptors for similarity analysis. DNA clustering is an important technology that automatically identifies inherent relationships among large-scale DNA sequences. The comparison between the DNA sequences of different species helps determine phylogenetic relationships among species. Alignment-free approaches have continuously gained interest in various sequence analysis applications such as phylogenetic inference and metagenomic classification/clustering, particularly for large-scale sequence datasets. Here, we construct a novel and simple mathematical descriptor based on the characterization of cis sequence complex DNA networks. This new approach is based on a code of three cis nucleotides in a gene that could code for an amino acid. In particular, for each DNA sequence, we will set up a cis sequence complex network that will be used to develop a characterization vector for the analysis of mitochondrial DNA sequence phylogenetic relationships among nine species. The resulting phylogenetic relationships among the nine species were determined to be in agreement with the actual situation. PMID:27746676

  1. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB.

    PubMed

    Pruesse, Elmar; Quast, Christian; Knittel, Katrin; Fuchs, Bernhard M; Ludwig, Wolfgang; Peplies, Jörg; Glöckner, Frank Oliver

    2007-01-01

    Sequencing ribosomal RNA (rRNA) genes is currently the method of choice for phylogenetic reconstruction, nucleic acid based detection and quantification of microbial diversity. The ARB software suite with its corresponding rRNA datasets has been accepted by researchers worldwide as a standard tool for large scale rRNA analysis. However, the rapid increase of publicly available rRNA sequence data has recently hampered the maintenance of comprehensive and curated rRNA knowledge databases. A new system, SILVA (from Latin silva, forest), was implemented to provide a central comprehensive web resource for up to date, quality controlled databases of aligned rRNA sequences from the Bacteria, Archaea and Eukarya domains. All sequences are checked for anomalies, carry a rich set of sequence associated contextual information, have multiple taxonomic classifications, and the latest validly described nomenclature. Furthermore, two precompiled sequence datasets compatible with ARB are offered for download on the SILVA website: (i) the reference (Ref) datasets, comprising only high quality, nearly full length sequences suitable for in-depth phylogenetic analysis and probe design and (ii) the comprehensive Parc datasets with all publicly available rRNA sequences longer than 300 nucleotides suitable for biodiversity analyses. The latest publicly available database release 91 (August 2007) hosts 547 521 sequences split into 461 823 small subunit and 85 689 large subunit rRNAs.

  2. PriFi: using a multiple alignment of related sequences to find primers for amplification of homologs.

    PubMed

    Fredslund, Jakob; Schauser, Leif; Madsen, Lene H; Sandal, Niels; Stougaard, Jens

    2005-07-01

    Using a comparative approach, the web program PriFi (http://cgi-www.daimi.au.dk/cgi-chili/PriFi/main) designs pairs of primers useful for PCR amplification of genomic DNA in species where prior sequence information is not available. The program works with an alignment of DNA sequences from phylogenetically related species and outputs a list of possibly degenerate primer pairs fulfilling a number of criteria, such that the primers have a maximal probability of amplifying orthologous sequences in other phylogenetically related species. Operating on a genome-wide scale, PriFi automates the first steps of a procedure for developing general markers serving as common anchor loci across species. To accommodate users with special preferences, configuration settings and criteria can be customized.

  3. New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era (2010 JGI/ANL HPC Workshop)

    ScienceCinema

    Notredame, Cedric [Centre for Genomic Regulation

    2016-07-12

    Cedric Notredame from the Centre for Genomic Regulation gives a presentation on "New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era" at the JGI/Argonne HPC Workshop on January 26, 2010.

  4. New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era (2010 JGI/ANL HPC Workshop)

    SciTech Connect

    Notredame, Cedric

    2010-01-26

    Cedric Notredame from the Centre for Genomic Regulation gives a presentation on "New Challenges of the Computation of Multiple Sequence Alignments in the High-Throughput Era" at the JGI/Argonne HPC Workshop on January 26, 2010.

  5. Global alignment: Finding rearrangements during alignment

    SciTech Connect

    Brudno, Michael; Malde, Sanket; Poliakov, Alexander; Do, Chuong B.; Couronne, Olivier; Dubchak, Inna; Batzoglou, Serafim

    2003-01-06

    Motivation: To compare entire genomes from different species, biologists increasingly need alignment methods that are efficient enough to handle long sequences, and accurate enough to correctly align the conserved biological features between distant species. The two main classes of pairwise alignments are global alignment, where one string is transformed into the other, and local alignment, where all locations of similarity between the two strings are returned. Global alignments are less prone to demonstrating false homology as each letter of one sequence is constrained to being aligned to only one letter of the other. Local alignments, on the other hand, can cope with rearrangements between non-syntenic, orthologous sequences by identifying similar regions in sequences; this, however, comes at the expense of a higher false positive rate due to the inability of local aligners to take into account overall conservation maps.

  6. Alignment of (dA).(dT) homopolymer tracts in gene flanking sequences suggests nucleosomal periodicity in D. discoideum DNA.

    PubMed

    Marx, K A; Hess, S T; Blake, R D

    1994-08-01

    It has been shown that the frequency versus size distribution of A and T overlapping and non-overlapping homopolymer tracts of N > 5 in D. discoideum gene flanking and intron regions are significantly greater than in coding regions(1). In the present report, we demonstrate, that a spatial periodicity exists in long A and T tracts (N > 10) in long flanking sequences by scored alignments of those tracts (N > 10) with the nucleosomal repeat. A tract spacing was found at 185-190 bp that corresponds to a maximum alignment score. This is exactly the average spacing of D. discoideum nucleosomes determined experimentally. A majority of A and T tracts in flanking sequences are often spaced by short DNA stretches and the total length of adjacent A and T tracts plus the interrupting short DNA stretch corresponds closely to the average experimentally measured nucleosomal linker DNA size in D. discoideum-42 bp. These data suggest a model which has A and T runs of N > 10 bp in flanking DNA of D. discoideum organized in a regular phase with nonhomopolymer sequences along the DNA. This model has functional implications for A and T tracts, suggesting that they are found in nucleosomal linker DNA regions of chromatin during some necessary portion(s) of the life of the cell.

  7. Indel PDB: A database of structural insertions and deletions derived from sequence alignments of closely related proteins

    PubMed Central

    Hsing, Michael; Cherkasov, Artem

    2008-01-01

    Background Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. Description We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. Conclusion By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites. PMID:18578882

  8. Application of the MAFFT sequence alignment program to large data—reexamination of the usefulness of chained guide trees

    PubMed Central

    Yamada, Kazunori D.; Tomii, Kentaro; Katoh, Kazutaka

    2016-01-01

    Motivation: Large multiple sequence alignments (MSAs), consisting of thousands of sequences, are becoming more and more common, due to advances in sequencing technologies. The MAFFT MSA program has several options for building large MSAs, but their performances have not been sufficiently assessed yet, because realistic benchmarking of large MSAs has been difficult. Recently, such assessments have been made possible through the HomFam and ContTest benchmark protein datasets. Along with the development of these datasets, an interesting theory was proposed: chained guide trees increase the accuracy of MSAs of structurally conserved regions. This theory challenges the basis of progressive alignment methods and needs to be examined by being compared with other known methods including computationally intensive ones. Results: We used HomFam, ContTest and OXFam (an extended version of OXBench) to evaluate several methods enabled in MAFFT: (1) a progressive method with approximate guide trees, (2) a progressive method with chained guide trees, (3) a combination of an iterative refinement method and a progressive method and (4) a less approximate progressive method that uses a rigorous guide tree and consistency score. Other programs, Clustal Omega and UPP, available for large MSAs, were also included into the comparison. The effect of method 2 (chained guide trees) was positive in ContTest but negative in HomFam and OXFam. Methods 3 and 4 increased the benchmark scores more consistently than method 2 for the three datasets, suggesting that they are safer to use. Availability and Implementation: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27378296

  9. Nearest Alignment Space Termination

    2006-07-13

    Near Alignment Space Termination (NAST) is the Greengenes algorithm that matches up submitted sequences with the Greengenes database to look for similarities and align the submitted sequences based on those similarities.

  10. Energy-based RNA consensus secondary structure prediction in multiple sequence alignments.

    PubMed

    Washietl, Stefan; Bernhart, Stephan H; Kellis, Manolis

    2014-01-01

    Many biologically important RNA structures are conserved in evolution leading to characteristic mutational patterns. RNAalifold is a widely used program to predict consensus secondary structures in multiple alignments by combining evolutionary information with traditional energy-based RNA folding algorithms. Here we describe the theory and applications of the RNAalifold algorithm. Consensus secondary structure prediction not only leads to significantly more accurate structure models, but it also allows to study structural conservation of functional RNAs. PMID:24639158

  11. Energy-based RNA consensus secondary structure prediction in multiple sequence alignments.

    PubMed

    Washietl, Stefan; Bernhart, Stephan H; Kellis, Manolis

    2014-01-01

    Many biologically important RNA structures are conserved in evolution leading to characteristic mutational patterns. RNAalifold is a widely used program to predict consensus secondary structures in multiple alignments by combining evolutionary information with traditional energy-based RNA folding algorithms. Here we describe the theory and applications of the RNAalifold algorithm. Consensus secondary structure prediction not only leads to significantly more accurate structure models, but it also allows to study structural conservation of functional RNAs.

  12. Sequence alignment status and amplicon size difference affecting EST-SSR primer performance and polymorphism

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Little attention has been given to failed, poorly-performing, and non-polymorphic expressed sequence tag (EST) simple sequence repeat (SSR) primers. This is due in part to a lack of interest and value in reporting them but also because of the difficulty in addressing the causes of failure on a prime...

  13. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Support Vector Machine-Pairwise Algorithm Utilizing LZ-Complexity

    PubMed Central

    Shahrudin, Shahriza

    2015-01-01

    This study concerns an attempt to establish a new method for predicting antimicrobial peptides (AMPs) which are important to the immune system. Recently, researchers are interested in designing alternative drugs based on AMPs because they have found that a large number of bacterial strains have become resistant to available antibiotics. However, researchers have encountered obstacles in the AMPs designing process as experiments to extract AMPs from protein sequences are costly and require a long set-up time. Therefore, a computational tool for AMPs prediction is needed to resolve this problem. In this study, an integrated algorithm is newly introduced to predict AMPs by integrating sequence alignment and support vector machine- (SVM-) LZ complexity pairwise algorithm. It was observed that, when all sequences in the training set are used, the sensitivity of the proposed algorithm is 95.28% in jackknife test and 87.59% in independent test, while the sensitivity obtained for jackknife test and independent test is 88.74% and 78.70%, respectively, when only the sequences that has less than 70% similarity are used. Applying the proposed algorithm may allow researchers to effectively predict AMPs from unknown protein peptide sequences with higher sensitivity. PMID:25802839

  14. Program Synthesizes UML Sequence Diagrams

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.

  15. [Research on the recombinant plasmid pDJH2 of L. interrogans serovar lai: sequencing and alignment with other known bacterial Omp sequence].

    PubMed

    Jiang, N; Dai, B; Yan, Z; Yang, W; Li, S; Fang, Z; Zhao, H; Wu, W; Ye, D; Yan, R; Liu, J; Song, S; Yang, Y; Zhang, Y; Liu, F; Tu, Y; Yang, H; Huang, Z; Liang, L; Hu, L; Zhao, M

    1996-12-01

    The Leptospira whole cell vaccine (LWCV) currently used in China is safe and effective, out the immunity following vaccination with two doses of the fluid medium vaccine is of low order. The duration of immunity conferred by this vaccine is rather short, six months or at most one year. Therefore, it is necessary to develop new generation vaccines against Leptospirosis for the developing world. In this paper we report the sequencing of the insert fragment of pDJH2 from genomic DNA of L. interrogans sevovar lai strain 017 and its alignment with other bacterial omp sequences. A genomic library of Leptospira interrogaans serovar lai strain 017 was constructed with the plasmid vector pUC18. A recombinant plasmid designated pJDH2 was screened from the genomic library. Inserted fragment of pDH2 is 1.9 kb by gel electrophoresis. Immunization/protection was studied in BALB/c mice model. The results showed highly significant difference between pDJH2 and pUC18 (control). Inserted fragment of pDJH2 DNA sequencing was performed by Dr Yan Zhengxin (Max-Planck-Institut for Biology. Tubingen, Germany). Insert fragment was cloned into pBluescript II KS-(stratagene) and sequenced by using AB1 (Applied Bio Systems, Model 373A). Two open reading frames of 565 and 662 nucleotides were identified. There were identifiable initiation codons, terminators, Shine-Dalgano ribosome combining site, Pribnow boxes and Sextama boxes within the 2 sequenced regions. Nucleotide sequences were analysed using Gene Work, a suit of computer program developed by Department of Biochemistry St. Jude Children's Research Hospital Memphis. U.S.A. The results of formatted alignment showed the predicted nucleotide sequence of ORF1 of the serovar lai had significant similarity with ORF2 (49.36%). L. kirschneri ompL1 (49.26%), Borrelia burgdoferi omp (48.97%), Treponema phagedenis omp (47.3%); Salmonella typhimurium ompC(46.87%), Yersinia enterocolitica ompH (46.7%), Leptospira borgpeterseni pfap (46.3%), and

  16. SeqAPASS (Sequence Alignment to Predict Across Species Susceptibility) software and documentation

    EPA Science Inventory

    SeqAPASS is a software application facilitates rapid and streamlined, yet transparent, comparisons of the similarity of toxicologically-significant molecular targets across species. The present application facilitates analysis of primary amino acid sequence similarity (including ...

  17. SGP-1: prediction and validation of homologous genes based on sequence alignments.

    PubMed

    Wiehe, T; Gebauer-Jung, S; Mitchell-Olds, T; Guigó, R

    2001-09-01

    Conventional methods of gene prediction rely on the recognition of DNA-sequence signals, the coding potential or the comparison of a genomic sequence with a cDNA, EST, or protein database. Reasons for limited accuracy in many circumstances are species-specific training and the incompleteness of reference databases. Lately, comparative genome analysis has attracted increasing attention. Several analysis tools that are based on human/mouse comparisons are already available. Here, we present a program for the prediction of protein-coding genes, termed SGP-1 (Syntenic Gene Prediction), which is based on the similarity of homologous genomic sequences. In contrast to most existing tools, the accuracy of depends little on species-specific properties such as codon usage or the nucleotide distribution. may therefore be applied to nonstandard model organisms in vertebrates as well as in plants, without the need for extensive parameter training. In addition to predicting genes in large-scale genomic sequences, the program may be useful to validate gene structure annotations from databases. To this end, SGP-1 output also contains comparisons between predicted and annotated gene structures in HTML format. The program can be accessed via a Web server at http://soft.ice.mpg.de/sgp-1. The source code, written in ANSI C, is available on request from the authors.

  18. Analysis and Visualization of ChIP-Seq and RNA-Seq Sequence Alignments Using ngs.plot.

    PubMed

    Loh, Yong-Hwee Eddie; Shen, Li

    2016-01-01

    The continual maturation and increasing applications of next-generation sequencing technology in scientific research have yielded ever-increasing amounts of data that need to be effectively and efficiently analyzed and innovatively mined for new biological insights. We have developed ngs.plot-a quick and easy-to-use bioinformatics tool that performs visualizations of the spatial relationships between sequencing alignment enrichment and specific genomic features or regions. More importantly, ngs.plot is customizable beyond the use of standard genomic feature databases to allow the analysis and visualization of user-specified regions of interest generated by the user's own hypotheses. In this protocol, we demonstrate and explain the use of ngs.plot using command line executions, as well as a web-based workflow on the Galaxy framework. We replicate the underlying commands used in the analysis of a true biological dataset that we had reported and published earlier and demonstrate how ngs.plot can easily generate publication-ready figures. With ngs.plot, users would be able to efficiently and innovatively mine their own datasets without having to be involved in the technical aspects of sequence coverage calculations and genomic databases. PMID:27115642

  19. Analysis and Visualization of ChIP-Seq and RNA-Seq Sequence Alignments Using ngs.plot.

    PubMed

    Loh, Yong-Hwee Eddie; Shen, Li

    2016-01-01

    The continual maturation and increasing applications of next-generation sequencing technology in scientific research have yielded ever-increasing amounts of data that need to be effectively and efficiently analyzed and innovatively mined for new biological insights. We have developed ngs.plot-a quick and easy-to-use bioinformatics tool that performs visualizations of the spatial relationships between sequencing alignment enrichment and specific genomic features or regions. More importantly, ngs.plot is customizable beyond the use of standard genomic feature databases to allow the analysis and visualization of user-specified regions of interest generated by the user's own hypotheses. In this protocol, we demonstrate and explain the use of ngs.plot using command line executions, as well as a web-based workflow on the Galaxy framework. We replicate the underlying commands used in the analysis of a true biological dataset that we had reported and published earlier and demonstrate how ngs.plot can easily generate publication-ready figures. With ngs.plot, users would be able to efficiently and innovatively mine their own datasets without having to be involved in the technical aspects of sequence coverage calculations and genomic databases.

  20. PASS2 database for the structure-based sequence alignment of distantly related SCOP domain superfamilies: update to version 5 and added features

    PubMed Central

    Gandhimathi, Arumugam; Ghosh, Pritha; Hariharaputran, Sridhar; Mathew, Oommen K.; Sowdhamini, R.

    2016-01-01

    Structure-based sequence alignment is an essential step in assessing and analysing the relationship of distantly related proteins. PASS2 is a database that records such alignments for protein domain superfamilies and has been constantly updated periodically. This update of the PASS2 version, named as PASS2.5, directly corresponds to the SCOPe 2.04 release. All SCOPe structural domains that share less than 40% sequence identity, as defined by the ASTRAL compendium of protein structures, are included. The current version includes 1977 superfamilies and has been assembled utilizing the structure-based sequence alignment protocol. Such an alignment is obtained initially through MATT, followed by a refinement through the COMPARER program. The JOY program has been used for structural annotations of such alignments. In this update, we have automated the protocol and focused on inclusion of new features such as mapping of GO terms, absolutely conserved residues among the domains in a superfamily and inclusion of PDBs, that are absent in SCOPe 2.04, using the HMM profiles from the alignments of the superfamily members and are provided as a separate list. We have also implemented a more user-friendly manner of data presentation and options for downloading more features. PASS2.5 version is available at http://caps.ncbs.res.in/pass2/. PMID:26553811

  1. PASS2 database for the structure-based sequence alignment of distantly related SCOP domain superfamilies: update to version 5 and added features.

    PubMed

    Gandhimathi, Arumugam; Ghosh, Pritha; Hariharaputran, Sridhar; Mathew, Oommen K; Sowdhamini, R

    2016-01-01

    Structure-based sequence alignment is an essential step in assessing and analysing the relationship of distantly related proteins. PASS2 is a database that records such alignments for protein domain superfamilies and has been constantly updated periodically. This update of the PASS2 version, named as PASS2.5, directly corresponds to the SCOPe 2.04 release. All SCOPe structural domains that share less than 40% sequence identity, as defined by the ASTRAL compendium of protein structures, are included. The current version includes 1977 superfamilies and has been assembled utilizing the structure-based sequence alignment protocol. Such an alignment is obtained initially through MATT, followed by a refinement through the COMPARER program. The JOY program has been used for structural annotations of such alignments. In this update, we have automated the protocol and focused on inclusion of new features such as mapping of GO terms, absolutely conserved residues among the domains in a superfamily and inclusion of PDBs, that are absent in SCOPe 2.04, using the HMM profiles from the alignments of the superfamily members and are provided as a separate list. We have also implemented a more user-friendly manner of data presentation and options for downloading more features. PASS2.5 version is available at http://caps.ncbs.res.in/pass2/. PMID:26553811

  2. Fast and sensitive alignment of microbial whole genome sequencing reads to large sequence datasets on a desktop PC: application to metagenomic datasets and pathogen identification.

    PubMed

    Pongor, Lőrinc S; Vera, Roberto; Ligeti, Balázs

    2014-01-01

    Next generation sequencing (NGS) of metagenomic samples is becoming a standard approach to detect individual species or pathogenic strains of microorganisms. Computer programs used in the NGS community have to balance between speed and sensitivity and as a result, species or strain level identification is often inaccurate and low abundance pathogens can sometimes be missed. We have developed Taxoner, an open source, taxon assignment pipeline that includes a fast aligner (e.g. Bowtie2) and a comprehensive DNA sequence database. We tested the program on simulated datasets as well as experimental data from Illumina, IonTorrent, and Roche 454 sequencing platforms. We found that Taxoner performs as well as, and often better than BLAST, but requires two orders of magnitude less running time meaning that it can be run on desktop or laptop computers. Taxoner is slower than the approaches that use small marker databases but is more sensitive due the comprehensive reference database. In addition, it can be easily tuned to specific applications using small tailored databases. When applied to metagenomic datasets, Taxoner can provide a functional summary of the genes mapped and can provide strain level identification. Taxoner is written in C for Linux operating systems. The code and documentation are available for research applications at http://code.google.com/p/taxoner.

  3. NGC: lossless and lossy compression of aligned high-throughput sequencing data.

    PubMed

    Popitsch, Niko; von Haeseler, Arndt

    2013-01-01

    A major challenge of current high-throughput sequencing experiments is not only the generation of the sequencing data itself but also their processing, storage and transmission. The enormous size of these data motivates the development of data compression algorithms usable for the implementation of the various storage policies that are applied to the produced intermediate and final result files. In this article, we present NGC, a tool for the compression of mapped short read data stored in the wide-spread SAM format. NGC enables lossless and lossy compression and introduces the following two novel ideas: first, we present a way to reduce the number of required code words by exploiting common features of reads mapped to the same genomic positions; second, we present a highly configurable way for the quantization of per-base quality values, which takes their influence on downstream analyses into account. NGC, evaluated with several real-world data sets, saves 33-66% of disc space using lossless and up to 98% disc space using lossy compression. By applying two popular variant and genotype prediction tools to the decompressed data, we could show that the lossy compression modes preserve >99% of all called variants while outperforming comparable methods in some configurations.

  4. An Alignment-Free "Metapeptide" Strategy for Metaproteomic Characterization of Microbiome Samples Using Shotgun Metagenomic Sequencing.

    PubMed

    May, Damon H; Timmins-Schiffman, Emma; Mikan, Molly P; Harvey, H Rodger; Borenstein, Elhanan; Nunn, Brook L; Noble, William S

    2016-08-01

    In principle, tandem mass spectrometry can be used to detect and quantify the peptides present in a microbiome sample, enabling functional and taxonomic insight into microbiome metabolic activity. However, the phylogenetic diversity constituting a particular microbiome is often unknown, and many of the organisms present may not have assembled genomes. In ocean microbiome samples, with particularly diverse and uncultured bacterial communities, it is difficult to construct protein databases that contain the bulk of the peptides in the sample without losing detection sensitivity due to the overwhelming number of candidate peptides for each tandem mass spectrum. We describe a method for deriving "metapeptides" (short amino acid sequences that may be represented in multiple organisms) from shotgun metagenomic sequencing of microbiome samples. In two ocean microbiome samples, we constructed site-specific metapeptide databases to detect more than one and a half times as many peptides as by searching against predicted genes from an assembled metagenome and roughly three times as many peptides as by searching against the NCBI environmental proteome database. The increased peptide yield has the potential to enrich the taxonomic and functional characterization of sample metaproteomes. PMID:27396978

  5. The map-based genome sequence of Spirodela polyrhiza aligned with its chromosomes, a reference for karyotype evolution.

    PubMed

    Cao, Hieu Xuan; Vu, Giang Thi Ha; Wang, Wenqin; Appenroth, Klaus J; Messing, Joachim; Schubert, Ingo

    2016-01-01

    Duckweeds are aquatic monocotyledonous plants of potential economic interest with fast vegetative propagation, comprising 37 species with variable genome sizes (0.158-1.88 Gbp). The genomic sequence of Spirodela polyrhiza, the smallest and the most ancient duckweed genome, needs to be aligned to its chromosomes as a reference and prerequisite to study the genome and karyotype evolution of other duckweed species. We selected physically mapped bacterial artificial chromosomes (BACs) containing Spirodela DNA inserts with little or no repetitive elements as probes for multicolor fluorescence in situ hybridization (mcFISH), using an optimized BAC pooling strategy, to validate its physical map and correlate it with its chromosome complement. By consecutive mcFISH analyses, we assigned the originally assembled 32 pseudomolecules (supercontigs) of the genomic sequences to the 20 chromosomes of S. polyrhiza. A Spirodela cytogenetic map containing 96 BAC markers with an average distance of 0.89 Mbp was constructed. Using a cocktail of 41 BACs in three colors, all chromosome pairs could be individualized simultaneously. Seven ancestral blocks emerged from duplicated chromosome segments of 19 Spirodela chromosomes. The chromosomally integrated genome of S. polyrhiza and the established prerequisites for comparative chromosome painting enable future studies on the chromosome homoeology and karyotype evolution of duckweed species.

  6. CLaMS: Classifier for Metagenomic Sequences

    SciTech Connect

    Pati, Amrita

    2010-12-01

    CLaMS-"Classifer for Metagenonic Sequences" is a Java application for binning assembled metagenomes wings user-specified training sequence sets and other user-specified initial parameters. Since ClAmS analyzes and matches sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; CLaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 Ghz. Intel Core 2 Duo processor and 2 GB Ram. CLaMS is meant to be desktop application for biologist and can be run on any machine under any operating system on which the Java Runtime Environment is enabled. CLaMS is freely available in both GVI-based and command-line based forms.

  7. CLaMS: Classifier for Metagenomic Sequences

    2010-12-01

    CLaMS-"Classifer for Metagenonic Sequences" is a Java application for binning assembled metagenomes wings user-specified training sequence sets and other user-specified initial parameters. Since ClAmS analyzes and matches sequence composition-based genomic signatures, it is much faster than binning tools that rely on alignments to homologs; CLaMS can bin ~20,000 sequences in 3 minutes on a laptop with a 2.4 Ghz. Intel Core 2 Duo processor and 2 GB Ram. CLaMS is meant to be desktop applicationmore » for biologist and can be run on any machine under any operating system on which the Java Runtime Environment is enabled. CLaMS is freely available in both GVI-based and command-line based forms.« less

  8. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity

    PubMed Central

    Kuraku, Shigehiro; Zmasek, Christian M.; Nishimura, Osamu; Katoh, Kazutaka

    2013-01-01

    We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology. PMID:23677614

  9. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity.

    PubMed

    Kuraku, Shigehiro; Zmasek, Christian M; Nishimura, Osamu; Katoh, Kazutaka

    2013-07-01

    We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology. PMID:23677614

  10. aLeaves facilitates on-demand exploration of metazoan gene family trees on MAFFT sequence alignment server with enhanced interactivity.

    PubMed

    Kuraku, Shigehiro; Zmasek, Christian M; Nishimura, Osamu; Katoh, Kazutaka

    2013-07-01

    We report a new web server, aLeaves (http://aleaves.cdb.riken.jp/), for homologue collection from diverse animal genomes. In molecular comparative studies involving multiple species, orthology identification is the basis on which most subsequent biological analyses rely. It can be achieved most accurately by explicit phylogenetic inference. More and more species are subjected to large-scale sequencing, but the resultant resources are scattered in independent project-based, and multi-species, but separate, web sites. This complicates data access and is becoming a serious barrier to the comprehensiveness of molecular phylogenetic analysis. aLeaves, launched to overcome this difficulty, collects sequences similar to an input query sequence from various data sources. The collected sequences can be passed on to the MAFFT sequence alignment server (http://mafft.cbrc.jp/alignment/server/), which has been significantly improved in interactivity. This update enables to switch between (i) sequence selection using the Archaeopteryx tree viewer, (ii) multiple sequence alignment and (iii) tree inference. This can be performed as a loop until one reaches a sensible data set, which minimizes redundancy for better visibility and handling in phylogenetic inference while covering relevant taxa. The work flow achieved by the seamless link between aLeaves and MAFFT provides a convenient online platform to address various questions in zoology and evolutionary biology.

  11. The Oryza map alignment project: Construction, alignment and analysis of 12 BAC fingerprint/end sequence framework physical maps that represent the 10 genome types of genus Oryza

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The Oryza Map Alignment Project (OMAP) provides the first comprehensive experimental system for understanding the evolution, physiology and biochemistry of a full genus in plants or animals. We have constructed twelve deep-coverage BAC libraries that are representative of both diploid and tetraploid...

  12. GUTSS: An Alignment-Free Sequence Comparison Method for Use in Human Intestinal Microbiome and Fecal Microbiota Transplantation Analysis

    PubMed Central

    Heltshe, Sonya L.; Hayden, Hillary S.; Radey, Matthew C.; Weiss, Eli J.; Damman, Christopher J.; Zisman, Timothy L.; Suskind, David L.; Miller, Samuel I.

    2016-01-01

    Background Comparative analysis of gut microbiomes in clinical studies of human diseases typically rely on identification and quantification of species or genes. In addition to exploring specific functional characteristics of the microbiome and potential significance of species diversity or expansion, microbiome similarity is also calculated to study change in response to therapies directed at altering the microbiome. Established ecological measures of similarity can be constructed from species abundances, however methods for calculating these commonly used ecological measures of similarity directly from whole genome shotgun (WGS) metagenomic sequence are lacking. Results We present an alignment-free method for calculating similarity of WGS metagenomic sequences that is analogous to the Bray–Curtis index for species, implemented by the General Utility for Testing Sequence Similarity (GUTSS) software application. This method was applied to intestinal microbiomes of healthy young children to measure developmental changes toward an adult microbiome during the first 3 years of life. We also calculate similarity of donor and recipient microbiomes to measure establishment, or engraftment, of donor microbiota in fecal microbiota transplantation (FMT) studies focused on mild to moderate Crohn's disease. We show how a relative index of similarity to donor can be calculated as a measure of change in a patient's microbiome toward that of the donor in response to FMT. Conclusion Because clinical efficacy of the transplant procedure cannot be fully evaluated without analysis methods to quantify actual FMT engraftment, we developed a method for detecting change in the gut microbiome that is independent of species identification and database bias, sensitive to changes in relative abundance of the microbial constituents, and can be formulated as an index for correlating engraftment success with clinical measures of disease. More generally, this method may be applied to clinical

  13. ISHAN: sequence homology analysis package.

    PubMed

    Shil, Pratip; Dudani, Niraj; Vidyasagar, Pandit B

    2006-01-01

    Sequence based homology studies play an important role in evolutionary tracing and classification of proteins. Various methods are available to analyze biological sequence information. However, with the advent of proteomics era, there is a growing demand for analysis of huge amount of biological sequence information, and it has become necessary to have programs that would provide speedy analysis. ISHAN has been developed as a homology analysis package, built on various sequence analysis tools viz FASTA, ALIGN, CLUSTALW, PHYLIP and CODONW (for DNA sequences). This JAVA application offers the user choice of analysis tools. For testing, ISHAN was applied to perform phylogenetic analysis for sets of Caspase 3 DNA sequences and NF-kappaB p105 amino acid sequences. By integrating several tools it has made analysis much faster and reduced manual intervention. PMID:17274766

  14. Model Checking JAVA Programs Using Java Pathfinder

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus; Pressburger, Thomas

    2000-01-01

    This paper describes a translator called JAVA PATHFINDER from JAVA to PROMELA, the "programming language" of the SPIN model checker. The purpose is to establish a framework for verification and debugging of JAVA programs based on model checking. This work should be seen in a broader attempt to make formal methods applicable "in the loop" of programming within NASA's areas such as space, aviation, and robotics. Our main goal is to create automated formal methods such that programmers themselves can apply these in their daily work (in the loop) without the need for specialists to manually reformulate a program into a different notation in order to analyze the program. This work is a continuation of an effort to formally verify, using SPIN, a multi-threaded operating system programmed in Lisp for the Deep-Space 1 spacecraft, and of previous work in applying existing model checkers and theorem provers to real applications.

  15. Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): A web-based tool for addressing the challenges of cross-species extrapolation of chemical toxicity

    EPA Science Inventory

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitat...

  16. Data for amino acid alignment of Japanese stingray melanocortin receptors with other gnathostome melanocortin receptor sequences, and the ligand selectivity of Japanese stingray melanocortin receptors.

    PubMed

    Takahashi, Akiyoshi; Davis, Perry; Reinick, Christina; Mizusawa, Kanta; Sakamoto, Tatsuya; Dores, Robert M

    2016-06-01

    This article contains structure and pharmacological characteristics of melanocortin receptors (MCRs) related to research published in "Characterization of melanocortin receptors from stingray Dasyatis akajei, a cartilaginous fish" (Takahashi et al., 2016) [1]. The amino acid sequences of the stingray, D. akajei, MC1R, MC2R, MC3R, MC4R, and MC5R were aligned with the corresponding melanocortin receptor sequences from the elephant shark, Callorhinchus milii, the dogfish, Squalus acanthias, the goldfish, Carassius auratus, and the mouse, Mus musculus. These alignments provide the basis for phylogenetic analysis of these gnathostome melanocortin receptor sequences. In addition, the Japanese stingray melanocortin receptors were separately expressed in Chinese Hamster Ovary cells, and stimulated with stingray ACTH, α-MSH, β-MSH, γ-MSH, δ-MSH, and β-endorphin. The dose response curves reveal the order of ligand selectivity for each stingray MCR. PMID:27408924

  17. Data in support of the discovery of alternative splicing variants of quail LEPR and the evolutionary conservation of qLEPRl by nucleotide and amino acid sequences alignment

    PubMed Central

    Wang, Dandan; Xu, Chunlin; Wang, Taian; Li, Hong; Li, Yanmin; Ren, Junxiao; Tian, Yadong; Li, Zhuanjian; Jiao, Yuping; Kang, Xiangtao; Liu, Xiaojun

    2015-01-01

    Leptin receptor (LEPR) belongs to the class I cytokine receptor superfamily which share common structural features and signal transduction pathways. Although multiple LEPR isoforms, which are derived from one gene, were identified in mammals, they were rarely found in avian except the long LEPR. Four alternative splicing variants of quail LEPR (qLEPR) had been cloned and sequenced for the first time (Wang et al., 2015 [1]). To define patterns of the four splicing variants (qLEPRl, qLEPR-a, qLEPR-b and qLEPR-c) and locate the conserved regions of qLEPRl, this data article provides nucleotide sequence alignment of qLEPR and amino acid sequence alignment of representative vertebrate LEPR. The detailed analysis was shown in [1]. PMID:26759819

  18. A cholinesterase genes server (ESTHER): a database of cholinesterase-related sequences for multiple alignments, phylogenetic relationships, mutations and structural data retrieval.

    PubMed Central

    Cousin, X; Hotelier, T; Liévin, P; Toutant, J P; Chatonnet, A

    1996-01-01

    We have built a database of sequences phylogenetically related to cholinesterases (ESTHER) for esterases, alpha/beta hydrolase enzymes and relatives). These sequences define a homogeneous group of enzymes (carboxylesterases, lipases and hormone-sensitive lipases) with some related proteins devoid of enzymatic activity. The purpose of ESTHER is to help comparison and alignment of any new sequence appearing in the field, to favour mutation analysis of structure-function relationships and to allow structural data recovery. ESTHER is a World Wide Web server with the URL http://www.montpellier.inra.fr:70/cholinesterase. PMID:8594562

  19. A cholinesterase genes server (ESTHER): a database of cholinesterase-related sequences for multiple alignments, phylogenetic relationships, mutations and structural data retrieval.

    PubMed

    Cousin, X; Hotelier, T; Liévin, P; Toutant, J P; Chatonnet, A

    1996-01-01

    We have built a database of sequences phylogenetically related to cholinesterases (ESTHER) for esterases, alpha/beta hydrolase enzymes and relatives). These sequences define a homogeneous group of enzymes (carboxylesterases, lipases and hormone-sensitive lipases) with some related proteins devoid of enzymatic activity. The purpose of ESTHER is to help comparison and alignment of any new sequence appearing in the field, to favour mutation analysis of structure-function relationships and to allow structural data recovery. ESTHER is a World Wide Web server with the URL http://www.montpellier.inra.fr:70/cholinesterase.

  20. Java online monitoring framework

    SciTech Connect

    Ronan, M.; Kirkby, D.; Johnson, A.S.; Groot, D. de

    1997-10-01

    An online monitoring framework has been written in the Java Language Environment to develop applications for monitoring special purpose detectors during commissioning of the PEP-II Interaction Region. PEP-II machine parameters and signals from several of the commissioning detectors are logged through VxWorks/EPICS and displayed by Java display applications. Remote clients are able to monitor the machine and detector performance using graphical displays and analysis histogram packages. In this paper, the design and implementation of the object-oriented Java framework is described. Illustrations of data acquisition, display and histograming applications are also given.

  1. Introducing W.A.T.E.R.S.: a Workflow for the Alignment, Taxonomy, and Ecology of Ribosomal Sequences

    PubMed Central

    2010-01-01

    Background For more than two decades microbiologists have used a highly conserved microbial gene as a phylogenetic marker for bacteria and archaea. The small-subunit ribosomal RNA gene, also known as 16 S rRNA, is encoded by ribosomal DNA, 16 S rDNA, and has provided a powerful comparative tool to microbial ecologists. Over time, the microbial ecology field has matured from small-scale studies in a select number of environments to massive collections of sequence data that are paired with dozens of corresponding collection variables. As the complexity of data and tool sets have grown, the need for flexible automation and maintenance of the core processes of 16 S rDNA sequence analysis has increased correspondingly. Results We present WATERS, an integrated approach for 16 S rDNA analysis that bundles a suite of publicly available 16 S rDNA analysis software tools into a single software package. The "toolkit" includes sequence alignment, chimera removal, OTU determination, taxonomy assignment, phylogentic tree construction as well as a host of ecological analysis and visualization tools. WATERS employs a flexible, collection-oriented 'workflow' approach using the open-source Kepler system as a platform. Conclusions By packaging available software tools into a single automated workflow, WATERS simplifies 16 S rDNA analyses, especially for those without specialized bioinformatics, programming expertise. In addition, WATERS, like some of the newer comprehensive rRNA analysis tools, allows researchers to minimize the time dedicated to carrying out tedious informatics steps and to focus their attention instead on the biological interpretation of the results. One advantage of WATERS over other comprehensive tools is that the use of the Kepler workflow system facilitates result interpretation and reproducibility via a data provenance sub-system. Furthermore, new "actors" can be added to the workflow as desired and we see WATERS as an initial seed for a sizeable and growing

  2. Java Programming Language

    NASA Technical Reports Server (NTRS)

    Shaykhian, Gholam Ali

    2007-01-01

    The Java seminar covers the fundamentals of Java programming language. No prior programming experience is required for participation in the seminar. The first part of the seminar covers introductory concepts in Java programming including data types (integer, character, ..), operators, functions and constants, casts, input, output, control flow, scope, conditional statements, and arrays. Furthermore, introduction to Object-Oriented programming in Java, relationships between classes, using packages, constructors, private data and methods, final instance fields, static fields and methods, and overloading are explained. The second part of the seminar covers extending classes, inheritance hierarchies, polymorphism, dynamic binding, abstract classes, protected access. The seminar conclude by introducing interfaces, properties of interfaces, interfaces and abstract classes, interfaces and cailbacks, basics of event handling, user interface components with swing, applet basics, converting applications to applets, the applet HTML tags and attributes, exceptions and debugging.

  3. Java for flight software

    NASA Technical Reports Server (NTRS)

    Benowitz, E. G.; Niessner, A. F.

    2003-01-01

    We have successfully demonstrated a portion of the spacecraft attitude control and fault protection, running on a standard Java platform, and are currently in the process of taking advantage of the features provided by the RTSJ.

  4. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases

    PubMed Central

    Floden, Evan W.; Tommaso, Paolo D.; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-01-01

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee. PMID:27106060

  5. PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases.

    PubMed

    Floden, Evan W; Tommaso, Paolo D; Chatzou, Maria; Magis, Cedrik; Notredame, Cedric; Chang, Jia-Ming

    2016-07-01

    The PSI/TM-Coffee web server performs multiple sequence alignment (MSA) of proteins by combining homology extension with a consistency based alignment approach. Homology extension is performed with Position Specific Iterative (PSI) BLAST searches against a choice of redundant and non-redundant databases. The main novelty of this server is to allow databases of reduced complexity to rapidly perform homology extension. This server also gives the possibility to use transmembrane proteins (TMPs) reference databases to allow even faster homology extension on this important category of proteins. Aside from an MSA, the server also outputs topological prediction of TMPs using the HMMTOP algorithm. Previous benchmarking of the method has shown this approach outperforms the most accurate alignment methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The web server is available at http://tcoffee.crg.cat/tmcoffee.

  6. The ENSDF Java Package

    SciTech Connect

    Sonzogni, A.A.

    2005-05-24

    A package of computer codes has been developed to process and display nuclear structure and decay data stored in the ENSDF (Evaluated Nuclear Structure Data File) library. The codes were written in an object-oriented fashion using the java language. This allows for an easy implementation across multiple platforms as well as deployment on web pages. The structure of the different java classes that make up the package is discussed as well as several different implementations.

  7. MAVID multiple alignment server.

    PubMed

    Bray, Nicolas; Pachter, Lior

    2003-07-01

    MAVID is a multiple alignment program suitable for many large genomic regions. The MAVID web server allows biomedical researchers to quickly obtain multiple alignments for genomic sequences and to subsequently analyse the alignments for conserved regions. MAVID has been successfully used for the alignment of closely related species such as primates and also for the alignment of more distant organisms such as human and fugu. The server is fast, capable of aligning hundreds of kilobases in less than a minute. The multiple alignment is used to build a phylogenetic tree for the sequences, which is subsequently used as a basis for identifying conserved regions in the alignment. The server can be accessed at http://baboon.math.berkeley.edu/mavid/.

  8. HPMV: human protein mutation viewer - relating sequence mutations to protein sequence architecture and function changes.

    PubMed

    Sherman, Westley Arthur; Kuchibhatla, Durga Bhavani; Limviphuvadh, Vachiranee; Maurer-Stroh, Sebastian; Eisenhaber, Birgit; Eisenhaber, Frank

    2015-10-01

    Next-generation sequencing advances are rapidly expanding the number of human mutations to be analyzed for causative roles in genetic disorders. Our Human Protein Mutation Viewer (HPMV) is intended to explore the biomolecular mechanistic significance of non-synonymous human mutations in protein-coding genomic regions. The tool helps to assess whether protein mutations affect the occurrence of sequence-architectural features (globular domains, targeting signals, post-translational modification sites, etc.). As input, HPMV accepts protein mutations - as UniProt accessions with mutations (e.g. HGVS nomenclature), genome coordinates, or FASTA sequences. As output, HPMV provides an interactive cartoon showing the mutations in relation to elements of the sequence architecture. A large variety of protein sequence architectural features were selected for their particular relevance to mutation interpretation. Clicking a sequence feature in the cartoon expands a tree view of additional information including multiple sequence alignments of conserved domains and a simple 3D viewer mapping the mutation to known PDB structures, if available. The cartoon is also correlated with a multiple sequence alignment of similar sequences from other organisms. In cases where a mutation is likely to have a straightforward interpretation (e.g. a point mutation disrupting a well-understood targeting signal), this interpretation is suggested. The interactive cartoon can be downloaded as standalone viewer in Java jar format to be saved and viewed later with only a standard Java runtime environment. The HPMV website is: http://hpmv.bii.a-star.edu.sg/ .

  9. easyPAC: A Tool for Fast Prediction, Testing and Reference Mapping of Degenerate PCR Primers from Alignments or Consensus Sequences

    PubMed Central

    Rosenkranz, David

    2012-01-01

    The PCR-amplification of unknown homologous or paralogous genes generally relies on PCR primers predicted from multi sequence alignments. But increasing sequence divergence can induce the need to use degenerate primers which entails the problem of testing the characteristics, unwanted interactions and potential mispriming of degenerate primers. Here I introduce easyPAC, a new software for the prediction of degenerate primers from multi sequence alignments or single consensus sequences. As a major innovation, easyPAC allows to apply all customary primer test procedures to degenerate primer sequences including fast mapping to reference files. Thus, easyPAC simplifies and expedites the designing of specific degenerate primers enormously. Degenerate primers suggested by easyPAC were used in PCR amplification with subsequent de novo sequencing of TDRD1 exon 11 homologs from several representatives of the haplorrhine primate phylogeny. The results demonstrate the efficient performance of the suggested primers and therefore show that easyPAC can advance upcoming comparative genetic studies.

  10. Pure Java-based streaming MPEG player

    NASA Astrophysics Data System (ADS)

    Tolba, Osama; Briceno, Hector; McMillan, Leonard

    1999-01-01

    We present a pure Java-based streaming MPEG-1 video player. By implementing the player entirely in Java, we guarantee its functionality across platforms within any Java-enabled web browsers, without the need for native libraries. This allows greater sue of MPEG video sequences, because the users will no longer need to pre-install any software to display video, beyond Java compatibility. This player features a novel forward-mapping IDCT algorithm that allows it to play locally stored, CIF-sized video sequences at 11 frames per second, when run on a personal computer with Java 'just-in-time' compiler. The IDCT algorithm can run with greater speed when the sequence is viewed at reduced size; e.g., performing approximately 1/4 the amount of computation when the user resizes the sequence to 1/2 its original width and height. We are able to play video streams stored anywhere on the Internet with acceptable performance using a proxy server, eliminating the need for large-capacity auxiliary storage. Thus, the player is well suited to small devices, such as digital TV set-top decoders, requiring little more memory than is required for three video frames. Because of our modular design, it is possible to assemble multiple video streams from remote sources and present them simultaneously to the viewers, subject to network and local performance limitations. The same modular system can further provide viewers with their own customized view of each sessions; e.g., moving and resizing the video display window dynamically, and selecting their preferred set of video controls.

  11. AdoMet radical proteins--from structure to evolution--alignment of divergent protein sequences reveals strong secondary structure element conservation.

    PubMed

    Nicolet, Yvain; Drennan, Catherine L

    2004-01-01

    Eighteen subclasses of S-adenosyl-l-methionine (AdoMet) radical proteins have been aligned in the first bioinformatics study of the AdoMet radical superfamily to utilize crystallographic information. The recently resolved X-ray structure of biotin synthase (BioB) was used to guide the multiple sequence alignment, and the recently resolved X-ray structure of coproporphyrinogen III oxidase (HemN) was used as the control. Despite the low 9% sequence identity between BioB and HemN, the multiple sequence alignment correctly predicted all but one of the core helices in HemN, and correctly predicted the residues in the enzyme active site. This alignment further suggests that the AdoMet radical proteins may have evolved from half-barrel structures (alphabeta)4 to three-quarter-barrel structures (alphabeta)6 to full-barrel structures (alphabeta)8. It predicts that anaerobic ribonucleotide reductase (RNR) activase, an ancient enzyme that, it has been suggested, serves as a link between the RNA and DNA worlds, will have a half-barrel structure, whereas the three-quarter barrel, exemplified by HemN, will be the most common architecture for AdoMet radical enzymes, and fewer members of the superfamily will join BioB in using a complete (alphabeta)8 TIM-barrel fold to perform radical chemistry. These differences in barrel architecture also explain how AdoMet radical enzymes can act on substrates that range in size from 10 atoms to 608 residue proteins.

  12. Automated insertion of sequences into a ribosomal RNA alignment: An application of computational linguistics in molecular biology

    SciTech Connect

    Taylor, R.C.

    1991-11-01

    This thesis involved the construction of (1) a grammar that incorporates knowledge on base invariancy and secondary structure in a molecule and (2) a parser engine that uses the grammar to position bases into the structural subunits of the molecule. These concepts were combined with a novel pinning technique to form a tool that semi-automates insertion of a new species into the alignment for the 16S rRNA molecule (a component of the ribosome) maintained by Dr. Carl Woese's group at the University of Illinois at Urbana. The tool was tested on species extracted from the alignment and on a group of entirely new species. The results were very encouraging, and the tool should be substantial aid to the curators of the 16S alignment. The construction of the grammar was itself automated, allowing application of the tool to alignments for other molecules. The logic programming language Prolog was used to construct all programs involved. The computational linguistics approach used here was found to be a useful way to attach the problem of insertion into an alignment.

  13. Automated insertion of sequences into a ribosomal RNA alignment: An application of computational linguistics in molecular biology

    SciTech Connect

    Taylor, R.C.

    1991-11-01

    This thesis involved the construction of (1) a grammar that incorporates knowledge on base invariancy and secondary structure in a molecule and (2) a parser engine that uses the grammar to position bases into the structural subunits of the molecule. These concepts were combined with a novel pinning technique to form a tool that semi-automates insertion of a new species into the alignment for the 16S rRNA molecule (a component of the ribosome) maintained by Dr. Carl Woese`s group at the University of Illinois at Urbana. The tool was tested on species extracted from the alignment and on a group of entirely new species. The results were very encouraging, and the tool should be substantial aid to the curators of the 16S alignment. The construction of the grammar was itself automated, allowing application of the tool to alignments for other molecules. The logic programming language Prolog was used to construct all programs involved. The computational linguistics approach used here was found to be a useful way to attach the problem of insertion into an alignment.

  14. Handling Permutation in Sequence Comparison: Genome-Wide Enhancer Prediction in Vertebrates by a Novel Non-Linear Alignment Scoring Principle

    PubMed Central

    Dolle, Dirk; Mateo, Juan L.; Eichenlaub, Michael P.; Sinn, Rebecca; Reinhardt, Robert; Höckendorf, Burkhard; Inoue, Daigo; Centanin, Lazaro; Ettwiller, Laurence; Wittbrodt, Joachim

    2015-01-01

    Enhancers have been described to evolve by permutation without changing function. This has posed the problem of how to predict enhancer elements that are hidden from alignment-based approaches due to the loss of co-linearity. Alignment-free algorithms have been proposed as one possible solution. However, this approach is hampered by several problems inherent to its underlying working principle. Here we present a new approach, which combines the power of alignment and alignment-free techniques into one algorithm. It allows the prediction of enhancers based on the query and target sequence only, no matter whether the regulatory logic is co-linear or reshuffled. To test our novel approach, we employ it for the prediction of enhancers across the evolutionary distance of ~450Myr between human and medaka. We demonstrate its efficacy by subsequent in vivo validation resulting in 82% (9/11) of the predicted medaka regions showing reporter activity. These include five candidates with partially co-linear and four with reshuffled motif patterns. Orthology in flanking genes and conservation of the detected co-linear motifs indicates that those candidates are likely functionally equivalent enhancers. In sum, our results demonstrate that the proposed principle successfully predicts mutated as well as permuted enhancer regions at an encouragingly high rate. PMID:26505748

  15. JAVA PathFinder

    NASA Technical Reports Server (NTRS)

    Mehhtz, Peter

    2005-01-01

    JPF is an explicit state software model checker for Java bytecode. Today, JPF is a swiss army knife for all sort of runtime based verification purposes. This basically means JPF is a Java virtual machine that executes your program not just once (like a normal VM), but theoretically in all possible ways, checking for property violations like deadlocks or unhandled exceptions along all potential execution paths. If it finds an error, JPF reports the whole execution that leads to it. Unlike a normal debugger, JPF keeps track of every step how it got to the defect.

  16. Internal Transcribed Spacer rRNA Gene-Based Phylogenetic Reconstruction Using Algorithms with Local and Global Sequence Alignment for Black Yeasts and Their Relatives

    PubMed Central

    Caligiorne, R. B.; Licinio, P.; Dupont, J.; de Hoog, G. S.

    2005-01-01

    Sequences of rRNA gene internal transcribed spacer (ITS) of a standard set of black yeast-like fungal pathogens were compared using two methods: local and global alignments. The latter is based on DNA-walk divergence analysis. This method has become recently available as an algorithm (DNAWD program) which converts sequences into three-dimensional walks. The walks are compared with, or fit to, each other generating global alignments. The DNA-walk geometry defines a proper metric used to create a distance matrix appropriated for phylogenetic reconstruction. In this work, the analyses were carried out for species currently classified in Capronia, Cladophialophora, Exophiala, Fonsecaea, Phialophora, and Ramichloridium. Main groups were verified by small-subunit rRNA gene data. DNAWD applied to ITS2 alone enabled species recognition as well as phylogenetic reconstruction reflecting clades discriminated in small-subunit rRNA gene phylogeny, which was not possible with any other algorithm using local alignment for the same data set. It is concluded that DNAWD provides rapid insight into broader relationships between groups using genes that otherwise would be hardly usable for this purpose. PMID:15956403

  17. MC64-ClustalWP2: a highly-parallel hybrid strategy to align multiple sequences in many-core architectures.

    PubMed

    Díaz, David; Esteban, Francisco J; Hernández, Pilar; Caballero, Juan Antonio; Guevara, Antonio; Dorado, Gabriel; Gálvez, Sergio

    2014-01-01

    We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification

  18. Java Metadata Facility

    SciTech Connect

    Buttler, D J

    2008-03-06

    The Java Metadata Facility is introduced by Java Specification Request (JSR) 175 [1], and incorporated into the Java language specification [2] in version 1.5 of the language. The specification allows annotations on Java program elements: classes, interfaces, methods, and fields. Annotations give programmers a uniform way to add metadata to program elements that can be used by code checkers, code generators, or other compile-time or runtime components. Annotations are defined by annotation types. These are defined the same way as interfaces, but with the symbol {at} preceding the interface keyword. There are additional restrictions on defining annotation types: (1) They cannot be generic; (2) They cannot extend other annotation types or interfaces; (3) Methods cannot have any parameters; (4) Methods cannot have type parameters; (5) Methods cannot throw exceptions; and (6) The return type of methods of an annotation type must be a primitive, a String, a Class, an annotation type, or an array, where the type of the array is restricted to one of the four allowed types. See [2] for additional restrictions and syntax. The methods of an annotation type define the elements that may be used to parameterize the annotation in code. Annotation types may have default values for any of its elements. For example, an annotation that specifies a defect report could initialize an element defining the defect outcome submitted. Annotations may also have zero elements. This could be used to indicate serializability for a class (as opposed to the current Serializability interface).

  19. Java for flight software

    NASA Technical Reports Server (NTRS)

    Benowitz, E.; Niessner, A.

    2003-01-01

    This work involves developing representative mission-critical spacecraft software using the Real-Time Specification for Java (RTSJ). This work currently leverages actual flight software used in the design of actual flight software in the NASA's Deep Space 1 (DSI), which flew in 1998.

  20. A Java commodity grid kit.

    SciTech Connect

    von Laszewski, G.; Foster, I.; Gawor, J.; Lane, P.; Mathematics and Computer Science

    2001-07-01

    In this paper we report on the features of the Java Commodity Grid Kit. The Java CoG Kit provides middleware for accessing Grid functionality from the Java framework. Java CoG Kit middleware is general enough to design a variety of advanced Grid applications with quite different user requirements. Access to the Grid is established via Globus protocols, allowing the Java CoG Kit to communicate also with the C Globus reference implementation. Thus, the Java CoG Kit provides Grid developers with the ability to utilize the Grid, as well as numerous additional libraries and frameworks developed by the Java community to enable network, Internet, enterprise, and peer-to peer computing. A variety of projects have successfully used the client libraries of the Java CoG Kit to access Grids driven by the C Globus software. In this paper we also report on the efforts to develop server side Java CoG Kit components. As part of this research we have implemented a prototype pure Java resource management system that enables one to run Globus jobs on platforms on which a Java virtual machine is supported, including Windows NT machines.

  1. CATO: The Clone Alignment Tool.

    PubMed

    Henstock, Peter V; LaPan, Peter

    2016-01-01

    High-throughput cloning efforts produce large numbers of sequences that need to be aligned, edited, compared with reference sequences, and organized as files and selected clones. Different pieces of software are typically required to perform each of these tasks. We have designed a single piece of software, CATO, the Clone Alignment Tool, that allows a user to align, evaluate, edit, and select clone sequences based on comparisons to reference sequences. The input and output are designed to be compatible with standard data formats, and thus suitable for integration into a clone processing pipeline. CATO provides both sequence alignment and visualizations to facilitate the analysis of cloning experiments. The alignment algorithm matches each of the relevant candidate sequences against each reference sequence. The visualization portion displays three levels of matching: 1) a top-level summary of the top candidate sequences aligned to each reference sequence, 2) a focused alignment view with the nucleotides of matched sequences displayed against one reference sequence, and 3) a pair-wise alignment of a single reference and candidate sequence pair. Users can select the minimum matching criteria for valid clones, edit or swap reference sequences, and export the results to a summary file as part of the high-throughput cloning workflow.

  2. CATO: The Clone Alignment Tool.

    PubMed

    Henstock, Peter V; LaPan, Peter

    2016-01-01

    High-throughput cloning efforts produce large numbers of sequences that need to be aligned, edited, compared with reference sequences, and organized as files and selected clones. Different pieces of software are typically required to perform each of these tasks. We have designed a single piece of software, CATO, the Clone Alignment Tool, that allows a user to align, evaluate, edit, and select clone sequences based on comparisons to reference sequences. The input and output are designed to be compatible with standard data formats, and thus suitable for integration into a clone processing pipeline. CATO provides both sequence alignment and visualizations to facilitate the analysis of cloning experiments. The alignment algorithm matches each of the relevant candidate sequences against each reference sequence. The visualization portion displays three levels of matching: 1) a top-level summary of the top candidate sequences aligned to each reference sequence, 2) a focused alignment view with the nucleotides of matched sequences displayed against one reference sequence, and 3) a pair-wise alignment of a single reference and candidate sequence pair. Users can select the minimum matching criteria for valid clones, edit or swap reference sequences, and export the results to a summary file as part of the high-throughput cloning workflow. PMID:27459605

  3. CATO: The Clone Alignment Tool

    PubMed Central

    Henstock, Peter V.; LaPan, Peter

    2016-01-01

    High-throughput cloning efforts produce large numbers of sequences that need to be aligned, edited, compared with reference sequences, and organized as files and selected clones. Different pieces of software are typically required to perform each of these tasks. We have designed a single piece of software, CATO, the Clone Alignment Tool, that allows a user to align, evaluate, edit, and select clone sequences based on comparisons to reference sequences. The input and output are designed to be compatible with standard data formats, and thus suitable for integration into a clone processing pipeline. CATO provides both sequence alignment and visualizations to facilitate the analysis of cloning experiments. The alignment algorithm matches each of the relevant candidate sequences against each reference sequence. The visualization portion displays three levels of matching: 1) a top-level summary of the top candidate sequences aligned to each reference sequence, 2) a focused alignment view with the nucleotides of matched sequences displayed against one reference sequence, and 3) a pair-wise alignment of a single reference and candidate sequence pair. Users can select the minimum matching criteria for valid clones, edit or swap reference sequences, and export the results to a summary file as part of the high-throughput cloning workflow. PMID:27459605

  4. Java Vertexing Tools

    SciTech Connect

    Strube, Jan; Graf, Norman; /SLAC

    2006-03-03

    This document describes the implementation of the topological vertex finding algorithm ZVTOP within the org.lcsim reconstruction and analysis framework. At the present date, Java vertexing tools allow users to perform topological vertexing on tracks that have been obtained from a Fast MC simulation. An implementation that will be able to handle fully reconstructed events is being designed from the ground up for longevity and maintainability.

  5. Phylo-VISTA: An interactive visualization tool for multiple DNAsequence alignments

    SciTech Connect

    Shah, Nameeta; Couronne, Olivier; Pennacchio, Len A.; Brudno,Michael; Batzoglou, Serafim; Bethel, E. Wes; Rubin, Edward M.; Hamann,Bernd; Dubchak, Inna

    2003-04-25

    Motivation. The power of multi-sequence comparison forbiological discovery is well established and sequence data from a growinglist of organisms is becoming available. Thus, a need exists forcomputational strategies to visually compare multiple aligned sequencesto support conservation analysis across various species. To be efficientthese visualization algorithms require the ability to universally handlea wide range of evolutionary distances while taking into accountphylogeny Results. We have developed Phylo-VISTA, an interactive tool foranalyzing multiple alignments by visualizing the similarity of DNAsequences among multiple species while considering their phylogenicrelationships. Features include a broad spectrum of resolution parametersfor examining the alignment and the ability to easily compare any subtreeof sequences within a complete alignment dataset. Phylo-VISTA uses VISTAconcepts that have been successfully applied previously to a wide rangeof comparative genomics data analysis problems. Availability Phylo-VISTAis an interactive java applet available for downloading athttp://graphics.cs.ucdavis.edu/~;nyshah/Phylo-VISTA. It is also availableon-line at http://www-gsd.lbl.gov/phylovista and is integrated with theglobal alignment program LAGAN athttp://lagan.stanford.edu.Contactphylovista@lbl.gov

  6. Theoretical assessment of feasibility to sequence DNA through interlayer electronic tunneling transport at aligned nanopores in bilayer graphene

    PubMed Central

    Prasongkit, Jariyanee; Feliciano, Gustavo T.; Rocha, Alexandre R.; He, Yuhui; Osotchan, Tanakorn; Ahuja, Rajeev; Scheicher, Ralph H.

    2015-01-01

    Fast, cost effective, single-shot DNA sequencing could be the prelude of a new era in genetics. As DNA encodes the information for the production of proteins in all known living beings on Earth, determining the nucleobase sequences is the first and necessary step in that direction. Graphene-based nanopore devices hold great promise for next-generation DNA sequencing. In this work, we develop a novel approach for sequencing DNA using bilayer graphene to read the interlayer conductance through the layers in the presence of target nucleobases. Classical molecular dynamics simulations of DNA translocation through the pore were performed to trace the nucleobase trajectories and evaluate the interaction between the nucleobases and the nanopore. This interaction stabilizes the bases in different orientations, resulting in smaller fluctuations of the nucleobases inside the pore. We assessed the performance of a bilayer graphene nanopore setup for the purpose of DNA sequencing by employing density functional theory and non-equilibrium Green’s function method to investigate the interlayer conductance of nucleobases coupling simultaneously to the top and bottom graphene layers. The obtained conductance is significantly affected by the presence of DNA in the bilayer graphene nanopore, allowing us to analyze DNA sequences. PMID:26634811

  7. Theoretical assessment of feasibility to sequence DNA through interlayer electronic tunneling transport at aligned nanopores in bilayer graphene.

    PubMed

    Prasongkit, Jariyanee; Feliciano, Gustavo T; Rocha, Alexandre R; He, Yuhui; Osotchan, Tanakorn; Ahuja, Rajeev; Scheicher, Ralph H

    2015-12-04

    Fast, cost effective, single-shot DNA sequencing could be the prelude of a new era in genetics. As DNA encodes the information for the production of proteins in all known living beings on Earth, determining the nucleobase sequences is the first and necessary step in that direction. Graphene-based nanopore devices hold great promise for next-generation DNA sequencing. In this work, we develop a novel approach for sequencing DNA using bilayer graphene to read the interlayer conductance through the layers in the presence of target nucleobases. Classical molecular dynamics simulations of DNA translocation through the pore were performed to trace the nucleobase trajectories and evaluate the interaction between the nucleobases and the nanopore. This interaction stabilizes the bases in different orientations, resulting in smaller fluctuations of the nucleobases inside the pore. We assessed the performance of a bilayer graphene nanopore setup for the purpose of DNA sequencing by employing density functional theory and non-equilibrium Green's function method to investigate the interlayer conductance of nucleobases coupling simultaneously to the top and bottom graphene layers. The obtained conductance is significantly affected by the presence of DNA in the bilayer graphene nanopore, allowing us to analyze DNA sequences.

  8. Theoretical assessment of feasibility to sequence DNA through interlayer electronic tunneling transport at aligned nanopores in bilayer graphene.

    PubMed

    Prasongkit, Jariyanee; Feliciano, Gustavo T; Rocha, Alexandre R; He, Yuhui; Osotchan, Tanakorn; Ahuja, Rajeev; Scheicher, Ralph H

    2015-01-01

    Fast, cost effective, single-shot DNA sequencing could be the prelude of a new era in genetics. As DNA encodes the information for the production of proteins in all known living beings on Earth, determining the nucleobase sequences is the first and necessary step in that direction. Graphene-based nanopore devices hold great promise for next-generation DNA sequencing. In this work, we develop a novel approach for sequencing DNA using bilayer graphene to read the interlayer conductance through the layers in the presence of target nucleobases. Classical molecular dynamics simulations of DNA translocation through the pore were performed to trace the nucleobase trajectories and evaluate the interaction between the nucleobases and the nanopore. This interaction stabilizes the bases in different orientations, resulting in smaller fluctuations of the nucleobases inside the pore. We assessed the performance of a bilayer graphene nanopore setup for the purpose of DNA sequencing by employing density functional theory and non-equilibrium Green's function method to investigate the interlayer conductance of nucleobases coupling simultaneously to the top and bottom graphene layers. The obtained conductance is significantly affected by the presence of DNA in the bilayer graphene nanopore, allowing us to analyze DNA sequences. PMID:26634811

  9. Differentiated evolutionary relationships among chordates from comparative alignments of multiple sequences of MyoD and MyoG myogenic regulatory factors.

    PubMed

    Oliani, L C; Lidani, K C F; Gabriel, J E

    2015-10-16

    MyoD and MyoG are transcription factors that have essential roles in myogenic lineage determination and muscle differentiation. The purpose of this study was to compare multiple amino acid sequences of myogenic regulatory proteins to infer evolutionary relationships among chordates. Protein sequences from Mus musculus (P10085 and P12979), human Homo sapiens (P15172 and P15173), bovine Bos taurus (Q7YS82 and Q7YS81), wild pig Sus scrofa (P49811 and P49812), quail Coturnix coturnix (P21572 and P34060), chicken Gallus gallus (P16075 and P17920), rat Rattus norvegicus (Q02346 and P20428), domestic water buffalo Bubalus bubalis (D2SP11 and A7L034), and sheep Ovis aries (Q90477 and D3YKV7) were searched from a non-redundant protein sequence database UniProtKB/Swiss-Prot, and subsequently analyzed using the Mega6.0 software. MyoD evolutionary analyses revealed the presence of three main clusters with all mammals branched in one cluster, members of the order Rodentia (mouse and rat) in a second branch linked to the first, and birds of the order Galliformes (chicken and quail) remaining isolated in a third. MyoG evolutionary analyses aligned sequences in two main clusters, all mammalian specimens grouped in different sub-branches, and birds clustered in a second branch. These analyses suggest that the evolution of MyoD and MyoG was driven by different pathways.

  10. Differentiated evolutionary relationships among chordates from comparative alignments of multiple sequences of MyoD and MyoG myogenic regulatory factors.

    PubMed

    Oliani, L C; Lidani, K C F; Gabriel, J E

    2015-01-01

    MyoD and MyoG are transcription factors that have essential roles in myogenic lineage determination and muscle differentiation. The purpose of this study was to compare multiple amino acid sequences of myogenic regulatory proteins to infer evolutionary relationships among chordates. Protein sequences from Mus musculus (P10085 and P12979), human Homo sapiens (P15172 and P15173), bovine Bos taurus (Q7YS82 and Q7YS81), wild pig Sus scrofa (P49811 and P49812), quail Coturnix coturnix (P21572 and P34060), chicken Gallus gallus (P16075 and P17920), rat Rattus norvegicus (Q02346 and P20428), domestic water buffalo Bubalus bubalis (D2SP11 and A7L034), and sheep Ovis aries (Q90477 and D3YKV7) were searched from a non-redundant protein sequence database UniProtKB/Swiss-Prot, and subsequently analyzed using the Mega6.0 software. MyoD evolutionary analyses revealed the presence of three main clusters with all mammals branched in one cluster, members of the order Rodentia (mouse and rat) in a second branch linked to the first, and birds of the order Galliformes (chicken and quail) remaining isolated in a third. MyoG evolutionary analyses aligned sequences in two main clusters, all mammalian specimens grouped in different sub-branches, and birds clustered in a second branch. These analyses suggest that the evolution of MyoD and MyoG was driven by different pathways. PMID:26505406

  11. Jannovar: a java library for exome annotation.

    PubMed

    Jäger, Marten; Wang, Kai; Bauer, Sebastian; Smedley, Damian; Krawitz, Peter; Robinson, Peter N

    2014-05-01

    Transcript-based annotation and pedigree analysis are two basic steps in the computational analysis of whole-exome sequencing experiments in genetic diagnostics and disease-gene discovery projects. Here, we present Jannovar, a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome and genome analysis. Jannovar uses an interval tree to identify all transcripts affected by a given variant, and provides Human Genome Variation Society-compliant annotations both for variants affecting coding sequences and splice junctions as well as untranslated regions and noncoding RNA transcripts. Jannovar can also perform family-based pedigree analysis with Variant Call Format (VCF) files with data from members of a family segregating a Mendelian disorder. Using a desktop computer, Jannovar requires a few seconds to annotate a typical VCF file with exome data. Jannovar is freely available under the BSD2 license. Source code as well as the Java application and library file can be downloaded from http://compbio.charite.de (with tutorial) and https://github.com/charite/jannovar. PMID:24677618

  12. MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures

    PubMed Central

    Díaz, David; Esteban, Francisco J.; Hernández, Pilar; Caballero, Juan Antonio; Guevara, Antonio

    2014-01-01

    We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification

  13. Implementation of NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael; Schultz, Matthew; Jin, Hao-Qiang; Yan, Jerry

    2000-01-01

    A number of features make Java an attractive but a debatable choice for High Performance Computing (HPC). In order to gauge the applicability of Java to the Computational Fluid Dynamics (CFD) we have implemented NAS Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would move Java closer to Fortran in the competition for CFD applications.

  14. FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications.

    PubMed

    Xiao, Chuan-Le; Mai, Zhi-Biao; Lian, Xin-Lei; Zhong, Jia-Yong; Jin, Jing-Jie; He, Qing-Yu; Zhang, Gong

    2014-01-01

    Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/. PMID:24743329

  15. FANSe2: A Robust and Cost-Efficient Alignment Tool for Quantitative Next-Generation Sequencing Applications

    PubMed Central

    Xiao, Chuan-Le; Mai, Zhi-Biao; Lian, Xin-Lei; Zhong, Jia-Yong; Jin, Jing-jie; He, Qing-Yu; Zhang, Gong

    2014-01-01

    Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/. PMID:24743329

  16. FANSe2: a robust and cost-efficient alignment tool for quantitative next-generation sequencing applications.

    PubMed

    Xiao, Chuan-Le; Mai, Zhi-Biao; Lian, Xin-Lei; Zhong, Jia-Yong; Jin, Jing-Jie; He, Qing-Yu; Zhang, Gong

    2014-01-01

    Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.

  17. Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis.

    PubMed

    Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron; Thompson, Julie Dawn

    2009-01-01

    The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented.

  18. Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis

    PubMed Central

    Aniba, Mohamed Radhouene; Siguenza, Sophie; Friedrich, Anne; Plewniak, Frédéric; Poch, Olivier; Marchler-Bauer, Aron

    2009-01-01

    The traditional approach to bioinformatics analyses relies on independent task-specific services and applications, using different input and output formats, often idiosyncratic, and frequently not designed to inter-operate. In general, such analyses were performed by experts who manually verified the results obtained at each step in the process. Today, the amount of bioinformatics information continuously being produced means that handling the various applications used to study this information presents a major data management and analysis challenge to researchers. It is now impossible to manually analyse all this information and new approaches are needed that are capable of processing the large-scale heterogeneous data in order to extract the pertinent information. We review the recent use of integrated expert systems aimed at providing more efficient knowledge extraction for bioinformatics research. A general methodology for building knowledge-based expert systems is described, focusing on the unstructured information management architecture, UIMA, which provides facilities for both data and process management. A case study involving a multiple alignment expert system prototype called AlexSys is also presented. PMID:18971242

  19. SigniSite: Identification of residue-level genotype-phenotype correlations in protein multiple sequence alignments.

    PubMed

    Jessen, Leon Eyrich; Hoof, Ilka; Lund, Ole; Nielsen, Morten

    2013-07-01

    Identifying which mutation(s) within a given genotype is responsible for an observable phenotype is important in many aspects of molecular biology. Here, we present SigniSite, an online application for subgroup-free residue-level genotype-phenotype correlation. In contrast to similar methods, SigniSite does not require any pre-definition of subgroups or binary classification. Input is a set of protein sequences where each sequence has an associated real number, quantifying a given phenotype. SigniSite will then identify which amino acid residues are significantly associated with the data set phenotype. As output, SigniSite displays a sequence logo, depicting the strength of the phenotype association of each residue and a heat-map identifying 'hot' or 'cold' regions. SigniSite was benchmarked against SPEER, a state-of-the-art method for the prediction of specificity determining positions (SDP) using a set of human immunodeficiency virus protease-inhibitor genotype-phenotype data and corresponding resistance mutation scores from the Stanford University HIV Drug Resistance Database, and a data set of protein families with experimentally annotated SDPs. For both data sets, SigniSite was found to outperform SPEER. SigniSite is available at: http://www.cbs.dtu.dk/services/SigniSite/. PMID:23761454

  20. Pre-Eocene rocks of Java, Indonesia

    USGS Publications Warehouse

    Ketner, Keith B.; Kastowo,; Modjo, Subroto; Naeser, C.W.; Obradovich, J.D.; Robinson, Keith; Suptandar, Tatan; Wikarno,

    1976-01-01

    The exposed pre-Eocene rocks of Java can be divided into two compound units for purposes of reconnaissance mapping and structural interpretation: a sedimentary sequence and melange. The sedimentary sequence consists of moderately deformed and little-metamorphosed conglomerate, sandstone, mudstone, claystone, chert, and limestone. The melange consists of a chaotic mechanical mixture of rocks identical to those of the sedimentary sequence and their metamorphic equivalents, such as schist, phyllite, quartzite, and marble. In addition, it contains a large proportion of quartz porphyry and smaller amounts of granite, basalt, gabbro, peridotite, pyroxenite, and serpentinite. The sedimentary sequence is at least partly of Early Cretaceous age and the melange is of Early Cretaceous to very early Paleocene age. They are overlain unconformably by Eocene rocks. The presence in the melange of blocks of quartz porphyry and granite is not easily reconcilable with current plate tectonic concepts in which the sites of formation of melange and plutonic rocks should be hundreds of kilometres apart.

  1. Use of Alignment-Free Phylogenetics for Rapid Genome Sequence-Based Typing of Helicobacter pylori Virulence Markers and Antibiotic Susceptibility

    PubMed Central

    Kusters, Johannes G.

    2015-01-01

    Whole-genome sequencing is becoming a leading technology in the typing and epidemiology of microbial pathogens, but the increase in genomic information necessitates significant investment in bioinformatic resources and expertise, and currently used methodologies struggle with genetically heterogeneous bacteria such as the human gastric pathogen Helicobacter pylori. Here we demonstrate that the alignment-free analysis method feature frequency profiling (FFP) can be used to rapidly construct phylogenetic trees of draft bacterial genome sequences on a standard desktop computer and that coupling with in silico genotyping methods gives useful information for comparative and clinical genomic and molecular epidemiology applications. FFP-based phylogenetic trees of seven gastric Helicobacter species matched those obtained by analysis of 16S rRNA genes and ribosomal proteins, and FFP- and core genome single nucleotide polymorphism-based analysis of 63 H. pylori genomes again showed comparable phylogenetic clustering, consistent with genomotypes assigned by using multilocus sequence typing (MLST). Analysis of 377 H. pylori genomes highlighted the conservation of genomotypes and linkage with phylogeographic characteristics and predicted the presence of an incomplete or nonfunctional cag pathogenicity island in 18/276 genomes. In silico analysis of antibiotic susceptibility markers suggests that most H. pylori hspAmerind and hspEAsia isolates are predicted to carry the T2812C mutation potentially conferring low-level clarithromycin resistance, while levels of metronidazole resistance were similar in all multilocus sequence types. In conclusion, the use of FFP phylogenetic clustering and in silico genotyping allows determination of genome evolution and phylogeographic clustering and can contribute to clinical microbiology by genomotyping for outbreak management and the prediction of pathogenic potential and antibiotic susceptibility. PMID:26135867

  2. JavaTech, an Introduction to Scientific and Technical Computing with Java

    NASA Astrophysics Data System (ADS)

    Lindsey, Clark S.; Tolliver, Johnny S.; Lindblad, Thomas

    2005-10-01

    Preface; Acknowledgements; Part I. Introduction to Java: 1. Introduction; 2. Language basics; 3. Classes and objects in Java; 4. More about objects in Java; 5. Organizing Java files and other practicalities; 6. Java graphics; 7. Graphical user interfaces; 8. Threads; 9. Java input/output; 10. Java utilities; 11. Image handling and processing; 12. More techniques and tips; Part II. Java and the Network: 13. Java networking basics; 14. A Java web server; 15. Client/server with sockets; 16. Distributed computing; 17. Distributed computing - the client; 18. Java remote method invocation (RMI); 19. CORBA; 20. Distributed computing - putting it all together; 21. Introduction to web services and XML; Part III. Out of the Sandbox: 22. The Java native interface (JNI); 23. Accessing the platform; 24. Embedded Java; Appendices; Index.

  3. JavaTech, an Introduction to Scientific and Technical Computing with Java

    NASA Astrophysics Data System (ADS)

    Lindsey, Clark S.; Tolliver, Johnny S.; Lindblad, Thomas

    2010-06-01

    Preface; Acknowledgements; Part I. Introduction to Java: 1. Introduction; 2. Language basics; 3. Classes and objects in Java; 4. More about objects in Java; 5. Organizing Java files and other practicalities; 6. Java graphics; 7. Graphical user interfaces; 8. Threads; 9. Java input/output; 10. Java utilities; 11. Image handling and processing; 12. More techniques and tips; Part II. Java and the Network: 13. Java networking basics; 14. A Java web server; 15. Client/server with sockets; 16. Distributed computing; 17. Distributed computing - the client; 18. Java remote method invocation (RMI); 19. CORBA; 20. Distributed computing - putting it all together; 21. Introduction to web services and XML; Part III. Out of the Sandbox: 22. The Java native interface (JNI); 23. Accessing the platform; 24. Embedded Java; Appendices; Index.

  4. The influence of alignment-free sequence representations on the semi-supervised classification of class C G protein-coupled receptors: semi-supervised classification of class C GPCRs.

    PubMed

    Cruz-Barbosa, Raúl; Vellido, Alfredo; Giraldo, Jesús

    2015-02-01

    G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The tertiary structure of the transmembrane domain, a gate to the study of protein functionality, is unknown for almost all members of class C GPCRs, which are the target of the current study. As a result, their investigation must often rely on alignments of their amino acid sequences. Sequence alignment entails the risk of missing relevant information. Various approaches have attempted to circumvent this risk through alignment-free transformations of the sequences on the basis of different amino acid physicochemical properties. In this paper, we use several of these alignment-free methods, as well as a basic amino acid composition representation, to transform the available sequences. Novel semi-supervised statistical machine learning methods are then used to discriminate the different class C GPCRs types from the transformed data. This approach is relevant due to the existence of orphan proteins to which type labels should be assigned in a process of deorphanization or reverse pharmacology. The reported experiments show that the proposed techniques provide accurate classification even in settings of extreme class-label scarcity and that fair accuracy can be achieved even with very simple transformation strategies that ignore the sequence ordering.

  5. GramAlign: fast alignment driven by grammar-based phylogeny.

    PubMed

    Russell, David J

    2014-01-01

    Multiple sequence alignment involves identifying related subsequences among biological sequences. When matches are found, the associated pieces are shifted so that when sequences are presented as successive rows-one sequence per row-homologous residues line-up in columns. Exact alignment of more than a few sequences is known to be computationally prohibitive. Thus many heuristic algorithms have been developed to produce good alignments in an efficient amount of time by determining an order by which pairs of sequences are progressively aligned and merged. GRAMALIGN is such a progressive alignment algorithm that uses a grammar-based relative complexity distance metric to determine the alignment order. This technique allows for a computationally efficient and scalable program useful for aligning both large numbers of sequences and sets of long sequences quickly. The GRAMALIGN software is available at http://bioinfo.unl.edu/gramalign.php for both source code download and a web-based alignment server.

  6. Java PathFinder: A Translator From Java to Promela

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus

    1999-01-01

    JAVA PATHFINDER, JPF, is a prototype translator from JAVA to PROMELA, the modeling language of the SPIN model checker. JPF is a product of a major effort by the Automated Software Engineering group at NASA Ames to make model checking technology part of the software process. Experience has shown that severe bugs can be found in final code using this technique, and that automated translation from a programming language to a modeling language like PROMELA can help reducing the effort required.

  7. Combining Multiple Pairwise Structure-based Alignments

    SciTech Connect

    2014-11-12

    CombAlign is a new Python code that generates a gapped, one-to-many, multiple structure-based sequence alignment(MSSA) given a set of pairwise structure-based alignments. In order to better define regions of similarity among related protein structures, it is useful to detect the residue-residue correspondences among a set of pairwise structure alignments. Few codes exist for constructing a one-to-many, multiple sequence alignment derived from a set of structure alignments, and we perceived a need for creating a new tool for combing pairwise structure alignments that would allow for insertion of gaps in the reference structure.

  8. Java applets for Physics Instruction

    NASA Astrophysics Data System (ADS)

    Dukes, Phillip; Hatch, Dorian

    1998-10-01

    We will present a number of Web accessible single concept Java applets created at Brigham Young University and designed for teaching introductory physics. Java applet based problems, along with digitized video, are being developed as part of a web-based physics course at Brigham Young University. It is believed that video and animation can significantly influence students' visualization of some types of physics and that use of these types of tools could significantly improve their understanding of physics concepts.

  9. Model Checker for Java Programs

    NASA Technical Reports Server (NTRS)

    Visser, Willem

    2007-01-01

    Java Pathfinder (JPF) is a verification and testing environment for Java that integrates model checking, program analysis, and testing. JPF consists of a custom-made Java Virtual Machine (JVM) that interprets bytecode, combined with a search interface to allow the complete behavior of a Java program to be analyzed, including interleavings of concurrent programs. JPF is implemented in Java, and its architecture is highly modular to support rapid prototyping of new features. JPF is an explicit-state model checker, because it enumerates all visited states and, therefore, suffers from the state-explosion problem inherent in analyzing large programs. It is suited to analyzing programs less than 10kLOC, but has been successfully applied to finding errors in concurrent programs up to 100kLOC. When an error is found, a trace from the initial state to the error is produced to guide the debugging. JPF works at the bytecode level, meaning that all of Java can be model-checked. By default, the software checks for all runtime errors (uncaught exceptions), assertions violations (supports Java s assert), and deadlocks. JPF uses garbage collection and symmetry reductions of the heap during model checking to reduce state-explosion, as well as dynamic partial order reductions to lower the number of interleavings analyzed. JPF is capable of symbolic execution of Java programs, including symbolic execution of complex data such as linked lists and trees. JPF is extensible as it allows for the creation of listeners that can subscribe to events during searches. The creation of dedicated code to be executed in place of regular classes is supported and allows users to easily handle native calls and to improve the efficiency of the analysis.

  10. Java based open architecture controller

    SciTech Connect

    Weinert, G F

    2000-01-13

    At Lawrence Livermore National Laboratory (LLNL) the authors have been developing an open architecture machine tool controller. This work has been patterned after the General Motors (GM) led Open Modular Architecture Controller (OMAC) work, where they have been involved since its inception. The OMAC work has centered on creating sets of implementation neutral application programming interfaces (APIs) for machine control software components. In the work at LLNL, they were among the early adopters of the Java programming language. As an application programming language, it is particularly well suited for component software development. The language contains many features, which along with a well-defined implementation API (such as the OMAC APIs) allows third party binary files to be integrated into a working system. Because of its interpreted nature, Java allows rapid integration testing of components. However, for real-time systems development, the Java programming language presents many drawbacks. For instance, lack of well defined scheduling semantics and threading behavior can present many unwanted challenges. Also, the interpreted nature of the standard Java Virtual Machine (JVM) presents an immediate performance hit. Various real-time Java vendors are currently addressing some of these drawbacks. The various pluses and minuses of using the Java programming language and environment, with regard to a component-based controller, will be outlined.

  11. Going back to Java.

    PubMed

    Critchfield, R

    1985-01-01

    In Indonesia, achievements in food production have helped lower the country's deaths rates and increase life expectancy, making concern about the birthrate all the more critical, particularly in the already crowded Java. Indonesia's rice production in 1985 is expected to reach 26.3 million tons, 58% more than the 1975-79 average. With every country except Malaysia now self-sufficient or surplus in rice, the world market price for rice has dropped markedly. Indonesia's National Logistics Board (BULOG), which aims to establish a floor price for rice, has had to stockpile 3.5 million tons, double its normal reserve and enough for 3 years. Some of it has been kept 2 years already, but it cannot be exported as the quality is low and everybody else also has plenty of rice. Peasants and agriculture experts agree that alternatives to rice pose greater risks in terms of weather and disease. Whatever the government does, rice prices have dropped sharply and are likely to stay down. Fertilizer use can also be expected to decline for the 1st time in years. Indonesia is the scene of a scientific breakthrough, a new hybrid seed corn that grows in the tropics. If seed companies are able to sell seed for half of Indonesia's existing corn acreage, this would be an increase of 1.3 million tons, which would mostly be a surplus to be used for export, processing, or increased human or animal consumption. In revisiting Indonesia, the biggest dissapointment is the failure of family planning to slow the rate of population growth more drastically. 5 years ago, Indonesia's family planning program, started in 1970, appeared a great success. Countrywide, the proportion of women aged 15-44 using contraceptives increased from almost nothing to almost 40% and in Bali topped 60%. Indonesia's overall annual population growth rate had dropped to 1.7%, raising hopes it could be brought down to the 1.2% rate of East Java and Bali by 1985. What has happended instead is that an unexpectedly fast

  12. Going back to Java.

    PubMed

    Critchfield, R

    1985-01-01

    In Indonesia, achievements in food production have helped lower the country's deaths rates and increase life expectancy, making concern about the birthrate all the more critical, particularly in the already crowded Java. Indonesia's rice production in 1985 is expected to reach 26.3 million tons, 58% more than the 1975-79 average. With every country except Malaysia now self-sufficient or surplus in rice, the world market price for rice has dropped markedly. Indonesia's National Logistics Board (BULOG), which aims to establish a floor price for rice, has had to stockpile 3.5 million tons, double its normal reserve and enough for 3 years. Some of it has been kept 2 years already, but it cannot be exported as the quality is low and everybody else also has plenty of rice. Peasants and agriculture experts agree that alternatives to rice pose greater risks in terms of weather and disease. Whatever the government does, rice prices have dropped sharply and are likely to stay down. Fertilizer use can also be expected to decline for the 1st time in years. Indonesia is the scene of a scientific breakthrough, a new hybrid seed corn that grows in the tropics. If seed companies are able to sell seed for half of Indonesia's existing corn acreage, this would be an increase of 1.3 million tons, which would mostly be a surplus to be used for export, processing, or increased human or animal consumption. In revisiting Indonesia, the biggest dissapointment is the failure of family planning to slow the rate of population growth more drastically. 5 years ago, Indonesia's family planning program, started in 1970, appeared a great success. Countrywide, the proportion of women aged 15-44 using contraceptives increased from almost nothing to almost 40% and in Bali topped 60%. Indonesia's overall annual population growth rate had dropped to 1.7%, raising hopes it could be brought down to the 1.2% rate of East Java and Bali by 1985. What has happended instead is that an unexpectedly fast

  13. JAVA Stereo Display Toolkit

    NASA Technical Reports Server (NTRS)

    Edmonds, Karina

    2008-01-01

    This toolkit provides a common interface for displaying graphical user interface (GUI) components in stereo using either specialized stereo display hardware (e.g., liquid crystal shutter or polarized glasses) or anaglyph display (red/blue glasses) on standard workstation displays. An application using this toolkit will work without modification in either environment, allowing stereo software to reach a wider audience without sacrificing high-quality display on dedicated hardware. The toolkit is written in Java for use with the Swing GUI Toolkit and has cross-platform compatibility. It hooks into the graphics system, allowing any standard Swing component to be displayed in stereo. It uses the OpenGL graphics library to control the stereo hardware and to perform the rendering. It also supports anaglyph and special stereo hardware using the same API (application-program interface), and has the ability to simulate color stereo in anaglyph mode by combining the red band of the left image with the green/blue bands of the right image. This is a low-level toolkit that accomplishes simply the display of components (including the JadeDisplay image display component). It does not include higher-level functions such as disparity adjustment, 3D cursor, or overlays all of which can be built using this toolkit.

  14. Java Radar Analysis Tool

    NASA Technical Reports Server (NTRS)

    Zaczek, Mariusz P.

    2005-01-01

    Java Radar Analysis Tool (JRAT) is a computer program for analyzing two-dimensional (2D) scatter plots derived from radar returns showing pieces of the disintegrating Space Shuttle Columbia. JRAT can also be applied to similar plots representing radar returns showing aviation accidents, and to scatter plots in general. The 2D scatter plots include overhead map views and side altitude views. The superposition of points in these views makes searching difficult. JRAT enables three-dimensional (3D) viewing: by use of a mouse and keyboard, the user can rotate to any desired viewing angle. The 3D view can include overlaid trajectories and search footprints to enhance situational awareness in searching for pieces. JRAT also enables playback: time-tagged radar-return data can be displayed in time order and an animated 3D model can be moved through the scene to show the locations of the Columbia (or other vehicle) at the times of the corresponding radar events. The combination of overlays and playback enables the user to correlate a radar return with a position of the vehicle to determine whether the return is valid. JRAT can optionally filter single radar returns, enabling the user to selectively hide or highlight a desired radar return.

  15. Java Application Shell: A Framework for Piecing Together Java Applications

    NASA Technical Reports Server (NTRS)

    Miller, Philip; Powers, Edward I. (Technical Monitor)

    2001-01-01

    This session describes the architecture of Java Application Shell (JAS), a Swing-based framework for developing interactive Java applications. Java Application Shell is being developed by Commerce One, Inc. for NASA Goddard Space Flight Center Code 588. The purpose of JAS is to provide a framework for the development of Java applications, providing features that enable the development process to be more efficient, consistent and flexible. Fundamentally, JAS is based upon an architecture where an application is considered a collection of 'plugins'. In turn, a plug-in is a collection of Swing actions defined using XML and packaged in a jar file. Plug-ins may be local to the host platform or remotely-accessible through HTTP. Local and remote plugins are automatically discovered by JAS upon application startup; plugins may also be loaded dynamically without having to re-start the application. Using Extensible Markup Language (XML) to define actions, as opposed to hardcoding them in application logic, allows easier customization of application-specific operations by separating application logic from presentation. Through XML, a developer defines an action that may appear on any number of menus, toolbars, and buttons. Actions maintain and propagate enable/disable states and specify icons, tool-tips, titles, etc. Furthermore, JAS allows actions to be implemented using various scripting languages through the use of IBM's Bean Scripting Framework. Scripted action implementation is seamless to the end-user. In addition to action implementation, scripts may be used for application and unit-level testing. In the case of application-level testing, JAS has hooks to assist a script in simulating end-user input. JAS also provides property and user preference management, JavaHelp, Undo/Redo, Multi-Document Interface, Single-Document Interface, printing, and logging. Finally, Jini technology has also been included into the framework by means of a Jini services browser and the

  16. Model Checking Real Time Java Using Java PathFinder

    NASA Technical Reports Server (NTRS)

    Lindstrom, Gary; Mehlitz, Peter C.; Visser, Willem

    2005-01-01

    The Real Time Specification for Java (RTSJ) is an augmentation of Java for real time applications of various degrees of hardness. The central features of RTSJ are real time threads; user defined schedulers; asynchronous events, handlers, and control transfers; a priority inheritance based default scheduler; non-heap memory areas such as immortal and scoped, and non-heap real time threads whose execution is not impeded by garbage collection. The Robust Software Systems group at NASA Ames Research Center has JAVA PATHFINDER (JPF) under development, a Java model checker. JPF at its core is a state exploring JVM which can examine alternative paths in a Java program (e.g., via backtracking) by trying all nondeterministic choices, including thread scheduling order. This paper describes our implementation of an RTSJ profile (subset) in JPF, including requirements, design decisions, and current implementation status. Two examples are analyzed: jobs on a multiprogramming operating system, and a complex resource contention example involving autonomous vehicles crossing an intersection. The utility of JPF in finding logic and timing errors is illustrated, and the remaining challenges in supporting all of RTSJ are assessed.

  17. Monitoring Java Programs with Java PathExplorer

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus; Rosu, Grigore; Clancy, Daniel (Technical Monitor)

    2001-01-01

    We present recent work on the development Java PathExplorer (JPAX), a tool for monitoring the execution of Java programs. JPAX can be used during program testing to gain increased information about program executions, and can potentially furthermore be applied during operation to survey safety critical systems. The tool facilitates automated instrumentation of a program's late code which will then omit events to an observer during its execution. The observer checks the events against user provided high level requirement specifications, for example temporal logic formulae, and against lower level error detection procedures, for example concurrency related such as deadlock and data race algorithms. High level requirement specifications together with their underlying logics are defined in the Maude rewriting logic, and then can either be directly checked using the Maude rewriting engine, or be first translated to efficient data structures and then checked in Java.

  18. AVID: A global alignment program.

    PubMed

    Bray, Nick; Dubchak, Inna; Pachter, Lior

    2003-01-01

    In this paper we describe a new global alignment method called AVID. The method is designed to be fast, memory efficient, and practical for sequence alignments of large genomic regions up to megabases long. We present numerous applications of the method, ranging from the comparison of assemblies to alignment of large syntenic genomic regions and whole genome human/mouse alignments. We have also performed a quantitative comparison of AVID with other popular alignment tools. To this end, we have established a format for the representation of alignments and methods for their comparison. These formats and methods should be useful for future studies. The tools we have developed for the alignment comparisons, as well as the AVID program, are publicly available. See Web Site References section for AVID Web address and Web addresses for other programs discussed in this paper. PMID:12529311

  19. AVID: A global alignment program.

    PubMed

    Bray, Nick; Dubchak, Inna; Pachter, Lior

    2003-01-01

    In this paper we describe a new global alignment method called AVID. The method is designed to be fast, memory efficient, and practical for sequence alignments of large genomic regions up to megabases long. We present numerous applications of the method, ranging from the comparison of assemblies to alignment of large syntenic genomic regions and whole genome human/mouse alignments. We have also performed a quantitative comparison of AVID with other popular alignment tools. To this end, we have established a format for the representation of alignments and methods for their comparison. These formats and methods should be useful for future studies. The tools we have developed for the alignment comparisons, as well as the AVID program, are publicly available. See Web Site References section for AVID Web address and Web addresses for other programs discussed in this paper.

  20. Enhancing Web applications in radiology with Java: estimating MR imaging relaxation times.

    PubMed

    Dagher, A P; Fitzpatrick, M; Flanders, A E; Eng, J

    1998-01-01

    Java is a relatively new programming language that has been used to develop a World Wide Web-based tool for estimating magnetic resonance (MR) imaging relaxation times, thereby demonstrating how Java may be used for Web-based radiology applications beyond improving the user interface of teaching files. A standard processing algorithm coded with Java is downloaded along with the hypertext markup language (HTML) document. The user (client) selects the desired pulse sequence and inputs data obtained from a region of interest on the MR images. The algorithm is used to modify selected MR imaging parameters in an equation that models the phenomenon being evaluated. MR imaging relaxation times are estimated, and confidence intervals and a P value expressing the accuracy of the final results are calculated. Design features such as simplicity, object-oriented programming, and security restrictions allow Java to expand the capabilities of HTML by offering a more versatile user interface that includes dynamic annotations and graphics. Java also allows the client to perform more sophisticated information processing and computation than is usually associated with Web applications. Java is likely to become a standard programming option, and the development of stand-alone Java applications may become more common as Java is integrated into future versions of computer operating systems.

  1. MzJava: An open source library for mass spectrometry data processing.

    PubMed

    Horlacher, Oliver; Nikitin, Frederic; Alocci, Davide; Mariethoz, Julien; Müller, Markus; Lisacek, Frederique

    2015-11-01

    Mass spectrometry (MS) is a widely used and evolving technique for the high-throughput identification of molecules in biological samples. The need for sharing and reuse of code among bioinformaticians working with MS data prompted the design and implementation of MzJava, an open-source Java Application Programming Interface (API) for MS related data processing. MzJava provides data structures and algorithms for representing and processing mass spectra and their associated biological molecules, such as metabolites, glycans and peptides. MzJava includes functionality to perform mass calculation, peak processing (e.g. centroiding, filtering, transforming), spectrum alignment and clustering, protein digestion, fragmentation of peptides and glycans as well as scoring functions for spectrum-spectrum and peptide/glycan-spectrum matches. For data import and export MzJava implements readers and writers for commonly used data formats. For many classes support for the Hadoop MapReduce (hadoop.apache.org) and Apache Spark (spark.apache.org) frameworks for cluster computing was implemented. The library has been developed applying best practices of software engineering. To ensure that MzJava contains code that is correct and easy to use the library's API was carefully designed and thoroughly tested. MzJava is an open-source project distributed under the AGPL v3.0 licence. MzJava requires Java 1.7 or higher. Binaries, source code and documentation can be downloaded from http://mzjava.expasy.org and https://bitbucket.org/sib-pig/mzjava. This article is part of a Special Issue entitled: Computational Proteomics.

  2. MzJava: An open source library for mass spectrometry data processing.

    PubMed

    Horlacher, Oliver; Nikitin, Frederic; Alocci, Davide; Mariethoz, Julien; Müller, Markus; Lisacek, Frederique

    2015-11-01

    Mass spectrometry (MS) is a widely used and evolving technique for the high-throughput identification of molecules in biological samples. The need for sharing and reuse of code among bioinformaticians working with MS data prompted the design and implementation of MzJava, an open-source Java Application Programming Interface (API) for MS related data processing. MzJava provides data structures and algorithms for representing and processing mass spectra and their associated biological molecules, such as metabolites, glycans and peptides. MzJava includes functionality to perform mass calculation, peak processing (e.g. centroiding, filtering, transforming), spectrum alignment and clustering, protein digestion, fragmentation of peptides and glycans as well as scoring functions for spectrum-spectrum and peptide/glycan-spectrum matches. For data import and export MzJava implements readers and writers for commonly used data formats. For many classes support for the Hadoop MapReduce (hadoop.apache.org) and Apache Spark (spark.apache.org) frameworks for cluster computing was implemented. The library has been developed applying best practices of software engineering. To ensure that MzJava contains code that is correct and easy to use the library's API was carefully designed and thoroughly tested. MzJava is an open-source project distributed under the AGPL v3.0 licence. MzJava requires Java 1.7 or higher. Binaries, source code and documentation can be downloaded from http://mzjava.expasy.org and https://bitbucket.org/sib-pig/mzjava. This article is part of a Special Issue entitled: Computational Proteomics. PMID:26141507

  3. JAVA based LCD Reconstruction and Analysis Tools

    SciTech Connect

    Bower, G.

    2004-10-11

    We summarize the current status and future developments of the North American Group's Java-based system for studying physics and detector design issues at a linear collider. The system is built around Java Analysis Studio (JAS) an experiment-independent Java-based utility for data analysis. Although the system is an integrated package running in JAS, many parts of it are also standalone Java utilities.

  4. JavaGenes Molecular Evolution

    NASA Technical Reports Server (NTRS)

    Lohn, Jason; Smith, David; Frank, Jeremy; Globus, Al; Crawford, James

    2007-01-01

    JavaGenes is a general-purpose, evolutionary software system written in Java. It implements several versions of a genetic algorithm, simulated annealing, stochastic hill climbing, and other search techniques. This software has been used to evolve molecules, atomic force field parameters, digital circuits, Earth Observing Satellite schedules, and antennas. This version differs from version 0.7.28 in that it includes the molecule evolution code and other improvements. Except for the antenna code, JaveGenes is available for NASA Open Source distribution.

  5. Dynamic triggering of Lusi, East Java Basin

    NASA Astrophysics Data System (ADS)

    Lupi, Matteo; Saenger, Erik H.; Fuchs, Florian; Miller, Steve

    2016-04-01

    On the 27th of May 2006, a M6.3 strike slip earthquake struck beneath Yogyakarta, Java. Forty-seven hours later a mixture of mud, breccia, and gas reached the surface near Sidoarjo, 250 km far from the epicenter, creating several mud vents aligned along a NW-SE direction. The mud eruption reached a peak of 180.000 km3 of erupted material per day and it is still ongoing. The major eruption crater was named Lusi and represents the surface expression of a newborn sedimentary-hosted hydrothermal system. Lusi flooded several villages causing a loss of approximately 4 billions to Indonesia. Previous geochemical and geological data suggest that the Yogyakarta earthquake may have reactivated parts of the Watukosek fault system, a strike slip structure upon which Lusi resides. The Watukosek fault systems connects the East Java basin to the volcanic arc, which may explain the presence of both biogenic and thermogenic fluids. To quantify the effects of incoming seismic energy at Lusi we conducted a seismic wave propagation study on a geological model of Lusi's structure. A key feature of our model is a low velocity shear zone in the Kalibeng formation caused by elevated pore pressures, which is often neglected in other studies. Our analysis highlights the importance of the overall geological structure that focused the seismic energy causing elevated strain rates at depth. In particular, we show that body waves generated by the Yogyakarta earthquake may have induced liquefaction of the Kalibeng formation. As consequence, the liquefied mud injected and reactivated parts of the Watukosek fault system. Our findings are in agreement with previous studies suggesting that Lusi was an unfortunate case of dynamic triggering promoted by the Yogyakarta earthquake.

  6. Alignment validation

    SciTech Connect

    ALICE; ATLAS; CMS; LHCb; Golling, Tobias

    2008-09-06

    The four experiments, ALICE, ATLAS, CMS and LHCb are currently under constructionat CERN. They will study the products of proton-proton collisions at the Large Hadron Collider. All experiments are equipped with sophisticated tracking systems, unprecedented in size and complexity. Full exploitation of both the inner detector andthe muon system requires an accurate alignment of all detector elements. Alignmentinformation is deduced from dedicated hardware alignment systems and the reconstruction of charged particles. However, the system is degenerate which means the data is insufficient to constrain all alignment degrees of freedom, so the techniques are prone to converging on wrong geometries. This deficiency necessitates validation and monitoring of the alignment. An exhaustive discussion of means to validate is subject to this document, including examples and plans from all four LHC experiments, as well as other high energy experiments.

  7. Features of the Java commodity grid kit.

    SciTech Connect

    von Laszewski, G.; Gawor, J.; Lane, P.; Rehn, N.; Russell, M.; Mathematics and Computer Science

    2002-11-01

    In this paper we report on the features of the Java Commodity Grid Kit (Java CoG Kit). The Java CoG Kit provides middleware for accessing Grid functionality from the Java framework. Java CoG Kit middleware is general enough to design a variety of advanced Grid applications with quite different user requirements. Access to the Grid is established via Globus Toolkit protocols, allowing the Java CoG Kit to also communicate with the services distributed as part of the C Globus Toolkit reference implementation. Thus, the Java CoG Kit provides Grid developers with the ability to utilize the Grid, as well as numerous additional libraries and frameworks developed by the Java community to enable network, Internet, enterprise and peer-to-peer computing. A variety of projects have successfully used the client libraries of the Java CoG Kit to access Grids driven by the C Globus Toolkit software. In this paper we also report on the efforts to develop serverside Java CoG Kit components. As part of this research we have implemented a prototype pure Java resource management system that enables one to run Grid jobs on platforms on which a Java virtual machine is supported, including Windows NT machines.

  8. Java: An Explosion on the Internet.

    ERIC Educational Resources Information Center

    Read, Tim; Hall, Hazel

    Summer 1995 saw the release, with considerable media attention, of draft versions of Sun Microsystems' Java computer programming language and the HotJava browser. Java has been heralded as the latest "killer" technology in the Internet explosion. Sun Microsystems and numerous companies including Microsoft, IBM, and Netscape have agreed upon…

  9. DNAAlignEditor: DNA alignment editor tool

    PubMed Central

    Sanchez-Villeda, Hector; Schroeder, Steven; Flint-Garcia, Sherry; Guill, Katherine E; Yamasaki, Masanori; McMullen, Michael D

    2008-01-01

    Background With advances in DNA re-sequencing methods and Next-Generation parallel sequencing approaches, there has been a large increase in genomic efforts to define and analyze the sequence variability present among individuals within a species. For very polymorphic species such as maize, this has lead to a need for intuitive, user-friendly software that aids the biologist, often with naïve programming capability, in tracking, editing, displaying, and exporting multiple individual sequence alignments. To fill this need we have developed a novel DNA alignment editor. Results We have generated a nucleotide sequence alignment editor (DNAAlignEditor) that provides an intuitive, user-friendly interface for manual editing of multiple sequence alignments with functions for input, editing, and output of sequence alignments. The color-coding of nucleotide identity and the display of associated quality score aids in the manual alignment editing process. DNAAlignEditor works as a client/server tool having two main components: a relational database that collects the processed alignments and a user interface connected to database through universal data access connectivity drivers. DNAAlignEditor can be used either as a stand-alone application or as a network application with multiple users concurrently connected. Conclusion We anticipate that this software will be of general interest to biologists and population genetics in editing DNA sequence alignments and analyzing natural sequence variation regardless of species, and will be particularly useful for manual alignment editing of sequences in species with high levels of polymorphism. PMID:18366684

  10. Alignment fixture

    DOEpatents

    Bell, Grover C.; Gibson, O. Theodore

    1980-01-01

    A part alignment fixture is provided which may be used for precise variable lateral and tilt alignment relative to the fixture base of various shaped parts. The fixture may be used as a part holder for machining or inspection of parts or alignment of parts during assembly and the like. The fixture includes a precisely machined diameter disc-shaped hub adapted to receive the part to be aligned. The hub is nested in a guide plate which is adapted to carry two oppositely disposed pairs of positioning wedges so that the wedges may be reciprocatively positioned by means of respective micrometer screws. The sloping faces of the wedges contact the hub at respective quadrants of the hub periphery. The lateral position of the hub relative to the guide plate is adjusted by positioning the wedges with the associated micrometer screws. The tilt of the part is adjusted relative to a base plate, to which the guide plate is pivotally connected by means of a holding plate. Two pairs of oppositely disposed wedges are mounted for reciprocative lateral positioning by means of separate micrometer screws between flanges of the guide plate and the base plate. Once the wedges are positioned to achieve the proper tilt of the part or hub on which the part is mounted relative to the base plate, the fixture may be bolted to a machining, inspection, or assembly device.

  11. Curriculum Alignment.

    ERIC Educational Resources Information Center

    Crowell, Ronald; Tissot, Paula

    Curriculum alignment (CA) refers to the congruence of all the elements of a school's curriculum: curriculum goals; instructional program--what is taught and the materials used; and tests used to judge outcomes. CA can be a very powerful can be a very powerful factor in improving schools. Although further research is needed on CA, there is…

  12. Java Mission Evaluation Workstation System

    NASA Technical Reports Server (NTRS)

    Pettinger, Ross; Watlington, Tim; Ryley, Richard; Harbour, Jeff

    2006-01-01

    The Java Mission Evaluation Workstation System (JMEWS) is a collection of applications designed to retrieve, display, and analyze both real-time and recorded telemetry data. This software is currently being used by both the Space Shuttle Program (SSP) and the International Space Station (ISS) program. JMEWS was written in the Java programming language to satisfy the requirement of platform independence. An object-oriented design was used to satisfy additional requirements and to make the software easily extendable. By virtue of its platform independence, JMEWS can be used on the UNIX workstations in the Mission Control Center (MCC) and on office computers. JMEWS includes an interactive editor that allows users to easily develop displays that meet their specific needs. The displays can be developed and modified while viewing data. By simply selecting a data source, the user can view real-time, recorded, or test data.

  13. The twilight zone of cis element alignments.

    PubMed

    Sebastian, Alvaro; Contreras-Moreira, Bruno

    2013-02-01

    Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein-DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein-DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.

  14. JESS: Java extensible snakes system

    NASA Astrophysics Data System (ADS)

    McInerney, Tim; Akhavan Sharif, M. Reza; Pashotanizadeh, Nasrin

    2005-04-01

    Snakes (Active Contour Models) are powerful model-based image segmentation tools. Although researchers have proven them especially useful in medical image analysis over the past decade, Snakes have remained primarily in the academic world and they have not become widely used in clinical practice or widely available in commercial packages. A number of confusing and specialized variants exist and there has been no standard open-source implementation available. To address this problem, we present a Java Extensible Snakes System (JESS) that is general, portable, and extensible. The system uses Java Swing classes to allow for the rapid development of custom graphical user interfaces (GUI's). It also incorporates the Java Advanced Imaging(JAI) class library, which provide custom image preprocessing, image display and general image I/O. The Snakes algorithm itself is written in a hierarchical fashion, consisting of a general Snake class and several subclasses that span the main variants of Snakes including a new, powerful, robust subdivision-curve Snake. These subclasses can be easily and quickly extended and customized for any specific segmentation and analysis task. We demonstrate the utility of these classes for segmenting various anatomical structures from 2D medical images. We also demonstrate the effectiveness of JESS by using it to rapidly build a prototype semi-automatic sperm analysis system. The JESS software will be made publicly available in early 2005.

  15. ALIGNING JIG

    DOEpatents

    Culver, J.S.; Tunnell, W.C.

    1958-08-01

    A jig or device is described for setting or aligning an opening in one member relative to another member or structure, with a predetermined offset, or it may be used for measuring the amount of offset with which the parts have previously been sct. This jig comprises two blocks rabbeted to each other, with means for securing thc upper block to the lower block. The upper block has fingers for contacting one of the members to be a1igmed, the lower block is designed to ride in grooves within the reference member, and calibration marks are provided to determine the amount of offset. This jig is specially designed to align the collimating slits of a mass spectrometer.

  16. Image alignment

    SciTech Connect

    Dowell, Larry Jonathan

    2014-04-22

    Disclosed is a method and device for aligning at least two digital images. An embodiment may use frequency-domain transforms of small tiles created from each image to identify substantially similar, "distinguishing" features within each of the images, and then align the images together based on the location of the distinguishing features. To accomplish this, an embodiment may create equal sized tile sub-images for each image. A "key" for each tile may be created by performing a frequency-domain transform calculation on each tile. A information-distance difference between each possible pair of tiles on each image may be calculated to identify distinguishing features. From analysis of the information-distance differences of the pairs of tiles, a subset of tiles with high discrimination metrics in relation to other tiles may be located for each image. The subset of distinguishing tiles for each image may then be compared to locate tiles with substantially similar keys and/or information-distance metrics to other tiles of other images. Once similar tiles are located for each image, the images may be aligned in relation to the identified similar tiles.

  17. Jess, the Java expert system shell

    SciTech Connect

    Friedman-Hill, E.J.

    1997-11-01

    This report describes Jess, a clone of the popular CLIPS expert system shell written entirely in Java. Jess supports the development of rule-based expert systems which can be tightly coupled to code written in the powerful, portable Java language. The syntax of the Jess language is discussed, and a comprehensive list of supported functions is presented. A guide to extending Jess by writing Java code is also included.

  18. Performance and Scalability of the NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan A. (Technical Monitor)

    2002-01-01

    Several features make Java an attractive choice for scientific applications. In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for scientific applications.

  19. Implementation of the NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Frumkin, Michael A.; Schultz, Matthew; Jin, Haoqiang; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2002-01-01

    Several features make Java an attractive choice for High Performance Computing (HPC). In order to gauge the applicability of Java to Computational Fluid Dynamics (CFD), we have implemented the NAS (NASA Advanced Supercomputing) Parallel Benchmarks in Java. The performance and scalability of the benchmarks point out the areas where improvement in Java compiler technology and in Java thread implementation would position Java closer to Fortran in the competition for CFD applications.

  20. Combining Multiple Pairwise Structure-based Alignments

    2014-11-12

    CombAlign is a new Python code that generates a gapped, one-to-many, multiple structure-based sequence alignment(MSSA) given a set of pairwise structure-based alignments. In order to better define regions of similarity among related protein structures, it is useful to detect the residue-residue correspondences among a set of pairwise structure alignments. Few codes exist for constructing a one-to-many, multiple sequence alignment derived from a set of structure alignments, and we perceived a need for creating a newmore » tool for combing pairwise structure alignments that would allow for insertion of gaps in the reference structure.« less

  1. DIDA: Distributed Indexing Dispatched Alignment

    PubMed Central

    Mohamadi, Hamid; Vandervalk, Benjamin P; Raymond, Anthony; Jackman, Shaun D; Chu, Justin; Breshears, Clay P; Birol, Inanc

    2015-01-01

    One essential application in bioinformatics that is affected by the high-throughput sequencing data deluge is the sequence alignment problem, where nucleotide or amino acid sequences are queried against targets to find regions of close similarity. When queries are too many and/or targets are too large, the alignment process becomes computationally challenging. This is usually addressed by preprocessing techniques, where the queries and/or targets are indexed for easy access while searching for matches. When the target is static, such as in an established reference genome, the cost of indexing is amortized by reusing the generated index. However, when the targets are non-static, such as contigs in the intermediate steps of a de novo assembly process, a new index must be computed for each run. To address such scalability problems, we present DIDA, a novel framework that distributes the indexing and alignment tasks into smaller subtasks over a cluster of compute nodes. It provides a workflow beyond the common practice of embarrassingly parallel implementations. DIDA is a cost-effective, scalable and modular framework for the sequence alignment problem in terms of memory usage and runtime. It can be employed in large-scale alignments to draft genomes and intermediate stages of de novo assembly runs. The DIDA source code, sample files and user manual are available through http://www.bcgsc.ca/platform/bioinfo/software/dida. The software is released under the British Columbia Cancer Agency License (BCCA), and is free for academic use. PMID:25923767

  2. Principal component analysis implementation in Java

    NASA Astrophysics Data System (ADS)

    Wójtowicz, Sebastian; Belka, Radosław; Sławiński, Tomasz; Parian, Mahnaz

    2015-09-01

    In this paper we show how PCA (Principal Component Analysis) method can be implemented using Java programming language. We consider using PCA algorithm especially in analysed data obtained from Raman spectroscopy measurements, but other applications of developed software should also be possible. Our goal is to create a general purpose PCA application, ready to run on every platform which is supported by Java.

  3. Sandia secure processor : a native Java processor.

    SciTech Connect

    Wickstrom, Gregory Lloyd; Gale, Jason Carl; Ma, Kwok Kee

    2003-08-01

    The Sandia Secure Processor (SSP) is a new native Java processor that has been specifically designed for embedded applications. The SSP's design is a system composed of a core Java processor that directly executes Java bytecodes, on-chip intelligent IO modules, and a suite of software tools for simulation and compiling executable binary files. The SSP is unique in that it provides a way to control real-time IO modules for embedded applications. The system software for the SSP is a 'class loader' that takes Java .class files (created with your favorite Java compiler), links them together, and compiles a binary. The complete SSP system provides very powerful functionality with very light hardware requirements with the potential to be used in a wide variety of small-system embedded applications. This paper gives a detail description of the Sandia Secure Processor and its unique features.

  4. Accelerator and transport line survey and alignment

    SciTech Connect

    Ruland, R.E.

    1991-10-01

    This paper summarizes the survey and alignment processes of accelerators and transport lines and discusses the propagation of errors associated with these processes. The major geodetic principles governing the survey and alignment measurement space are introduced and their relationship to a lattice coordinate system shown. The paper continues with a broad overview about the activities involved in the step sequence from initial absolute alignment to final smoothing. Emphasis is given to the relative alignment of components, in particular to the importance of incorporating methods to remove residual systematic effects in surveying and alignment operations. Various approaches to smoothing used at major laboratories are discussed. 47 refs., 19 figs., 1 tab.

  5. A begomovirus associated with Ageratum yellow vein disease in Indonesia: evidence for natural recombination between tomato leaf curl Java virus and Ageratum yellow vein virus-[Java].

    PubMed

    Kon, T; Kuwabara, K; Hidayat, S H; Ikegami, M

    2007-01-01

    A begomovirus (2747 nucleotides) and a satellite DNA beta component (1360 nucleotides) have been isolated from Ageratum conyzoides L. plants with yellow vein symptoms growing in Java, Indonesia. The begomovirus is most closely related to Tomato leaf curl Java virus (ToLCJV) (91 and 98% in the total nucleotide and coat protein amino acid sequences, respectively), although the products of ORFs C1 and C4 are more closely related to those of Ageratum yellow vein virus-[Java] (91 and 95% identity, respectively). For this reason, the begomovirus it is considered to be a strain of ToLCJV and is referred to as ToLCJV-Ageratum. The virus probably derives from a recombination event in which nucleotides 2389-2692 of ToLCJV have been replaced with the corresponding region of the AYVV-[Java] genome, which includes the 5' part of the intergenic region and the C1 and C4 ORFs. Infection of A. conyzoides with ToLCJV-Ageratum alone produced no symptoms, but co-infection with DNAbeta induced yellow vein symptoms. Symptoms induced in Nicotiana benthamiana by ToLCJV-Ageratum, ToLCJV and AYVV-[Java] are consistent with the exchange of pathogenicity determinant ORF C4 during recombination.

  6. BinAligner: a heuristic method to align biological networks

    PubMed Central

    2013-01-01

    The advances in high throughput omics technologies have made it possible to characterize molecular interactions within and across various species. Alignments and comparison of molecular networks across species will help detect orthologs and conserved functional modules and provide insights on the evolutionary relationships of the compared species. However, such analyses are not trivial due to the complexity of network and high computational cost. Here we develop a mixture of global and local algorithm, BinAligner, for network alignments. Based on the hypotheses that the similarity between two vertices across networks would be context dependent and that the information from the edges and the structures of subnetworks can be more informative than vertices alone, two scoring schema, 1-neighborhood subnetwork and graphlet, were introduced to derive the scoring matrices between networks, besides the commonly used scoring scheme from vertices. Then the alignment problem is formulated as an assignment problem, which is solved by the combinatorial optimization algorithm, such as the Hungarian method. The proposed algorithm was applied and validated in aligning the protein-protein interaction network of Kaposi's sarcoma associated herpesvirus (KSHV) and that of varicella zoster virus (VZV). Interestingly, we identified several putative functional orthologous proteins with similar functions but very low sequence similarity between the two viruses. For example, KSHV open reading frame 56 (ORF56) and VZV ORF55 are helicase-primase subunits with sequence identity 14.6%, and KSHV ORF75 and VZV ORF44 are tegument proteins with sequence identity 15.3%. These functional pairs can not be identified if one restricts the alignment into orthologous protein pairs. In addition, BinAligner identified a conserved pathway between two viruses, which consists of 7 orthologous protein pairs and these proteins are connected by conserved links. This pathway might be crucial for virus packing and

  7. Fine-tuning structural RNA alignments in the twilight zone

    PubMed Central

    2010-01-01

    Background A widely used method to find conserved secondary structure in RNA is to first construct a multiple sequence alignment, and then fold the alignment, optimizing a score based on thermodynamics and covariance. This method works best around 75% sequence similarity. However, in a "twilight zone" below 55% similarity, the sequence alignment tends to obscure the covariance signal used in the second phase. Therefore, while the overall shape of the consensus structure may still be found, the degree of conservation cannot be estimated reliably. Results Based on a combination of available methods, we present a method named planACstar for improving structure conservation in structural alignments in the twilight zone. After constructing a consensus structure by alignment folding, planACstar abandons the original sequence alignment, refolds the sequences individually, but consistent with the consensus, aligns the structures, irrespective of sequence, by a pure structure alignment method, and derives an improved sequence alignment from the alignment of structures, to be re-submitted to alignment folding, etc.. This circle may be iterated as long as structural conservation improves, but normally, one step suffices. Conclusions Employing the tools ClustalW, RNAalifold, and RNAforester, we find that for sequences with 30-55% sequence identity, structural conservation can be improved by 10% on average, with a large variation, measured in terms of RNAalifold's own criterion, the structure conservation index. PMID:20433706

  8. Editor's Highlight: Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): A Web-Based Tool for Addressing the Challenges of Cross-Species Extrapolation of Chemical Toxicity.

    PubMed

    LaLone, Carlie A; Villeneuve, Daniel L; Lyons, David; Helgen, Henry W; Robinson, Serina L; Swintek, Joseph A; Saari, Travis W; Ankley, Gerald T

    2016-10-01

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS; https://seqapass.epa.gov/seqapass/) application was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target while remaining amenable to variable degrees of protein characterization, in the context of available information about the chemical/protein interaction and the molecular target itself. To accommodate this flexibility in the analysis, 3 levels of evaluation were developed. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of orthologs); the second level evaluates sequence similarity within selected functional domains (eg, ligand-binding domain); and the third level of analysis compares individual amino acid residue positions of importance for protein conformation and/or interaction with the chemical upon binding. Each level of the SeqAPASS analysis provides additional evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further evaluation, selection of appropriate species for testing, extrapolation of empirical toxicity data, and/or assessment of the cross-species relevance of adverse outcome pathways. Three case studies are described herein to demonstrate application of the SeqAPASS tool: the first 2 focused on predictions of pollinator susceptibility to molt-accelerating compounds and neonicotinoid insecticides, and the third on evaluation of cross-species susceptibility to strobilurin fungicides. These analyses

  9. Editor's Highlight: Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS): A Web-Based Tool for Addressing the Challenges of Cross-Species Extrapolation of Chemical Toxicity.

    PubMed

    LaLone, Carlie A; Villeneuve, Daniel L; Lyons, David; Helgen, Henry W; Robinson, Serina L; Swintek, Joseph A; Saari, Travis W; Ankley, Gerald T

    2016-10-01

    Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS; https://seqapass.epa.gov/seqapass/) application was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target while remaining amenable to variable degrees of protein characterization, in the context of available information about the chemical/protein interaction and the molecular target itself. To accommodate this flexibility in the analysis, 3 levels of evaluation were developed. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of orthologs); the second level evaluates sequence similarity within selected functional domains (eg, ligand-binding domain); and the third level of analysis compares individual amino acid residue positions of importance for protein conformation and/or interaction with the chemical upon binding. Each level of the SeqAPASS analysis provides additional evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further evaluation, selection of appropriate species for testing, extrapolation of empirical toxicity data, and/or assessment of the cross-species relevance of adverse outcome pathways. Three case studies are described herein to demonstrate application of the SeqAPASS tool: the first 2 focused on predictions of pollinator susceptibility to molt-accelerating compounds and neonicotinoid insecticides, and the third on evaluation of cross-species susceptibility to strobilurin fungicides. These analyses

  10. Bringing Interactivity to the Web: The JAVA Solution.

    ERIC Educational Resources Information Center

    Knee, Richard H.; Cafolla, Ralph

    Java is an object-oriented programming language of the Internet. It's popularity lies in its ability to create interactive Web sites across platforms. The most common Java programs are applications and applets, which adhere to a set of conventions that lets them run within a Java-compatible browser. Java is becoming an essential subject matter and…

  11. Multiple alignment using hidden Markov models

    SciTech Connect

    Eddy, S.R.

    1995-12-31

    A simulated annealing method is described for training hidden Markov models and producing multiple sequence alignments from initially unaligned protein or DNA sequences. Simulated annealing in turn uses a dynamic programming algorithm for correctly sampling suboptimal multiple alignments according to their probability and a Boltzmann temperature factor. The quality of simulated annealing alignments is evaluated on structural alignments of ten different protein families, and compared to the performance of other HMM training methods and the ClustalW program. Simulated annealing is better able to find near-global optima in the multiple alignment probability landscape than the other tested HMM training methods. Neither ClustalW nor simulated annealing produce consistently better alignments compared to each other. Examination of the specific cases in which ClustalW outperforms simulated annealing, and vice versa, provides insight into the strengths and weaknesses of current hidden Maxkov model approaches.

  12. Java and its future in biomedical computing.

    PubMed Central

    Rodgers, R P

    1996-01-01

    Java, a new object-oriented computing language related to C++, is receiving considerable attention due to its use in creating network-sharable, platform-independent software modules (known as "applets") that can be used with the World Wide Web. The Web has rapidly become the most commonly used information-retrieval tool associated with the global computer network known as the Internet, and Java has the potential to further accelerate the Web's application to medical problems. Java's potentially wide acceptance due to its Web association and its own technical merits also suggests that it may become a popular language for non-Web-based, object-oriented computing. PMID:8880677

  13. Alignment algorithm for homology modeling and threading.

    PubMed Central

    Alexandrov, N. N.; Luethy, R.

    1998-01-01

    A DNA/protein sequence comparison is a popular computational tool for molecular biologists. Finding a good alignment implies an evolutionary and/or functional relationship between proteins or genomic loci. Sequential similarity between two proteins indicates their structural resemblance, providing a practical approach for structural modeling, when structure of one of these proteins is known. The first step in the homology modeling is a construction of an accurate sequence alignment. The commonly used alignment algorithms do not provide an adequate treatment of the structurally mismatched residues in locally dissimilar regions. We propose a simple modification of the existing alignment algorithm which treats these regions properly and demonstrate how this modification improves sequence alignments in real proteins. PMID:9521100

  14. Analysis of variables affecting unemployment rate and detecting for cluster in West Java, Central Java, and East Java in 2012

    NASA Astrophysics Data System (ADS)

    Samuel, Putra A.; Widyaningsih, Yekti; Lestari, Dian

    2016-02-01

    The objective of this study is modeling the Unemployment Rate (UR) in West Java, Central Java, and East Java, with rate of disease, infant mortality rate, educational level, population size, proportion of married people, and GDRP as the explanatory variables. Spatial factors are also considered in the modeling since the closer the distance, the higher the correlation. This study uses the secondary data from BPS (Badan Pusat Statistik). The data will be analyzed using Moran I test, to obtain the information about spatial dependence, and using Spatial Autoregressive modeling to obtain the information, which variables are significant affecting UR and how great the influence of the spatial factors. The result is, variables proportion of married people, rate of disease, and population size are related significantly to UR. In all three regions, the Hotspot of unemployed will also be detected districts/cities using Spatial Scan Statistics Method. The results are 22 districts/cities as a regional group with the highest unemployed (Most likely cluster) in the study area; 2 districts/cities as a regional group with the highest unemployed in West Java; 1 district/city as a regional groups with the highest unemployed in Central Java; 15 districts/cities as a regional group with the highest unemployed in East Java.

  15. Factor D of the alternative pathway of human complement. Purification, alignment and N-terminal amino acid sequences of the major cyanogen bromide fragments, and localization of the serine residue at the active site.

    PubMed Central

    Johnson, D M; Gagnon, J; Reid, K B

    1980-01-01

    The serine esterase factor D of the complement system was purified from outdated human plasma with a yield of 20% of the initial haemolytic activity found in serum. This represented an approx. 60 000-fold purification. The final product was homogeneous as judged by sodium dodecyl sulphate/polyacrylamide-gel electrophoresis (with an apparent mol.wt. of 24 000), its migration as a single component in a variety of fractionation procedures based on size and charge, and its N-terminal amino-acid-sequence analysis. The N-terminal amino acid sequence of the first 36 residues of the intact molecule was found to be homologous with the N-terminal amino acid sequences of the catalytic chains of other serine esterases. Factor D showed an especially strong homology (greater than 60% identity) with rat 'group-specific protease' [Woodbury, Katunuma, Kobayashi, Titani, & Neurath (1978) Biochemistry 17, 811-819] over the first 16 amino acid residues. This similarity is of interest since it is considered that both enzymes may be synthesized in their active, rather than zymogen, forms. The three major CNBr fragments of factor D, which had apparent mol.wts. of 15 800, 6600 and 1700, were purified and then aligned by N-terminal amino acid sequence analysis and amino acid analysis. By using factor D labelled with di-[1,3-14C]isopropylphosphofluoridate it was shown that the CNBr fragment of apparent mol.wt. 6600, which is located in the C-terminal region of factor D, contained the active serine residue. The amino acid sequence around this residue was determined. Images Fig. 1. Fig. 2. PMID:6821372

  16. Java Parallel Secure Stream for Grid Computing

    SciTech Connect

    Chen, Jie; Akers, Walter; Chen, Ying; Watson, William

    2001-09-01

    The emergence of high speed wide area networks makes grid computing a reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve the bandwidth and to reduce latency on a high speed wide area network. This paper presents a pure Java package called JPARSS (Java Par-allel Secure Stream) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a gird environment without the necessity of tuning the TCP window size. Several experimental results are provided to show that using parallel stream is more effective than tuning TCP window size. In addi-tion X.509 certificate based single sign-on mechanism and SSL based connection establishment are integrated into this package. Finally a few applications using this package will be discussed.

  17. Java PathFinder User Guide

    NASA Technical Reports Server (NTRS)

    Havelund, Klaus

    1999-01-01

    The JAVA PATHFINDER, JPF, is a translator from a subset of JAVA 1.0 to PROMELA, the programming language of the SPIN model checker. The purpose of JPF is to establish a framework for verification and debugging of JAVA programming based on model checking. The main goal is to automate program verification such that a programmer can apply it in the daily work without the need for a specialist to manually reformulate a program into a different notation in order to analyze the program. The system is especially suited for analyzing multi-threaded JAVA applications, where normal testing usually falls short. The system can find deadlocks and violations of boolean assertions stated by the programmer in a special assertion language. This document explains how to Use JPF.

  18. On comparing two structured RNA multiple alignments.

    PubMed

    Patel, Vandanaben; Wang, Jason T L; Setia, Shefali; Verma, Anurag; Warden, Charles D; Zhang, Kaizhong

    2010-12-01

    We present a method, called BlockMatch, for aligning two blocks, where a block is an RNA multiple sequence alignment with the consensus secondary structure of the alignment in Stockholm format. The method employs a quadratic-time dynamic programming algorithm for aligning columns and column pairs of the multiple alignments in the blocks. Unlike many other tools that can perform pairwise alignment of either single sequences or structures only, BlockMatch takes into account the characteristics of all the sequences in the blocks along with their consensus structures during the alignment process, thus being able to achieve a high-quality alignment result. We apply BlockMatch to phylogeny reconstruction on a set of 5S rRNA sequences taken from fifteen bacteria species. Experimental results showed that the phylogenetic tree generated by our method is more accurate than the tree constructed based on the widely used ClustalW tool. The BlockMatch algorithm is implemented into a web server, accessible at http://bioinformatics.njit.edu/blockmatch. A jar file of the program is also available for download from the web server. PMID:21121021

  19. On comparing two structured RNA multiple alignments.

    PubMed

    Patel, Vandanaben; Wang, Jason T L; Setia, Shefali; Verma, Anurag; Warden, Charles D; Zhang, Kaizhong

    2010-12-01

    We present a method, called BlockMatch, for aligning two blocks, where a block is an RNA multiple sequence alignment with the consensus secondary structure of the alignment in Stockholm format. The method employs a quadratic-time dynamic programming algorithm for aligning columns and column pairs of the multiple alignments in the blocks. Unlike many other tools that can perform pairwise alignment of either single sequences or structures only, BlockMatch takes into account the characteristics of all the sequences in the blocks along with their consensus structures during the alignment process, thus being able to achieve a high-quality alignment result. We apply BlockMatch to phylogeny reconstruction on a set of 5S rRNA sequences taken from fifteen bacteria species. Experimental results showed that the phylogenetic tree generated by our method is more accurate than the tree constructed based on the widely used ClustalW tool. The BlockMatch algorithm is implemented into a web server, accessible at http://bioinformatics.njit.edu/blockmatch. A jar file of the program is also available for download from the web server.

  20. Muria Volcano, Island of Java, Indonesia

    NASA Technical Reports Server (NTRS)

    1991-01-01

    This view of the north coast of central Java, Indonesia centers on the currently inactive Muria Volcano (6.5S, 111.0E). Muria is 5,330 ft. tall and lies just north of Java's main volcanic belt which runs east - west down the spine of the island attesting to the volcanic origin of the more than 1,500 Indonesian Islands.

  1. Multiparadigm communications in Java for Grid computing.

    SciTech Connect

    Getov, V.; von Laszewski, G.; Philippsen, M.; Foster, I.; Mathematics and Computer Science; Univ. of Westminster; Univ. of Karlsruhe

    2001-01-01

    In this article, we argue that the rapid development of Java technology now makes it possible to support, in a single object-oriented framework, the different communication and coordination structures that arise in scientific applications. We outline how this integrated approach can be achieved, reviewing in the process the state-of-the-art in communication paradigms within Java. We also present recent evaluation results indicating that this integrated approach can be achieved without compromising on performance.

  2. Generation of Java code from Alvis model

    NASA Astrophysics Data System (ADS)

    Matyasik, Piotr; Szpyrka, Marcin; Wypych, Michał

    2015-12-01

    Alvis is a formal language that combines graphical modelling of interconnections between system entities (called agents) and a high level programming language to describe behaviour of any individual agent. An Alvis model can be verified formally with model checking techniques applied to the model LTS graph that represents the model state space. This paper presents transformation of an Alvis model into executable Java code. Thus, the approach provides a method of automatic generation of a Java application from formally verified Alvis model.

  3. Java simulations of embedded control systems.

    PubMed

    Farias, Gonzalo; Cervin, Anton; Arzén, Karl-Erik; Dormido, Sebastián; Esquembre, Francisco

    2010-01-01

    This paper introduces a new Open Source Java library suited for the simulation of embedded control systems. The library is based on the ideas and architecture of TrueTime, a toolbox of Matlab devoted to this topic, and allows Java programmers to simulate the performance of control processes which run in a real time environment. Such simulations can improve considerably the learning and design of multitasking real-time systems. The choice of Java increases considerably the usability of our library, because many educators program already in this language. But also because the library can be easily used by Easy Java Simulations (EJS), a popular modeling and authoring tool that is increasingly used in the field of Control Education. EJS allows instructors, students, and researchers with less programming capabilities to create advanced interactive simulations in Java. The paper describes the ideas, implementation, and sample use of the new library both for pure Java programmers and for EJS users. The JTT library and some examples are online available on http://lab.dia.uned.es/jtt.

  4. Java simulations of embedded control systems.

    PubMed

    Farias, Gonzalo; Cervin, Anton; Arzén, Karl-Erik; Dormido, Sebastián; Esquembre, Francisco

    2010-01-01

    This paper introduces a new Open Source Java library suited for the simulation of embedded control systems. The library is based on the ideas and architecture of TrueTime, a toolbox of Matlab devoted to this topic, and allows Java programmers to simulate the performance of control processes which run in a real time environment. Such simulations can improve considerably the learning and design of multitasking real-time systems. The choice of Java increases considerably the usability of our library, because many educators program already in this language. But also because the library can be easily used by Easy Java Simulations (EJS), a popular modeling and authoring tool that is increasingly used in the field of Control Education. EJS allows instructors, students, and researchers with less programming capabilities to create advanced interactive simulations in Java. The paper describes the ideas, implementation, and sample use of the new library both for pure Java programmers and for EJS users. The JTT library and some examples are online available on http://lab.dia.uned.es/jtt. PMID:22163674

  5. Java Simulations of Embedded Control Systems

    PubMed Central

    Farias, Gonzalo; Cervin, Anton; Årzén, Karl-Erik; Dormido, Sebastián; Esquembre, Francisco

    2010-01-01

    This paper introduces a new Open Source Java library suited for the simulation of embedded control systems. The library is based on the ideas and architecture of TrueTime, a toolbox of Matlab devoted to this topic, and allows Java programmers to simulate the performance of control processes which run in a real time environment. Such simulations can improve considerably the learning and design of multitasking real-time systems. The choice of Java increases considerably the usability of our library, because many educators program already in this language. But also because the library can be easily used by Easy Java Simulations (EJS), a popular modeling and authoring tool that is increasingly used in the field of Control Education. EJS allows instructors, students, and researchers with less programming capabilities to create advanced interactive simulations in Java. The paper describes the ideas, implementation, and sample use of the new library both for pure Java programmers and for EJS users. The JTT library and some examples are online available on http://lab.dia.uned.es/jtt. PMID:22163674

  6. Analyses of the radiation of birnaviruses from diverse host phyla and of their evolutionary affinities with other double-stranded RNA and positive strand RNA viruses using robust structure-based multiple sequence alignments and advanced phylogenetic methods

    PubMed Central

    2013-01-01

    Background Birnaviruses form a distinct family of double-stranded RNA viruses infecting animals as different as vertebrates, mollusks, insects and rotifers. With such a wide host range, they constitute a good model for studying the adaptation to the host. Additionally, several lines of evidence link birnaviruses to positive strand RNA viruses and suggest that phylogenetic analyses may provide clues about transition. Results We characterized the genome of a birnavirus from the rotifer Branchionus plicalitis. We used X-ray structures of RNA-dependent RNA polymerases and capsid proteins to obtain multiple structure alignments that allowed us to obtain reliable multiple sequence alignments and we employed “advanced” phylogenetic methods to study the evolutionary relationships between some positive strand and double-stranded RNA viruses. We showed that the rotifer birnavirus genome exhibited an organization remarkably similar to other birnaviruses. As this host was phylogenetically very distant from the other known species targeted by birnaviruses, we revisited the evolutionary pathways within the Birnaviridae family using phylogenetic reconstruction methods. We also applied a number of phylogenetic approaches based on structurally conserved domains/regions of the capsid and RNA-dependent RNA polymerase proteins to study the evolutionary relationships between birnaviruses, other double-stranded RNA viruses and positive strand RNA viruses. Conclusions We show that there is a good correlation between the phylogeny of the birnaviruses and that of their hosts at the phylum level using the RNA-dependent RNA polymerase (genomic segment B) on the one hand and a concatenation of the capsid protein, protease and ribonucleoprotein (genomic segment A) on the other hand. This correlation tends to vanish within phyla. The use of advanced phylogenetic methods and robust structure-based multiple sequence alignments allowed us to obtain a more accurate picture (in terms of

  7. Java Expert System Shell Version 6.0

    SciTech Connect

    Friedman-Hill, Ernest

    2002-06-18

    Java Expert Shell System - Jess - is a rule engine and scripting environment written entirely in Sun's Java language, Jess was orginially inspired by the CLIPS expert system shell, but has grown int a complete, distinct JAVA-influenced environment of its own. Using Jess, you can build Java applets and applications that have the capacity to "reason" using knowledge you supply in the form of declarative rules. Jess is surprisingly fast, and for some problems is faster than CLIPS, in that many Jess scripts are valid CLIPS scripts and vice-versa. Like CLIPS, Jess uses the Rete algorithm to process rules, a very efficient mechanism for solving the difficult many-to-many matching problem. Jess adds many features to CLIPS, including backwards chaining and the ability to manipulate and directly reason about Java objects. Jess is also a powerful Java scripting environment, from which you can create Java objects and call Java methods without compiling any Java Code.

  8. Plague in Central Java, Indonesia

    PubMed Central

    Williams, J. E.; Hudson, B. W.; Turner, R. W.; Saroso, J. Sulianti; Cavanaugh, D. C.

    1980-01-01

    Plague in man occurred from 1968 to 1970 in mountain villages of the Boyolali Regency in Central Java. Infected fleas, infected rats, and seropositive rats were collected in villages with human plague cases. Subsequent isolations of Yersinia pestis and seropositive rodents, detected during investigations of rodent plague undertaken by the Government of Indonesia and the WHO, attested to the persistence of plague in the region from 1972 to 1974. Since 1968, the incidence of both rodent and human plague has been greatest from December to May at elevations over 1000 m. Isolations of Y. pestis were obtained from the fleas Xenopsylla cheopis and Stivalius cognatus and the rats Rattus rattus diardii and R. exulans ephippium. The major risk to man has been fleas infected with Y. pestis of unique electrophoretic phenotype. Infected fleas were collected most often in houses. Introduced in 1920, rodent plague had persisted in the Boyolali Regency for at least 54 years. The recent data support specific requirements for continued plague surveillance. ImagesFig. 2 PMID:6968252

  9. Hyper-Threaded Java: Use the Java Concurrency API to Speed Up Time-Consuming Tasks

    SciTech Connect

    Scarberry, Randy

    2006-11-21

    This is for a Java World article that was already published on Nov 21, 2006. When I originally submitted the draft, Java World wasn't in the available lists of publications. Now that it is, Hanford Library staff recommended that I resubmit so it would be counted. Original submission ID: PNNL-SA-52490

  10. Phylogenetic Inference From Conserved sites Alignments

    SciTech Connect

    grundy, W.N.; Naylor, G.J.P.

    1999-08-15

    Molecular sequences provide a rich source of data for inferring the phylogenetic relationships among species. However, recent work indicates that even an accurate multiple alignment of a large sequence set may yield an incorrect phylogeny and that the quality of the phylogenetic tree improves when the input consists only of the highly conserved, motif regions of the alignment. This work introduces two methods of producing multiple alignments that include only the conserved regions of the initial alignment. The first method retains conserved motifs, whereas the second retains individual conserved sites in the initial alignment. Using parsimony analysis on a mitochondrial data set containing 19 species among which the phylogenetic relationships are widely accepted, both conserved alignment methods produce better phylogenetic trees than the complete alignment. Unlike any of the 19 inference methods used before to analyze this data, both methods produce trees that are completely consistent with the known phylogeny. The motif-based method employs far fewer alignment sites for comparable error rates. For a larger data set containing mitochondrial sequences from 39 species, the site-based method produces a phylogenetic tree that is largely consistent with known phylogenetic relationships and suggests several novel placements.

  11. Optimal Network Alignment with Graphlet Degree Vectors

    PubMed Central

    Milenković, Tijana; Ng, Weng Leong; Hayes, Wayne; Pržulj, Nataša

    2010-01-01

    Important biological information is encoded in the topology of biological networks. Comparative analyses of biological networks are proving to be valuable, as they can lead to transfer of knowledge between species and give deeper insights into biological function, disease, and evolution. We introduce a new method that uses the Hungarian algorithm to produce optimal global alignment between two networks using any cost function. We design a cost function based solely on network topology and use it in our network alignment. Our method can be applied to any two networks, not just biological ones, since it is based only on network topology. We use our new method to align protein-protein interaction networks of two eukaryotic species and demonstrate that our alignment exposes large and topologically complex regions of network similarity. At the same time, our alignment is biologically valid, since many of the aligned protein pairs perform the same biological function. From the alignment, we predict function of yet unannotated proteins, many of which we validate in the literature. Also, we apply our method to find topological similarities between metabolic networks of different species and build phylogenetic trees based on our network alignment score. The phylogenetic trees obtained in this way bear a striking resemblance to the ones obtained by sequence alignments. Our method detects topologically similar regions in large networks that are statistically significant. It does this independent of protein sequence or any other information external to network topology. PMID:20628593

  12. Computing posterior probabilities for score-based alignments using ppALIGN.

    PubMed

    Wolfsheimer, Stefan; Hartmann, Alexander; Rabus, Ralf; Nuel, Gregory

    2012-01-01

    Score-based pairwise alignments are widely used in bioinformatics in particular with molecular database search tools, such as the BLAST family. Due to sophisticated heuristics, such algorithms are usually fast but the underlying scoring model unfortunately lacks a statistical description of the reliability of the reported alignments. In particular, close to gaps, in low-score or low-complexity regions, a huge number of alternative alignments arise which results in a decrease of the certainty of the alignment. ppALIGN is a software package that uses hidden Markov Model techniques to compute position-wise reliability of score-based pairwise alignments of DNA or protein sequences. The design of the model allows for a direct connection between the scoring function and the parameters of the probabilistic model. For this reason it is suitable to analyze the outcomes of popular score based aligners and search tools without having to choose a complicated set of parameters. By contrast, our program only requires the classical score parameters (the scoring function and gap costs). The package comes along with a library written in C++, a standalone program for user defined alignments (ppALIGN) and another program (ppBLAST) which can process a complete result set of BLAST. The main algorithms essentially exhibit a linear time complexity (in the alignment lengths), and they are hence suitable for on-line computations. We have also included alternative decoding algorithms to provide alternative alignments. ppALIGN is a fast program/library that helps detect and quantify questionable regions in pairwise alignments. Due to its structure, the input/output interface it can to be connected to other post-processing tools. Empirically, we illustrate its usefulness in terms of correctly predicted reliable regions for sequences generated using the ROSE model for sequence evolution, and identify sensor-specific regions in the denitrifying betaproteobacterium Aromatoleum aromaticum. PMID

  13. The design and performance of MedJava. Experience of developing performance-sensitive distributed applications with Java

    NASA Astrophysics Data System (ADS)

    Jain, Prashant; Widoff, Seth; Schmidt, Douglas C.

    1998-12-01

    The Java programming language has gained substantial popularity in the past two years. Java's networking features, along with the growing number of Web browsers that execute Java applets, facilitate Internet programming. Despite the popularity of Java, however, there are many concerns about its efficiency. In particular, networking and computation performance are key concerns when considering the use of Java to develop performance-sensitive distributed applications. This paper makes three contributions to the study of Java for performance-sensitive distributed applications. First, we describe an architecture using Java and the Web to develop MedJava, which is a distributed electronic medical imaging system with stringent networking and computation requirements. Second, we present benchmarks of MedJava image processing and compare the results with the performance of xv, which is an equivalent image processing application written in C. Finally, we present performance benchmarks using Java as a transport interface to exchange large medical images over high-speed ATM networks. For computationally-intensive algorithms like image filtering, Java code that is optimized both manually and with JIT compilers can sometimes compensate for the lack of compile-time optimizations and yield a performance commensurate with equivalently compiled C code. With rigorous compile-time optimizations, however, C compilers still generally generate more efficient code. The advent of highly optimizing Java compilers should make it feasible to use Java for performance-sensitive distributed applications where C and C++ are currently used.

  14. Instrumentation of Java Bytecode for Runtime Analysis

    NASA Technical Reports Server (NTRS)

    Goldberg, Allen; Haveland, Klaus

    2003-01-01

    This paper describes JSpy, a system for high-level instrumentation of Java bytecode and its use with JPaX, OUT system for runtime analysis of Java programs. JPaX monitors the execution of temporal logic formulas and performs predicative analysis of deadlocks and data races. JSpy s input is an instrumentation specification, which consists of a collection of rules, where a rule is a predicate/action pair The predicate is a conjunction of syntactic constraints on a Java statement, and the action is a description of logging information to be inserted in the bytecode corresponding to the statement. JSpy is built using JTrek an instrumentation package at a lower level of abstraction.

  15. Multiple Whole Genome Alignments Without a Reference Organism

    SciTech Connect

    Dubchak, Inna; Poliakov, Alexander; Kislyuk, Andrey; Brudno, Michael

    2009-01-16

    Multiple sequence alignments have become one of the most commonly used resources in genomics research. Most algorithms for multiple alignment of whole genomes rely either on a reference genome, against which all of the other sequences are laid out, or require a one-to-one mapping between the nucleotides of the genomes, preventing the alignment of recently duplicated regions. Both approaches have drawbacks for whole-genome comparisons. In this paper we present a novel symmetric alignment algorithm. The resulting alignments not only represent all of the genomes equally well, but also include all relevant duplications that occurred since the divergence from the last common ancestor. Our algorithm, implemented as a part of the VISTA Genome Pipeline (VGP), was used to align seven vertebrate and sixDrosophila genomes. The resulting whole-genome alignments demonstrate a higher sensitivity and specificity than the pairwise alignments previously available through the VGP and have higher exon alignment accuracy than comparable public whole-genome alignments. Of the multiple alignment methods tested, ours performed the best at aligning genes from multigene families?perhaps the most challenging test for whole-genome alignments. Our whole-genome multiple alignments are available through the VISTA Browser at http://genome.lbl.gov/vista/index.shtml.

  16. An integrated SNP mining and utilization (ISMU) pipeline for next generation sequencing data.

    PubMed

    Azam, Sarwar; Rathore, Abhishek; Shah, Trushar M; Telluri, Mohan; Amindala, BhanuPrakash; Ruperao, Pradeep; Katta, Mohan A V S K; Varshney, Rajeev K

    2014-01-01

    Open source single nucleotide polymorphism (SNP) discovery pipelines for next generation sequencing data commonly requires working knowledge of command line interface, massive computational resources and expertise which is a daunting task for biologists. Further, the SNP information generated may not be readily used for downstream processes such as genotyping. Hence, a comprehensive pipeline has been developed by integrating several open source next generation sequencing (NGS) tools along with a graphical user interface called Integrated SNP Mining and Utilization (ISMU) for SNP discovery and their utilization by developing genotyping assays. The pipeline features functionalities such as pre-processing of raw data, integration of open source alignment tools (Bowtie2, BWA, Maq, NovoAlign and SOAP2), SNP prediction (SAMtools/SOAPsnp/CNS2snp and CbCC) methods and interfaces for developing genotyping assays. The pipeline outputs a list of high quality SNPs between all pairwise combinations of genotypes analyzed, in addition to the reference genome/sequence. Visualization tools (Tablet and Flapjack) integrated into the pipeline enable inspection of the alignment and errors, if any. The pipeline also provides a confidence score or polymorphism information content value with flanking sequences for identified SNPs in standard format required for developing marker genotyping (KASP and Golden Gate) assays. The pipeline enables users to process a range of NGS datasets such as whole genome re-sequencing, restriction site associated DNA sequencing and transcriptome sequencing data at a fast speed. The pipeline is very useful for plant genetics and breeding community with no computational expertise in order to discover SNPs and utilize in genomics, genetics and breeding studies. The pipeline has been parallelized to process huge datasets of next generation sequencing. It has been developed in Java language and is available at http://hpc.icrisat.cgiar.org/ISMU as a standalone

  17. A Java Applet for Illustrating Internet Error Control

    ERIC Educational Resources Information Center

    Holliday, Mark A.

    2004-01-01

    This paper discusses the author's experiences developing a Java applet that illustrates how error control is implemented in the Transmission Control Protocol (TCP). One section discusses the concepts which the TCP error control Java applet is intended to convey, while the nature of the Java applet is covered in another section. The author…

  18. Tectonic Control of Piercement Structures in Central Java, Indonesia

    NASA Astrophysics Data System (ADS)

    Mazzini, A.; Hadi, S.; Etiope, G.; Inguaggiato, S.

    2014-12-01

    A recent field expedition in Central Java targeted the mapping and sampling of several piercements structures in central Java (Indonesia), most of which have never been documented before. Here, at least seven structures erupting mud water and gas are distributed along a NE-SW alignment that extends for about 10 kilometers. Some of the mapped structures (Bledug Kuwu, Bledug Cangkring Krabagan, Mendikil, Banjarsari, Krewek) have been named after the neighboring local village. None of these have obvious elevation despite the vigorous emission of gas and mud, suggesting that significant caldera collapse is ongoing. Among the most relevant: Bledug Kuwu is certainly the most impressive structure with three main eruption sites in the crater area bursting more than 5 m large hot mud bubbles. Similar characteristics are present at the smaller (200 m in diameter) Bledug Cangkring Krabagan, that is also surrounded by numerous pools and gryphons seeping around the main crater. The smaller sized Mendikil is the only visited structure that, at the moment of the sampling, did not show seepage of hot fluids. Banjarsari and Krewek (up to 200 m wide) are characterized by scattered hot water-dominated pools where gas is vented vigorously. In particular the hot pools are systematically covered by travertine concretions. Water and gas geochemisty confirms the seepage of CO2 dominated gas and water with hydrothermal signature. The investigated structures appear to follow an obvious NE-SW oriented lineament that most likely coincides with a tectonic structure (fault?) that controls their location. Indeed the field observations and the analyses suggest that likely scenario is that this fault (?) acts as a preferential pathway for the expulsion of hydrothermal fluids to the surface. Very little is known about this region, neither is known why several of these structures erupt hot mud despite their significant distance from the two closest volcanic structures (i.e. Mt. Muria 60 km to the NW

  19. Local Structural Alignment of RNA with Affine Gap Model

    NASA Astrophysics Data System (ADS)

    Wong, Thomas K. F.; Cheung, Brenda W. Y.; Lam, T. W.; Yiu, S. M.

    Predicting new non-coding RNAs (ncRNAs) of a family can be done by aligning the potential candidate with a member of the family with known sequence and secondary structure. Existing tools either only consider the sequence similarity or cannot handle local alignment with gaps. In this paper, we consider the problem of finding the optimal local structural alignment between a query RNA sequence (with known secondary structure) and a target sequence (with unknown secondary structure) with the affine gap penalty model. We provide the algorithm to solve the problem. Based on a preliminary experiment, we show that there are ncRNA families in which considering local structural alignment with gap penalty model can identify real hits more effectively than using global alignment or local alignment without gap penalty model.

  20. R3D Align: global pairwise alignment of RNA 3D structures using local superpositions

    PubMed Central

    Rahrig, Ryan R.; Leontis, Neocles B.; Zirbel, Craig L.

    2010-01-01

    Motivation: Comparing 3D structures of homologous RNA molecules yields information about sequence and structural variability. To compare large RNA 3D structures, accurate automatic comparison tools are needed. In this article, we introduce a new algorithm and web server to align large homologous RNA structures nucleotide by nucleotide using local superpositions that accommodate the flexibility of RNA molecules. Local alignments are merged to form a global alignment by employing a maximum clique algorithm on a specially defined graph that we call the ‘local alignment’ graph. Results: The algorithm is implemented in a program suite and web server called ‘R3D Align’. The R3D Align alignment of homologous 3D structures of 5S, 16S and 23S rRNA was compared to a high-quality hand alignment. A full comparison of the 16S alignment with the other state-of-the-art methods is also provided. The R3D Align program suite includes new diagnostic tools for the structural evaluation of RNA alignments. The R3D Align alignments were compared to those produced by other programs and were found to be the most accurate, in comparison with a high quality hand-crafted alignment and in conjunction with a series of other diagnostics presented. The number of aligned base pairs as well as measures of geometric similarity are used to evaluate the accuracy of the alignments. Availability: R3D Align is freely available through a web server http://rna.bgsu.edu/R3DAlign. The MATLAB source code of the program suite is also freely available for download at that location. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: r-rahrig@onu.edu PMID:20929913

  1. Dynamic Frames in Java Dynamic Logic

    NASA Astrophysics Data System (ADS)

    Schmitt, Peter H.; Ulbrich, Mattias; Weiß, Benjamin

    In this paper we present a realisation of the concept of dynamic frames in a dynamic logic for verifying Java programs. This is achieved by treating sets of heap locations as first class citizens in the logic. Syntax and formal semantics of the logic are presented, along with sound proof rules for modularly reasoning about method calls and heap dependent symbols using specification contracts.

  2. Rickettsia felis in Xenopsylla cheopis, Java, Indonesia

    PubMed Central

    Jiang, Ju; Soeatmadji, Djoko W.; Henry, Katherine M.; Ratiwayanto, Sutanti; Bangs, Michael J.

    2006-01-01

    Rickettsia typhi and R. felis, etiologic agents of murine typhus and fleaborne spotted fever, respectively, were detected in Oriental rat fleas (Xenopsylla cheopis) collected from rodents and shrews in Java, Indonesia. We describe the first evidence of R. felis in Indonesia and naturally occurring R. felis in Oriental rat fleas. PMID:16965716

  3. Interactive Economics Instruction with Java and CGI.

    ERIC Educational Resources Information Center

    Gerdes, Geoffrey R.

    2000-01-01

    States that this Web site is based on the conviction that Web-based materials must contain interactive modules to achieve value beyond that obtained by conventional media. Discusses three applets that can be reached at the homepage of the Web site by selecting the Java applets link. (CMK)

  4. Lisp as an Alternative to Java

    NASA Technical Reports Server (NTRS)

    Gat, E.

    2000-01-01

    In a recent study, Prechelt compared the relative performance of Java and C++ in terms of execution time and memory utilization. Unlike many benchmark studies, Prechelt compared mulitple implementations of the same task by multiple programmers in order to control for the effects of difference in programmer skill.

  5. JAVA CLASSES FOR NONPROCEDURAL VARIOGRAM MONITORING

    EPA Science Inventory

    A set of Java classes was written for variogram modeling to support research for US EPA's Regional Vulnerability Assessment Program (ReVA). The modeling objectives of this research program are to use conceptual programming tools for numerical analysis for regional risk assessm...

  6. Modular VO oriented Java EE service deployer

    NASA Astrophysics Data System (ADS)

    Molinaro, Marco; Cepparo, Francesco; De Marco, Marco; Knapic, Cristina; Apollo, Pietro; Smareglia, Riccardo

    2014-07-01

    The International Virtual Observatory Alliance (IVOA) has produced many standards and recommendations whose aim is to generate an architecture that starts from astrophysical resources, in a general sense, and ends up in deployed consumable services (that are themselves astrophysical resources). Focusing on the Data Access Layer (DAL) system architecture, that these standards define, in the last years a web based application has been developed and maintained at INAF-OATs IA2 (Italian National institute for Astrophysics - Astronomical Observatory of Trieste, Italian center of Astronomical Archives) to try to deploy and manage multiple VO (Virtual Observatory) services in a uniform way: VO-Dance. However a set of criticalities have arisen since when the VO-Dance idea has been produced, plus some major changes underwent and are undergoing at the IVOA DAL layer (and related standards): this urged IA2 to identify a new solution for its own service layer. Keeping on the basic ideas from VO-Dance (simple service configuration, service instantiation at call time and modularity) while switching to different software technologies (e.g. dismissing Java Reflection in favour of Enterprise Java Bean, EJB, based solution), the new solution has been sketched out and tested for feasibility. Here we present the results originating from this test study. The main constraints for this new project come from various fields. A better homogenized solution rising from IVOA DAL standards: for example the new DALI (Data Access Layer Interface) specification that acts as a common interface system for previous and oncoming access protocols. The need for a modular system where each component is based upon a single VO specification allowing services to rely on common capabilities instead of homogenizing them inside service components directly. The search for a scalable system that takes advantage from distributed systems. The constraints find answer in the adopted solutions hereafter sketched. The

  7. GATA: A graphic alignment tool for comparative sequenceanalysis

    SciTech Connect

    Nix, David A.; Eisen, Michael B.

    2005-01-01

    Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dot plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.

  8. An algorithm for linear metabolic pathway alignment.

    PubMed

    Chen, Ming; Hofestaedt, Ralf

    2005-01-01

    Metabolic pathway alignment represents one of the most powerful tools for comparative analysis of metabolism. It involves recognition of metabolites common to a set of functionally-related metabolic pathways, interpretation of biological evolution processes and determination of alternative metabolic pathways. Moreover, it is of assistance in function prediction and metabolism modeling. Although research on genomic sequence alignment is extensive, the problem of aligning metabolic pathways has received less attention. We are motivated to develop an algorithm of metabolic pathway alignment to reveal the similarities between metabolic pathways. A new definition of the metabolic pathway is introduced. The algorithm has been implemented into the PathAligner system; its web-based interface is available at http://bibiserv.techfak.uni-bielefeld.de/pathaligner/.

  9. A Telemetry Browser Built with Java Components

    NASA Astrophysics Data System (ADS)

    Poupart, E.

    In the context of CNES balloon scientific campaigns and telemetry survey field, a generic telemetry processing product, called TelemetryBrowser in the following, was developed reusing COTS, Java Components for most of them. Connection between those components relies on a software architecture based on parameter producers and parameter consumers. The first one transmit parameter values to the second one which has registered to it. All of those producers and consumers can be spread over the network thanks to Corba, and over every kind of workstation thanks to Java. This gives a very powerful mean to adapt to constraints like network bandwidth, or workstations processing or memory. It's also very useful to display and correlate at the same time information coming from multiple and various sources. An important point of this architecture is that the coupling between parameter producers and parameter consumers is reduced to the minimum and that transmission of information on the network is made asynchronously. So, if a parameter consumer goes down or runs slowly, there is no consequence on the other consumers, because producers don't wait for their consumers to finish their data processing before sending it to other consumers. An other interesting point is that parameter producers, also called TelemetryServers in the following are generated nearly automatically starting from a telemetry description using Flavori component. Keywords Java components, Corba, distributed application, OpenORBii, software reuse, COTS, Internet, Flavor. i Flavor (Formal Language for Audio-Visual Object Representation) is an object-oriented media representation language being developed at Columbia University. It is designed as an extension of Java and C++ and simplifies the development of applications that involve a significant media processing component (encoding, decoding, editing, manipulation, etc.) by providing bitstream representation semantics. (flavor.sourceforge.net) ii Open

  10. The complete mitochondrial genome of Java warty pig (Sus verrucosus).

    PubMed

    Fan, Jie; Li, Chun-Hong; Shi, Wei

    2015-06-01

    In the present study, the complete mitochondrial genome sequence of the Java warty pig was reported for the first time. The total length of the mitogenome was 16,479 bp. It contained the typical structure, including 2 ribosomal RNA genes, 13 protein-coding genes, 22 transfer RNA genes and 1 non-coding control region (D-loop region) as that of most other pigs. The overall composition of the mitogenome was estimated to be 34.9% for A, 26.1% for T, 26.0% for C and 13.0% for G showing an A-T (61.0%)-rich feature. The mitochondrial genome analyzed here will provide new genetic resource to uncover pigs' evolution.

  11. Interplate coupling along the Java trench from CGPS observation

    NASA Astrophysics Data System (ADS)

    Meilano, I.; Kuncoro, H.; Susilo, S.; Efendi, J.; Abidin, H. Z.; Nugraha, A. D.; Widiyantoro, S.

    2014-12-01

    Interplate seismogenic zones along the Java trench were estimated by using continuous GPS observation from South of Lampung in the west to Lombok Island in the east. The observation period starting from 2010 to 2013 with more than 60 CGPS observation stations. The GPS analysis indicates that present-day deformation of Java Island is controlled by rotation of Sunda land, extension in the southern Strait of Sunda, postseismic deformation of the 2006 earthquake and the coupling between Indo-Australian plate and Sunda land. Strain rate solutions indicate compression in the south of Java Island. Using elastic dislocation theory the estimated interplate seismozonic coupling in the Java trench is about 50 percent in the sunda strait, smaller in the south west java and become larger to the east. Slip deficit on subduction interface has important implication for seismic hazard of Java Island. Keywords: CGPS observation, Interplate Seismogenic, Java Trench

  12. Annotating RNA motifs in sequences and alignments

    PubMed Central

    Gardner, Paul P.; Eldai, Hisham

    2015-01-01

    RNA performs a diverse array of important functions across all cellular life. These functions include important roles in translation, building translational machinery and maturing messenger RNA. More recent discoveries include the miRNAs and bacterial sRNAs that regulate gene expression, the thermosensors, riboswitches and other cis-regulatory elements that help prokaryotes sense their environment and eukaryotic piRNAs that suppress transposition. However, there can be a long period between the initial discovery of a RNA and determining its function. We present a bioinformatic approach to characterize RNA motifs, which are critical components of many RNA structure–function relationships. These motifs can, in some instances, provide researchers with functional hypotheses for uncharacterized RNAs. Moreover, we introduce a new profile-based database of RNA motifs—RMfam—and illustrate some applications for investigating the evolution and functional characterization of RNA. All the data and scripts associated with this work are available from: https://github.com/ppgardne/RMfam. PMID:25520192

  13. JavaGenes and Condor: Cycle-Scavenging Genetic Algorithms

    NASA Technical Reports Server (NTRS)

    Globus, Al; Langhirt, Eric; Livny, Miron; Ramamurthy, Ravishankar; Soloman, Marvin; Traugott, Steve

    2000-01-01

    A genetic algorithm code, JavaGenes, was written in Java and used to evolve pharmaceutical drug molecules and digital circuits. JavaGenes was run under the Condor cycle-scavenging batch system managing 100-170 desktop SGI workstations. Genetic algorithms mimic biological evolution by evolving solutions to problems using crossover and mutation. While most genetic algorithms evolve strings or trees, JavaGenes evolves graphs representing (currently) molecules and circuits. Java was chosen as the implementation language because the genetic algorithm requires random splitting and recombining of graphs, a complex data structure manipulation with ample opportunities for memory leaks, loose pointers, out-of-bound indices, and other hard to find bugs. Java garbage-collection memory management, lack of pointer arithmetic, and array-bounds index checking prevents these bugs from occurring, substantially reducing development time. While a run-time performance penalty must be paid, the only unacceptable performance we encountered was using standard Java serialization to checkpoint and restart the code. This was fixed by a two-day implementation of custom checkpointing. JavaGenes is minimally integrated with Condor; in other words, JavaGenes must do its own checkpointing and I/O redirection. A prototype Java-aware version of Condor was developed using standard Java serialization for checkpointing. For the prototype to be useful, standard Java serialization must be significantly optimized. JavaGenes is approximately 8700 lines of code and a few thousand JavaGenes jobs have been run. Most jobs ran for a few days. Results include proof that genetic algorithms can evolve directed and undirected graphs, development of a novel crossover operator for graphs, a paper in the journal Nanotechnology, and another paper in preparation.

  14. Girder Alignment Plan

    SciTech Connect

    Wolf, Zackary; Ruland, Robert; LeCocq, Catherine; Lundahl, Eric; Levashov, Yurii; Reese, Ed; Rago, Carl; Poling, Ben; Schafer, Donald; Nuhn, Heinz-Dieter; Wienands, Uli; /SLAC

    2010-11-18

    The girders for the LCLS undulator system contain components which must be aligned with high accuracy relative to each other. The alignment is one of the last steps before the girders go into the tunnel, so the alignment must be done efficiently, on a tight schedule. This note documents the alignment plan which includes efficiency and high accuracy. The motivation for girder alignment involves the following considerations. Using beam based alignment, the girder position will be adjusted until the beam goes through the center of the quadrupole and beam finder wire. For the machine to work properly, the undulator axis must be on this line and the center of the undulator beam pipe must be on this line. The physics reasons for the undulator axis and undulator beam pipe axis to be centered on the beam are different, but the alignment tolerance for both are similar. In addition, the beam position monitor must be centered on the beam to preserve its calibration. Thus, the undulator, undulator beam pipe, quadrupole, beam finder wire, and beam position monitor axes must all be aligned to a common line. All relative alignments are equally important, not just, for example, between quadrupole and undulator. We begin by making the common axis the nominal beam axis in the girder coordinate system. All components will be initially aligned to this axis. A more accurate alignment will then position the components relative to each other, without incorporating the girder itself.

  15. Interstellar Dust Grain Alignment

    NASA Astrophysics Data System (ADS)

    Andersson, B.-G.; Lazarian, A.; Vaillancourt, John E.

    2015-08-01

    Interstellar polarization at optical-to-infrared wavelengths is known to arise from asymmetric dust grains aligned with the magnetic field. This effect provides a potentially powerful probe of magnetic field structure and strength if the details of the grain alignment can be reliably understood. Theory and observations have recently converged on a quantitative, predictive description of interstellar grain alignment based on radiative processes. The development of a general, analytical model for this radiative alignment torque (RAT) theory has allowed specific, testable predictions for realistic interstellar conditions. We outline the theoretical and observational arguments in favor of RAT alignment, as well as reasons the "classical" paramagnetic alignment mechanism is unlikely to work, except possibly for the very smallest grains. With further detailed characterization of the RAT mechanism, grain alignment and polarimetry promise to not only better constrain the interstellar magnetic field but also provide new information on the dust characteristics.

  16. Methods for comparing a DNA sequence with a protein sequence.

    PubMed

    Huang, X; Zhang, J

    1996-12-01

    We describe two methods for constructing an optimal global alignment of, and an optimal local alignment between, a DNA sequence and a protein sequence. The alignment model of the methods addresses the problems of frameshifts and introns in the DNA sequence. The methods require computer memory proportional to the sequence lengths, so they can rigorously process very huge sequences. The simplified versions of the methods were implemented as computer programs named NAP and LAP. The experimental results demonstrate that the programs are sensitive and powerful tools for finding genes by DNA-protein sequence homology.

  17. Genome alignment with graph data structures: a comparison

    PubMed Central

    2014-01-01

    Background Recent advances in rapid, low-cost sequencing have opened up the opportunity to study complete genome sequences. The computational approach of multiple genome alignment allows investigation of evolutionarily related genomes in an integrated fashion, providing a basis for downstream analyses such as rearrangement studies and phylogenetic inference. Graphs have proven to be a powerful tool for coping with the complexity of genome-scale sequence alignments. The potential of graphs to intuitively represent all aspects of genome alignments led to the development of graph-based approaches for genome alignment. These approaches construct a graph from a set of local alignments, and derive a genome alignment through identification and removal of graph substructures that indicate errors in the alignment. Results We compare the structures of commonly used graphs in terms of their abilities to represent alignment information. We describe how the graphs can be transformed into each other, and identify and classify graph substructures common to one or more graphs. Based on previous approaches, we compile a list of modifications that remove these substructures. Conclusion We show that crucial pieces of alignment information, associated with inversions and duplications, are not visible in the structure of all graphs. If we neglect vertex or edge labels, the graphs differ in their information content. Still, many ideas are shared among all graph-based approaches. Based on these findings, we outline a conceptual framework for graph-based genome alignment that can assist in the development of future genome alignment tools. PMID:24712884

  18. Alignment of multiple proteins with an ensemble of Hidden Markov Models

    PubMed Central

    Song, Yinglei; Qu, Junfeng; Hura, Gurdeep S.

    2011-01-01

    In this paper, we developed a new method that progressively construct and update a set of alignments by adding sequences in certain order to each of the existing alignments. Each of the existing alignments is modelled with a profile Hidden Markov Model (HMM) and an added sequence is aligned to each of these profile HMMs. We introduced an integer parameter for the number of profile HMMs. The profile HMMs are then updated based on the alignments with leading scores. Our experiments on BaliBASE showed that our approach could efficiently explore the alignment space and significantly improve the alignment accuracy. PMID:20376922

  19. A Reconfigurable Processor Infrastructure for Accelerating Java Applications

    NASA Astrophysics Data System (ADS)

    Han, Youngsun; Hwang, Seok Joong; Kim, Seon Wook

    In this paper, we present a reconfigurable processor infrastructure to accelerate Java applications, called Jaguar. The Jaguar infrastructure consists of a compiler framework and a runtime environment support. The compiler framework selects a group of Java methods to be translated into hardware for delivering the best performance under limited resources, and translates the selected Java methods into Verilog synthesizable code modules. The runtime environment support includes the Java virtual machine (JVM) running on a host processor to provide Java execution environment to the generated Java accelerator through communication interface units while preserving Java semantics. Our compiler infrastructure is a tightly integrated and solid compiler-aided solution for Java reconfigurable computing. There is no limitation in generating synthesizable Verilog modules from any Java application while preserving Java semantics. In terms of performance, our infrastructure achieves the speedup by 5.4 times on average and by up to 9.4 times in measured benchmarks with respect to JVM-only execution. Furthermore, two optimization schemes such as an instruction folding and a live buffer removal can reduce 24% on average and up to 39% of the resource consumption.

  20. Implementation of a Parallel Protein Structure Alignment Service on Cloud

    PubMed Central

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842

  1. Implementation of a parallel protein structure alignment service on cloud.

    PubMed

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842

  2. Software tool for the analysis and visualization of whole genome alignments

    2011-08-01

    GenomeVISTA is a tool which performs and displays pairwise and multiple whole genome DNA alignments. The tools provides a graphical user interface by which users can navigate alignments and multiple levels of resolution and get imformation about individual aligned regions. Users can load their own sequences into GenomeVISTA or view pre-computed alignments for genomes in the VISTA database.

  3. The state of the Java universe

    SciTech Connect

    2011-02-08

    Speaker Bio: James Gosling received a B.Sc. in computer science from the University of Calgary, Canada in 1977. He received a Ph.D. in computer science from Carnegie-Mellon University in 1983. The title of his thesis was The Algebraic Manipulation of Constraints. He has built satellite data acquisition systems, a multiprocessor version of UNIX®, several compilers, mail systems, and window managers. He has also built a WYSIWYG text editor, a constraint-based drawing editor, and a text editor called Emacs, for UNIX systems. At Sun his early activity was as lead engineer of the NeWS window system. He did the original design of the Java programming language and implemented its original compiler and virtual machine. He has recently been a contributor to the Real-Time Specification for Java.

  4. Safe Commits for Transactional Featherweight Java

    NASA Astrophysics Data System (ADS)

    Thuong Tran, Thi Mai; Steffen, Martin

    Transactions are a high-level alternative for low-level concurrency-control mechanisms such as locks, semaphores, monitors. A recent proposal for integrating transactional features into programming languages is Transactional Featherweight Java (TFJ), extending Featherweight Java by adding transactions. With support for nested and multi-threaded transactions, its transactional model is rather expressive. In particular, the constructs governing transactions - to start and to commit a transaction - can be used freely with a non-lexical scope. On the downside, this flexibility also allows for an incorrect use of these constructs, e.g., trying to perform a commit outside any transaction. To catch those kinds of errors, we introduce a static type and effect system for the safe use of transactions for TFJ. We prove the soundness of our type system by subject reduction.

  5. The state of the Java universe

    ScienceCinema

    None

    2016-07-12

    Speaker Bio: James Gosling received a B.Sc. in computer science from the University of Calgary, Canada in 1977. He received a Ph.D. in computer science from Carnegie-Mellon University in 1983. The title of his thesis was The Algebraic Manipulation of Constraints. He has built satellite data acquisition systems, a multiprocessor version of UNIX®, several compilers, mail systems, and window managers. He has also built a WYSIWYG text editor, a constraint-based drawing editor, and a text editor called Emacs, for UNIX systems. At Sun his early activity was as lead engineer of the NeWS window system. He did the original design of the Java programming language and implemented its original compiler and virtual machine. He has recently been a contributor to the Real-Time Specification for Java.

  6. APINetworks Java. A Java approach to the efficient treatment of large-scale complex networks

    NASA Astrophysics Data System (ADS)

    Muñoz-Caro, Camelia; Niño, Alfonso; Reyes, Sebastián; Castillo, Miriam

    2016-10-01

    We present a new version of the core structural package of our Application Programming Interface, APINetworks, for the treatment of complex networks in arbitrary computational environments. The new version is written in Java and presents several advantages over the previous C++ version: the portability of the Java code, the easiness of object-oriented design implementations, and the simplicity of memory management. In addition, some additional data structures are introduced for storing the sets of nodes and edges. Also, by resorting to the different garbage collectors currently available in the JVM the Java version is much more efficient than the C++ one with respect to memory management. In particular, the G1 collector is the most efficient one because of the parallel execution of G1 and the Java application. Using G1, APINetworks Java outperforms the C++ version and the well-known NetworkX and JGraphT packages in the building and BFS traversal of linear and complete networks. The better memory management of the present version allows for the modeling of much larger networks.

  7. JavaGenes: Evolving Graphs with Crossover

    NASA Technical Reports Server (NTRS)

    Globus, Al; Atsatt, Sean; Lawton, John; Wipke, Todd

    2000-01-01

    Genetic algorithms usually use string or tree representations. We have developed a novel crossover operator for a directed and undirected graph representation, and used this operator to evolve molecules and circuits. Unlike strings or trees, a single point in the representation cannot divide every possible graph into two parts, because graphs may contain cycles. Thus, the crossover operator is non-trivial. A steady-state, tournament selection genetic algorithm code (JavaGenes) was written to implement and test the graph crossover operator. All runs were executed by cycle-scavagging on networked workstations using the Condor batch processing system. The JavaGenes code has evolved pharmaceutical drug molecules and simple digital circuits. Results to date suggest that JavaGenes can evolve moderate sized drug molecules and very small circuits in reasonable time. The algorithm has greater difficulty with somewhat larger circuits, suggesting that directed graphs (circuits) are more difficult to evolve than undirected graphs (molecules), although necessary differences in the crossover operator may also explain the results. In principle, JavaGenes should be able to evolve other graph-representable systems, such as transportation networks, metabolic pathways, and computer networks. However, large graphs evolve significantly slower than smaller graphs, presumably because the space-of-all-graphs explodes combinatorially with graph size. Since the representation strongly affects genetic algorithm performance, adding graphs to the evolutionary programmer's bag-of-tricks should be beneficial. Also, since graph evolution operates directly on the phenotype, the genotype-phenotype translation step, common in genetic algorithm work, is eliminated.

  8. ELIST8: simulating military deployments in Java

    SciTech Connect

    Van Groningen, C. N.; Blachowicz, D.; Braun, M. D.; Simunich, K. L.; Widing, M. A.

    2002-04-12

    Planning for the transportation of large amounts of equipment, troops, and supplies presents a complex problem. Many options, including modes of transportation, vehicles, facilities, routes, and timing, must be considered. The amount of data involved in generating and analyzing a course of action (e.g., detailed information about military units, logistical infrastructures, and vehicles) is enormous. Software tools are critical in defining and analyzing these plans. Argonne National Laboratory has developed ELIST (Enhanced Logistics Intra-theater Support Tool), a simulation-based decision support system, to assist military planners in determining the logistical feasibility of an intra-theater course of action. The current version of ELIST (v.8) contains a discrete event simulation developed using the Java programming language. Argonne selected Java because of its object-oriented framework, which has greatly facilitated entity and process development within the simulation, and because it fulfills a primary requirement for multi-platform execution. This paper describes the model, including setup and analysis, a high-level architectural design, and an evaluation of Java.

  9. astrojs: JavaScript Libraries for Astronomy

    NASA Astrophysics Data System (ADS)

    Kapadia, A.; Smith, A.

    2013-10-01

    Astronomers mainly use the web for data retrieval. To create visualizations and conduct analyses requires installation of many external packages, often creating a difficult task for the astronomer. An ideal situation would move many of the common tasks to a browser — a homogenous solution for data access, visualization, and analyses in one application. As part of an effort to build research tools around core citizen science experiences, the Zooniverse is building science grade tools for handling astronomical data. As the browser is Zooniverse's medium, JavaScript — the only client-side programming language — becomes ever more relevant for feature-rich web applications. The technology industry is investing large development time in improving JavaScript engines resulting in performance gains that exceed other scripting languages. The science community could benefit from this investment by migrating development of desktop applications to web applications. Similar to the astropy initiative, ASTROJS is providing a consolidation of JavaScript libraries for in-browser client-side astronomical data visualization and analyses.

  10. A Visual Editor in Java for View

    NASA Technical Reports Server (NTRS)

    Stansifer, Ryan

    2000-01-01

    In this project we continued the development of a visual editor in the Java programming language to create screens on which to display real-time data. The data comes from the numerous systems monitoring the operation of the space shuttle while on the ground and in space, and from the many tests of subsystems. The data can be displayed on any computer platform running a Java-enabled World Wide Web (WWW) browser and connected to the Internet. Previously a special-purpose program bad been written to display data on emulations of character-based display screens used for many years at NASA. The goal now is to display bit-mapped screens created by a visual editor. We report here on the visual editor that creates the display screens. This project continues the work we bad done previously. Previously we had followed the design of the 'beanbox,' a prototype visual editor created by Sun Microsystems. We abandoned this approach and implemented a prototype using a more direct approach. In addition, our prototype is based on newly released Java 2 graphical user interface (GUI) libraries. The result has been a visually more appealing appearance and a more robust application.

  11. Horizontal carbon nanotube alignment.

    PubMed

    Cole, Matthew T; Cientanni, Vito; Milne, William I

    2016-09-21

    The production of horizontally aligned carbon nanotubes offers a rapid means of realizing a myriad of self-assembled near-atom-scale technologies - from novel photonic crystals to nanoscale transistors. The ability to reproducibly align anisotropic nanostructures has huge technological value. Here we review the present state-of-the-art in horizontal carbon nanotube alignment. For both in and ex situ approaches, we quantitatively assess the reported linear packing densities alongside the degree of alignment possible for each of these core methodologies. PMID:27546174

  12. Horizontal carbon nanotube alignment.

    PubMed

    Cole, Matthew T; Cientanni, Vito; Milne, William I

    2016-09-21

    The production of horizontally aligned carbon nanotubes offers a rapid means of realizing a myriad of self-assembled near-atom-scale technologies - from novel photonic crystals to nanoscale transistors. The ability to reproducibly align anisotropic nanostructures has huge technological value. Here we review the present state-of-the-art in horizontal carbon nanotube alignment. For both in and ex situ approaches, we quantitatively assess the reported linear packing densities alongside the degree of alignment possible for each of these core methodologies.

  13. Orthodontics and Aligners

    MedlinePlus

    ... Repairing Chipped Teeth Teeth Whitening Tooth-Colored Fillings Orthodontics and Aligners Straighten teeth for a healthier smile. Orthodontics When consumers think about orthodontics, braces are the ...

  14. Tidal alignment of galaxies

    NASA Astrophysics Data System (ADS)

    Blazek, Jonathan; Vlah, Zvonimir; Seljak, Uroš

    2015-08-01

    We develop an analytic model for galaxy intrinsic alignments (IA) based on the theory of tidal alignment. We calculate all relevant nonlinear corrections at one-loop order, including effects from nonlinear density evolution, galaxy biasing, and source density weighting. Contributions from density weighting are found to be particularly important and lead to bias dependence of the IA amplitude, even on large scales. This effect may be responsible for much of the luminosity dependence in IA observations. The increase in IA amplitude for more highly biased galaxies reflects their locations in regions with large tidal fields. We also consider the impact of smoothing the tidal field on halo scales. We compare the performance of this consistent nonlinear model in describing the observed alignment of luminous red galaxies with the linear model as well as the frequently used "nonlinear alignment model," finding a significant improvement on small and intermediate scales. We also show that the cross-correlation between density and IA (the "GI" term) can be effectively separated into source alignment and source clustering, and we accurately model the observed alignment down to the one-halo regime using the tidal field from the fully nonlinear halo-matter cross correlation. Inside the one-halo regime, the average alignment of galaxies with density tracers no longer follows the tidal alignment prediction, likely reflecting nonlinear processes that must be considered when modeling IA on these scales. Finally, we discuss tidal alignment in the context of cosmic shear measurements.

  15. Alignability of Optical Interconnects

    NASA Astrophysics Data System (ADS)

    Beech, Russell Scott

    With the continuing drive towards higher speed, density, and functionality in electronics, electrical interconnects become inadequate. Due to optics' high speed and bandwidth, freedom from capacitive loading effects, and freedom from crosstalk, optical interconnects can meet more stringent interconnect requirements. But, an optical interconnect requires additional components, such as an optical source and detector, lenses, holographic elements, etc. Fabrication and assembly of an optical interconnect requires precise alignment of these components. The successful development and deployment of optical interconnects depend on how easily the interconnect components can be aligned and/or how tolerant the interconnect is to misalignments. In this thesis, a method of quantitatively specifying the relative difficulty of properly aligning an optical interconnect is described. Ways of using this theory of alignment to obtain design and packaging guidelines for optical interconnects are examined. The measure of the ease with which an optical interconnect can be aligned, called the alignability, uses the efficiency of power transfer as a measure of alignment quality. The alignability is related to interconnect package design through the overall cost measure, which depends upon various physical parameters of the interconnect, such as the cost of the components and the time required for fabrication and alignment. Through a mutual dependence on detector size, the relationship between an interconnect's alignability and its bandwidth, signal-to-noise ratio, and bit-error -rate is examined. The results indicate that a range of device sizes exists for which given performance threshold values are satisfied. Next, the alignability of integrated planar-optic backplanes is analyzed in detail. The resulting data show that the alignability can be optimized by varying the substrate thickness or the angle of reflection. By including the effects of crosstalk, in a multi-channel backplane, the

  16. Aligned genomic data compression via improved modeling.

    PubMed

    Ochoa, Idoia; Hernaez, Mikel; Weissman, Tsachy

    2014-12-01

    With the release of the latest Next-Generation Sequencing (NGS) machine, the HiSeq X by Illumina, the cost of sequencing the whole genome of a human is expected to drop to a mere $1000. This milestone in sequencing history marks the era of affordable sequencing of individuals and opens the doors to personalized medicine. In accord, unprecedented volumes of genomic data will require storage for processing. There will be dire need not only of compressing aligned data, but also of generating compressed files that can be fed directly to downstream applications to facilitate the analysis of and inference on the data. Several approaches to this challenge have been proposed in the literature; however, focus thus far has been on the low coverage regime and most of the suggested compressors are not based on effective modeling of the data. We demonstrate the benefit of data modeling for compressing aligned reads. Specifically, we show that, by working with data models designed for the aligned data, we can improve considerably over the best compression ratio achieved by previously proposed algorithms. Our results indicate that the pareto-optimal barrier for compression rate and speed claimed by Bonfield and Mahoney (2013) [Bonfield JK and Mahoneys MV, Compression of FASTQ and SAM format sequencing data, PLOS ONE, 8(3):e59190, 2013.] does not apply for high coverage aligned data. Furthermore, our improved compression ratio is achieved by splitting the data in a manner conducive to operations in the compressed domain by downstream applications.

  17. Aligned genomic data compression via improved modeling.

    PubMed

    Ochoa, Idoia; Hernaez, Mikel; Weissman, Tsachy

    2014-12-01

    With the release of the latest Next-Generation Sequencing (NGS) machine, the HiSeq X by Illumina, the cost of sequencing the whole genome of a human is expected to drop to a mere $1000. This milestone in sequencing history marks the era of affordable sequencing of individuals and opens the doors to personalized medicine. In accord, unprecedented volumes of genomic data will require storage for processing. There will be dire need not only of compressing aligned data, but also of generating compressed files that can be fed directly to downstream applications to facilitate the analysis of and inference on the data. Several approaches to this challenge have been proposed in the literature; however, focus thus far has been on the low coverage regime and most of the suggested compressors are not based on effective modeling of the data. We demonstrate the benefit of data modeling for compressing aligned reads. Specifically, we show that, by working with data models designed for the aligned data, we can improve considerably over the best compression ratio achieved by previously proposed algorithms. Our results indicate that the pareto-optimal barrier for compression rate and speed claimed by Bonfield and Mahoney (2013) [Bonfield JK and Mahoneys MV, Compression of FASTQ and SAM format sequencing data, PLOS ONE, 8(3):e59190, 2013.] does not apply for high coverage aligned data. Furthermore, our improved compression ratio is achieved by splitting the data in a manner conducive to operations in the compressed domain by downstream applications. PMID:25395305

  18. An Accurate Scalable Template-based Alignment Algorithm.

    PubMed

    Gardner, David P; Xu, Weijia; Miranker, Daniel P; Ozer, Stuart; Cannone, Jamie J; Gutell, Robin R

    2012-12-31

    The rapid determination of nucleic acid sequences is increasing the number of sequences that are available. Inherent in a template or seed alignment is the culmination of structural and functional constraints that are selecting those mutations that are viable during the evolution of the RNA. While we might not understand these structural and functional, template-based alignment programs utilize the patterns of sequence conservation to encapsulate the characteristics of viable RNA sequences that are aligned properly. We have developed a program that utilizes the different dimensions of information in rCAD, a large RNA informatics resource, to establish a profile for each position in an alignment. The most significant include sequence identity and column composition in different phylogenetic taxa. We have compared our methods with a maximum of eight alternative alignment methods on different sets of 16S and 23S rRNA sequences with sequence percent identities ranging from 50% to 100%. The results showed that CRWAlign outperformed the other alignment methods in both speed and accuracy. A web-based alignment server is available at http://www.rna.ccbb.utexas.edu/SAE/2F/CRWAlign.

  19. Java Expert System Shell Version 6.0

    2002-06-18

    Java Expert Shell System - Jess - is a rule engine and scripting environment written entirely in Sun's Java language, Jess was orginially inspired by the CLIPS expert system shell, but has grown int a complete, distinct JAVA-influenced environment of its own. Using Jess, you can build Java applets and applications that have the capacity to "reason" using knowledge you supply in the form of declarative rules. Jess is surprisingly fast, and for some problemsmore » is faster than CLIPS, in that many Jess scripts are valid CLIPS scripts and vice-versa. Like CLIPS, Jess uses the Rete algorithm to process rules, a very efficient mechanism for solving the difficult many-to-many matching problem. Jess adds many features to CLIPS, including backwards chaining and the ability to manipulate and directly reason about Java objects. Jess is also a powerful Java scripting environment, from which you can create Java objects and call Java methods without compiling any Java Code.« less

  20. Hole-Aligning Tool

    NASA Technical Reports Server (NTRS)

    Collins, Frank A.; Saude, Frank; Sep, Martin J.

    1996-01-01

    Tool designed for use in aligning holes in plates or other structural members to be joined by bolt through holes. Holes aligned without exerting forces perpendicular to planes of holes. Tool features screw-driven-wedge design similar to (but simpler than) that of some automotive exhaust-pipe-expanding tools.

  1. Java-based framework for the secure distribution of electronic medical records.

    PubMed

    Goh, A

    1999-01-01

    In this paper, we present a Java-based framework for the processing, storage and delivery of Electronic Medical Records (EMR). The choice of Java as a developmental and operational environment ensures operability over a wide-range of client-side platforms, with our on-going work emphasising migration towards Extensible Markup Language (XML) capable Web browser clients. Telemedicine in support of womb-to-tomb healthcare as articulated by the Multimedia Supercorridor (MSC) Telemedicine initiative--which motivated this project--will require high-volume data exchange over an insecure public-access Wide Area Network (WAN), thereby requiring a hybrid cryptosystem with both symmetric and asymmetric components. Our prototype framework features a pre-transaction authentication and key negotiation sequence which can be readily modified for client-side environments ranging from Web browsers without local storage capability to workstations with serial connectivity to a tamper-proof device, and also for point-to-multipoint transaction processes.

  2. Structural analysis of aligned RNAs.

    PubMed

    Voss, Björn

    2006-01-01

    The knowledge about classes of non-coding RNAs (ncRNAs) is growing very fast and it is mainly the structure which is the common characteristic property shared by members of the same class. For correct characterization of such classes it is therefore of great importance to analyse the structural features in great detail. In this manuscript I present RNAlishapes which combines various secondary structure analysis methods, such as suboptimal folding and shape abstraction, with a comparative approach known as RNA alignment folding. RNAlishapes makes use of an extended thermodynamic model and covariance scoring, which allows to reward covariation of paired bases. Applying the algorithm to a set of bacterial trp-operon leaders using shape abstraction it was able to identify the two alternating conformations of this attenuator. Besides providing in-depth analysis methods for aligned RNAs, the tool also shows a fairly well prediction accuracy. Therefore, RNAlishapes provides the community with a powerful tool for structural analysis of classes of RNAs and is also a reasonable method for consensus structure prediction based on sequence alignments. RNAlishapes is available for online use and download at http://rna.cyanolab.de. PMID:17020924

  3. Pairwise alignment of protein interaction networks.

    PubMed

    Koyutürk, Mehmet; Kim, Yohan; Topkara, Umut; Subramaniam, Shankar; Szpankowski, Wojciech; Grama, Ananth

    2006-03-01

    With an ever-increasing amount of available data on protein-protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Although available data on protein-protein interactions is currently limited, recently developed algorithms have been shown to convey novel biological insights through employment of elegant mathematical models. The main challenge in aligning PPI networks is to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately. In this respect, modeling of conservation and divergence of interactions, as well as the interpretation of resulting alignments, are important design parameters. In this paper, we develop a framework for comprehensive alignment of PPI networks, which is inspired by duplication/divergence models that focus on understanding the evolution of protein interactions. We propose a mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch, and duplication in network alignment and evaluates similarity between graph structures through a scoring function that accounts for evolutionary events. By relying on evolutionary models, the proposed framework facilitates interpretation of resulting alignments in terms of not only conservation but also divergence of modularity in PPI networks. Furthermore, as in the case of sequence alignment, our model allows flexibility in adjusting parameters to quantify underlying evolutionary relationships. Based on the proposed model, we formulate PPI network alignment as an optimization problem and present fast algorithms to solve this problem. Detailed experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively, in terms of both accuracies and computational

  4. Long Read Alignment with Parallel MapReduce Cloud Platform

    PubMed Central

    Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki

    2015-01-01

    Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms. PMID:26839887

  5. Long Read Alignment with Parallel MapReduce Cloud Platform.

    PubMed

    Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki

    2015-01-01

    Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms.

  6. PipeAlign: A new toolkit for protein family analysis.

    PubMed

    Plewniak, Frédéric; Bianchetti, Laurent; Brelivet, Yann; Carles, Annaick; Chalmel, Frédéric; Lecompte, Odile; Mochel, Thiebaut; Moulinier, Luc; Muller, Arnaud; Muller, Jean; Prigent, Veronique; Ripp, Raymond; Thierry, Jean-Claude; Thompson, Julie D; Wicker, Nicolas; Poch, Olivier

    2003-07-01

    PipeAlign is a protein family analysis tool integrating a five step process ranging from the search for sequence homologues in protein and 3D structure databases to the definition of the hierarchical relationships within and between subfamilies. The complete, automatic pipeline takes a single sequence or a set of sequences as input and constructs a high-quality, validated MACS (multiple alignment of complete sequences) in which sequences are clustered into potential functional subgroups. For the more experienced user, the PipeAlign server also provides numerous options to run only a part of the analysis, with the possibility to modify the default parameters of each software module. For example, the user can choose to enter an existing multiple sequence alignment for refinement, validation and subsequent clustering of the sequences. The aim is to provide an interactive workbench for the validation, integration and presentation of a protein family, not only at the sequence level, but also at the structural and functional levels. PipeAlign is available at http://igbmc.u-strasbg.fr/PipeAlign/.

  7. Long Read Alignment with Parallel MapReduce Cloud Platform.

    PubMed

    Al-Absi, Ahmed Abdulhakim; Kang, Dae-Ki

    2015-01-01

    Genomic sequence alignment is an important technique to decode genome sequences in bioinformatics. Next-Generation Sequencing technologies produce genomic data of longer reads. Cloud platforms are adopted to address the problems arising from storage and analysis of large genomic data. Existing genes sequencing tools for cloud platforms predominantly consider short read gene sequences and adopt the Hadoop MapReduce framework for computation. However, serial execution of map and reduce phases is a problem in such systems. Therefore, in this paper, we introduce Burrows-Wheeler Aligner's Smith-Waterman Alignment on Parallel MapReduce (BWASW-PMR) cloud platform for long sequence alignment. The proposed cloud platform adopts a widely accepted and accurate BWA-SW algorithm for long sequence alignment. A custom MapReduce platform is developed to overcome the drawbacks of the Hadoop framework. A parallel execution strategy of the MapReduce phases and optimization of Smith-Waterman algorithm are considered. Performance evaluation results exhibit an average speed-up of 6.7 considering BWASW-PMR compared with the state-of-the-art Bwasw-Cloud. An average reduction of 30% in the map phase makespan is reported across all experiments comparing BWASW-PMR with Bwasw-Cloud. Optimization of Smith-Waterman results in reducing the execution time by 91.8%. The experimental study proves the efficiency of BWASW-PMR for aligning long genomic sequences on cloud platforms. PMID:26839887

  8. Volume visualization of multiple alignment of genomic DNA

    SciTech Connect

    Shah, Nameeta; Weber, Gunther H.; Dillard, Scott E.; Hamann, Bernd

    2004-05-01

    Genomes of hundreds of species have been sequenced to date and many more are being sequenced. As more and more sequence data sets become available, and as the challenge of comparing these massive ''billion basepair DNA sequences'' becomes substantial, so does the need for more powerful tools supporting the exploration of these data sets. Similarity score data used to compare aligned DNA sequences is inherently one-dimensional. One-dimensional (1D) representations of these data sets do not effectively utilize screen real estate. We present a technique to arrange 1D data in 3D space to allow us to apply state-of-the-art interactive volume visualization techniques for data exploration. We provide results for aligned DNA sequence data and compare it with traditional 1D line plots. Our technique, coupled with 1D line plots, results in effective multiresolution visualization of very large aligned sequence data sets.

  9. Implementation of BT, SP, LU, and FT of NAS Parallel Benchmarks in Java

    NASA Technical Reports Server (NTRS)

    Schultz, Matthew; Frumkin, Michael; Jin, Hao-Qiang; Yan, Jerry

    2000-01-01

    A number of Java features make it an attractive but a debatable choice for High Performance Computing. We have implemented benchmarks working on single structured grid BT,SP,LU and FT in Java. The performance and scalability of the Java code shows that a significant improvement in Java compiler technology and in Java thread implementation are necessary for Java to compete with Fortran in HPC applications.

  10. Java interface for asserting interactive telerobotic control

    NASA Astrophysics Data System (ADS)

    DePasquale, Peter; Lewis, John; Stein, Matthew R.

    1997-12-01

    Many current web-based telerobotic interfaces use HyperText Markup Language (HTML) forms to assert user control on a robot. While acceptable for some tasks, a Java interface can provide better client-server interaction. The Puma Paint project is a joint effort between the Department of Computing Sciences at Villanova University and the Department of Mechanical and Materials Engineering at Wilkes University. THe project utilizes a Java applet to control a Unimation Puma 1760 robot during the task of painting on a canvas. The interface allows the user to control the paint strokes as well as the pressure of a brush on the canvas and how deep the brush is dipped into a paint jar. To provide immediate feedback, a virtual canvas models the effects of the controls as the artist paints. Live color video feedback is provided, allowing the user to view the actual results of the robot's motions. Unlike the step-at-a-time model of many web forms, the application permits the user to assert interactive control. The greater the complexity of the interaction between the robot and its environment, the greater the need for high quality information presentation to the user. The use of Java allows the sophistication of the user interface to be raised to the level required for satisfactory control. This paper describes the Puma Paint project, including the interface and communications model. It also examines the challenges of using the Internet as the medium of communications and the challenges of encoding free ranging motions for transmission from the client to the robot.

  11. Java implementation of Class Association Rule algorithms

    2007-08-30

    Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix andmore » a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.« less

  12. Java implementation of Class Association Rule algorithms

    SciTech Connect

    Tamura, Makio

    2007-08-30

    Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix and a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.

  13. HubAlign: an accurate and efficient method for global alignment of protein–protein interaction networks

    PubMed Central

    Hashemifar, Somaye; Xu, Jinbo

    2014-01-01

    Motivation: High-throughput experimental techniques have produced a large amount of protein–protein interaction (PPI) data. The study of PPI networks, such as comparative analysis, shall benefit the understanding of life process and diseases at the molecular level. One way of comparative analysis is to align PPI networks to identify conserved or species-specific subnetwork motifs. A few methods have been developed for global PPI network alignment, but it still remains challenging in terms of both accuracy and efficiency. Results: This paper presents a novel global network alignment algorithm, denoted as HubAlign, that makes use of both network topology and sequence homology information, based upon the observation that topologically important proteins in a PPI network usually are much more conserved and thus, more likely to be aligned. HubAlign uses a minimum-degree heuristic algorithm to estimate the topological and functional importance of a protein from the global network topology information. Then HubAlign aligns topologically important proteins first and gradually extends the alignment to the whole network. Extensive tests indicate that HubAlign greatly outperforms several popular methods in terms of both accuracy and efficiency, especially in detecting functionally similar proteins. Availability: HubAlign is available freely for non-commercial purposes at http://ttic.uchicago.edu/∼hashemifar/software/HubAlign.zip Contact: jinboxu@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25161231

  14. Precision alignment device

    DOEpatents

    Jones, Nelson E.

    1990-01-01

    Apparatus for providing automatic alignment of beam devices having an associated structure for directing, collimating, focusing, reflecting, or otherwise modifying the main beam. A reference laser is attached to the structure enclosing the main beam producing apparatus and produces a reference beam substantially parallel to the main beam. Detector modules containing optical switching devices and optical detectors are positioned in the path of the reference beam and are effective to produce an electrical output indicative of the alignment of the main beam. This electrical output drives servomotor operated adjustment screws to adjust the position of elements of the structure associated with the main beam to maintain alignment of the main beam.

  15. Precision alignment device

    DOEpatents

    Jones, N.E.

    1988-03-10

    Apparatus for providing automatic alignment of beam devices having an associated structure for directing, collimating, focusing, reflecting, or otherwise modifying the main beam. A reference laser is attached to the structure enclosing the main beam producing apparatus and produces a reference beam substantially parallel to the main beam. Detector modules containing optical switching devices and optical detectors are positioned in the path of the reference beam and are effective to produce an electrical output indicative of the alignment of the main beam. This electrical output drives servomotor operated adjustment screws to adjust the position of elements of the structure associated with the main beam to maintain alignment of the main beam. 5 figs.

  16. Galaxy Alignments: An Overview

    NASA Astrophysics Data System (ADS)

    Joachimi, Benjamin; Cacciato, Marcello; Kitching, Thomas D.; Leonard, Adrienne; Mandelbaum, Rachel; Schäfer, Björn Malte; Sifón, Cristóbal; Hoekstra, Henk; Kiessling, Alina; Kirk, Donnacha; Rassat, Anais

    2015-11-01

    The alignments between galaxies, their underlying matter structures, and the cosmic web constitute vital ingredients for a comprehensive understanding of gravity, the nature of matter, and structure formation in the Universe. We provide an overview on the state of the art in the study of these alignment processes and their observational signatures, aimed at a non-specialist audience. The development of the field over the past one hundred years is briefly reviewed. We also discuss the impact of galaxy alignments on measurements of weak gravitational lensing, and discuss avenues for making theoretical and observational progress over the coming decade.

  17. Radiative Grain Alignment

    NASA Astrophysics Data System (ADS)

    Andersson, B. G.

    2015-12-01

    Polarization due to aligned dust grains was discovered in the interstellar medium more than 60 years ago. A quantitative, observationally well tested theory of the phenomenon has finally emerged in the last decade, promising not only an improved understanding of interstellar magnetic fields, but new tools for studying the dust environments and grain characteristics. This Radiative Alignment Torque (RAT) theory also has many potential applications in solar system physics, including for comet dust characteristics. I will review the main aspects of the theory and the observational tests performed to date, as well as some of the new possibilities for using polarization as a tool to study dust and its environment, with RAT alignment.

  18. Hybrid vehicle motor alignment

    DOEpatents

    Levin, Michael Benjamin

    2001-07-03

    A rotor of an electric motor for a motor vehicle is aligned to an axis of rotation for a crankshaft of an internal combustion engine having an internal combustion engine and an electric motor. A locator is provided on the crankshaft, a piloting tool is located radially by the first locator to the crankshaft. A stator of the electric motor is aligned to a second locator provided on the piloting tool. The stator is secured to the engine block. The rotor is aligned to the crankshaft and secured thereto.

  19. A Geostationary Earth Orbit Satellite Model Using Easy Java Simulation

    ERIC Educational Resources Information Center

    Wee, Loo Kang; Goh, Giam Hwee

    2013-01-01

    We develop an Easy Java Simulation (EJS) model for students to visualize geostationary orbits near Earth, modelled using a Java 3D implementation of the EJS 3D library. The simplified physics model is described and simulated using a simple constant angular velocity equation. We discuss four computer model design ideas: (1) a simple and realistic…

  20. JAVA SWING-BASED PLOTTING PACKAGE RESIDING WITHIN XAL

    SciTech Connect

    Shishlo, Andrei P; Chu, Paul; Pelaia II, Tom

    2007-01-01

    A data plotting package residing in the XAL tools set is presented. This package is based on Java SWING, and therefore it has the same portability as Java itself. The data types for charts, bar-charts, and color-surface plots are described. The algorithms, performance, interactive capabilities, limitations, and the best usage practices of this plotting package are discussed.

  1. Java: A New Brew for Educators, Administrators and Students.

    ERIC Educational Resources Information Center

    Gordon, Barbara

    1996-01-01

    Java is an object-oriented programming language developed by Sun Microsystems; its benefits include platform independence, security, and interactivity. Within the college community, Java is being used in programming courses, collaborative technology research projects, computer graphics instruction, and distance education. (AEF)

  2. Paintbrush of Discovery: Using Java Applets to Enhance Mathematics Education

    ERIC Educational Resources Information Center

    Eason, Ray; Heath, Garrett

    2004-01-01

    This article addresses the enhancement of the learning environment by using Java applets in the mathematics classroom. Currently, the first year mathematics program at the United States Military Academy involves one semester of modeling with discrete dynamical systems (DDS). Several faculty members from the Academy have integrated Java applets…

  3. Dynamic Learning Objects to Teach Java Programming Language

    ERIC Educational Resources Information Center

    Narasimhamurthy, Uma; Al Shawkani, Khuloud

    2010-01-01

    This article describes a model for teaching Java Programming Language through Dynamic Learning Objects. The design of the learning objects was based on effective learning design principles to help students learn the complex topic of Java Programming. Visualization was also used to facilitate the learning of the concepts. (Contains 1 figure and 2…

  4. Real-time Java for flight applications: an update

    NASA Technical Reports Server (NTRS)

    Dvorak, D.

    2003-01-01

    The RTSJ is a specification for supporting real-time execution in the Java programming language. The specification has been shaped by several guiding principles, particularly: predictable execution as the first priority in all tradeoffs, no syntactic extensions to Java, and backward compatibility.

  5. High-Performance Java Codes for Computational Fluid Dynamics

    NASA Technical Reports Server (NTRS)

    Riley, Christopher; Chatterjee, Siddhartha; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

    2001-01-01

    The computational science community is reluctant to write large-scale computationally -intensive applications in Java due to concerns over Java's poor performance, despite the claimed software engineering advantages of its object-oriented features. Naive Java implementations of numerical algorithms can perform poorly compared to corresponding Fortran or C implementations. To achieve high performance, Java applications must be designed with good performance as a primary goal. This paper presents the object-oriented design and implementation of two real-world applications from the field of Computational Fluid Dynamics (CFD): a finite-volume fluid flow solver (LAURA, from NASA Langley Research Center), and an unstructured mesh adaptation algorithm (2D_TAG, from NASA Ames Research Center). This work builds on our previous experience with the design of high-performance numerical libraries in Java. We examine the performance of the applications using the currently available Java infrastructure and show that the Java version of the flow solver LAURA performs almost within a factor of 2 of the original procedural version. Our Java version of the mesh adaptation algorithm 2D_TAG performs within a factor of 1.5 of its original procedural version on certain platforms. Our results demonstrate that object-oriented software design principles are not necessarily inimical to high performance.

  6. Developmental Process Model for the Java Intelligent Tutoring System

    ERIC Educational Resources Information Center

    Sykes, Edward

    2007-01-01

    The Java Intelligent Tutoring System (JITS) was designed and developed to support the growing trend of Java programming around the world. JITS is an advanced web-based personalized tutoring system that is unique in several ways. Most programming Intelligent Tutoring Systems require the teacher to author problems with corresponding solutions. JITS,…

  7. JavaScript: Convenient Interactivity for the Class Web Page.

    ERIC Educational Resources Information Center

    Gray, Patricia

    This paper shows how JavaScript can be used within HTML pages to add interactive review sessions and quizzes incorporating graphics and sound files. JavaScript has the advantage of providing basic interactive functions without the use of separate software applications and players. Because it can be part of a standard HTML page, it is…

  8. Multi-threading the generation of Burrows-Wheeler Alignment.

    PubMed

    Jo, H

    2016-01-01

    Along with recent progress in next-generation sequencing technology, it has become easier to process larger amounts of genome sequencing data at a lower cost. The most time-consuming step of next-generation sequencing data analysis involves the mapping of read data into a reference genome. Although the Burrows-Wheeler Alignment (BWA) tool is one of the most widely used open-source software tools for aligning read sequences, it still has a limitation in that it does not fully support a multi-thread mechanism during the alignment generation step. In this article, we propose a BWA-MT tool based on BWA that supports multi-thread mechanisms for processing alignment generation. To evaluate BWA-MT, we used an evaluation system equipped with 24 cores and 128 GB of memory. As workloads, we used the hg19 human genome reference sequence and sequences of various read sizes from the 1 to 40 M spots. In our evaluation, BWA-MT showed a maximum of 3.66-times better performance, and generated the same Sequence Alignment/Map result file as that of BWA. Although the ability to speed up the procedure might be dependent on computing resources, we confirmed that BWA-MT is a highly effective and fast alignment tool. PMID:27323088

  9. Sequence information signal processor

    DOEpatents

    Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

    1999-01-01

    An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.

  10. Methylation of cytosine at C5 in a CpG sequence context causes a conformational switch of a benzo[a]pyrene diol epoxide-N2-guanine adduct in DNA from a minor groove alignment to intercalation with base displacement.

    SciTech Connect

    Zhang, N.; Lin, C.; Huang, X.; Kolbanovskiy, A.; Hingerty, Brian E; Amin, S.; Broyde, S.; Geactinov, N. E.; Patel, D. J.

    2005-03-01

    It is well known that CpG dinucleotide steps in DNA, which are highly methylated at the 5-position of cytosine (meC) in human tissues, exhibit a disproportionate number of mutations within certain codons of the p53 gene. There is ample published evidence indicating that the reactivity of guanine with anti-B[a]PDE (a metabolite of the environmental carcinogen benzo[a]pyrene) at CpG mutation hot spots is enhanced by the methylation of the cytosine residue flanking the target guanine residue on the 5'-side. In this work we demonstrate that such a methylation can also dramatically affect the conformational characteristics of an adduct derived from the reaction of one of the two enantiomers of anti-B[a]PDE with the exocyclic amino group of guanine ([BP]G adduct). A detailed NMR study indicates that the 10R (-)-trans-anti-[BP]G adduct undergoes a transition from a minor groove-binding alignment of the aromatic BP ring system in the unmethylated C-[BP]G sequence context, to an intercalative BP alignment with a concomitant displacement of the modified guanine residue into the minor groove in the methylated meC-[BP]G sequence context. By contrast, a minor groove-binding alignment was observed for the stereoisomeric 10S (+)-trans-anti-[BP]G adduct in both the C-[BP]G and meC-[BP]G sequence contexts. This remarkable conformational switch resulting from the presence of a single methyl group at the 5-position of the cytosine residue flanking the lesion on the 5'-side, is attributed to the hydrophobic effect of the methyl group that can stabilize intercalated adduct conformations in an adduct stereochemistry-dependent manner. Such conformational differences in methylated and unmethylated CpG sequences may be significant because of potential alterations in the cellular processing of the [BP]G adducts by DNA transcription, replication, and repair enzymes.

  11. FRESCO: flexible alignment with rectangle scoring schemes.

    PubMed

    Dalca, A V; Brudno, M

    2008-01-01

    While the popular DNA sequence alignment tools incorporate powerful heuristics to allow for fast and accurate alignment of DNA, most of them still optimize the classical Needleman Wunsch scoring scheme. The development of novel scoring schemes is often hampered by the difficulty of finding an optimizing algorithm for each non-trivial scheme. In this paper we define the broad class of rectangle scoring schemes, and describe an algorithm and tool that can align two sequences with an arbitrary rectangle scoring scheme in polynomial time. Rectangle scoring schemes encompass some of the popular alignment scoring metrics currently in use, as well as many other functions. We investigate a novel scoring function based on minimizing the expected number of random diagonals observed with the given scores and show that it rivals the LAGAN and Clustal-W aligners, without using any biological or evolutionary parameters. The FRESCO program, freely available at http://compbio.cs.toronto.edu/fresco, gives bioinformatics researchers the ability to quickly compare the performance of other complex scoring formulas without having to implement new algorithms to optimize them.

  12. JPARSS: A Java Parallel Network Package for Grid Computing

    SciTech Connect

    Chen, Jie; Akers, Walter; Chen, Ying; Watson, William

    2002-03-01

    The emergence of high speed wide area networks makes grid computinga reality. However grid applications that need reliable data transfer still have difficulties to achieve optimal TCP performance due to network tuning of TCP window size to improve bandwidth and to reduce latency on a high speed wide area network. This paper presents a Java package called JPARSS (Java Parallel Secure Stream (Socket)) that divides data into partitions that are sent over several parallel Java streams simultaneously and allows Java or Web applications to achieve optimal TCP performance in a grid environment without the necessity of tuning TCP window size. This package enables single sign-on, certificate delegation and secure or plain-text data transfer using several security components based on X.509 certificate and SSL. Several experiments will be presented to show that using Java parallelstreams is more effective than tuning TCP window size. In addition a simple architecture using Web services

  13. An implicit spatial memory alignment effect.

    PubMed

    Cerles, Mélanie; Gomez, Alice; Rousset, Stéphane

    2015-09-01

    The memory alignment effect is the advantage of reasoning from a perspective which is aligned with the frame of reference used to encode an environment in memory. It usually occurs when participants have to consciously take a perspective to perform a spatial memory task. The present experiment assesses whether the memory alignment effect can occur without requiring to consciously take a given perspective, when the misaligned perspective is only perceptively provided. In others words, does the memory alignment effect still arise when it is only implicitly prompted? Thirty participants learned a sequence of four objects' positions in a room from a north-as-up survey perspective. During the testing phase, they had to point to the direction of a target object from another object ('the reference') with a fixed north-up orientation. The background behind the reference object displayed either a uniform color (control condition) or a misaligned ground-level perspective. The latter displayed a reference object's position information which was either congruent with the studied environment (congruent misaligned condition) or incongruent (incongruent misaligned condition). Mean pointing errors were higher in the congruent misaligned condition than in the control condition, whereas the incongruent misaligned condition did not differ from the control one. The present study shows that the memory alignment effect can arise without requiring a conscious misaligned perspective taking. Moreover, the perceived misaligned perspective must share the same spatial content as the memorized spatial representation in order to induce an alignment effect. PMID:26233526

  14. HIV Sequence Compendium 2015

    SciTech Connect

    Foley, Brian Thomas; Leitner, Thomas Kenneth; Apetrei, Cristian; Hahn, Beatrice; Mizrachi, Ilene; Mullins, James; Rambaut, Andrew; Wolinsky, Steven; Korber, Bette Tina Marie

    2015-10-05

    This compendium is an annual printed summary of the data contained in the HIV sequence database. We try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2015. Hence, though it is published in 2015 and called the 2015 Compendium, its contents correspond to the 2014 curated alignments on our website. The number of sequences in the HIV database is still increasing. In total, at the end of 2014, there were 624,121 sequences in the HIV Sequence Database, an increase of 7% since the previous year. This is the first year that the number of new sequences added to the database has decreased compared to the previous year. The number of near complete genomes (>7000 nucleotides) increased to 5834 by end of 2014. However, as in previous years, the compendium alignments contain only a fraction of these. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/ content/sequence/NEWALIGN/align.html As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  15. Analysis of viral protein-2 encoding gene of avian encephalomyelitis virus from field specimens in Central Java region, Indonesia

    PubMed Central

    Haryanto, Aris; Ermawati, Ratna; Wati, Vera; Irianingsih, Sri Handayani; Wijayanti, Nastiti

    2016-01-01

    Aim: Avian encephalomyelitis (AE) is a viral disease which can infect various types of poultry, especially chicken. In Indonesia, the incidence of AE infection in chicken has been reported since 2009, the AE incidence tends to increase from year to year. The objective of this study was to analyze viral protein 2 (VP-2) encoding gene of AE virus (AEV) from various species of birds in field specimen by reverse transcription polymerase chain reaction (RT-PCR) amplification using specific nucleotides primer for confirmation of AE diagnosis. Materials and Methods: A total of 13 AEV samples are isolated from various species of poultry which are serologically diagnosed infected by AEV from some areas in central Java, Indonesia. Research stage consists of virus samples collection from field specimens, extraction of AEV RNA, amplification of VP-2 protein encoding gene by RT-PCR, separation of RT-PCR product by agarose gel electrophoresis, DNA sequencing and data analysis. Results: Amplification products of the VP-2 encoding gene of AEV by RT-PCR methods of various types of poultry from field specimens showed a positive results on sample code 499/4/12 which generated DNA fragment in the size of 619 bp. Sensitivity test of RT-PCR amplification showed that the minimum concentration of RNA template is 127.75 ng/µl. The multiple alignments of DNA sequencing product indicated that positive sample with code 499/4/12 has 92% nucleotide homology compared with AEV with accession number AV1775/07 and 85% nucleotide homology with accession number ZCHP2/0912695 from Genbank database. Analysis of VP-2 gene sequence showed that it found 46 nucleotides difference between isolate 499/4/12 compared with accession number AV1775/07 and 93 nucleotides different with accession number ZCHP2/0912695. Conclusions: Analyses of the VP-2 encoding gene of AEV with RT-PCR method from 13 samples from field specimen generated the DNA fragment in the size of 619 bp from one sample with sample code 499

  16. Anatomy of the western Java plate interface from depth-migrated seismic images

    USGS Publications Warehouse

    Kopp, H.; Hindle, D.; Klaeschen, D.; Oncken, O.; Reichert, C.; Scholl, D.

    2009-01-01

    Newly pre-stack depth-migrated seismic images resolve the structural details of the western Java forearc and plate interface. The structural segmentation of the forearc into discrete mechanical domains correlates with distinct deformation styles. Approximately 2/3 of the trench sediment fill is detached and incorporated into frontal prism imbricates, while the floor sequence is underthrust beneath the d??collement. Western Java, however, differs markedly from margins such as Nankai or Barbados, where a uniform, continuous d??collement reflector has been imaged. In our study area, the plate interface reveals a spatially irregular, nonlinear pattern characterized by the morphological relief of subducted seamounts and thicker than average patches of underthrust sediment. The underthrust sediment is associated with a low velocity zone as determined from wide-angle data. Active underplating is not resolved, but likely contributes to the uplift of the large bivergent wedge that constitutes the forearc high. Our profile is located 100 km west of the 2006 Java tsunami earthquake. The heterogeneous d??collement zone regulates the friction behavior of the shallow subduction environment where the earthquake occurred. The alternating pattern of enhanced frictional contact zones associated with oceanic basement relief and weak material patches of underthrust sediment influences seismic coupling and possibly contributed to the heterogeneous slip distribution. Our seismic images resolve a steeply dipping splay fault, which originates at the d??collement and terminates at the sea floor and which potentially contributes to tsunami generation during co-seismic activity. ?? 2009 Elsevier B.V.

  17. PDV Probe Alignment Technique

    SciTech Connect

    Whitworth, T L; May, C M; Strand, O T

    2007-10-26

    This alignment technique was developed while performing heterodyne velocimetry measurements at LLNL. There are a few minor items needed, such as a white card with aperture in center, visible alignment laser, IR back reflection meter, and a microscope to view the bridge surface. The work was performed on KCP flyers that were 6 and 8 mils wide. The probes used were Oz Optics manufactured with focal distances of 42mm and 26mm. Both probes provide a spot size of approximately 80?m at 1550nm. The 42mm probes were specified to provide an internal back reflection of -35 to -40dB, and the probe back reflections were measured to be -37dB and -33dB. The 26mm probes were specified as -30dB and both measured -30.5dB. The probe is initially aligned normal to the flyer/bridge surface. This provides a very high return signal, up to -2dB, due to the bridge reflectivity. A white card with a hole in the center as an aperture can be used to check the reflected beam position relative to the probe and launch beam, and the alignment laser spot centered on the bridge, see Figure 1 and Figure 2. The IR back reflection meter is used to measure the dB return from the probe and surface, and a white card or similar object is inserted between the probe and surface to block surface reflection. It may take several iterations between the visible alignment laser and the IR back reflection meter to complete this alignment procedure. Once aligned normal to the surface, the probe should be tilted to position the visible alignment beam as shown in Figure 3, and the flyer should be translated in the X and Y axis to reposition the alignment beam onto the flyer as shown in Figure 4. This tilting of the probe minimizes the amount of light from the bridge reflection into the fiber within the probe while maintaining the alignment as near normal to the flyer surface as possible. When the back reflection is measured after the tilt adjustment, the level should be about -3dB to -6dB higher than the probes

  18. Curriculum Alignment Research Suggests that Alignment Can Improve Student Achievement

    ERIC Educational Resources Information Center

    Squires, David

    2012-01-01

    Curriculum alignment research has developed showing the relationship among three alignment categories: the taught curriculum, the tested curriculum and the written curriculum. Each pair (for example, the taught and the written curriculum) shows a positive impact for aligning those results. Following this, alignment results from the Third…

  19. An explorative multiproxy approach to characterize the ecospace of Homo erectus at Sangiran (Java, Indonesia)

    NASA Astrophysics Data System (ADS)

    Hertler, Christine; Haupt, Susanne; Lüdecke, Tina; Wirkner, Mathias; Bruch, Angela

    2015-04-01

    Homo erectus inhabited the islands of the Sunda Shelf in the late Early Pleistocene. This is illustrated by an extensive record of hominid specimens stemming from a variety of sites in Java. The hominid locality Sangiran plays a crucial role in studying related environments, because the geological record at the Sangiran dome covers a stratigraphic sequence, unlike any other hominid site in Java. Although the detailed chronology of the localities in Java is still under dispute, it covers the period between the late Early and early Middle Pleistocene. Fossil evidence includes the hominin specimens proper, diverse and evolving vertebrate faunas as well as pollen profiles. We applied a multiproxy approach to analyse and reconstruct features of the Homo erectus ecospace. Preliminary results of our explorative study are introduced in this paper. Based on the pollen record, we reconstructed temperature and precipitation for the major stratigraphic units. Although resulting values are averaging over wide chronological intervals, they illustrate general climatic trends in the late Early and early Middle Pleistocene in accordance with previous studies and the MIS record. The mammalian specimens we selected for this preliminary study possess a more restricted stratigraphic provenience. Our analyses are based on a dental sample of Duboisia santeng from the Koenigswald collection (n=14). The occurrence of the taxon is restricted to 3 layers in the stratigraphy. We reconstructed body mass and inferred diet from mesowear and isotope studies. There is no significant shift in body masses of Duboisia santeng. This result is in accordance with studies from other localities in Java. However, slight shifts in the mesowear signals (mixed feeder with increasingly browsing signal) are confirmed by studies of carbon isotopes. The analysis of oxygen isotopes provides evidence for seasonality which is compared with the signals from the vegetation.

  20. Petroleum systems of the Northwest Java Province, Java and offshore southeast Sumatra, Indonesia

    USGS Publications Warehouse

    Bishop, Michele G.

    2000-01-01

    Mature, synrift lacustrine shales of Eocene to Oligocene age and mature, late-rift coals and coaly shales of Oligocene to Miocene age are source rocks for oil and gas in two important petroleum systems of the onshore and offshore areas of the Northwest Java Basin. Biogenic gas and carbonate-sourced gas have also been identified. These hydrocarbons are trapped primarily in anticlines and fault blocks involving sandstone and carbonate reservoirs. These source rocks and reservoir rocks were deposited in a complex of Tertiary rift basins formed from single or multiple half-grabens on the south edge of the Sunda Shelf plate. The overall transgressive succession was punctuated by clastic input from the exposed Sunda Shelf and marine transgressions from the south. The Northwest Java province may contain more than 2 billion barrels of oil equivalent in addition to the 10 billion barrels of oil equivalent already identified.

  1. Topological network alignment uncovers biological function and phylogeny

    PubMed Central

    Kuchaiev, Oleksii; Milenković, Tijana; Memišević, Vesna; Hayes, Wayne; Pržulj, Nataša

    2010-01-01

    Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology and disease. Comparison and alignment of biological networks will probably have a similar impact. Existing network alignments use information external to the networks, such as sequence, because no good algorithm for purely topological alignment has yet been devised. In this paper, we present a novel algorithm based solely on network topology, that can be used to align any two networks. We apply it to biological networks to produce by far the most complete topological alignments of biological networks to date. We demonstrate that both species phylogeny and detailed biological function of individual proteins can be extracted from our alignments. Topology-based alignments have the potential to provide a completely new, independent source of phylogenetic information. Our alignment of the protein–protein interaction networks of two very different species—yeast and human—indicate that even distant species share a surprising amount of network topology, suggesting broad similarities in internal cellular wiring across all life on Earth. PMID:20236959

  2. Mineralogy of a perudic Andosol in central Java, Indonesia

    SciTech Connect

    Van Ranst, Eric; Utami, S. R.; Verdoodt, A.; Qafoku, Nikolla

    2008-02-15

    We studied the mineralogy of a perudic Andosol developed on the Dieng Tephra Sequence in central Java, Indonesia. The objective was to confirm the presence and determine the origin and stability of 2:1 and interlayered 2:1 phyllosilicates in well-drained Andosols. This was and still is a debated topic in the literature. Total elemental and selective dissolution, as well as microscopic and X-ray diffraction analyses, were performed on the soil samples collected from this site. These analyses confirmed that andic properties were present in the soil samples. The allophane content determined by selective dissolution was 3-4% in the A horizons, and increased to 12-18% in the deeper subsoil horizons. In addition, the clay fraction contained dioctahedral smectite, hydroxy-Al-interlayered 2:1 minerals (HIS), Al-chlorite, kaolinite, pyrophyllite, mica, cristobalite and some gibbsite. The silt and sand fractions were rich in plagioclase and pyroxene. The 2:1 minerals (smectite and pyrophyllite), as well as chlorite and kaolinite were of hydrothermal origin and were incorporated in the tephra during volcanic eruption. Besides desilication during dissolution of unstable minerals, Al interlayering of 2:1 layer silicates was most likely the most prominent pedogenic process. Although hydroxy-Al polymeric interlayers would normally stabilize the 2:1 clay phases, the strong weakening, and even disappearance of the characteristic XRD peaks, indicated instability of these minerals in the upper A horizons due to the perudic and intensive leaching conditions.

  3. Enhancement of initial equivalency for protein structure alignment based on encoded local structures.

    PubMed

    Hung, Kenneth; Wang, Jui-Chih; Chen, Cheng-Wei; Chuang, Cheng-Long; Tsai, Kun-Nan; Chen, Chung-Ming

    2012-11-01

    Most alignment algorithms find an initial equivalent residue pair followed by an iterative optimization process to explore better near-optimal alignments in the surrounding solution space of the initial alignment. It plays a decisive role in determining the alignment quality since a poor initial alignment may make the final alignment trapped in an undesirable local optimum even with an iterative optimization. We proposed a vector-based alignment algorithm with a new initial alignment approach accounting for local structure features called MIRAGE-align. The new idea is to enhance the quality of the initial alignment based on encoded local structural alphabets to identify the protein structure pair whose sequence identity falls in or below twilight zone. The statistical analysis of alignment quality based on Match Index (MI) and computation time demonstrated that MIRAGE-align algorithm outperformed four previously published algorithms, i.e., the residue-based algorithm (CE), the vector-based algorithm (SSM), TM-align, and Fr-TM-align. MIRAGE-align yields a better estimate of initial solution to enhance the quality of initial alignment and enable the employment of a non-iterative optimization process to achieve a better alignment. PMID:22717522

  4. New Web Server - the Java Version of Tempest - Produced

    NASA Technical Reports Server (NTRS)

    York, David W.; Ponyik, Joseph G.

    2000-01-01

    A new software design and development effort has produced a Java (Sun Microsystems, Inc.) version of the award-winning Tempest software (refs. 1 and 2). In 1999, the Embedded Web Technology (EWT) team received a prestigious R&D 100 Award for Tempest, Java Version. In this article, "Tempest" will refer to the Java version of Tempest, a World Wide Web server for desktop or embedded systems. Tempest was designed at the NASA Glenn Research Center at Lewis Field to run on any platform for which a Java Virtual Machine (JVM, Sun Microsystems, Inc.) exists. The JVM acts as a translator between the native code of the platform and the byte code of Tempest, which is compiled in Java. These byte code files are Java executables with a ".class" extension. Multiple byte code files can be zipped together as a "*.jar" file for more efficient transmission over the Internet. Today's popular browsers, such as Netscape (Netscape Communications Corporation) and Internet Explorer (Microsoft Corporation) have built-in Virtual Machines to display Java applets.

  5. HotJava: Sun's Animated Interactive World Wide Web Browser for the Internet.

    ERIC Educational Resources Information Center

    Machovec, George S., Ed.

    1995-01-01

    Examines HotJava and Java, World Wide Web technology for use on the Internet. HotJava, an interactive, animated Web browser, based on the object-oriented Java programming language, is different from HTML-based browsers such as Netscape. Its client/server design does not understand Internet protocols but can dynamically find what it needs to know.…

  6. Spilling the beans on java 3D: a tool for the virtual anatomist.

    PubMed

    Guttmann, G D

    1999-04-15

    The computing world has just provided the anatomist with another tool: Java 3D, within the Java 2 platform. On December 9, 1998, Sun Microsystems released Java 2. Java 3D classes are now included in the jar (Java Archive) archives of the extensions directory of Java 2. Java 3D is also a part of the Java Media Suite of APIs (Application Programming Interfaces). But what is Java? How does Java 3D work? How do you view Java 3D objects? A brief introduction to the concepts of Java and object-oriented programming is provided. Also, there is a short description of the tools of Java 3D and of the Java 3D viewer. Thus, the virtual anatomist has another set of computer tools to use for modeling various aspects of anatomy, such as embryological development. Also, the virtual anatomist will be able to assist the surgeon with virtual surgery using the tools found in Java 3D. Java 3D will be able to fulfill gaps, such as the lack of platform independence, interactivity, and manipulability of 3D images, currently existing in many anatomical computer-aided learning programs.

  7. VOTable JAVA Streaming Writer and Applications.

    NASA Astrophysics Data System (ADS)

    Kulkarni, P.; Kembhavi, A.; Kale, S.

    2004-07-01

    Virtual Observatory related tools use a new standard for data transfer called the VOTable format. This is a variant of the xml format that enables easy transfer of data over the web. We describe a streaming interface that can bridge the VOTable format, through a user friendly graphical interface, with the FITS and ASCII formats, which are commonly used by astronomers. A streaming interface is important for efficient use of memory because of the large size of catalogues. The tools are developed in JAVA to provide a platform independent interface. We have also developed a stand-alone version that can be used to convert data stored in ASCII or FITS format on a local machine. The Streaming writer is successfully being used in VOPlot (See Kale et al 2004 for a description of VOPlot).We present the test results of converting huge FITS and ASCII data into the VOTable format on machines that have only limited memory.

  8. Debris Dispersion Model Using Java 3D

    NASA Technical Reports Server (NTRS)

    Thirumalainambi, Rajkumar; Bardina, Jorge

    2004-01-01

    This paper describes web based simulation of Shuttle launch operations and debris dispersion. Java 3D graphics provides geometric and visual content with suitable mathematical model and behaviors of Shuttle launch. Because the model is so heterogeneous and interrelated with various factors, 3D graphics combined with physical models provides mechanisms to understand the complexity of launch and range operations. The main focus in the modeling and simulation covers orbital dynamics and range safety. Range safety areas include destruct limit lines, telemetry and tracking and population risk near range. If there is an explosion of Shuttle during launch, debris dispersion is explained. The shuttle launch and range operations in this paper are discussed based on the operations from Kennedy Space Center, Florida, USA.

  9. BBMap: A Fast, Accurate, Splice-Aware Aligner

    SciTech Connect

    Bushnell, Brian

    2014-03-17

    Alignment of reads is one of the primary computational tasks in bioinformatics. Of paramount importance to resequencing, alignment is also crucial to other areas - quality control, scaffolding, string-graph assembly, homology detection, assembly evaluation, error-correction, expression quantification, and even as a tool to evaluate other tools. An optimal aligner would greatly improve virtually any sequencing process, but optimal alignment is prohibitively expensive for gigabases of data. Here, we will present BBMap [1], a fast splice-aware aligner for short and long reads. We will demonstrate that BBMap has superior speed, sensitivity, and specificity to alternative high-throughput aligners bowtie2 [2], bwa [3], smalt, [4] GSNAP [5], and BLASR [6].

  10. Optics Alignment Panel

    NASA Technical Reports Server (NTRS)

    Schroeder, Daniel J.

    1992-01-01

    The Optics Alignment Panel (OAP) was commissioned by the HST Science Working Group to determine the optimum alignment of the OTA optics. The goal was to find the position of the secondary mirror (SM) for which there is no coma or astigmatism in the camera images due to misaligned optics, either tilt or decenter. The despace position was reviewed of the SM and the optimum focus was sought. The results of these efforts are as follows: (1) the best estimate of the aligned position of the SM in the notation of HDOS is (DZ,DY,TZ,TY) = (+248 microns, +8 microns, +53 arcsec, -79 arcsec), and (2) the best focus, defined to be that despace which maximizes the fractional energy at 486 nm in a 0.1 arcsec radius of a stellar image, is 12.2 mm beyond paraxial focus. The data leading to these conclusions, and the estimated uncertainties in the final results, are presented.

  11. Barrel alignment fixture

    NASA Astrophysics Data System (ADS)

    Sheeley, J. D.

    1981-04-01

    Fabrication of slapper type detonator cables requires bonding of a thin barrel over a bridge. Location of the barrel hole with respect to the bridge is critical: the barrel hole must be centered over the bridge uniform spacing on each side. An alignment fixture which permits rapid adjustment of the barrel position with respect to the bridge is described. The barrel is manipulated by pincer-type fingers which are mounted on a small x-y table equipped with micrometer adjustments. Barrel positioning, performed under a binocular microscopy, is rapid and accurate. After alignment, the microscope is moved out of position and an infrared (IR) heat source is aimed at the barrel. A 5-second pulse of infrared heat flows the adhesive under the barrel and bonds it to the cable. Sapphire and Fotoform glass barrels were bonded successfully with the alignment fixture.

  12. Magnetically aligned supramolecular hydrogels.

    PubMed

    Wallace, Matthew; Cardoso, Andre Zamith; Frith, William J; Iggo, Jonathan A; Adams, Dave J

    2014-12-01

    The magnetic-field-induced alignment of the fibrillar structures present in an aqueous solution of a dipeptide gelator, and the subsequent retention of this alignment upon transformation to a hydrogel upon the addition of CaCl2 or upon a reduction in solution pH is reported. Utilising the switchable nature of the magnetic field coupled with the slow diffusion of CaCl2 , it is possible to precisely control the extent of anisotropy across a hydrogel, something that is generally very difficult to do using alternative methods. The approach is readily extended to other compounds that form viscous solutions at high pH. It is expected that this work will greatly expand the utility of such low-molecular-weight gelators (LMWG) in areas where alignment is key. PMID:25345918

  13. Short read alignment with populations of genomes

    PubMed Central

    Huang, Lin; Popic, Victoria; Batzoglou, Serafim

    2013-01-01

    Summary: The increasing availability of high-throughput sequencing technologies has led to thousands of human genomes having been sequenced in the past years. Efforts such as the 1000 Genomes Project further add to the availability of human genome variation data. However, to date, there is no method that can map reads of a newly sequenced human genome to a large collection of genomes. Instead, methods rely on aligning reads to a single reference genome. This leads to inherent biases and lower accuracy. To tackle this problem, a new alignment tool BWBBLE is introduced in this article. We (i) introduce a new compressed representation of a collection of genomes, which explicitly tackles the genomic variation observed at every position, and (ii) design a new alignment algorithm based on the Burrows–Wheeler transform that maps short reads from a newly sequenced genome to an arbitrary collection of two or more (up to millions of) genomes with high accuracy and no inherent bias to one specific genome. Availability: http://viq854.github.com/bwbble. Contact: serafim@cs.stanford.edu PMID:23813006

  14. Prototyping Faithful Execution in a Java virtual machine.

    SciTech Connect

    Tarman, Thomas David; Campbell, Philip LaRoche; Pierson, Lyndon George

    2003-09-01

    This report presents the implementation of a stateless scheme for Faithful Execution, the design for which is presented in a companion report, ''Principles of Faithful Execution in the Implementation of Trusted Objects'' (SAND 2003-2328). We added a simple cryptographic capability to an already simplified class loader and its associated Java Virtual Machine (JVM) to provide a byte-level implementation of Faithful Execution. The extended class loader and JVM we refer to collectively as the Sandia Faithfully Executing Java architecture (or JavaFE for short). This prototype is intended to enable exploration of more sophisticated techniques which we intend to implement in hardware.

  15. JavaScript and interactive web pages in radiology.

    PubMed

    Gurney, J W

    2001-10-01

    Web publishing is becoming a more common method of disseminating information. JavaScript is an object-orientated language embedded into modern browsers and has a wide variety of uses. The use of JavaScript in radiology is illustrated by calculating the indices of sensitivity, specificity, and predictive values from a table of true positives, true negatives, false positives, and false negatives. In addition, a single line of JavaScript code can be used to annotate images, which has a wide variety of uses.

  16. HIV sequence compendium 2002

    SciTech Connect

    Kuiken, Carla; Foley, Brian; Freed, Eric; Hahn, Beatrice; Marx, Preston; McCutchan, Francine; Mellors, John; Wolinsky, Steven; Korber, Bette

    2002-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Traditionally, we present the sequence data themselves in the form of alignments: Section II, an alignment of a selection of HIV-1/SIVcpz full-length genomes (a lot of LAI-like sequences, for example, have been omitted because they are so similar that they bias the alignment); Section III, a combined HIV-1/HIV-2/SIV whole genome alignment; Sections IV–VI, amino acid alignments for HIV-1/SIV-cpz, HIV-2/SIV, and SIVagm. The HIV-2/SIV and SIVagm amino acid alignments are separate because the genetic distances between these groups are so great that presenting them in one alignment would make it very elongated because of the large number of gaps that have to be inserted. As always, tables with extensive background information gathered from the literature accompany the whole genome alignments. The collection of whole-gene sequences in the database is now large enough that we have abundant representation of most subtypes. For many subtypes, and especially for subtype B, a large number of sequences that span entire genes were not included in the printed alignments to conserve space. A more complete version of all alignments is available on our website, http://hiv-web.lanl.gov/content/hiv-db/ALIGN_CURRENT/ALIGN-INDEX.html. Importantly, all these alignments have been edited to include only one sequence per person, based on phylogenetic trees that were created for all of them, as well as on the literature. Because of the number of sequences available, we have decided to use a different selection principle this year, based on the epidemiological importance of the subtypes. Subtypes A–D and CRFs 01 and 02 are by far the most widespread variants, and for these (when available) we have included 8–10 representatives in the alignments. The other

  17. MUSE optical alignment procedure

    NASA Astrophysics Data System (ADS)

    Laurent, Florence; Renault, Edgard; Loupias, Magali; Kosmalski, Johan; Anwand, Heiko; Bacon, Roland; Boudon, Didier; Caillier, Patrick; Daguisé, Eric; Dubois, Jean-Pierre; Dupuy, Christophe; Kelz, Andreas; Lizon, Jean-Louis; Nicklas, Harald; Parès, Laurent; Remillieux, Alban; Seifert, Walter; Valentin, Hervé; Xu, Wenli

    2012-09-01

    MUSE (Multi Unit Spectroscopic Explorer) is a second generation VLT integral field spectrograph (1x1arcmin² Field of View) developed for the European Southern Observatory (ESO), operating in the visible wavelength range (0.465-0.93 μm). A consortium of seven institutes is currently assembling and testing MUSE in the Integration Hall of the Observatoire de Lyon for the Preliminary Acceptance in Europe, scheduled for 2013. MUSE is composed of several subsystems which are under the responsibility of each institute. The Fore Optics derotates and anamorphoses the image at the focal plane. A Splitting and Relay Optics feed the 24 identical Integral Field Units (IFU), that are mounted within a large monolithic instrument mechanical structure. Each IFU incorporates an image slicer, a fully refractive spectrograph with VPH-grating and a detector system connected to a global vacuum and cryogenic system. During 2011, all MUSE subsystems were integrated, aligned and tested independently in each institute. After validations, the systems were shipped to the P.I. institute at Lyon and were assembled in the Integration Hall This paper describes the end-to-end optical alignment procedure of the MUSE instrument. The design strategy, mixing an optical alignment by manufacturing (plug and play approach) and few adjustments on key components, is presented. We depict the alignment method for identifying the optical axis using several references located in pupil and image planes. All tools required to perform the global alignment between each subsystem are described. The success of this alignment approach is demonstrated by the good results for the MUSE image quality. MUSE commissioning at the VLT (Very Large Telescope) is planned for 2013.

  18. Segment alignment control system

    NASA Technical Reports Server (NTRS)

    Aubrun, JEAN-N.; Lorell, Ken R.

    1988-01-01

    The segmented primary mirror for the LDR will require a special segment alignment control system to precisely control the orientation of each of the segments so that the resulting composite reflector behaves like a monolith. The W.M. Keck Ten Meter Telescope will utilize a primary mirror made up of 36 actively controlled segments. Thus the primary mirror and its segment alignment control system are directly analogous to the LDR. The problems of controlling the segments in the face of disturbances and control/structures interaction, as analyzed for the TMT, are virtually identical to those for the LDR. The two systems are briefly compared.

  19. PILOT optical alignment

    NASA Astrophysics Data System (ADS)

    Longval, Y.; Mot, B.; Ade, P.; André, Y.; Aumont, J.; Baustista, L.; Bernard, J.-Ph.; Bray, N.; de Bernardis, P.; Boulade, O.; Bousquet, F.; Bouzit, M.; Buttice, V.; Caillat, A.; Charra, M.; Chaigneau, M.; Crane, B.; Crussaire, J.-P.; Douchin, F.; Doumayrou, E.; Dubois, J.-P.; Engel, C.; Etcheto, P.; Gélot, P.; Griffin, M.; Foenard, G.; Grabarnik, S.; Hargrave, P..; Hughes, A.; Laureijs, R.; Lepennec, Y.; Leriche, B.; Maestre, S.; Maffei, B.; Martignac, J.; Marty, C.; Marty, W.; Masi, S.; Mirc, F.; Misawa, R.; Montel, J.; Montier, L.; Narbonne, J.; Nicot, J.-M.; Pajot, F.; Parot, G.; Pérot, E.; Pimentao, J.; Pisano, G.; Ponthieu, N.; Ristorcelli, I.; Rodriguez, L.; Roudil, G.; Salatino, M.; Savini, G.; Simonella, O.; Saccoccio, M.; Tapie, P.; Tauber, J.; Torre, J.-P.; Tucker, C.

    2016-07-01

    PILOT is a balloon-borne astronomy experiment designed to study the polarization of dust emission in the diffuse interstellar medium in our Galaxy at wavelengths 240 μm with an angular resolution about two arcminutes. Pilot optics is composed an off-axis Gregorian type telescope and a refractive re-imager system. All optical elements, except the primary mirror, are in a cryostat cooled to 3K. We combined the optical, 3D dimensional measurement methods and thermo-elastic modeling to perform the optical alignment. The talk describes the system analysis, the alignment procedure, and finally the performances obtained during the first flight in September 2015.

  20. JavaProtein Dossier: a novel web-based data visualization tool for comprehensive analysis of protein structure

    PubMed Central

    Neshich, Goran; Rocchia, Walter; Mancini, Adauto L.; Yamagishi, Michel E. B.; Kuser, Paula R.; Fileto, Renato; Baudet, Christian; Pinto, Ivan P.; Montagner, Arnaldo J.; Palandrani, Juliana F.; Krauchenco, Joao N.; Torres, Renato C.; Souza, Savio; Togawa, Roberto C.; Higa, Roberto H.

    2004-01-01

    JavaProtein Dossier (JPD) is a new concept, database and visualization tool providing one of the largest collections of the physicochemical parameters describing proteins' structure, stability, function and interaction with other macromolecules. By collecting as many descriptors/parameters as possible within a single database, we can achieve a better use of the available data and information. Furthermore, data grouping allows us to generate different parameters with the potential to provide new insights into the sequence–structure–function relationship. In JPD, residue selection can be performed according to multiple criteria. JPD can simultaneously display and analyze all the physicochemical parameters of any pair of structures, using precalculated structural alignments, allowing direct parameter comparison at corresponding amino acid positions among homologous structures. In order to focus on the physicochemical (and consequently pharmacological) profile of proteins, visualization tools (showing the structure and structural parameters) also had to be optimized. Our response to this challenge was the use of Java technology with its exceptional level of interactivity. JPD is freely accessible (within the Gold Sting Suite) at http://sms.cbi.cnptia.embrapa.br, http://mirrors.rcsb.org/SMS, http://trantor.bioc.columbia.edu/SMS and http://www.es.embnet.org/SMS/ (Option: JavaProtein Dossier). PMID:15215458

  1. HIV Sequence Compendium 2010

    SciTech Connect

    Kuiken, Carla; Foley, Brian; Leitner, Thomas; Apetrei, Christian; Hahn, Beatrice; Mizrachi, Ilene; Mullins, James; Rambaut, Andrew; Wolinsky, Steven; Korber, Bette

    2010-12-31

    This compendium is an annual printed summary of the data contained in the HIV sequence database. In these compendia we try to present a judicious selection of the data in such a way that it is of maximum utility to HIV researchers. Each of the alignments attempts to display the genetic variability within the different species, groups and subtypes of the virus. This compendium contains sequences published before January 1, 2010. Hence, though it is called the 2010 Compendium, its contents correspond to the 2009 curated alignments on our website. The number of sequences in the HIV database is still increasing exponentially. In total, at the time of printing, there were 339,306 sequences in the HIV Sequence Database, an increase of 45% since last year. The number of near complete genomes (>7000 nucleotides) increased to 2576 by end of 2009, reflecting a smaller increase than in previous years. However, as in previous years, the compendium alignments contain only a small fraction of these. Included in the alignments are a small number of sequences representing each of the subtypes and the more prevalent circulating recombinant forms (CRFs) such as 01 and 02, as well as a few outgroup sequences (group O and N and SIV-CPZ). Of the rarer CRFs we included one representative each. A more complete version of all alignments is available on our website, http://www.hiv.lanl.gov/content/sequence/NEWALIGN/align.html. Reprints are available from our website in the form of both HTML and PDF files. As always, we are open to complaints and suggestions for improvement. Inquiries and comments regarding the compendium should be addressed to seq-info@lanl.gov.

  2. Multiple structure alignment with msTALI

    PubMed Central

    2012-01-01

    Background Multiple structure alignments have received increasing attention in recent years as an alternative to multiple sequence alignments. Although multiple structure alignment algorithms can potentially be applied to a number of problems, they have primarily been used for protein core identification. A method that is capable of solving a variety of problems using structure comparison is still absent. Here we introduce a program msTALI for aligning multiple protein structures. Our algorithm uses several informative features to guide its alignments: torsion angles, backbone Cα atom positions, secondary structure, residue type, surface accessibility, and properties of nearby atoms. The algorithm allows the user to weight the types of information used to generate the alignment, which expands its utility to a wide variety of problems. Results msTALI exhibits competitive results on 824 families from the Homstrad and SABmark databases when compared to Matt and Mustang. We also demonstrate success at building a database of protein cores using 341 randomly selected CATH domains and highlight the contribution of msTALI compared to the CATH classifications. Finally, we present an example applying msTALI to the problem of detecting hinges in a protein undergoing rigid-body motion. Conclusions msTALI is an effective algorithm for multiple structure alignment. In addition to its performance on standard comparison databases, it utilizes clear, informative features, allowing further customization for domain-specific applications. The C++ source code for msTALI is available for Linux on the web at http://ifestos.cse.sc.edu/mstali. PMID:22607234

  3. Fast and accurate short read alignment with Burrows–Wheeler transform

    PubMed Central

    Li, Heng; Durbin, Richard

    2009-01-01

    Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ∼10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: rd@sanger.ac.uk PMID:19451168

  4. Curriculum Alignment: Establishing Coherence

    ERIC Educational Resources Information Center

    Gagné, Philippe; Dumont, Laurence; Brunet, Sabine; Boucher, Geneviève

    2013-01-01

    In this paper, we present a step-by-step guide to implement a curricular alignment project, directed at professional development and student support, and developed in a higher education French as a second language department. We outline best practices and preliminary results from our experience and provide ways to adapt our experience to other…

  5. Optically Aligned Drill Press

    NASA Technical Reports Server (NTRS)

    Adderholdt, Bruce M.

    1994-01-01

    Precise drill press equipped with rotary-indexing microscope. Microscope and drill exchange places when turret rotated. Microscope axis first aligned over future hole, then rotated out of way so drill axis assumes its precise position. New procedure takes less time to locate drilling positions and produces more accurate results. Apparatus adapted to such other machine tools as milling and measuring machines.

  6. Aligning brains and minds

    PubMed Central

    Tong, Frank

    2012-01-01

    In this issue of Neuron, Haxby and colleagues describe a new method for aligning functional brain activity patterns across participants. Their study demonstrates that objects are similarly represented across different brains, allowing for reliable classification of one person’s brain activity based on another’s. PMID:22017984

  7. Aligned-or Not?

    ERIC Educational Resources Information Center

    Roseman, Jo Ellen; Koppal, Mary

    2015-01-01

    When state leaders and national partners in the development of the Next Generation Science Standards met to consider implementation strategies, states and school districts wanted to know which materials were aligned to the new standards. The answer from the developers was short but not sweet: You won't find much now, and it's going to…

  8. The 17 July 2006 Tsunami earthquake in West Java, Indonesia

    USGS Publications Warehouse

    Mori, J.; Mooney, W.D.; Afnimar,; Kurniawan, S.; Anaya, A.I.; Widiyantoro, S.

    2007-01-01

    A tsunami earthquake (Mw = 7.7) occurred south of Java on 17 July 2006. The event produced relatively low levels of high-frequency radiation, and local felt reports indicated only weak shaking in Java. There was no ground motion damage from the earthquake, but there was extensive damage and loss of life from the tsunami along 250 km of the southern coasts of West Java and Central Java. An inspection of the area a few days after the earthquake showed extensive damage to wooden and unreinforced masonry buildings that were located within several hundred meters of the coast. Since there was no tsunami warning system in place, efforts to escape the large waves depended on how people reacted to the earthquake shaking, which was only weakly felt in the coastal areas. This experience emphasizes the need for adequate tsunami warning systems for the Indian Ocean region.

  9. MALIDUP: a database of manually constructed structure alignments for duplicated domain pairs.

    PubMed

    Cheng, Hua; Kim, Bong-Hyun; Grishin, Nick V

    2008-03-01

    We describe MALIDUP (manual alignments of duplicated domains), a database of 241 pairwise structure alignments for homologous domains originated by internal duplication within the same polypeptide chain. Since duplicated domains within a protein frequently diverge in function and thus in sequence, this would be the first database of structurally similar homologs that is not strongly biased by sequence or functional similarity. Our manual alignments in most cases agree with the automatic structural alignments generated by several commonly used programs. This carefully constructed database could be used in studies on protein evolution and as a reference for testing structure alignment programs. The database is available at http://prodata.swmed.edu/malidup. PMID:17932926

  10. A tool for learning the programming style of Java

    NASA Astrophysics Data System (ADS)

    Arai, Masayuki

    2013-03-01

    We developed a tool for learning the programming style of Java. The tool has the following functions: (1) recommends the use of CamelCase and English words for the names of classes, methods, and variables; (2) recommends setting the correct scope level of variables and the appropriate length of variable names; (3) recommends writing comments in source programs; (4) shows sample source programs according to the programming style of Java.

  11. A rapid protein structure alignment algorithm based on a text modeling technique

    PubMed Central

    Razmara, Jafar; Deris, Safaai; Parvizpour, Sepideh

    2011-01-01

    Structural alignment of proteins is widely used in various fields of structural biology. In order to further improve the quality of alignment, we describe an algorithm for structural alignment based on text modelling techniques. The technique firstly superimposes secondary structure elements of two proteins and then, models the 3D-structure of the protein in a sequence of alphabets. These sequences are utilized by a step-by-step sequence alignment procedure to align two protein structures. A benchmark test was organized on a set of 200 non-homologous proteins to evaluate the program and compare it to state of the art programs, e.g. CE, SAL, TM-align and 3D-BLAST. On average, the results of all-against-all structure comparison by the program have a competitive accuracy with CE and TM-align where the algorithm has a high running speed like 3D-BLAST. PMID:21814392

  12. Fast and sensitive protein alignment using DIAMOND.

    PubMed

    Buchfink, Benjamin; Xie, Chao; Huson, Daniel H

    2015-01-01

    The alignment of sequencing reads against a protein reference database is a major computational bottleneck in metagenomics and data-intensive evolutionary projects. Although recent tools offer improved performance over the gold standard BLASTX, they exhibit only a modest speedup or low sensitivity. We introduce DIAMOND, an open-source algorithm based on double indexing that is 20,000 times faster than BLASTX on short reads and has a similar degree of sensitivity.

  13. jFuzz: A Concolic Whitebox Fuzzer for Java

    NASA Technical Reports Server (NTRS)

    Jayaraman, Karthick; Harvison, David; Ganesh, Vijay; Kiezun, Adam

    2009-01-01

    We present jFuzz, a automatic testing tool for Java programs. jFuzz is a concolic whitebox fuzzer, built on the NASA Java PathFinder, an explicit-state Java model checker, and a framework for developing reliability and analysis tools for Java. Starting from a seed input, jFuzz automatically and systematically generates inputs that exercise new program paths. jFuzz uses a combination of concrete and symbolic execution, and constraint solving. Time spent on solving constraints can be significant. We implemented several well-known optimizations and name-independent caching, which aggressively normalizes the constraints to reduce the number of calls to the constraint solver. We present preliminary results due to the optimizations, and demonstrate the effectiveness of jFuzz in creating good test inputs. The source code of jFuzz is available as part of the NASA Java PathFinder. jFuzz is intended to be a research testbed for investigating new testing and analysis techniques based on concrete and symbolic execution. The source code of jFuzz is available as part of the NASA Java PathFinder.

  14. A JAVA User Interface for the Virtual Human

    SciTech Connect

    Easterly, C E; Strickler, D J; Tolliver, J S; Ward, R C

    1999-10-13

    A human simulation environment, the Virtual Human (VH), is under development at the Oak Ridge National Laboratory (ORNL). Virtual Human connects three-dimensional (3D) anatomical models of the body with dynamic physiological models to investigate a wide range of human biological and physical responses to stimuli. We have utilized the Java programming language to develop a flexible user interface to the VH. The Java prototype interface has been designed to display dynamic results from selected physiological models, with user control of the initial model parameters and ability to steer the simulation as it is proceeding. Taking advantage of Java's Remote Method Invocation (RMI) features, the interface runs as a Java client that connects to a Java RMI server process running on a remote server machine. The RMI server can couple to physiological models written in Java, or in other programming languages, including C and FORTRAN. Future versions of the interface will be linked to 3D anatomical models of the human body to complete the development of the VH.

  15. Java Performance for Scientific Applications on LLNL Computer Systems

    SciTech Connect

    Kapfer, C; Wissink, A

    2002-05-10

    Languages in use for high performance computing at the laboratory--Fortran (f77 and f90), C, and C++--have many years of development behind them and are generally considered the fastest available. However, Fortran and C do not readily extend to object-oriented programming models, limiting their capability for very complex simulation software. C++ facilitates object-oriented programming but is a very complex and error-prone language. Java offers a number of capabilities that these other languages do not. For instance it implements cleaner (i.e., easier to use and less prone to errors) object-oriented models than C++. It also offers networking and security as part of the language standard, and cross-platform executables that make it architecture neutral, to name a few. These features have made Java very popular for industrial computing applications. The aim of this paper is to explain the trade-offs in using Java for large-scale scientific applications at LLNL. Despite its advantages, the computational science community has been reluctant to write large-scale computationally intensive applications in Java due to concerns over its poor performance. However, considerable progress has been made over the last several years. The Java Grande Forum [1] has been promoting the use of Java for large-scale computing. Members have introduced efficient array libraries, developed fast just-in-time (JIT) compilers, and built links to existing packages used in high performance parallel computing.

  16. Establishing a framework for comparative analysis of genome sequences

    SciTech Connect

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  17. Aligning genomes with inversions and swaps

    SciTech Connect

    Holloway, J.L.; Cull, P.

    1994-12-31

    The decision about what operators to allow and how to charge for these operations when aligning strings that arise in a biological context is the decision about what model of evolution to assume. Frequently the operators used to construct an alignment between biological sequences axe limited to deletion, insertion, or replacement of a character or block of characters, but there is biological evidence for the evolutionary operations of exchanging the positions of two segments in a sequence and the replacement of a segment by its reversed complement. In this paper we describe a family of heuristics designed to compute alignments of biological sequences assuming a model of evolution with swaps and inversions. The heuristics will necessarily be approximate since the appropriate way to charge for the evolutionary events (delete, insert, substitute, swap, and invert) is not known. The paper concludes with a pair-wise comparison of 20 Picornavirus genomes, and a detailed comparison of the hepatitis delta virus with the citrus exocortis viroid.

  18. MUSE alignment onto VLT

    NASA Astrophysics Data System (ADS)

    Laurent, Florence; Renault, Edgard; Boudon, Didier; Caillier, Patrick; Daguisé, Eric; Dupuy, Christophe; Jarno, Aurélien; Lizon, Jean-Louis; Migniau, Jean-Emmanuel; Nicklas, Harald; Piqueras, Laure

    2014-07-01

    MUSE (Multi Unit Spectroscopic Explorer) is a second generation Very Large Telescope (VLT) integral field spectrograph developed for the European Southern Observatory (ESO). It combines a 1' x 1' field of view sampled at 0.2 arcsec for its Wide Field Mode (WFM) and a 7.5"x7.5" field of view for its Narrow Field Mode (NFM). Both modes will operate with the improved spatial resolution provided by GALACSI (Ground Atmospheric Layer Adaptive Optics for Spectroscopic Imaging), that will use the VLT deformable secondary mirror and 4 Laser Guide Stars (LGS) foreseen in 2015. MUSE operates in the visible wavelength range (0.465-0.93 μm). A consortium of seven institutes is currently commissioning MUSE in the Very Large Telescope for the Preliminary Acceptance in Chile, scheduled for September, 2014. MUSE is composed of several subsystems which are under the responsibility of each institute. The Fore Optics derotates and anamorphoses the image at the focal plane. A Splitting and Relay Optics feed the 24 identical Integral Field Units (IFU), that are mounted within a large monolithic structure. Each IFU incorporates an image slicer, a fully refractive spectrograph with VPH-grating and a detector system connected to a global vacuum and cryogenic system. During 2012 and 2013, all MUSE subsystems were integrated, aligned and tested to the P.I. institute at Lyon. After successful PAE in September 2013, MUSE instrument was shipped to the Very Large Telescope in Chile where that was aligned and tested in ESO integration hall at Paranal. After, MUSE was directly transported, fully aligned and without any optomechanical dismounting, onto VLT telescope where the first light was overcame the 7th of February, 2014. This paper describes the alignment procedure of the whole MUSE instrument with respect to the Very Large Telescope (VLT). It describes how 6 tons could be move with accuracy better than 0.025mm and less than 0.25 arcmin in order to reach alignment requirements. The success

  19. Sedimentary deposits study of the 2006 Java tsunami, in Pangandaran, West Java (preliminary result)

    NASA Astrophysics Data System (ADS)

    Maemunah, Imun; Suparka, Emmy; Puspito, Nanang T.; Hidayati, Sri

    2015-04-01

    The 2006 Java Earthquake (Mw 7.2) has generated a tsunami that reached Pangandaran coastal plain with 9.7 m above sea level height of wave. In 2014 we examined the tsunami deposit exposed in shallow trenches along a˜300 m at 5 transect from shoreline to inland on Karapyak and Madasari, Pangandaran. We documented stratigraphically and sedimentologically, the characteristics of Java Tsunami deposit on Karapyak and Madasari and compared both sediments. In local farmland a moderately-sorted, brown soil is buried by a poorly-sorted, grey, medium-grained sand-sheet. The tsunami deposit was distinguished from the underlying soil by a pronounced increase in grain size that becomes finner upwards and landwards. Decreasing concentration of coarse size particles with distance toward inland are in agreement with grain size analysis. The thickest tsunami deposit is about 25 cm found at 84 m from shoreline in Madasari and about 15 cm found at 80 m from shoreline in Karapyak. The thickness of tsunami deposits in some transect become thinner landward but in some other transect lack a consistent suggested strongly affected by local topography. Tsunami deposits at Karapyak and Madasari show many similarities. Both deposits consist of coarse sand that sharply overlies a finer sandy soil. The presence mud drapes and other sedimentary structure like graded bedding, massive beds, mud clasts in many locations shows a dynamics process of tsunami waves. The imbrication coarse and shell fragments of the 2006 Java, tsunami deposits also provide information about the curent direction, allowing us to distinguish run up deposits from backwash deposits.

  20. Sedimentary deposits study of the 2006 Java tsunami, in Pangandaran, West Java (preliminary result)

    SciTech Connect

    Maemunah, Imun; Suparka, Emmy Puspito, Nanang T; Hidayati, Sri

    2015-04-24

    The 2006 Java Earthquake (Mw 7.2) has generated a tsunami that reached Pangandaran coastal plain with 9.7 m above sea level height of wave. In 2014 we examined the tsunami deposit exposed in shallow trenches along a∼300 m at 5 transect from shoreline to inland on Karapyak and Madasari, Pangandaran. We documented stratigraphically and sedimentologically, the characteristics of Java Tsunami deposit on Karapyak and Madasari and compared both sediments. In local farmland a moderately-sorted, brown soil is buried by a poorly-sorted, grey, medium-grained sand-sheet. The tsunami deposit was distinguished from the underlying soil by a pronounced increase in grain size that becomes finner upwards and landwards. Decreasing concentration of coarse size particles with distance toward inland are in agreement with grain size analysis. The thickest tsunami deposit is about 25 cm found at 84 m from shoreline in Madasari and about 15 cm found at 80 m from shoreline in Karapyak. The thickness of tsunami deposits in some transect become thinner landward but in some other transect lack a consistent suggested strongly affected by local topography. Tsunami deposits at Karapyak and Madasari show many similarities. Both deposits consist of coarse sand that sharply overlies a finer sandy soil. The presence mud drapes and other sedimentary structure like graded bedding, massive beds, mud clasts in many locations shows a dynamics process of tsunami waves. The imbrication coarse and shell fragments of the 2006 Java, tsunami deposits also provide information about the curent direction, allowing us to distinguish run up deposits from backwash deposits.

  1. Molecular characterization and phylogenetic analysis of Fasciola gigantica from western Java, Indonesia.

    PubMed

    Hayashi, Kei; Ichikawa-Seki, Madoka; Allamanda, Puttik; Wibowo, Putut Eko; Mohanta, Uday Kumar; Sodirun; Guswanto, Azirwan; Nishikawa, Yoshifumi

    2016-10-01

    Fasciola gigantica and aspermic (hybrid) Fasciola flukes are thought to be distributed in Southeast Asian countries. The objectives of this study were to investigate the distribution of these flukes from unidentified ruminants in western Java, Indonesia, and to determine their distribution history into the area. Sixty Fasciola flukes from western Java were identified as F. gigantica based on the nucleotide sequences of the nuclear phosphoenolpyruvate carboxykinase (pepck) and DNA polymerase delta (pold) genes. The flukes were then analyzed phylogenetically based on the nucleotide sequence of the mitochondrial NADH dehydrogenase subunit 1 (nad1) gene, together with Fasciola flukes from other Asian countries. All but one F. gigantica fluke were classified in F. gigantica haplogroup C, which mainly contains nad1 haplotypes detected in flukes from Thailand, Vietnam, and China. A population genetic analysis suggested that haplogroup C spread from Thailand to the neighboring countries including Indonesia together with domestic ruminants, such as the swamp buffalo, Bubalus bubalis. The swamp buffalo is one of the important definitive hosts of Fasciola flukes in Indonesia, and is considered to have been domesticated in the north of Thailand. The remaining one fluke displayed a novel nad1 haplotype that has never been detected in the reference countries. Therefore, the origin of the fluke could not be established. No hybrid Fasciola flukes were detected in this study, in contrast to neighboring Asian countries. PMID:27266482

  2. Molecular characterization and phylogenetic analysis of Fasciola gigantica from western Java, Indonesia.

    PubMed

    Hayashi, Kei; Ichikawa-Seki, Madoka; Allamanda, Puttik; Wibowo, Putut Eko; Mohanta, Uday Kumar; Sodirun; Guswanto, Azirwan; Nishikawa, Yoshifumi

    2016-10-01

    Fasciola gigantica and aspermic (hybrid) Fasciola flukes are thought to be distributed in Southeast Asian countries. The objectives of this study were to investigate the distribution of these flukes from unidentified ruminants in western Java, Indonesia, and to determine their distribution history into the area. Sixty Fasciola flukes from western Java were identified as F. gigantica based on the nucleotide sequences of the nuclear phosphoenolpyruvate carboxykinase (pepck) and DNA polymerase delta (pold) genes. The flukes were then analyzed phylogenetically based on the nucleotide sequence of the mitochondrial NADH dehydrogenase subunit 1 (nad1) gene, together with Fasciola flukes from other Asian countries. All but one F. gigantica fluke were classified in F. gigantica haplogroup C, which mainly contains nad1 haplotypes detected in flukes from Thailand, Vietnam, and China. A population genetic analysis suggested that haplogroup C spread from Thailand to the neighboring countries including Indonesia together with domestic ruminants, such as the swamp buffalo, Bubalus bubalis. The swamp buffalo is one of the important definitive hosts of Fasciola flukes in Indonesia, and is considered to have been domesticated in the north of Thailand. The remaining one fluke displayed a novel nad1 haplotype that has never been detected in the reference countries. Therefore, the origin of the fluke could not be established. No hybrid Fasciola flukes were detected in this study, in contrast to neighboring Asian countries.

  3. Inflation by alignment

    SciTech Connect

    Burgess, C.P.; Roest, Diederik

    2015-06-08

    Pseudo-Goldstone bosons (pGBs) can provide technically natural inflatons, as has been comparatively well-explored in the simplest axion examples. Although inflationary success requires trans-Planckian decay constants, f≳M{sub p}, several mechanisms have been proposed to obtain this, relying on (mis-)alignments between potential and kinetic energies in multiple-field models. We extend these mechanisms to a broader class of inflationary models, including in particular the exponential potentials that arise for pGB potentials based on noncompact groups (and so which might apply to moduli in an extra-dimensional setting). The resulting potentials provide natural large-field inflationary models and can predict a larger primordial tensor signal than is true for simpler single-field versions of these models. In so doing we provide a unified treatment of several alignment mechanisms, showing how each emerges as a limit of the more general setup.

  4. Org.Lcsim: Event Reconstruction in Java

    SciTech Connect

    Graf, Norman A.; /SLAC

    2012-04-19

    Maximizing the physics performance of detectors being designed for the International Linear Collider, while remaining sensitive to cost constraints, requires a powerful, efficient, and flexible simulation, reconstruction and analysis environment to study the capabilities of a large number of different detector designs. The preparation of Letters Of Intent for the International Linear Collider involved the detailed study of dozens of detector options, layouts and readout technologies; the final physics benchmarking studies required the reconstruction and analysis of hundreds of millions of events. We describe the Java-based software toolkit (org.lcsim) which was used for full event reconstruction and analysis. The components are fully modular and are available for tasks from digitization of tracking detector signals through to cluster finding, pattern recognition, track-fitting, calorimeter clustering, individual particle reconstruction, jet-finding, and analysis. The detector is defined by the same xml input files used for the detector response simulation, ensuring the simulation and reconstruction geometries are always commensurate by construction. We discuss the architecture as well as the performance.

  5. Jeagle: a JAVA Runtime Verification Tool

    NASA Technical Reports Server (NTRS)

    DAmorim, Marcelo; Havelund, Klaus

    2005-01-01

    We introduce the temporal logic Jeagle and its supporting tool for runtime verification of Java programs. A monitor for an Jeagle formula checks if a finite trace of program events satisfies the formula. Jeagle is a programming oriented extension of the rule-based powerful Eagle logic that has been shown to be capable of defining and implementing a range of finite trace monitoring logics, including future and past time temporal logic, real-time and metric temporal logics, interval logics, forms of quantified temporal logics, and so on. Monitoring is achieved on a state-by-state basis avoiding any need to store the input trace. Jeagle extends Eagle with constructs for capturing parameterized program events such as method calls and method returns. Parameters can be the objects that methods are called upon, arguments to methods, and return values. Jeagle allows one to refer to these in formulas. The tool performs automated program instrumentation using AspectJ. We show the transformational semantics of Jeagle.

  6. Orbit IMU alignment: Error analysis

    NASA Technical Reports Server (NTRS)

    Corson, R. W.

    1980-01-01

    A comprehensive accuracy analysis of orbit inertial measurement unit (IMU) alignments using the shuttle star trackers was completed and the results are presented. Monte Carlo techniques were used in a computer simulation of the IMU alignment hardware and software systems to: (1) determine the expected Space Transportation System 1 Flight (STS-1) manual mode IMU alignment accuracy; (2) investigate the accuracy of alignments in later shuttle flights when the automatic mode of star acquisition may be used; and (3) verify that an analytical model previously used for estimating the alignment error is a valid model. The analysis results do not differ significantly from expectations. The standard deviation in the IMU alignment error for STS-1 alignments was determined to the 68 arc seconds per axis. This corresponds to a 99.7% probability that the magnitude of the total alignment error is less than 258 arc seconds.

  7. Nuclear reactor alignment plate configuration

    DOEpatents

    Altman, David A; Forsyth, David R; Smith, Richard E; Singleton, Norman R

    2014-01-28

    An alignment plate that is attached to a core barrel of a pressurized water reactor and fits within slots within a top plate of a lower core shroud and upper core plate to maintain lateral alignment of the reactor internals. The alignment plate is connected to the core barrel through two vertically-spaced dowel pins that extend from the outside surface of the core barrel through a reinforcement pad and into corresponding holes in the alignment plate. Additionally, threaded fasteners are inserted around the perimeter of the reinforcement pad and into the alignment plate to further secure the alignment plate to the core barrel. A fillet weld also is deposited around the perimeter of the reinforcement pad. To accomodate thermal growth between the alignment plate and the core barrel, a gap is left above, below and at both sides of one of the dowel pins in the alignment plate holes through with the dowel pins pass.

  8. Alignment reference device

    DOEpatents

    Patton, Gail Y.; Torgerson, Darrel D.

    1987-01-01

    An alignment reference device provides a collimated laser beam that minimizes angular deviations therein. A laser beam source outputs the beam into a single mode optical fiber. The output end of the optical fiber acts as a source of radiant energy and is positioned at the focal point of a lens system where the focal point is positioned within the lens. The output beam reflects off a mirror back to the lens that produces a collimated beam.

  9. Automated whole-genome multiple alignment of rat, mouse, and human

    SciTech Connect

    Brudno, Michael; Poliakov, Alexander; Salamov, Asaf; Cooper, Gregory M.; Sidow, Arend; Rubin, Edward M.; Solovyev, Victor; Batzoglou, Serafim; Dubchak, Inna

    2004-07-04

    We have built a whole genome multiple alignment of the three currently available mammalian genomes using a fully automated pipeline which combines the local/global approach of the Berkeley Genome Pipeline and the LAGAN program. The strategy is based on progressive alignment, and consists of two main steps: (1) alignment of the mouse and rat genomes; and (2) alignment of human to either the mouse-rat alignments from step 1, or the remaining unaligned mouse and rat sequences. The resulting alignments demonstrate high sensitivity, with 87% of all human gene-coding areas aligned in both mouse and rat. The specificity is also high: <7% of the rat contigs are aligned to multiple places in human and 97% of all alignments with human sequence > 100kb agree with a three-way synteny map built independently using predicted exons in the three genomes. At the nucleotide level <1% of the rat nucleotides are mapped to multiple places in the human sequence in the alignment; and 96.5% of human nucleotides within all alignments agree with the synteny map. The alignments are publicly available online, with visualization through the novel Multi-VISTA browser that we also present.

  10. Dynamic Alignment at SLS

    SciTech Connect

    Ruland, Robert E.

    2003-04-23

    The relative alignment of components in the storage ring of the Swiss Light Source (SLS) is guaranteed by mechanical means. The magnets are rigidly fixed to 48 girders by means of alignment rails with tolerances of less than {+-}15 {micro}m. The bending magnets, supported by 3 point ball bearings, overlap adjacent girders and thus establish virtual train links between the girders, located near the bending magnet centres. Keeping the distortion of the storage ring geometry within a tolerance of {+-}100 {micro}m in order to guarantee sufficient dynamic apertures, requires continuous monitoring and correction of the girder locations. Two monitoring systems for the horizontal and the vertical direction will be installed to measure displacements of the train link between girders, which are due to ground settings and temperature effects: The hydrostatic levelling system (HLS) gives an absolute vertical reference, while the horizontal positioning system (HPS), which employs low cost linear encoders with sub-micron resolution, measures relative horizontal movements. The girder mover system based on five DC motors per girder allows a dynamic realignment of the storage ring within a working window of more than {+-}1 mm for girder translations and {+-}1 mrad for rotations. We will describe both monitoring systems (HLS and HPS) as well as the applied correction scheme based on the girder movers. We also show simulations indicating that beam based girder alignment takes care of most of the static closed orbit correction.

  11. Approximate protein structural alignment in polynomial time.

    PubMed

    Kolodny, Rachel; Linial, Nathan

    2004-08-17

    Alignment of protein structures is a fundamental task in computational molecular biology. Good structural alignments can help detect distant evolutionary relationships that are hard or impossible to discern from protein sequences alone. Here, we study the structural alignment problem as a family of optimization problems and develop an approximate polynomial-time algorithm to solve them. For a commonly used scoring function, the algorithm runs in O(n(10)/epsilon(6)) time, for globular protein of length n, and it detects alignments that score within an additive error of epsilon from all optima. Thus, we prove that this task is computationally feasible, although the method that we introduce is too slow to be a useful everyday tool. We argue that such approximate solutions are, in fact, of greater interest than exact ones because of the noisy nature of experimentally determined protein coordinates. The measurement of similarity between a pair of protein structures used by our algorithm involves the Euclidean distance between the structures (appropriately rigidly transformed). We show that an alternative approach, which relies on internal distance matrices, must incorporate sophisticated geometric ingredients if it is to guarantee optimality and run in polynomial time. We use these observations to visualize the scoring function for several real instances of the problem. Our investigations yield insights on the computational complexity of protein alignment under various scoring functions. These insights can be used in the design of scoring functions for which the optimum can be approximated efficiently and perhaps in the development of efficient algorithms for the multiple structural alignment problem. PMID:15304646

  12. Alignment and alignment transition of bent core nematics

    NASA Astrophysics Data System (ADS)

    Elamain, Omaima; Hegde, Gurumurthy; Komitov, Lachezar

    2013-07-01

    We report on the alignment of nematics consisting of bimesogen bent core molecules of chlorine substituent of benzene derivative and their binary mixture with rod like nematics. It was found that the alignment layer made from polyimide material, which is usually used for promoting vertical (homeotropic) alignment of rod like nematics, promotes instead a planar alignment of the bent core nematic and its nematic mixtures. At higher concentration of the rod like nematic component in these mixtures, a temperature driven transition from vertical to planar alignment was found near the transition to isotropic phase.

  13. Polar cap arcs: Sun-aligned or cusp-aligned?

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; Paxton, L. J.; Zhang, Qinghe; Xing, Zanyang

    2016-08-01

    Polar cap arcs are often called sun-aligned arcs. Satellite observations reveal that polar cap arcs join together at the cusp and are actually cusp aligned. Strong ionospheric plasma velocity shears, thus field aligned currents, were associated with polar arcs and they were likely caused by Kelvin-Helmholtz waves around the low-latitude magnetopause under a northward IMF Bz. The magnetic field lines around the magnetopause join together in the cusp region so are the field aligned currents and particle precipitation. This explains why polar arcs are cusp aligned.

  14. COS to FGS Alignment {NUV}

    NASA Astrophysics Data System (ADS)

    Hartig, George

    2009-07-01

    DESCRIPTION: In order to determine the location of the COS reference frame with respect to the FGS reference frames, NUV MIRRORA images will be obtained of an astrometric target and field. Astrometric guide stars and targets must be employed for this activity in order to facilitate the alignment wth the FGS. Images will be obtained at the initial pointing and at positions offset in V2 and in V3. Starting with the original blind pointing, obtain MIRRORA image exposures in a 5x5 POS-TARG grid centered on initial pointing; repeat the image sequence at two bracketing focus positions in same visit. Following completion of third pattern, return to nominal focus and perform 5x5 ACQ/SEARCH target acquisition and obtain one TIME-TAG MIRRORA image and one ACCUM verification exposure. Next perform an ACQ/IMAGE target acquisition followed by an ACCUM verification exposure. Also obtain ACCUM verification exposure for each of the two alternate focus positions used previously. Using MIRRORB obtain ACCUM confirmation image at nominal focus and ACCUM images at alternate focus positions and then perform an ACQ/IMAGE and confirming image at nominal focus. Analyze imagery, uplink pointing offset as offset 11469A and adjust nominal focus via patchable constant uplinked with subsequent visit of this program; update aperture locations via modified SIAF file uplinked with subsequent SMS. Use updated focus and offset pointing as input for COS 09 {program 11469 - NUV Optics Alignment and Focus} {note the SIAF update is not a prerequisite for COS 09 to proceed, but the pointing offset and focus update are}.

  15. Aspect-Oriented Subprogram Synthesizes UML Sequence Diagrams

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    The Rational Sequence computer program described elsewhere includes a subprogram that utilizes the capability for aspect-oriented programming when that capability is present. This subprogram is denoted the Rational Sequence (AspectJ) component because it uses AspectJ, which is an extension of the Java programming language that introduces aspect-oriented programming techniques into the language

  16. Building interactive virtual environments for simulated training in medicine using VRML and Java/JavaScript.

    PubMed

    Korocsec, D; Holobar, A; Divjak, M; Zazula, D

    2005-12-01

    Medicine is a difficult thing to learn. Experimenting with real patients should not be the only option; simulation deserves a special attention here. Virtual Reality Modelling Language (VRML) as a tool for building virtual objects and scenes has a good record of educational applications in medicine, especially for static and animated visualisations of body parts and organs. However, to create computer simulations resembling situations in real environments the required level of interactivity and dynamics is difficult to achieve. In the present paper we describe some approaches and techniques which we used to push the limits of the current VRML technology further toward dynamic 3D representation of virtual environments (VEs). Our demonstration is based on the implementation of a virtual baby model, whose vital signs can be controlled from an external Java application. The main contributions of this work are: (a) outline and evaluation of the three-level VRML/Java implementation of the dynamic virtual environment, (b) proposal for a modified VRML Timesensor node, which greatly improves the overall control of system performance, and (c) architecture of the prototype distributed virtual environment for training in neonatal resuscitation comprising the interactive virtual newborn, active bedside monitor for vital signs and full 3D representation of the surgery room. PMID:16520145

  17. Building interactive virtual environments for simulated training in medicine using VRML and Java/JavaScript.

    PubMed

    Korocsec, D; Holobar, A; Divjak, M; Zazula, D

    2005-12-01

    Medicine is a difficult thing to learn. Experimenting with real patients should not be the only option; simulation deserves a special attention here. Virtual Reality Modelling Language (VRML) as a tool for building virtual objects and scenes has a good record of educational applications in medicine, especially for static and animated visualisations of body parts and organs. However, to create computer simulations resembling situations in real environments the required level of interactivity and dynamics is difficult to achieve. In the present paper we describe some approaches and techniques which we used to push the limits of the current VRML technology further toward dynamic 3D representation of virtual environments (VEs). Our demonstration is based on the implementation of a virtual baby model, whose vital signs can be controlled from an external Java application. The main contributions of this work are: (a) outline and evaluation of the three-level VRML/Java implementation of the dynamic virtual environment, (b) proposal for a modified VRML Timesensor node, which greatly improves the overall control of system performance, and (c) architecture of the prototype distributed virtual environment for training in neonatal resuscitation comprising the interactive virtual newborn, active bedside monitor for vital signs and full 3D representation of the surgery room.

  18. Improving ASM stepper alignment accuracy by alignment signal intensity simulation

    NASA Astrophysics Data System (ADS)

    Li, Gerald; Pushpala, Sagar M.; Bradford, Bradley; Peng, Zezhong; Gottipati, Mohan

    1993-08-01

    As photolithography technology advances into submicron regime, the requirement for alignment accuracy also becomes much tighter. The alignment accuracy is a function of the strength of the alignment signal. Therefore, a detailed alignment signal intensity simulation for 0.8 micrometers EPROM poly-1 layer on ASM stepper was done based on the process of record in the fab to reduce misalignment and improve die yield. Oxide thickness variation did not have significant impact on the alignment signal intensity. However, poly-1 thickness was the most important parameter to affect optical alignments. The real alignment intensity data versus resist thickness on production wafers was collected and it showed good agreement with the simulated results. Similar results were obtained for ONO dielectric layer at a different fab.

  19. SWAMP+: multiple subsequence alignment using associative massive parallelism

    SciTech Connect

    Steinfadt, Shannon Irene; Baker, Johnnie W

    2010-10-18

    A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.

  20. A divide and conquer approach to multiple alignment.

    PubMed

    Dress, A; Füllen, G; Perrey, S

    1995-01-01

    We present a report on work in progress on a divide and conquer approach to multiple alignment. The algorithm makes use of the costs calculated from applying the standard dynamic programming scheme to all pairs of sequences. The resulting cost matrices for pairwise alignment give rise to secondary matrices containing the additional costs imposed by fixing the path through the dynamic programming graph at a particular vertex. Such a constraint corresponds to a division of the problem obtained by slicing both sequences between two particular positions, and aligning the two sequences on the left and the two sequences on the right, charging for gaps introduced at the slicing point. To obtain an estimate for the additional cost imposed by forcing the multiple alignment through a particular vertex in the whole hypercube, we will take a (weighted) sum of secondary costs over all pairwise projections of the division of the problem, as defined by this vertex, that is, by slicing all sequences at the points suggested by the vertex. We then use that partition of every single sequence under consideration into two 'halfs' which imposes a minimal (weighted) sum of pairwise additional costs, making sure that one of the sequences is divided somewhere close to its midpoint. Hence, each iteration can cut the problem size in half. As the enumeration of all possible partitions may restrict this approach to small-size problems, we eliminate futile partitions, and organize their enumeration in a way that starts with the most promising ones.(ABSTRACT TRUNCATED AT 250 WORDS)